From: Jon Harrop <jon@ffconsultancy.com>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] strange timing between search trees
Date: Fri, 25 Jan 2008 02:28:07 +0000 [thread overview]
Message-ID: <200801250228.07262.jon@ffconsultancy.com> (raw)
In-Reply-To: <47993C41.1050900@univ-savoie.fr>
On Friday 25 January 2008 01:32:49 Christophe Raffalli wrote:
> Dear list members,
>
> I wanted to compare 2-3-4 trees (look in wikipedia if you do not know
> them) with the balanced trees of the standard library.
> I expected the 2-3-4 to be much faster for search because they use much
> less indirections. However, I thought
> that construction should make little difference ...
>
> I was wrong :
> Construction is arround 20% faster for 2-3-4 trees
> Search is slower arround 5-10% for 2-3-4 trees (the diff gets smaller
> when the trees are larger which is expected)
>
> I wonder if the difference in code size is the explanation (the search
> function for balanced trees is really small and fits better
> in cache) ?
I doubt it. You can make the built-in Set module much faster by increasing the
code size.
> I attach my code with two files (the code and the test, compile with
> ocamlopt unix.cmxa set234.ml test234tree.ml)
>
> Any remarks or comments ?
I get quantitatively similar performance results. You're missing a lot of
information in your benchmarking though:
What happens if the sets are constructed in-order (affects locality)?
What happens if you iterate the benchmark over an array of preallocated sets?
I did lots of benchmarking along these lines for the optimization chapter in
OCaml for Scientists (including benchmarking sets). I found that there are
several alternative set implementations out there but they all give almost
identical performance (as you're observing). However, the AVL trees of the
built-in sets provide asymptotically more efficient set-theoretic operations
(union, diff, inter) than most other implementations.
I also found that the benchmarking strategy more representative of the real
code that I had was to iterate over preallocate sets, i.e. more cache misses.
Ultimately, I found it much more productive to simply optimize the Set module
that comes with OCaml. I'll describe how in a later OCaml Journal article but
the main ideas are to get the comparison function inlined (OCaml doesn't
inline across functor boundaries) and add a new node type for leaves. The
latter greatly reduces the stress on the GC, which is the main performance
bottleneck of the current implementation, and I found it to be up to 30%
faster.
We've also discussed this in detail before here so you might find more
information in the caml-list archives.
HTH.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
next prev parent reply other threads:[~2008-01-25 2:34 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-25 1:32 Christophe Raffalli
2008-01-25 2:28 ` Jon Harrop [this message]
2008-01-25 18:12 ` [Caml-list] " Christophe Raffalli
2008-01-25 2:31 ` Jon Harrop
2008-01-30 17:50 ` Christophe Raffalli
2008-01-30 18:10 ` Edgar Friendly
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200801250228.07262.jon@ffconsultancy.com \
--to=jon@ffconsultancy.com \
--cc=caml-list@yquem.inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox