Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Christophe Raffalli <christophe.raffalli@univ-savoie.fr>
To: Jon Harrop <jon@ffconsultancy.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] strange timing between search trees
Date: Fri, 25 Jan 2008 19:12:30 +0100	[thread overview]
Message-ID: <479A268E.4080907@univ-savoie.fr> (raw)
In-Reply-To: <200801250228.07262.jon@ffconsultancy.com>

Jon Harrop a écrit :
> On Friday 25 January 2008 01:32:49 Christophe Raffalli wrote:
>   
>> Dear list members,
>>
>> I wanted to compare 2-3-4 trees (look in wikipedia if you do not know
>> them) with the balanced trees of the standard library.
>> I expected the 2-3-4 to be much faster for search because they use much
>> less indirections. However, I thought
>> that construction should make little difference ...
>>
>> I was wrong :
>> Construction is arround 20% faster for 2-3-4 trees
>> Search is slower arround 5-10% for 2-3-4 trees (the diff gets smaller
>> when the trees are larger which is expected)
>>
>> I wonder if the difference in code size is the explanation (the search
>> function for balanced trees is really small and fits better
>> in cache) ?
>>     
>
> I doubt it. You can make the built-in Set module much faster by increasing the 
> code size.
>
>   
>> I attach my code with two files (the code and the test, compile with
>> ocamlopt unix.cmxa set234.ml test234tree.ml)
>>
>> Any remarks or comments ?
>>     
>
> I get quantitatively similar performance results. You're missing a lot of 
> information in your benchmarking though:
>
> What happens if the sets are constructed in-order (affects locality)?
>
> What happens if you iterate the benchmark over an array of preallocated sets?
>
>   
I known, but I was quite surprised enough with these cases.

I now have an explanation: for binary balanced trees, at least one son 
of every node is near to its ancestor,
because you create one branch at each insertion (this is true if 
comparison do not allocate and if the GC do not break locality to much, but
I think copying GC will mostly keep this property if there is not too 
much sharing ?) ... Unfortunately,
making comparison allocates memory changed nothing to my timing so I am 
not sure my explanation is reasonnable ...
At least, this shows that there is no room for a lot of improvement by 
building larger nodes ...
> Ultimately, I found it much more productive to simply optimize the Set module 
> that comes with OCaml. I'll describe how in a later OCaml Journal article but 
> the main ideas are to get the comparison function inlined (OCaml doesn't 
> inline across functor boundaries) and add a new node type for leaves. The 
> latter greatly reduces the stress on the GC, which is the main performance 
> bottleneck of the current implementation, and I found it to be up to 30% 
> faster.
>
>   
Are you sure the leaf node is a gain ? I just tried that and it failed 
(slow down) on my code (not a benchmark, my real code).

Do you add one node for Leaf, or Three (one node when the two son are 
empty and two when one of the son is empty) ?
Do you make more subtil changes ?

And could you make your optimized set library available ?


Best regards,
Christophe


PS: yet another remark: filter and partition seem suboptimal in the set 
standard library ?

Why aren't they written with join and merge as in:

    let rec filter p = function
        | Empty -> Empty
        | Node(l, v, r, _) ->
        if p v then join (filter p l) v (filter p r)
        else merge (filter p l) (filter p r)

    let rec partition p = function
        | Empty -> (Empty, Empty)
        | Node(l, v, r, _) ->
        let (lt, lf) = partition p l in
        let (rt, rf) = partition p r in
        if p v then join lt v rt, merge lf rf else
          merge lt rt, join lf v rf


  reply	other threads:[~2008-01-25 18:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-25  1:32 Christophe Raffalli
2008-01-25  2:28 ` [Caml-list] " Jon Harrop
2008-01-25 18:12   ` Christophe Raffalli [this message]
2008-01-25  2:31 ` Jon Harrop
2008-01-30 17:50 ` Christophe Raffalli
2008-01-30 18:10   ` Edgar Friendly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=479A268E.4080907@univ-savoie.fr \
    --to=christophe.raffalli@univ-savoie.fr \
    --cc=caml-list@yquem.inria.fr \
    --cc=jon@ffconsultancy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox