RE: Re: [Caml-list] Persistence

* RE: Re: [Caml-list] Persistence
@ 2005-02-21 20:17 Don Syme
  2005-02-21 23:24 ` Jon Harrop
  0 siblings, 1 reply; 4+ messages in thread
From: Don Syme @ 2005-02-21 20:17 UTC (permalink / raw)
  To: Jon Harrop, Caml mailing list

Hi Jon,

Could you send around the source code to this?  I intuitively knew that
such performance results were likely, but had never had such a
convincing example to wave at people.  

It would also be interesting to ask some C++ programmers how they would
go about optimizing this code, for a couple of reasons

(a) to see if they come up with anything faster; and 

(b) to see if they would ever even think of using
sharing-of-complex-immutable-internal-data-structure-nodes as an
optimization technique.

Don

-----Original Message-----
From: caml-list-admin@yquem.inria.fr
[mailto:caml-list-admin@yquem.inria.fr] On Behalf Of Jon Harrop
Sent: 18 February 2005 09:40
To: caml-list@yquem.inria.fr
Subject: Fwd: Re: [Caml-list] Persistence

Many moons ago:

On Sunday 06 February 2005 22:26, Radu Grigore wrote:
> I had written:
> > The STL will probably take <<1ms because the STL's size() is
probably
> > O(1) whereas Map's fold is O(n). In contrast, forking lineages is
> > probably O(n) with the STL (requiring a copy) and O(1) with ocaml's
Map.
>
> I haven't actually tested with STL but the implementation of size() is
> indeed a simple return. Do you have an example where forking lineages
> is useful?

I do now. :-)

I've just finished converting one of my example scientific programs from
OCaml to C++. The program basically computes the "n"th nearest
neighbours of
a vertex in a graph. The implementations are based upon sets of
vertices.

The OCaml version simply implements the obvious mathematical expression.
The
only difference is that reuse is implicit in the maths (which is purely
functional) and has to be made explicit in the OCaml code by memoizing
the
function.

My first stab at a C++ version did exactly the same thing. As you know,
OCaml
performance is supposed to to be within a factor of 2 of C/C++. Somewhat
suprisingly, on my first timed run the C++ code turned out to be over
100
times slower than the OCaml. After some profiling it transpired that
virtually all of the time was spent computing set unions which I had
stupidly
implemented using the STL set_union function. ;-)

The reason is quite simple. Thanks to the purely functional
implementation in
OCaml, sets are immutable. OCaml's Set.union function exploits this by
reusing pieces of tree from the input sets in the output ("copying"
arbitrarily-large subsets in O(1) time), happily knowing that they
cannot be
altered afterwards. In contrast, the C++ implementation of set_union can
make
no such assumption and is forced to copy most of the elements from both
of
the input sets. The performance of set_union in the STL is, therefore,
vastly
worse than the performance of Set.union in OCaml, thanks to persistence.

In summary, like me, you've probably been using persistence a lot
without
realising it. Every time you do set operations, for example.

HTH.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 4+ messages in thread