* RE: Re: [Caml-list] Persistence @ 2005-02-21 20:17 Don Syme 2005-02-21 23:24 ` Jon Harrop 0 siblings, 1 reply; 4+ messages in thread From: Don Syme @ 2005-02-21 20:17 UTC (permalink / raw) To: Jon Harrop, Caml mailing list Hi Jon, Could you send around the source code to this? I intuitively knew that such performance results were likely, but had never had such a convincing example to wave at people. It would also be interesting to ask some C++ programmers how they would go about optimizing this code, for a couple of reasons (a) to see if they come up with anything faster; and (b) to see if they would ever even think of using sharing-of-complex-immutable-internal-data-structure-nodes as an optimization technique. Don -----Original Message----- From: caml-list-admin@yquem.inria.fr [mailto:caml-list-admin@yquem.inria.fr] On Behalf Of Jon Harrop Sent: 18 February 2005 09:40 To: caml-list@yquem.inria.fr Subject: Fwd: Re: [Caml-list] Persistence Many moons ago: On Sunday 06 February 2005 22:26, Radu Grigore wrote: > I had written: > > The STL will probably take <<1ms because the STL's size() is probably > > O(1) whereas Map's fold is O(n). In contrast, forking lineages is > > probably O(n) with the STL (requiring a copy) and O(1) with ocaml's Map. > > I haven't actually tested with STL but the implementation of size() is > indeed a simple return. Do you have an example where forking lineages > is useful? I do now. :-) I've just finished converting one of my example scientific programs from OCaml to C++. The program basically computes the "n"th nearest neighbours of a vertex in a graph. The implementations are based upon sets of vertices. The OCaml version simply implements the obvious mathematical expression. The only difference is that reuse is implicit in the maths (which is purely functional) and has to be made explicit in the OCaml code by memoizing the function. My first stab at a C++ version did exactly the same thing. As you know, OCaml performance is supposed to to be within a factor of 2 of C/C++. Somewhat suprisingly, on my first timed run the C++ code turned out to be over 100 times slower than the OCaml. After some profiling it transpired that virtually all of the time was spent computing set unions which I had stupidly implemented using the STL set_union function. ;-) The reason is quite simple. Thanks to the purely functional implementation in OCaml, sets are immutable. OCaml's Set.union function exploits this by reusing pieces of tree from the input sets in the output ("copying" arbitrarily-large subsets in O(1) time), happily knowing that they cannot be altered afterwards. In contrast, the C++ implementation of set_union can make no such assumption and is forced to copy most of the elements from both of the input sets. The performance of set_union in the STL is, therefore, vastly worse than the performance of Set.union in OCaml, thanks to persistence. In summary, like me, you've probably been using persistence a lot without realising it. Every time you do set operations, for example. HTH. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Persistence 2005-02-21 20:17 Re: [Caml-list] Persistence Don Syme @ 2005-02-21 23:24 ` Jon Harrop 2005-02-22 9:06 ` Diego Olivier Fernandez Pons 0 siblings, 1 reply; 4+ messages in thread From: Jon Harrop @ 2005-02-21 23:24 UTC (permalink / raw) To: Don Syme; +Cc: Caml mailing list On Monday 21 February 2005 20:17, Don Syme wrote: > Could you send around the source code to this? I intuitively knew that > such performance results were likely, but had never had such a > convincing example to wave at people. Yes. The implementation of traveling salesman given in my book is another example which is several orders of magnitude faster than the C/C++/Fortran implementations given in all of the other text books that I have read. This problem also happens to be a defacto standard test given to most natural science undergrads on their computing courses, which should be good for book sales. Tell everyone you know. :-) > It would also be interesting to ask some C++ programmers how they would > go about optimizing this code, for a couple of reasons > > (a) to see if they come up with anything faster; and I managed to, after a while. However, doing this optimisation required me to have several things: 1. Estimates of run-time sizes of data structures. 2. Knowledge of the internals of the STL set implementation. 3. Knowledge of the complexities of various operations over STL sets. 4. Huge kahunas, as the resulting code is much more likely to be wrong. The optimised C++ is also specialised by (1), whereas the OCaml code will always be efficient (because Xavier wrote the hard part ;-). Also, the C++ implementation consumes twice as much memory (which could be a real killer as many scientific computations use resources to the limit of the available computer). Both the OCaml and the C++ can probably be optimised more, but I'm interested in problems where author-time is comparable to execution-time. Were I to continue optimising, I would move to hash sets, but these require an imperative approach which negates the principal advantage of the current OCaml implementation: persistence. > (b) to see if they would ever even think of using > sharing-of-complex-immutable-internal-data-structure-nodes as an > optimization technique. No way. :-) Also (from the traveling salesman solution): (c) to see if they would ever even think of reinventing closures. For people who just want a quick look, here is the OCaml code for the core function: let rec nth_nn = let memory = Hashtbl.create 1 in fun n (i, io) -> try Hashtbl.find memory (n, i) with Not_found -> match n with 0 -> AtomSet.singleton (i, io) | 1 -> let nn = bonds.(i - 1) in if io = zero then nn else let aux (j, jo) s = AtomSet.add (j, add_i io jo) s in AtomSet.fold aux nn AtomSet.empty | n -> let pprev = nth_nn (n-2) (i, io) in let prev = nth_nn (n-1) (i, io) in let aux j t = AtomSet.union (nth_nn 1 j) t in let t = AtomSet.fold aux prev AtomSet.empty in let t = AtomSet.diff (AtomSet.diff t prev) pprev in Hashtbl.add memory (n, i) t; t in $ time ./nth 30 1 <cfg-diamond >tmp real 0m4.167s user 0m3.940s sys 0m0.000s $ time ./nth.opt 30 1 <cfg-diamond >tmp real 0m0.492s user 0m0.450s sys 0m0.000s Here's the elegant C++ equivalent: const AtomSet nth_nn(int n, int i, const vector<int> io) { const Map::key_type k=make_pair(n, make_pair(i, io)); Map::const_iterator m=memory.find(k); if (m != memory.end()) return m->second; AtomSet s; if (n == 0) { s.insert(make_pair(i, io)); return s; } else if (n == 1) { const AtomSet &nn=bonds[i-1]; for (AtomSet::const_iterator it=nn.begin(); it != nn.end(); it++) { int j = it->first; vector<int> jo = it->second; for (i=0; i<d; i++) jo[i] += io[i]; s.insert(make_pair(j, jo)); } return s; } else { const AtomSet &pprev = nth_nn(n-2, i, io), &prev = nth_nn(n-1, i, io); for (AtomSet::const_iterator it=prev.begin(); it != prev.end(); it++) { const AtomSet &t = nth_nn(1, it->first, it->second); AtomSet t2; set_union(s.begin(), s.end(), t.begin(), t.end(), inserter(t2, t2.begin())); s=t2; } AtomSet s2; set_difference(s.begin(), s.end(), prev.begin(), prev.end(), inserter(s2, s2.begin())); s=AtomSet(); set_difference(s2.begin(), s2.end(), pprev.begin(), pprev.end(), inserter(s, s.begin())); } memory[k] = s; return memory.find(k)->second; } With this C++, I get: $ time ./nth 30 1 <cfg-diamond >tmp real 1m31.178s user 1m12.680s sys 0m0.100s With my sneaky optimisation: $ time ./nth 30 1 <cfg-diamond >tmp real 0m0.479s user 0m0.480s sys 0m0.000s I've just noticed that the timing results are quite interesting functions of "n". I think I'll plot them... -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://ffconsultancy.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Persistence 2005-02-21 23:24 ` Jon Harrop @ 2005-02-22 9:06 ` Diego Olivier Fernandez Pons 2005-02-22 11:59 ` Thomas Fischbacher 0 siblings, 1 reply; 4+ messages in thread From: Diego Olivier Fernandez Pons @ 2005-02-22 9:06 UTC (permalink / raw) To: caml-list Bonjour, > > Could you send around the source code to this? I intuitively knew > > that such performance results were likely, but had never had such > > a convincing example to wave at people. > > Yes. The implementation of traveling salesman given in my book is > another example which is several orders of magnitude faster than the > C/C++/Fortran implementations given in all of the other text books > that I have read. Branch-and-bound algorithms (pure integer, not LP based) are good candidates for important speed-ups due to data persistence : when you backtrack it is most of the time much faster to restart from a memorized data structure than to reconstruct it, specially with structured data like ordered sets, graphs, etc. Generally speaking, the _good_ paradigm to program B&B solvers for NP-hard problems is mixed functional/imperative programming (in other words ML), because you benefit from fast recursion, data persistence, garbage collection while you use very few floating point computations. Actually, most of the commercial C/C++ libraries for constraint programming (= integer branch and bound) include a 'functional layer' that contains one or several garbage collectors, persistent data structures, a system to make persistent user-defined data structures, function objects, etc. Winning against a mixed imperative/functional implementation in e.g. C/C++ require a lot of low-level implementation work - and you will end anyway implementing some kind of 'functional core'. Worse, algorithmic wins (better algorithms) are most of the time several order of magnitude superior than implementation wins (better implementation). Therefore it is also more convenient to use a high level language like Caml in order to focalize on algorithms rather than implementation. Diego Olivier ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Persistence 2005-02-22 9:06 ` Diego Olivier Fernandez Pons @ 2005-02-22 11:59 ` Thomas Fischbacher 0 siblings, 0 replies; 4+ messages in thread From: Thomas Fischbacher @ 2005-02-22 11:59 UTC (permalink / raw) To: Diego Olivier Fernandez Pons; +Cc: caml-list On Tue, 22 Feb 2005, Diego Olivier Fernandez Pons wrote: > Winning against a mixed imperative/functional implementation in e.g. > C/C++ require a lot of low-level implementation work - and you will > end anyway implementing some kind of 'functional core'. Greenspun's tenth rule, isn't it? -- regards, tf@cip.physik.uni-muenchen.de (o_ Thomas Fischbacher - http://www.cip.physik.uni-muenchen.de/~tf //\ (lambda (n) ((lambda (p q r) (p p q r)) (lambda (g x y) V_/_ (if (= x 0) y (g g (- x 1) (* x y)))) n 1)) (Debian GNU) ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-02-22 11:59 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-02-21 20:17 Re: [Caml-list] Persistence Don Syme 2005-02-21 23:24 ` Jon Harrop 2005-02-22 9:06 ` Diego Olivier Fernandez Pons 2005-02-22 11:59 ` Thomas Fischbacher
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox