* Re: OCaml is broken @ 2009-12-19 19:38 Jeff Shaw 2009-12-20 4:43 ` [Caml-list] " Jon Harrop 2009-12-20 11:56 ` [***SPAM*** Score/Req: 10.1/8.0] [Caml-list] Re: OCaml is broken Erik Rigtorp 0 siblings, 2 replies; 57+ messages in thread From: Jeff Shaw @ 2009-12-19 19:38 UTC (permalink / raw) To: caml-list My understanding is that since jocaml uses the regular ocaml runtime, it is also not multicore enabled. Haskell is a functional language that has good performance that can use multiple processors, but the learning curve is steeper and higher. OCaml is a close relative of Standard ML, so there might be some implementation of SML that you like. MLTon might allow multicore use, but I'm not sure how mature it is. SML/NJ has a library or language extension called Concurrent ML, but I think SML/NJ might not use multiple processors. Note that if you're not using a lot of threads, you can use Unix.fork to do true multithreaded programming ocaml. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-19 19:38 OCaml is broken Jeff Shaw @ 2009-12-20 4:43 ` Jon Harrop 2009-12-20 12:21 ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp 2009-12-21 13:31 ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow 2009-12-20 11:56 ` [***SPAM*** Score/Req: 10.1/8.0] [Caml-list] Re: OCaml is broken Erik Rigtorp 1 sibling, 2 replies; 57+ messages in thread From: Jon Harrop @ 2009-12-20 4:43 UTC (permalink / raw) To: caml-list On Saturday 19 December 2009 19:38:41 Jeff Shaw wrote: > My understanding is that since jocaml uses the regular ocaml runtime, it > is also not multicore enabled. > > Haskell is a functional language that has good performance GHC and the Haskell language itself have serious performance problems. > that can use multiple processors, but the learning curve is steeper and > higher. And Haskell lacks many of the features OCaml programmers take for granted. > OCaml is a close relative of Standard ML, so there might be some > implementation of SML that you like. MLTon might allow multicore use, > but I'm not sure how mature it is. SML/NJ has a library or language > extension called Concurrent ML, but I think SML/NJ might not use > multiple processors. MLton and SML/NJ are both multicore incapable. The PolyML implementation of SML is multicore friendly but last time I looked (many years ago) it was 100x slower than OCaml for floating point. As long as you're looking at OCaml's close relatives with multicore support, F# is your only viable option. Soon, HLVM will provide a cross-platform open source solution. If you look further you will also find Scala and Clojure. > Note that if you're not using a lot of threads, you can use Unix.fork to > do true multithreaded programming ocaml. We've discussed the problems with that before. Writing a parallel generic quicksort seems to be a good test of a decent multicore capable language implementation. Currently, F# is a *long* way ahead of everything open source. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken 2009-12-20 4:43 ` [Caml-list] " Jon Harrop @ 2009-12-20 12:21 ` Erik Rigtorp 2009-12-20 13:22 ` Martin Jambon ` (4 more replies) 2009-12-21 13:31 ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow 1 sibling, 5 replies; 57+ messages in thread From: Erik Rigtorp @ 2009-12-20 12:21 UTC (permalink / raw) Cc: caml-list On Sun, Dec 20, 2009 at 05:43, Jon Harrop <jon@ffconsultancy.com> wrote: > As long as you're looking at OCaml's close relatives with multicore support, > F# is your only viable option. Soon, HLVM will provide a cross-platform open > source solution. If you look further you will also find Scala and Clojure. F# is not viable since i'm developing for Solaris. I also believe the .NET GC is not good enough for real-time systems. Clojure running under real-time Java might be interesting. It's too bad that INRIA is not interested in fixing this bug. No matter what people say I consider this a bug. Two cores is standard by now, I'm used to 8, next year 32 and so on. OCaml will only become more and more irrelevant. I hate to see that happening. I think right now only Erlang got this right and they have a great library for developing enterprise applications too! The first step for OCaml would be to be able to run multiple communicating instances of the runtime bound to one core each in one process and have them communicate via lock free queues. Erik Rigtorp ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken 2009-12-20 12:21 ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp @ 2009-12-20 13:22 ` Martin Jambon 2009-12-20 13:47 ` Yaron Minsky ` (3 subsequent siblings) 4 siblings, 0 replies; 57+ messages in thread From: Martin Jambon @ 2009-12-20 13:22 UTC (permalink / raw) To: Erik Rigtorp; +Cc: caml-list Erik Rigtorp wrote: > It's too bad that INRIA is not interested in fixing this bug. Ask Santa Claus, you'll get it by Friday. Free shipping. ;-) Martin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken 2009-12-20 12:21 ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp 2009-12-20 13:22 ` Martin Jambon @ 2009-12-20 13:47 ` Yaron Minsky 2009-12-20 16:01 ` Gerd Stolpmann 2009-12-21 22:50 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp 2009-12-20 14:27 ` Dario Teixeira ` (2 subsequent siblings) 4 siblings, 2 replies; 57+ messages in thread From: Yaron Minsky @ 2009-12-20 13:47 UTC (permalink / raw) To: Erik Rigtorp; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 1543 bytes --] On Sun, Dec 20, 2009 at 7:21 AM, Erik Rigtorp <erik@rigtorp.com> wrote: > The first step for OCaml would be to be able to run multiple > communicating instances of the runtime bound to one core each in one > process and have them communicate via lock free queues. We've done some experiments in this direction at Jane Street. On Linux, we've been able to get fast enough IPC channels for our purposes that slamming things into the same memory space has not in the end been necessary. (There is I agree some pain associated with running multiple runtimes in the same process. If you're interested, contact me off-list and I can try to get you some of the details of what we ran into.) But have you tried using shared-memory segments for communicating between different processes? You say the latencies are too high, but do you have any measurements you could share? Have you tried queues using shared memory segments, in particular? Inter-thread communication has latency as well, and the performance issues depend on lots of things, OS and hardware platform included. It would help in understanding the tradeoffs. As we go to higher-and-higher numbers of cores, I suspect that message-passing solutions are likely to scale better than shared memory, so I'm not so sure that OCaml is on the wrong path here. I think that most of the work that's needed is going to come in the form of libraries, with only a little work in the compiler and the runtime. Given that, I think this is an issue for the community to solve, not INRIA. y [-- Attachment #2: Type: text/html, Size: 1916 bytes --] ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-20 13:47 ` Yaron Minsky @ 2009-12-20 16:01 ` Gerd Stolpmann 2009-12-21 22:50 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp 1 sibling, 0 replies; 57+ messages in thread From: Gerd Stolpmann @ 2009-12-20 16:01 UTC (permalink / raw) To: yminsky; +Cc: Erik Rigtorp, caml-list Am Sonntag, den 20.12.2009, 08:47 -0500 schrieb Yaron Minsky: > On Sun, Dec 20, 2009 at 7:21 AM, Erik Rigtorp <erik@rigtorp.com> > wrote: > The first step for OCaml would be to be able to run multiple > communicating instances of the runtime bound to one core each > in one > process and have them communicate via lock free queues. > > > We've done some experiments in this direction at Jane Street. On > Linux, we've been able to get fast enough IPC channels for our > purposes that slamming things into the same memory space has not in > the end been necessary. (There is I agree some pain associated with > running multiple runtimes in the same process. If you're interested, > contact me off-list and I can try to get you some of the details of > what we ran into.) > > > But have you tried using shared-memory segments for communicating > between different processes? You say the latencies are too high, but > do you have any measurements you could share? Have you tried queues > using shared memory segments, in particular? Inter-thread > communication has latency as well, and the performance issues depend > on lots of things, OS and hardware platform included. It would help > in understanding the tradeoffs. I'm also experimenting now with shared memory (shm) as fast IPC mechanism. I've extended ocamlnet with a few functions that allow to copy an ocaml value into a shm segment which is accessible as bigarray: https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli Look especially for init_string. (I've also to mention Ancient here which inspired to this work.) Having ocaml values in shm saves us from some marshalling costs which is right now the biggest performance penalty when using multiple processes. However, this causes some problems, and at some point modifications of the ocaml runtime will be necessary: - The polymorphic equality and hash primitives do not work anymore for values in such shm segments (and that really hurts, especially string comparison is broken) - Given that the shm segment is set to read-only after being set up, it is not possible to have pointers from shm to other memory regions. This is good, as this would be very dangerous (GC may delete or move values in the regular heap). However, the question arises when the shm segment can be deleted. We would need help by the GC to identify segments that are no longer referenced. Without that, shm will be restricted to a role as low-level inter-process buffers. > As we go to higher-and-higher numbers of cores, I suspect that > message-passing solutions are likely to scale better than shared > memory, so I'm not so sure that OCaml is on the wrong path here. I > think that most of the work that's needed is going to come in the form > of libraries, with only a little work in the compiler and the > runtime. Given that, I think this is an issue for the community to > solve, not INRIA. Well, message passing and shm do not exclude each other. We should refine the terminology here: Actually, shm is just a basic mechanism where several execution threads (including processes) can share memory. What's often meant is, however, the role it plays for multi-threading, i.e. shared mutable data structures. What's typical here is that several threads write to the same memory regions. I don't know a good name for naming that programming style - maybe multi-threading style shm is the best. I'm working on a local message passing queue that can be used for long messages, based on shm, and where the messages can contain normal ocaml values (although it is likely that these are copied to the normal heap by the receiver, for the above mentioned reasons, but this is an expensive copy). The whole point will be that the data marshalling costs are minimized. So far I can already say, we will need some changes in the runtime to make such a mechanism fast and safe. Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken 2009-12-20 13:47 ` Yaron Minsky 2009-12-20 16:01 ` Gerd Stolpmann @ 2009-12-21 22:50 ` Erik Rigtorp 2009-12-22 12:04 ` Erik Rigtorp 2009-12-29 12:07 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Richard Jones 1 sibling, 2 replies; 57+ messages in thread From: Erik Rigtorp @ 2009-12-21 22:50 UTC (permalink / raw) To: yminsky; +Cc: caml-list On Sun, Dec 20, 2009 at 14:47, Yaron Minsky <yminsky@gmail.com> wrote: > On Sun, Dec 20, 2009 at 7:21 AM, Erik Rigtorp <erik@rigtorp.com> wrote: >> >> The first step for OCaml would be to be able to run multiple >> communicating instances of the runtime bound to one core each in one >> process and have them communicate via lock free queues. > > We've done some experiments in this direction at Jane Street. On Linux, > we've been able to get fast enough IPC channels for our purposes that > slamming things into the same memory space has not in the end been > necessary. (There is I agree some pain associated with running multiple > runtimes in the same process. If you're interested, contact me off-list and > I can try to get you some of the details of what we ran into.) > But have you tried using shared-memory segments for communicating between > different processes? You say the latencies are too high, but do you have > any measurements you could share? Have you tried queues using shared memory > segments, in particular? Inter-thread communication has latency as well, > and the performance issues depend on lots of things, OS and hardware > platform included. It would help in understanding the tradeoffs. Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The benchmarks are running on a cpuset with 1 core. I measure the time from sending in one process until the other process receives the message. So a context switch and the message passing is included in the measurements. Max/Min/Avg * Pipes: 28205/5973/6259 * Unix domain sockets: 44256/7748/8153 * SYSv message queues: 19197/5895/6173 * Posix message queues: 37399/10965/11303 * TCP on loopback: 29017/7471/7885 So the latency is roughly 10µs for all these solutions. That latency is pretty high and would be several times the processing time of the message itself. I haven't tried using shm for IPC yet. I'll try see if I can do something with this lock-free queue: http://www.liblfds.org/. > As we go to higher-and-higher numbers of cores, I suspect that > message-passing solutions are likely to scale better than shared memory, so > I'm not so sure that OCaml is on the wrong path here. I think that most of I agree, but the message passing must be cheap and the creation and scheduling of lightweight threads must be fast. In order to do that the runtime needs to handle the communication via shared memory between OS threads or processes, binding OS processes and threads to cores and know about different cores different memory latencies and so on. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken 2009-12-21 22:50 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp @ 2009-12-22 12:04 ` Erik Rigtorp 2009-12-22 12:27 ` Mihamina Rakotomandimby 2009-12-22 13:27 ` Gerd Stolpmann 2009-12-29 12:07 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Richard Jones 1 sibling, 2 replies; 57+ messages in thread From: Erik Rigtorp @ 2009-12-22 12:04 UTC (permalink / raw) To: yminsky; +Cc: caml-list On Mon, Dec 21, 2009 at 23:50, Erik Rigtorp <erik@rigtorp.com> wrote: > Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The > benchmarks are running on a cpuset with 1 core. I measure the time > from sending in one process until the other process receives the > message. So a context switch and the message passing is included in > the measurements. > > Max/Min/Avg > * Pipes: 28205/5973/6259 > * Unix domain sockets: 44256/7748/8153 > * SYSv message queues: 19197/5895/6173 > * Posix message queues: 37399/10965/11303 > * TCP on loopback: 29017/7471/7885 > > So the latency is roughly 10µs for all these solutions. That latency > is pretty high and would be several times the processing time of the > message itself. Some more benchmarks: Max/Min/Avg * Spinlocking shm: 50897/403/761 (This one utilizes multiple cores, since one core is just burning while waiting for data) * Pthreads mutex shm: 27582/5246/6577 Forgot to say that all measurements are in nanoseconds. Erik ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken 2009-12-22 12:04 ` Erik Rigtorp @ 2009-12-22 12:27 ` Mihamina Rakotomandimby 2009-12-22 13:27 ` Gerd Stolpmann 1 sibling, 0 replies; 57+ messages in thread From: Mihamina Rakotomandimby @ 2009-12-22 12:27 UTC (permalink / raw) To: caml-list > Erik Rigtorp <erik@rigtorp.com> : > > Max/Min/Avg > > * Pipes: 28205/5973/6259 > > * Unix domain sockets: 44256/7748/8153 > > * SYSv message queues: 19197/5895/6173 > > * Posix message queues: 37399/10965/11303 > > * TCP on loopback: 29017/7471/7885 > Some more benchmarks: > Max/Min/Avg > * Spinlocking shm: 50897/403/761 (This one utilizes multiple cores, > since one core is just burning while waiting for data) > * Pthreads mutex shm: 27582/5246/6577 And, then... is OCaml "broken"? :-) -- Architecte Informatique chez Blueline/Gulfsat: Administration Systeme, Recherche & Developpement +261 34 29 155 34 / +261 33 11 207 36 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-22 12:04 ` Erik Rigtorp 2009-12-22 12:27 ` Mihamina Rakotomandimby @ 2009-12-22 13:27 ` Gerd Stolpmann 2009-12-23 11:25 ` Erik Rigtorp 1 sibling, 1 reply; 57+ messages in thread From: Gerd Stolpmann @ 2009-12-22 13:27 UTC (permalink / raw) To: Erik Rigtorp; +Cc: yminsky, caml-list Am Dienstag, den 22.12.2009, 13:04 +0100 schrieb Erik Rigtorp: > On Mon, Dec 21, 2009 at 23:50, Erik Rigtorp <erik@rigtorp.com> wrote: > > Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The > > benchmarks are running on a cpuset with 1 core. I measure the time > > from sending in one process until the other process receives the > > message. So a context switch and the message passing is included in > > the measurements. > > > > Max/Min/Avg > > * Pipes: 28205/5973/6259 > > * Unix domain sockets: 44256/7748/8153 > > * SYSv message queues: 19197/5895/6173 > > * Posix message queues: 37399/10965/11303 > > * TCP on loopback: 29017/7471/7885 > > > > So the latency is roughly 10µs for all these solutions. That latency > > is pretty high and would be several times the processing time of the > > message itself. > > Some more benchmarks: > > Max/Min/Avg > * Spinlocking shm: 50897/403/761 (This one utilizes multiple cores, > since one core is just burning while waiting for data) > * Pthreads mutex shm: 27582/5246/6577 > > Forgot to say that all measurements are in nanoseconds. That's for communication between processes, right? How would the picture be different (especially comparing the latter two) if you do message passing between threads? If I remember correctly, threads are more light-weight in Solaris than processes. That could also affect context switching times, and scheduler decisions. Do you have source code? I could also run in on Linux, for comparison. Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-22 13:27 ` Gerd Stolpmann @ 2009-12-23 11:25 ` Erik Rigtorp 0 siblings, 0 replies; 57+ messages in thread From: Erik Rigtorp @ 2009-12-23 11:25 UTC (permalink / raw) To: Gerd Stolpmann; +Cc: yminsky, caml-list On Tue, Dec 22, 2009 at 14:27, Gerd Stolpmann <gerd@gerd-stolpmann.de> wrote: > > Am Dienstag, den 22.12.2009, 13:04 +0100 schrieb Erik Rigtorp: >> On Mon, Dec 21, 2009 at 23:50, Erik Rigtorp <erik@rigtorp.com> wrote: >> > Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The >> > benchmarks are running on a cpuset with 1 core. I measure the time >> > from sending in one process until the other process receives the >> > message. So a context switch and the message passing is included in >> > the measurements. >> > >> > Max/Min/Avg >> > * Pipes: 28205/5973/6259 >> > * Unix domain sockets: 44256/7748/8153 >> > * SYSv message queues: 19197/5895/6173 >> > * Posix message queues: 37399/10965/11303 >> > * TCP on loopback: 29017/7471/7885 >> > >> > So the latency is roughly 10µs for all these solutions. That latency >> > is pretty high and would be several times the processing time of the >> > message itself. >> >> Some more benchmarks: >> >> Max/Min/Avg >> * Spinlocking shm: 50897/403/761 (This one utilizes multiple cores, >> since one core is just burning while waiting for data) >> * Pthreads mutex shm: 27582/5246/6577 >> >> Forgot to say that all measurements are in nanoseconds. > > That's for communication between processes, right? How would the picture > be different (especially comparing the latter two) if you do message > passing between threads? If I remember correctly, threads are more > light-weight in Solaris than processes. That could also affect context > switching times, and scheduler decisions. With a system supporting green threads/tasklets/erlang processes over multiple cores you can have 1µs message passing latencies without busy waiting. I'll checkout the thread message passing too, but probably not until after new years. > Do you have source code? I could also run in on Linux, for comparison. I'll have that approved by my company first. It would actually be interesting to create a open source multiplatform IPC message passing benchmark. Erik ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken 2009-12-21 22:50 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp 2009-12-22 12:04 ` Erik Rigtorp @ 2009-12-29 12:07 ` Richard Jones 1 sibling, 0 replies; 57+ messages in thread From: Richard Jones @ 2009-12-29 12:07 UTC (permalink / raw) To: Erik Rigtorp; +Cc: yminsky, caml-list On Mon, Dec 21, 2009 at 11:50:36PM +0100, Erik Rigtorp wrote: > Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The > benchmarks are running on a cpuset with 1 core. I measure the time > from sending in one process until the other process receives the > message. So a context switch and the message passing is included in > the measurements. > > Max/Min/Avg > * Pipes: 28205/5973/6259 > * Unix domain sockets: 44256/7748/8153 > * SYSv message queues: 19197/5895/6173 > * Posix message queues: 37399/10965/11303 > * TCP on loopback: 29017/7471/7885 > > So the latency is roughly 10µs for all these solutions. That latency > is pretty high and would be several times the processing time of the > message itself. Are you pinning processes? Without pinning and understanding the corresponding physical architecture of the machine, such tests are pretty much useless. Also - Solaris 10 ...? That boat left a long time ago. You should really be thinking about migrating to modern operating systems run by a company with a future. Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-20 12:21 ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp 2009-12-20 13:22 ` Martin Jambon 2009-12-20 13:47 ` Yaron Minsky @ 2009-12-20 14:27 ` Dario Teixeira 2009-12-20 21:14 ` Jon Harrop 2009-12-20 19:38 ` [***SPAM*** Score/Req: 10.1/8.0] " Jon Harrop 2009-12-21 13:07 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] " Damien Doligez 4 siblings, 1 reply; 57+ messages in thread From: Dario Teixeira @ 2009-12-20 14:27 UTC (permalink / raw) To: Erik Rigtorp; +Cc: caml-list Hi, > It's too bad that INRIA is not interested in fixing this bug. No > matter what people say I consider this a bug. Two cores is standard by > now, I'm used to 8, next year 32 and so on. OCaml will only become > more and more irrelevant. I hate to see that happening. This is a perennial topic in this list. Without meaning to dwell too long on old arguments, I simply ask you to consider the following: - Do you really think a concurrent GC with shared memory will scale neatly to those 32 cores? - Will memory access remain homogeneous for all cores as soon as we get into the dozens of cores? - Have you considered that many Ocaml users prefer a GC that offers maximum single core performance, because their application is parallelised via multiple processes communicating via message passing? In this context, your "bug" is actually a "feature". Best regards, Dario Teixeira ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-20 14:27 ` Dario Teixeira @ 2009-12-20 21:14 ` Jon Harrop 2009-12-21 1:08 ` Gerd Stolpmann 0 siblings, 1 reply; 57+ messages in thread From: Jon Harrop @ 2009-12-20 21:14 UTC (permalink / raw) To: caml-list On Sunday 20 December 2009 14:27:00 Dario Teixeira wrote: > Hi, > > > It's too bad that INRIA is not interested in fixing this bug. No > > matter what people say I consider this a bug. Two cores is standard by > > now, I'm used to 8, next year 32 and so on. OCaml will only become > > more and more irrelevant. I hate to see that happening. > > This is a perennial topic in this list. Without meaning to dwell too > long on old arguments, I simply ask you to consider the following: > > - Do you really think a concurrent GC with shared memory will scale neatly > to those 32 cores? > > - Will memory access remain homogeneous for all cores as soon as we get > into the dozens of cores? The following web page describes a commercial machine sold by Azul Systems that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in a flat SMP configuration: http://www.azulsystems.com/products/compute_appliance.htm As you can see, a GC with shared memory can already scale across dozens of cores and memory access is no more heterogeneous than it was 20 years ago. Also, note that homogeneous memory access is a red herring in this context because it does not undermine the utility of a shared heap on a multicore. > - Have you considered that many Ocaml users prefer a GC that offers maximum > single core performance, OCaml's GC is nowhere near offering maximum single core performance. Its uniform data representation renders OCaml many times slower than its competitors for many tasks. For example, filling a 10M float->float hash table is over 18x slower with OCaml than with F#. FFT with a complex number type is 5.5x slower with OCaml than F#. Fibonacci with floats is 3.3x slower with OCaml than my own HLVM project (!). > because their application is parallelised via multiple processes > communicating via message passing? A circular argument based upon the self-selected group of remaining OCaml users. Today's OCaml users use OCaml despite its shortcomings. If you want to see the impact of OCaml's multicore unfriendliness, consider why the OCaml community has haemorrhaged 50% of its users in only 2 years. > In this context, your "bug" is actually a "feature". I'm not even sure you can substantiate that in the very specific context of distributed parallel theorem provers because other languages are so much more efficient at handling common abstractions like parametric polymorphism. Got any benchmarks? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-20 21:14 ` Jon Harrop @ 2009-12-21 1:08 ` Gerd Stolpmann 2009-12-21 4:30 ` Jon Harrop 2009-12-26 17:08 ` orbitz 0 siblings, 2 replies; 57+ messages in thread From: Gerd Stolpmann @ 2009-12-21 1:08 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list > The following web page describes a commercial machine sold by Azul Systems > that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in a flat > SMP configuration: > > http://www.azulsystems.com/products/compute_appliance.htm > > As you can see, a GC with shared memory can already scale across dozens of > cores and memory access is no more heterogeneous than it was 20 years ago. > Also, note that homogeneous memory access is a red herring in this context > because it does not undermine the utility of a shared heap on a multicore. The benchmarks they mention can all easily be parallelized - that stuff you can also do with multi-processing. The interesting thing would be an inherent parallel algorithm where the same memory region is accessed by multiple threads. Or at least a numeric program (your examples seem to be mostly from that area). > > - Have you considered that many Ocaml users prefer a GC that offers maximum > > single core performance, > > OCaml's GC is nowhere near offering maximum single core performance. Its > uniform data representation renders OCaml many times slower than its > competitors for many tasks. For example, filling a 10M float->float hash > table is over 18x slower with OCaml than with F#. FFT with a complex number > type is 5.5x slower with OCaml than F#. Fibonacci with floats is 3.3x slower > with OCaml than my own HLVM project (!). Sure, but these micro benchmarks are first seldom correct, and do not really count for real-world programs. For example, an important parameter of such benchmarks is the frequency the GC runs. Ocaml runs the GC very often - good for latencies, but bad for micro benchmarks because other runtimes simply delay the GC until some limits are exceeded, so these other runtimes often haven't run the GC even once in the short period of time the benchmark runs. It is simply a fact that the ocaml developers had some preferences. E.g. allocating and freeing short-living values is extremely fast (often <10ns). This is very good when you do symbolic computations, or have lots of small strings, but ignorable for numeric stuff, or for programs where the lifetime of allocated memory is bound to server sessions. The minor GC is very fast, but, as you observe, the uniform representation has costs elsewhere. > > because their application is parallelised via multiple processes > > communicating via message passing? > > A circular argument based upon the self-selected group of remaining OCaml > users. Today's OCaml users use OCaml despite its shortcomings. If you want to > see the impact of OCaml's multicore unfriendliness, consider why the OCaml > community has haemorrhaged 50% of its users in only 2 years. Don't see that. That's just speculation - maybe some win32 ocaml users switched to F#, but there are for sure also other reasons than multicore support, e.g. GUIs and better Windows integration. Btw, where do you get your numbers from? There are many, many users for whom multicore is just a useless hype. Either the algorithms are inherently difficult to parallelize (and this is vast majority), or are that easy (like all client/server stuff) that multi-processing is sufficient. You can consider multicore as a marketing trick of the chip industry to let the ordinary desktop user pay for a feature that is mostly interesting for datacenters. Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-21 1:08 ` Gerd Stolpmann @ 2009-12-21 4:30 ` Jon Harrop 2009-12-21 3:58 ` Yaron Minsky 2009-12-21 5:32 ` Markus Mottl 2009-12-26 17:08 ` orbitz 1 sibling, 2 replies; 57+ messages in thread From: Jon Harrop @ 2009-12-21 4:30 UTC (permalink / raw) To: caml-list On Monday 21 December 2009 01:08:14 Gerd Stolpmann wrote: > > The following web page describes a commercial machine sold by Azul > > Systems that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory > > in a flat SMP configuration: > > > > http://www.azulsystems.com/products/compute_appliance.htm > > > > As you can see, a GC with shared memory can already scale across dozens > > of cores and memory access is no more heterogeneous than it was 20 years > > ago. Also, note that homogeneous memory access is a red herring in this > > context because it does not undermine the utility of a shared heap on a > > multicore. > > The benchmarks they mention can all easily be parallelized - that stuff > you can also do with multi-processing. Only if the result is small, otherwise you spend all of your time deserializing it. With a shared heap, you just return the resulting value by reference. > The interesting thing would be an > inherent parallel algorithm where the same memory region is accessed by > multiple threads. Concurrent hash tables are a big thing for Azul: "Scales well up to 768 CPUs" - http://www.youtube.com/watch?v=WYXgtXWejRM This blog entry describes performance on 750 cores: http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html > Or at least a numeric program (your examples seem to be mostly from that > area). Yes. You can look at matrix operations or linear algebra (QR decomposition) but also things like quicksort and graphics. Would be interesting to compare symbolic performance as well though. > > > - Have you considered that many Ocaml users prefer a GC that offers > > > maximum single core performance, > > > > OCaml's GC is nowhere near offering maximum single core performance. Its > > uniform data representation renders OCaml many times slower than its > > competitors for many tasks. For example, filling a 10M float->float hash > > table is over 18x slower with OCaml than with F#. FFT with a complex > > number type is 5.5x slower with OCaml than F#. Fibonacci with floats is > > 3.3x slower with OCaml than my own HLVM project (!). > > Sure, but these micro benchmarks are first seldom correct, and do not > really count for real-world programs. > > For example, an important parameter of such benchmarks is the frequency > the GC runs. Ocaml runs the GC very often - good for latencies, but bad > for micro benchmarks because other runtimes simply delay the GC until > some limits are exceeded, so these other runtimes often haven't run the > GC even once in the short period of time the benchmark runs. You're missing the point: every example I gave shouldn't be doing any GC at all and doesn't in F# but spends a lot of time in the GC in OCaml just because of unnecessary boxing. The mutator also takes longer because boxing damages locality. > It is simply a fact that the ocaml developers had some preferences. E.g. > allocating and freeing short-living values is extremely fast (often > <10ns). This is very good when you do symbolic computations, or have > lots of small strings, but ignorable for numeric stuff, or for programs > where the lifetime of allocated memory is bound to server sessions. The > minor GC is very fast, but, as you observe, the uniform representation > has costs elsewhere. Yes. That's why I think the best way forward is to develop HLVM. > > > because their application is parallelised via multiple processes > > > communicating via message passing? > > > > A circular argument based upon the self-selected group of remaining OCaml > > users. Today's OCaml users use OCaml despite its shortcomings. If you > > want to see the impact of OCaml's multicore unfriendliness, consider why > > the OCaml community has haemorrhaged 50% of its users in only 2 years. > > Don't see that. That's just speculation - maybe some win32 ocaml users > switched to F#, I wasn't a win32 user. :-) > but there are for sure also other reasons than multicore > support, e.g. GUIs and better Windows integration. Btw, where do you get > your numbers from? Traffic here: 2007: 5814 2008: 4051 2009: 3071 http://groups.google.com/group/fa.caml/about Or searches for OCaml on Google: http://www.google.com/trends?q=ocaml%2Cclojure%2Cf%23 The number of OCaml jobs has crashed as well: http://www.itjobswatch.co.uk/jobs/uk/ocaml.do And, of course, what our customers say. > There are many, many users for whom multicore is just a useless hype. In 2005, the OCaml community was composed largely of performance junkies who came here because OCaml produced excellent performance from succinct and readable code on benchmark after benchmark. More people were buying OFS than were using Coq. I don't believe for a second that many of OCaml's former users thought multicore was just useless hype. > Either the algorithms are inherently difficult to parallelize (and this > is vast majority), I have had great success parallelizing code. > or are that easy (like all client/server stuff) that multi-processing is > sufficient. There are certainly applications where multicore is not beneficial. > You can consider multicore as a marketing trick of the chip > industry to let the ordinary desktop user pay for a feature that is mostly > interesting for datacenters. Ordinary desktop users have been paying top dollar for parallel computers in the form of GPUs for some time now. The use of GPUs for more general programming has been a really hot topic for years and just became viable. Even games consoles have multicores. ARM are making quadcores for your phone and netbook! If I can get HLVM to make parallel OCaml-style programming easy, I think a lot of people would love it. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-21 4:30 ` Jon Harrop @ 2009-12-21 3:58 ` Yaron Minsky 2009-12-21 5:32 ` Markus Mottl 1 sibling, 0 replies; 57+ messages in thread From: Yaron Minsky @ 2009-12-21 3:58 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 2389 bytes --] I find the ponderings on the popularity of OCaml to be of limited utility --- those who pick OCaml based on its popularity are making a terrible mistake. OCaml was a deeply unpopular language in 2005 and remains so today, the variations notwithstanding. There are other good reasons to use the language nonetheless. On Sun, Dec 20, 2009 at 11:30 PM, Jon Harrop <jon@ffconsultancy.com> wrote: > Or searches for OCaml on Google: > > http://www.google.com/trends?q=ocaml%2Cclojure%2Cf%23 I'm not sure if OCaml is becoming more or less popular, but I find the evidence for a decline less than convincing. It is true that there is less traffic on this list, but it's hard to know how to interpret this. I haven't gotten the sense that Python is in decline, but traffic on comp.lang.python has also been declining since 2005. Google Trends is also a confusing metric. For example, it suggests that Java, Python and C++ have been declining for years: http://www.google.com/trends?q=java&ctab=0&geo=all&date=all&sort=0 http://www.google.com/trends?q=C%2B%2B&ctab=0&geo=all&date=all&sort=0 http://www.google.com/trends?q=Python&ctab=0&geo=all&date=all&sort=0 My suspicion is that Google Trends gives numbers normalized to the overall search world, and so things that aren't growing fast look smaller as search volume in general grows. Obviously an up-and-coming language like clojure still shows an upswing, as one would expect from an up-and-coming language. The number of OCaml jobs has crashed as well: > > http://www.itjobswatch.co.uk/jobs/uk/ocaml.do I thought this was a silly metric when it spiked up, and continue to think it's a silly metric today. There are a tiny number of legitimate ocaml jobs (and the same is true for Haskell, Clojure, Scala, SML, etc.) and the ups-and-down in this tiny sample are not statistically significant. Again: don't pick OCaml because of the large number of OCaml jobs out there. There are very very few, both now and in '05. Reliable metrics on a community like this are hard to come by, but things seem quite vibrant to me. There are always new OCaml startups popping into existence, new libraries being written, and new things coming out of INRIA (for example, the arrival of modules as first-class values, which is expected in OCaml 3.12). From my point of view, there is still no platform out there I would rather be using. y [-- Attachment #2: Type: text/html, Size: 3572 bytes --] ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-21 4:30 ` Jon Harrop 2009-12-21 3:58 ` Yaron Minsky @ 2009-12-21 5:32 ` Markus Mottl 2009-12-21 13:29 ` Jon Harrop 1 sibling, 1 reply; 57+ messages in thread From: Markus Mottl @ 2009-12-21 5:32 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list On Sun, Dec 20, 2009 at 23:30, Jon Harrop <jon@ffconsultancy.com> wrote: > Traffic here: > > 2007: 5814 > 2008: 4051 > 2009: 3071 That's because I don't have much time to post here nowaydays. I'm sure if Jon followed my example, we would have a parallel GC for OCaml by the end of the year. Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-21 5:32 ` Markus Mottl @ 2009-12-21 13:29 ` Jon Harrop 0 siblings, 0 replies; 57+ messages in thread From: Jon Harrop @ 2009-12-21 13:29 UTC (permalink / raw) To: caml-list On Monday 21 December 2009 05:32:38 Markus Mottl wrote: > On Sun, Dec 20, 2009 at 23:30, Jon Harrop <jon@ffconsultancy.com> wrote: > > Traffic here: > > > > 2007: 5814 > > 2008: 4051 > > 2009: 3071 > > That's because I don't have much time to post here nowaydays. I'm > sure if Jon followed my example, we would have a parallel GC for OCaml > by the end of the year. HLVM already has a parallel GC. :-) -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: OCaml is broken 2009-12-21 1:08 ` Gerd Stolpmann 2009-12-21 4:30 ` Jon Harrop @ 2009-12-26 17:08 ` orbitz 1 sibling, 0 replies; 57+ messages in thread From: orbitz @ 2009-12-26 17:08 UTC (permalink / raw) To: Gerd Stolpmann; +Cc: Jon Harrop, caml-list On Dec 20, 2009, at 8:08 PM, Gerd Stolpmann wrote: >> The following web page describes a commercial machine sold by Azul >> Systems >> that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in >> a flat >> SMP configuration: >> >> http://www.azulsystems.com/products/compute_appliance.htm >> >> As you can see, a GC with shared memory can already scale across >> dozens of >> cores and memory access is no more heterogeneous than it was 20 >> years ago. >> Also, note that homogeneous memory access is a red herring in this >> context >> because it does not undermine the utility of a shared heap on a >> multicore. > > The benchmarks they mention can all easily be parallelized - that > stuff > you can also do with multi-processing. The interesting thing would > be an > inherent parallel algorithm where the same memory region is accessed > by > multiple threads. Or at least a numeric program (your examples seem to > be mostly from that area). I'm not sure if it is relevant here, but it should be noted that a lot of the performance gains Azul gets is because they have built their own chips that do a lot of tricks for you under the hood. Last I used an Azul Appliance, they perform quite poorly if you are hitting the same memory often from multiple threads (the machine I used was about 4x slower than an equivalent Intel machine for a single core). If the Azul tricks make it into desktop processors, that would likely be pretty great. Also, for what it's worth, lots of cores have actually been less performant in the type of computing I currently do. We want less cores and more physical boxes, making multiple processes running single threads a better solution for us. We tend to become memory IO bound by multiple cores (the bus cannot keep up with us). We are processing lots of biological data. For the record we are not using Ocaml for our project, just an observation of what model works well for us. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken 2009-12-20 12:21 ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp ` (2 preceding siblings ...) 2009-12-20 14:27 ` Dario Teixeira @ 2009-12-20 19:38 ` Jon Harrop 2009-12-21 12:26 ` Mihamina Rakotomandimby 2009-12-21 13:07 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] " Damien Doligez 4 siblings, 1 reply; 57+ messages in thread From: Jon Harrop @ 2009-12-20 19:38 UTC (permalink / raw) To: caml-list On Sunday 20 December 2009 12:21:44 Erik Rigtorp wrote: > On Sun, Dec 20, 2009 at 05:43, Jon Harrop <jon@ffconsultancy.com> wrote: > > As long as you're looking at OCaml's close relatives with multicore > > support, F# is your only viable option. Soon, HLVM will provide a > > cross-platform open source solution. If you look further you will also > > find Scala and Clojure. > > F# is not viable since i'm developing for Solaris. Yes. F# is Windows only for all intents and purposes. > I also believe the .NET GC is not good enough for real-time systems. Although heavily allocating threads will experience pauses of up to several seconds on .NET. However, threads that do not exceed their allocation quota run almost completely concurrently with the GC, so their real-time performance characteristics are good. This is the key to keeping UI threads responsive. Note that OCaml's GC has some problems. Specifically, the stack and arrays of pointers in the heap are not traversed incrementally, incurring arbitrarily-long stalls. > Clojure running under real-time Java might be interesting. Sounds like you have hard RT guarantees. > It's too bad that INRIA is not interested in fixing this bug. They spent something like a decade trying to write a decent concurrent GC and pioneered the field. > No matter what people say I consider this a bug. A perf bug at best: it just means that OCaml is slower for many tasks. > Two cores is standard by > now, I'm used to 8, next year 32 and so on. OCaml will only become > more and more irrelevant. I hate to see that happening. Me too. The OCaml language will continue to kick ass for some time to come but INRIA's implementation is no longer competitively performant for many tasks. However, open source offerings are all quite dire, particularly stand-alone ones. > I think right now only Erlang got this right and they have a great > library for developing enterprise applications too! I couldn't disagree more. The *only* reason to work on parallelism is performance and Erlang's performance sucks. I know Erlang scales better, but it scales from poor absolute performance on 1 core to poor absolute performance on n cores. Hence Erlang is hardly the defacto standard for HPC on shared-memory supercomputers. > The first step for OCaml would be to be able to run multiple > communicating instances of the runtime bound to one core each in one > process and have them communicate via lock free queues. I think the first step is simply to replace OCaml's GC with a stop-the-world parallel one like the one I wrote for HLVM. The problem is that OCaml's data representation gives absolutely dire performance and kills scalability if you do that. So you either need to optimize the GC for this or rewrite everything from the ground up. OC4MC is doing the former. My HLVM project is doing the latter. Suffice to say, there is no easy solution (although I prefer mine ;-). -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: OCaml is broken 2009-12-20 19:38 ` [***SPAM*** Score/Req: 10.1/8.0] " Jon Harrop @ 2009-12-21 12:26 ` Mihamina Rakotomandimby 2009-12-21 14:19 ` general question, was " Keyan 0 siblings, 1 reply; 57+ messages in thread From: Mihamina Rakotomandimby @ 2009-12-21 12:26 UTC (permalink / raw) To: caml-list > Jon Harrop <jon@ffconsultancy.com> : > > Two cores is standard by > > now, I'm used to 8, next year 32 and so on. OCaml will only become > > more and more irrelevant. I hate to see that happening. > > Me too. The OCaml language will continue to kick ass for some time to > come but INRIA's implementation is no longer competitively performant > for many tasks. However, open source offerings are all quite dire, > particularly stand-alone ones. That seems dark... -- Architecte Informatique chez Blueline/Gulfsat: Administration Systeme, Recherche & Developpement +261 34 29 155 34 / +261 33 11 207 36 ^ permalink raw reply [flat|nested] 57+ messages in thread
* general question, was Re: OCaml is broken 2009-12-21 12:26 ` Mihamina Rakotomandimby @ 2009-12-21 14:19 ` Keyan 2009-12-21 14:40 ` [Caml-list] " rixed ` (4 more replies) 0 siblings, 5 replies; 57+ messages in thread From: Keyan @ 2009-12-21 14:19 UTC (permalink / raw) To: caml-list Hi, i have a large project written in C++, for which i am planing to write add-ons and tools in ocaml, e.g. different tools to analyse my code (dependency stuff), an interpreter for a script-language i plan to include, etc, etc. form my time at the uni i remembered that ocaml allows to compile libraries which can be included in c/c++ program, and i know people who use it extensively in other projects. therefore, i decided to give ocaml a try. i like functional programming, and my first steps with ocaml are very promising. following this discussion, i am not so sure anymore, if ocaml is a good decision. may be i got this discussion wrong, but if ocaml is dying out, i might have to look for another functional programming language to use with my project. please dont take this email as an offence. i am just curious. at this point, i can still easily look for an alternative to ocaml, so its best to ask now. regards, keyan ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] general question, was Re: OCaml is broken 2009-12-21 14:19 ` general question, was " Keyan @ 2009-12-21 14:40 ` rixed 2009-12-21 14:42 ` Gerd Stolpmann ` (3 subsequent siblings) 4 siblings, 0 replies; 57+ messages in thread From: rixed @ 2009-12-21 14:40 UTC (permalink / raw) To: caml-list > following this discussion, i am not so sure anymore, if ocaml is a good decision. may be i got this discussion wrong, but if ocaml is dying out, i might have to look for another functional programming language to use with my project. Every programming language suffers its trolls and flamewars. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] general question, was Re: OCaml is broken 2009-12-21 14:19 ` general question, was " Keyan 2009-12-21 14:40 ` [Caml-list] " rixed @ 2009-12-21 14:42 ` Gerd Stolpmann 2009-12-21 15:25 ` Eray Ozkural 2009-12-21 14:50 ` Philip ` (2 subsequent siblings) 4 siblings, 1 reply; 57+ messages in thread From: Gerd Stolpmann @ 2009-12-21 14:42 UTC (permalink / raw) To: Keyan; +Cc: caml-list Am Montag, den 21.12.2009, 15:19 +0100 schrieb Keyan: > Hi, > > i have a large project written in C++, for which i am planing to write add-ons and tools in ocaml, e.g. different tools to analyse my code (dependency stuff), an interpreter for a script-language i plan to include, etc, etc. form my time at the uni i remembered that ocaml allows to compile libraries which can be included in c/c++ program, and i know people who use it extensively in other projects. therefore, i decided to give ocaml a try. i like functional programming, and my first steps with ocaml are very promising. > > following this discussion, i am not so sure anymore, if ocaml is a good decision. may be i got this discussion wrong, but if ocaml is dying out, i might have to look for another functional programming language to use with my project. > > please dont take this email as an offence. i am just curious. at this point, i can still easily look for an alternative to ocaml, so its best to ask now. Please don't believe Jon's propaganda. He has just very specific needs (high performance computing on desktops), and generalizes them in the way "it's not perfect for me anymore, so it's bad anyway". He has been doing that for years now, not seeing that he really harms the way ocaml is seen by newcomers. The examples you mention are good matches for using ocaml - symbolic programming with lots of terms and trees. That's the stuff ocaml was originally developed for, and it delivers excellence performance. Also, ocaml is still backed by INRIA, and there is still a large community, including a growing number of industrial users. Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] general question, was Re: OCaml is broken 2009-12-21 14:42 ` Gerd Stolpmann @ 2009-12-21 15:25 ` Eray Ozkural 0 siblings, 0 replies; 57+ messages in thread From: Eray Ozkural @ 2009-12-21 15:25 UTC (permalink / raw) To: caml-list On Mon, Dec 21, 2009 at 4:42 PM, Gerd Stolpmann <gerd@gerd-stolpmann.de> wrote: > > Please don't believe Jon's propaganda. He has just very specific needs > (high performance computing on desktops), and generalizes them in the > way "it's not perfect for me anymore, so it's bad anyway". He has been > doing that for years now, not seeing that he really harms the way ocaml > is seen by newcomers. I've seen some interesting parallel programming projects and language extensions using ocaml. I suppose ocaml could benefit from a parallelizing compiler & standardized explicit parallelism constructs, and be a serious contender for the multicore "market". I personally started out with Haskell with regards to contemporary high-level languages, and then switched to ocaml because of performance and sanity. I think I also love the higher-order modules =) I want to rewrite my stock prediction program in ocaml nowadays. In Haskell, it was a pain to work on large files. Good thing I lost the code in a hard drive crash. The way I see it, ocaml has adequate performance, and is excellent for algorithmic work. I have this half-finished project that features ocaml implementation of some algorithms. You should see them, they are almost identical to pseudo-code. I should move that project to ocamlforge. Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] general question, was Re: OCaml is broken 2009-12-21 14:19 ` general question, was " Keyan 2009-12-21 14:40 ` [Caml-list] " rixed 2009-12-21 14:42 ` Gerd Stolpmann @ 2009-12-21 14:50 ` Philip 2009-12-21 15:01 ` Keyan 2009-12-21 18:50 ` Jon Harrop 2009-12-21 18:48 ` Jon Harrop 2010-01-03 10:49 ` Sylvain Le Gall 4 siblings, 2 replies; 57+ messages in thread From: Philip @ 2009-12-21 14:50 UTC (permalink / raw) To: caml-list On Mon, 2009-12-21 at 15:19 +0100, Keyan wrote: > Hi, > > i have a large project written in C++, for which i am planing to write add-ons and tools in ocaml, e.g. different tools to analyse my code (dependency stuff), an interpreter for a script-language i plan to include, etc, etc. form my time at the uni i remembered that ocaml allows to compile libraries which can be included in c/c++ program, and i know people who use it extensively in other projects. therefore, i decided to give ocaml a try. i like functional programming, and my first steps with ocaml are very promising. > > following this discussion, i am not so sure anymore, if ocaml is a good decision. may be i got this discussion wrong, but if ocaml is dying out, i might have to look for another functional programming language to use with my project. > Functional programming languages will never become mainstream. There is a thread on haskell-mailing-list from time to time. What language you chose should depend always on your (your team) skills, tools and tasks. -Philip > please dont take this email as an offence. i am just curious. at this point, i can still easily look for an alternative to ocaml, so its best to ask now. > > regards, > keyan > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] general question, was Re: OCaml is broken 2009-12-21 14:50 ` Philip @ 2009-12-21 15:01 ` Keyan 2009-12-21 15:13 ` Stefano Zacchiroli 2009-12-21 15:27 ` Dario Teixeira 2009-12-21 18:50 ` Jon Harrop 1 sibling, 2 replies; 57+ messages in thread From: Keyan @ 2009-12-21 15:01 UTC (permalink / raw) To: Philip; +Cc: caml-list Hi, first of all thanks for the replies, and for not misunderstanding my email. > What language you chose should depend always on your (your team) skills, > tools and tasks. i dont want to go into a which-programming-language-is-best-for-what discussion (as this will never end), but at this point i wanted to know if ocaml is still alive, i.e. if you can still easily download and install it on a variety of OS, and if it will be supported in the future. i am working on an open-source project, so beside your arguments (which of course are valid), it is also important if people who decide not to download the binaries, but rather download the source, will be able to compile it. for me personally, i just like functional programming, and i like to learn new stuff from time to time. and to be honest, i am very happy to have found applications in the project which i can use a reason to include a functional programming language in the code :) so, thanks for your replies. i am confident again, and will continue to use ocaml. regards, keyan ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] general question, was Re: OCaml is broken 2009-12-21 15:01 ` Keyan @ 2009-12-21 15:13 ` Stefano Zacchiroli 2009-12-21 15:27 ` Dario Teixeira 1 sibling, 0 replies; 57+ messages in thread From: Stefano Zacchiroli @ 2009-12-21 15:13 UTC (permalink / raw) To: caml-list On Mon, Dec 21, 2009 at 04:01:31PM +0100, Keyan wrote: > but at this point i wanted to know if ocaml is still alive, i.e. if > you can still easily download and install it on a variety of OS, and Quite alive. http://packages.debian.org/changelogs/pool/main/o/ocaml/current/changelog (Look for "new upstream release" to check how often new upstream have been released and made part of Debian-based distributions. Fedora people have similar stories to tell.) Cheers. -- Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7 zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/ Dietro un grande uomo c'è ..| . |. Et ne m'en veux pas si je te tutoie sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] general question, was Re: OCaml is broken 2009-12-21 15:01 ` Keyan 2009-12-21 15:13 ` Stefano Zacchiroli @ 2009-12-21 15:27 ` Dario Teixeira 2009-12-21 15:46 ` Jacques Carette 1 sibling, 1 reply; 57+ messages in thread From: Dario Teixeira @ 2009-12-21 15:27 UTC (permalink / raw) To: Philip, Keyan; +Cc: caml-list Hi, > i dont want to go into a which-programming-language-is-best-for-what > discussion (as this will never end), but at this point i wanted to know > if ocaml is still alive, i.e. if you can still easily download and > install it on a variety of OS, and if it will be supported in the future. The fact that the compiler's source code is (a) available, and (b) straightforward enough for mere mortals to understand should give you some assurances that Ocaml can never die by fiat. Moreover, there's a vibrant community around it, both in industry and in the open-source world. (Ocaml support in Debian and Fedora is top-notch, for example). Last but not least, Ocaml plays a central role in multiple INRIA projects, which means its creators have all the reason to continue maintaining it and improving it for the foreseeable future (and there's some interesting goodies in the upcoming 3.12 release, for example). Though I am grateful and acknowledge Jon Harrop's help in the beginner's list, you should take his prognostications with a grain of salt. Every now and again he proclaims that "Ocaml is doomed! We're all gonna die!". It has almost become a comedy catchphrase of sorts in this list... So yes, do choose Ocaml for your project. You won't regret it. Best regards, Dario Teixeira ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] general question, was Re: OCaml is broken 2009-12-21 15:27 ` Dario Teixeira @ 2009-12-21 15:46 ` Jacques Carette 0 siblings, 0 replies; 57+ messages in thread From: Jacques Carette @ 2009-12-21 15:46 UTC (permalink / raw) To: Dario Teixeira; +Cc: Philip, Keyan, caml-list I agree with most of what Dario Teixeira wrote, except for one small quibble: Dario Teixeira wrote: > Last but not least, Ocaml plays a central role in multiple INRIA > projects, which means its creators have all the reason to continue > maintaining it and improving it for the foreseeable future (and there's > some interesting goodies in the upcoming 3.12 release, for example). > Actually, this gives these projects an incentive to insure that Ocaml survives, which gives an incentive for some 'maintenance engineers' to be kept on-staff to insure that Ocaml does not bit-rot. This gives only quite partial incentive to a team of researchers (the creators of Ocaml) to do maintenance (as that is usually not research, thus not the kind of work of interest to researchers). And entropy is a real problem -- Ocaml is now quite mature, which means that radical changes are well nigh impossible; this is a serious disincentive for researchers. End of quibble. Personally, I would really really want to see a 4.00 release which really warrants that name. The 3.XX line can be maintained for a few more years while people switch, in the same way gcc did this. In any case, I have nevertheless voted with my time and effort: I have 1 large project being implemented in Ocaml, 3 medium ones in metaocaml, although I must admit that I have some 'research' code in Haskell (and in Maple, but that's another story). Jacques ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] general question, was Re: OCaml is broken 2009-12-21 14:50 ` Philip 2009-12-21 15:01 ` Keyan @ 2009-12-21 18:50 ` Jon Harrop 1 sibling, 0 replies; 57+ messages in thread From: Jon Harrop @ 2009-12-21 18:50 UTC (permalink / raw) To: caml-list On Monday 21 December 2009 14:50:15 Philip wrote: > Functional programming languages will never become mainstream. C# 3 and Javascript are mainstream functional languages. F# is next. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] general question, was Re: OCaml is broken 2009-12-21 14:19 ` general question, was " Keyan ` (2 preceding siblings ...) 2009-12-21 14:50 ` Philip @ 2009-12-21 18:48 ` Jon Harrop 2010-01-03 10:49 ` Sylvain Le Gall 4 siblings, 0 replies; 57+ messages in thread From: Jon Harrop @ 2009-12-21 18:48 UTC (permalink / raw) To: caml-list On Monday 21 December 2009 14:19:05 Keyan wrote: > Hi, > > i have a large project written in C++, for which i am planing to write > add-ons and tools in ocaml, e.g. different tools to analyse my code > (dependency stuff), an interpreter for a script-language i plan to include, > etc, etc. form my time at the uni i remembered that ocaml allows to compile > libraries which can be included in c/c++ program, and i know people who use > it extensively in other projects. therefore, i decided to give ocaml a try. > i like functional programming, and my first steps with ocaml are very > promising. > > following this discussion, i am not so sure anymore, if ocaml is a good > decision. may be i got this discussion wrong, but if ocaml is dying out, i > might have to look for another functional programming language to use with > my project. OCaml isn't dying out and it is still ideal for the applications you listed because they are not performance critical and do rely heavily upon an efficient GC. This discussion was essentially about the exact opposite end of the spectrum of applications, where performance is critical but GC is largely irrelevant. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: general question, was Re: OCaml is broken 2009-12-21 14:19 ` general question, was " Keyan ` (3 preceding siblings ...) 2009-12-21 18:48 ` Jon Harrop @ 2010-01-03 10:49 ` Sylvain Le Gall 2010-01-03 20:06 ` [Caml-list] " Jon Harrop 4 siblings, 1 reply; 57+ messages in thread From: Sylvain Le Gall @ 2010-01-03 10:49 UTC (permalink / raw) To: caml-list On 21-12-2009, Keyan <ml@pulsschlag.net> wrote: > Hi, > > i have a large project written in C++, for which i am planing to write > add-ons and tools in ocaml, e.g. different tools to analyse my code > (dependency stuff), an interpreter for a script-language i plan to > include, etc, etc. form my time at the uni i remembered that ocaml > allows to compile libraries which can be included in c/c++ program, > and i know people who use it extensively in other projects. therefore, > i decided to give ocaml a try. i like functional programming, and my > first steps with ocaml are very promising. > > following this discussion, i am not so sure anymore, if ocaml is a > good decision. may be i got this discussion wrong, but if ocaml is > dying out, i might have to look for another functional programming > language to use with my project. > OCaml is not dying out at all (v3.11.2 is being prepared, v3.12.0 is coming soon). Of course, the core OCaml distribution (the one shipped by INRIA), is missing some features. You can use libraries/alternative compiler to have these features back (cothreads, jocaml, camlp3l). The only point of the whole discussion -- which is a recurring point by some of those who participate -- is the lack of shared-memory parallelism in the core language. Other languages like C or C++ are also lacking this support in their core definition... I.e. there is way to do it but you need to use pthread or Win32 thread. All in all, you can go a long way writing your tools with OCaml without encounting these problems. For the topic you describe, OCaml is a good choice. Regards, Sylvain Le Gall ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: general question, was Re: OCaml is broken 2010-01-03 10:49 ` Sylvain Le Gall @ 2010-01-03 20:06 ` Jon Harrop 0 siblings, 0 replies; 57+ messages in thread From: Jon Harrop @ 2010-01-03 20:06 UTC (permalink / raw) To: caml-list; +Cc: Sylvain Le Gall On Sunday 03 January 2010 10:49:38 Sylvain Le Gall wrote: > The only point of the whole discussion -- which is a recurring point by > some of those who participate -- is the lack of shared-memory > parallelism in the core language. I solved the problem: the latest version of HLVM now facilitates high-performance shared-memory parallelism. The remaining challenges to making this more user friendly are: 1. High-level constructs for parallelism in HLVM (task queues). 2. OCaml<->HLVM interop, probably by destructuring values passed from the OCaml world so that HLVM programs can use them directly and return results by mutating values on the OCaml side. 3. Camlp4 macros so users can write their HLVM code in an OCaml-like DSL. I believe this is basically an optimal solution for OCaml's multicore problem given the practical constraints. The future's looking bright again. :-) -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken 2009-12-20 12:21 ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp ` (3 preceding siblings ...) 2009-12-20 19:38 ` [***SPAM*** Score/Req: 10.1/8.0] " Jon Harrop @ 2009-12-21 13:07 ` Damien Doligez 4 siblings, 0 replies; 57+ messages in thread From: Damien Doligez @ 2009-12-21 13:07 UTC (permalink / raw) To: caml-list On 2009-12-20, at 13:21, Erik Rigtorp wrote: > The first step for OCaml would be to be able to run multiple > communicating instances of the runtime bound to one core each in one > process and have them communicate via lock free queues. Does anyone know how to do lock-free queues in a weakly-consistent memory model? -- Damien ^ permalink raw reply [flat|nested] 57+ messages in thread
* multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] 2009-12-20 4:43 ` [Caml-list] " Jon Harrop 2009-12-20 12:21 ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp @ 2009-12-21 13:31 ` Goswin von Brederlow 2009-12-21 14:19 ` multicore wish Mihamina Rakotomandimby 2009-12-21 19:53 ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Jon Harrop 1 sibling, 2 replies; 57+ messages in thread From: Goswin von Brederlow @ 2009-12-21 13:31 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list Jon Harrop <jon@ffconsultancy.com> writes: > We've discussed the problems with that before. Writing a parallel generic > quicksort seems to be a good test of a decent multicore capable language > implementation. Currently, F# is a *long* way ahead of everything open > source. How do you implement it? 1) divide at the top and then let each core sort its smaller array Say you have 2^n cores then the first n splits in merge sort would first sort the values into the 2 regions and then start a thread for each region (start a new one, do the other part in this thread). After n splits you would switch to a uni-core quicksort. For this you need to split well so each core ends up with a roughly equal sized chunk. 2) factory/worker approach Each core runs a factory/worker thread waiting on a job queue. You start by dumping the full array into the job queue. Some random core picks it up, splits it into 2 regions and dumps one into the job queue. If the job gets too small (PAGE_SIZE? cache line size? total size / cores^2?) the factory/worker switches to normal uni-core quicksort and sorts the whole chunk. The job queue should probably be priority based so larger chunks are sorted before smaller. Here the quality of each split is not so important. If a chunk is smaller, and therefore faster, the core just picks up the next job in the queue. But you need more syncronization between the cores for the job queue. On the other hand you aren't limited to 2^n cores. Any number will do. MfG Goswin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: multicore wish 2009-12-21 13:31 ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow @ 2009-12-21 14:19 ` Mihamina Rakotomandimby 2009-12-21 16:15 ` [Caml-list] " Fischbacher T. ` (2 more replies) 2009-12-21 19:53 ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Jon Harrop 1 sibling, 3 replies; 57+ messages in thread From: Mihamina Rakotomandimby @ 2009-12-21 14:19 UTC (permalink / raw) To: caml-list, ocsigen Ok, so for the beginner I am (must I ask on the beginners ML?): is multicore support just useless or not? I am beginning using Ocsigen, for a growing web project: Is multicore support useless for scaling on Ocsigen? X-post to Ocsigen ML. -- Architecte Informatique chez Blueline/Gulfsat: Administration Systeme, Recherche & Developpement +261 34 29 155 34 / +261 33 11 207 36 ^ permalink raw reply [flat|nested] 57+ messages in thread
* RE: [Caml-list] Re: multicore wish 2009-12-21 14:19 ` multicore wish Mihamina Rakotomandimby @ 2009-12-21 16:15 ` Fischbacher T. 2009-12-21 17:42 ` Dario Teixeira 2009-12-21 18:43 ` Jon Harrop 2 siblings, 0 replies; 57+ messages in thread From: Fischbacher T. @ 2009-12-21 16:15 UTC (permalink / raw) To: Mihamina Rakotomandimby, caml-list, ocsigen Mihamina, > Ok, so for the beginner I am (must I ask on the beginners ML?): is > multicore support just useless or not? That *entirely* depends on what you want to do. If, for example, you have to do a large calculation that is limited by memory and not by CPU, or, if you have an application that is trivially parallelized anyway, multicore support won't make much of a difference. There are (many) other applications, however, where it does matter quite a lot. Actually, the biggest effect of multicore architectures I see is to shift the emphasis from raw CPU power to memory bandwidth. -- Thomas ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-21 14:19 ` multicore wish Mihamina Rakotomandimby 2009-12-21 16:15 ` [Caml-list] " Fischbacher T. @ 2009-12-21 17:42 ` Dario Teixeira 2009-12-21 18:43 ` Jon Harrop 2 siblings, 0 replies; 57+ messages in thread From: Dario Teixeira @ 2009-12-21 17:42 UTC (permalink / raw) To: caml-list, ocsigen, Mihamina Rakotomandimby Hi, > I am beginning using Ocsigen, for a growing web project: > Is multicore support useless for scaling on Ocsigen? Categorically, yes. In fact, I would say that the model used by Ocsigen is close to being optimal performance-wise as far as web applications are concerned. The Ocsigen server and Eliom applications use Lwt for concurrency, ensuring that the CPU is always busy and will not idle waiting for I/O. Moreover, the green threads offered by Lwt are much lighter than system threads, and avoid the context switching penalty incurred by the latter. And what about those multiple cores? Simple, if you have n cores, then simply fire up n instances of the Ocsigen server, and put a dispatching server like HAProxy or Ocsigen itself as frontend (there are some simple tricks to ensure that the same client is always directed to same server). This solution takes advantage of the fact that serving web requests from multiple clients is embarrasingly parallel from the web application standpoint. (Sure, you'll have contention on the database side, but most DBMSs handle that reasonably well). (And yes, I am aware that Lwt's performance could be improved further by using syscalls like epoll/kqeueue/etc instead of select. That is however an implementation issue, not an architectural flaw). Best regards, Dario Teixeira ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-21 14:19 ` multicore wish Mihamina Rakotomandimby 2009-12-21 16:15 ` [Caml-list] " Fischbacher T. 2009-12-21 17:42 ` Dario Teixeira @ 2009-12-21 18:43 ` Jon Harrop 2 siblings, 0 replies; 57+ messages in thread From: Jon Harrop @ 2009-12-21 18:43 UTC (permalink / raw) To: caml-list On Monday 21 December 2009 14:19:36 Mihamina Rakotomandimby wrote: > Ok, so for the beginner I am (must I ask on the beginners ML?): is > multicore support just useless or not? I have found a great many uses for multicores but you need a decent foundation to make effective use of it. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] 2009-12-21 13:31 ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow 2009-12-21 14:19 ` multicore wish Mihamina Rakotomandimby @ 2009-12-21 19:53 ` Jon Harrop 2009-12-22 13:09 ` multicore wish Goswin von Brederlow 1 sibling, 1 reply; 57+ messages in thread From: Jon Harrop @ 2009-12-21 19:53 UTC (permalink / raw) To: caml-list On Monday 21 December 2009 13:31:10 Goswin von Brederlow wrote: > Jon Harrop <jon@ffconsultancy.com> writes: > > We've discussed the problems with that before. Writing a parallel generic > > quicksort seems to be a good test of a decent multicore capable language > > implementation. Currently, F# is a *long* way ahead of everything open > > source. > > How do you implement it? > > 1) divide at the top and then let each core sort its smaller array > > Say you have 2^n cores then the first n splits in merge sort would > first sort the values into the 2 regions and then start a thread for > each region (start a new one, do the other part in this thread). After > n splits you would switch to a uni-core quicksort. > > For this you need to split well so each core ends up with a roughly > equal sized chunk. That is basically what is now being called "nested data parallelism". You precompute a parallel strategy, partition the work upfront and then execute embarassingly parallel tasks. This worked well decades ago for sparse linear algebra on supercomputers but it sucks on multicores because load is variable and there is no load balancing. So one of those "equal-sized" partitions is likely to take 10x longer than the others due to competing processes, scheduling or cache issues and all cores end up twiddling their thumbs waiting for that one to complete. > 2) factory/worker approach > > Each core runs a factory/worker thread waiting on a job queue. You > start by dumping the full array into the job queue. Some random core > picks it up, splits it into 2 regions and dumps one into the job > queue. If the job gets too small (PAGE_SIZE? cache line size? total > size / cores^2?) the factory/worker switches to normal uni-core > quicksort and sorts the whole chunk. > > The job queue should probably be priority based so larger chunks are > sorted before smaller. > > Here the quality of each split is not so important. If a chunk is > smaller, and therefore faster, the core just picks up the next job in > the queue. But you need more syncronization between the cores for the > job queue. On the other hand you aren't limited to 2^n cores. Any > number will do. This is a work sharing queue which is better because it is dynamically load balanced but still bad because you have global contention for a shared resource: the shared queue. Cilk pioneered wait-free work-stealing task deques and Microsoft's Task Parallel Library (which will be part of .NET 4 in March 2010) copied the idea. You have a separate deque of tasks for each core. A core tries to pop a task off its deque. If there are no tasks on its deque then it tries to pop off a randomly-chosen other deque. If there is a task then the core begins executing that task and any child tasks are pushed back onto the local deque (and might get stolen before they are executed). An important characteristic of this design is that child tasks are likely to be executed on the same core as their parent and, therefore, locality between related tasks is valuable. Consequently, many algorithms that recursively subdivide a large dataset (like quicksort) attain superlinear speedups because they make effective use of the sum of all L2 caches rather than just a single L2 cache. So this works extremely well with the multi-level hierarchical caches with interconnects that are the hallmark of multicore architecture. You can then implement quicksort (and many other parallel divide and conquer algorithms) something like this: let rec serial_qsort cmp a i0 i3 = if i3 - i0 > 1 then let i1, i2 = partition cmp a i0 i3 in serial_qsort i0 i1; serial_qsort i2 i3 let rec parallel_qsort cmp a i0 i3 = if i3 - i0 > 1 then if i3 - i0 < threshold then serial_qsort cmp a i0 i3 else let i1, i2 = partition cmp a i0 i3 in let future = invoke (parallel_qsort cmp i0) i1 in parallel_qsort i2 i3 future() where the "invoke" combinator pushes a task onto the local task deque and returns a closure that, when applied, waits for that task to complete and returns its (unit) result. The "threshold" value protects against the function spending more time spawning tasks that doing actual work. You can implement this in OCaml and F# but there are some important differences: 1. The array "a" is just an ordinary array of any type of values on the shared heap in F# but, for generality in OCaml, this must be both the underlying ordinary data and a manually-managed shared big array of indices into the ordinary data where the indices get sorted while the original data remain in place until they are permuted at the end. 2. The sorted subarrays are contiguous in memory and, at some subdivision, will fit into L2 cache. So F# offers optimal locality. In contrast, there is no locality whatsoever in the OCaml code and most accesses into the unsorted original array will incur cache misses right up to main memory. So the OCaml approach does not scale as well and will never see superlinear speedup because it cannot be cache friendly. 3. Child tasks are likely to be executed on the same core as their parent and use a subset of their parent's data in F#, so they offer the best possible locality. In contrast, child processes are likely to be executed on another core in OCaml and offer the worst possible locality. 4. Task deques can handle an arbitrary number of tasks limited only by memory whereas processes are a scarce resource and forking is likely to fail, whereupon the "invoke" combinator will simply execute sequentially. So it is much easier to write reliable and performant code in F# than OCaml. 5. OCaml's fork-based "invoke" combinator is many orders of magnitude slower than pushing a closure onto a concurrent task deque in F#. 6. The "threshold" value is a magic number derived from measurements on a given machine in my OCaml code but is dynamically adjusted in a machine-independent way by the "invoke" combinator in my F# code using atomic operations and real time profiling of the proportion of time spent spawning tasks vs doing actual work. The same basic principles apply to many algorithms. Although it can sometimes be tricky to figure out how best to use this technology to parallelize a given algorithm (e.g. successive over relaxation), I have found that a great many algorithms can be parallelized effectively using this approach when you have a suitable foundation in place (like the TPL). Moreover, the ability to use ordinary constructs in F# instead of hacks like type-specific shared memory big arrays in OCaml makes it a lot easier to parallelize programs. My parallel Burrows-Wheeler Transform (BWT), for example, took 30 minutes to develop in F# and 2 days in OCaml. Hopefully, HLVM will bring these kinds of benefits to OCaml, albeit in the form of a DSL for high performance parallel programming. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: multicore wish 2009-12-21 19:53 ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Jon Harrop @ 2009-12-22 13:09 ` Goswin von Brederlow 2009-12-22 19:12 ` [Caml-list] " Jon Harrop 0 siblings, 1 reply; 57+ messages in thread From: Goswin von Brederlow @ 2009-12-22 13:09 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list Jon Harrop <jon@ffconsultancy.com> writes: > Cilk pioneered wait-free work-stealing task deques and Microsoft's Task > Parallel Library (which will be part of .NET 4 in March 2010) copied the > idea. You have a separate deque of tasks for each core. A core tries to pop a > task off its deque. If there are no tasks on its deque then it tries to pop > off a randomly-chosen other deque. If there is a task then the core begins > executing that task and any child tasks are pushed back onto the local deque > (and might get stolen before they are executed). > > An important characteristic of this design is that child tasks are likely to > be executed on the same core as their parent and, therefore, locality between > related tasks is valuable. Consequently, many algorithms that recursively > subdivide a large dataset (like quicksort) attain superlinear speedups > because they make effective use of the sum of all L2 caches rather than just > a single L2 cache. So this works extremely well with the multi-level > hierarchical caches with interconnects that are the hallmark of multicore > architecture. > > You can then implement quicksort (and many other parallel divide and conquer > algorithms) something like this: > > let rec serial_qsort cmp a i0 i3 = > if i3 - i0 > 1 then > let i1, i2 = partition cmp a i0 i3 in > serial_qsort i0 i1; > serial_qsort i2 i3 > > let rec parallel_qsort cmp a i0 i3 = > if i3 - i0 > 1 then > if i3 - i0 < threshold then > serial_qsort cmp a i0 i3 > else > let i1, i2 = partition cmp a i0 i3 in > let future = invoke (parallel_qsort cmp i0) i1 in > parallel_qsort i2 i3 > future() > > where the "invoke" combinator pushes a task onto the local task deque and > returns a closure that, when applied, waits for that task to complete and > returns its (unit) result. The "threshold" value protects against the > function spending more time spawning tasks that doing actual work. > > You can implement this in OCaml and F# but there are some important > differences: > > 1. The array "a" is just an ordinary array of any type of values on the shared > heap in F# but, for generality in OCaml, this must be both the underlying > ordinary data and a manually-managed shared big array of indices into the > ordinary data where the indices get sorted while the original data remain in > place until they are permuted at the end. Unless you have a primitive type that isn't a pointer. The advantage with ocaml though is that you never have pointers into a structure. Makes thinks a lot simpler for the GC and avoids large overheads in memory. > 2. The sorted subarrays are contiguous in memory and, at some subdivision, > will fit into L2 cache. So F# offers optimal locality. In contrast, there is > no locality whatsoever in the OCaml code and most accesses into the unsorted > original array will incur cache misses right up to main memory. So the OCaml > approach does not scale as well and will never see superlinear speedup > because it cannot be cache friendly. On the other hand swapping two elements in the array has a constant cost no matter what size they have. At some size there will be a break even point where copying the data costs more than the cache misses and with increasing size the cache won't help F# so much either. > 3. Child tasks are likely to be executed on the same core as their parent and > use a subset of their parent's data in F#, so they offer the best possible > locality. In contrast, child processes are likely to be executed on another > core in OCaml and offer the worst possible locality. But, if I understood you right, you first fork one process per core. Those should then also each pin themself to one core. Each process then has a work queue which is works through. So they will always use the local data. Only when a queue runs dry they steal from another process and ruin locality. So I don't see where your argument fits. You are not creating childs on the fly. Only at the start and they run till all the work is done. At least in this example. > 4. Task deques can handle an arbitrary number of tasks limited only by memory > whereas processes are a scarce resource and forking is likely to fail, > whereupon the "invoke" combinator will simply execute sequentially. So it is > much easier to write reliable and performant code in F# than OCaml. Why would you fork in invoke? That only adds a closure to the task queue for the core. > 5. OCaml's fork-based "invoke" combinator is many orders of magnitude slower > than pushing a closure onto a concurrent task deque in F#. > > 6. The "threshold" value is a magic number derived from measurements on a > given machine in my OCaml code but is dynamically adjusted in a > machine-independent way by the "invoke" combinator in my F# code using atomic > operations and real time profiling of the proportion of time spent spawning > tasks vs doing actual work. 5+6 seem to be an implementation detail of some specific implementation you are talking about. I don't see anything in the theory that would require that. > The same basic principles apply to many algorithms. Although it can sometimes > be tricky to figure out how best to use this technology to parallelize a > given algorithm (e.g. successive over relaxation), I have found that a great > many algorithms can be parallelized effectively using this approach when you > have a suitable foundation in place (like the TPL). Moreover, the ability to > use ordinary constructs in F# instead of hacks like type-specific shared > memory big arrays in OCaml makes it a lot easier to parallelize programs. My > parallel Burrows-Wheeler Transform (BWT), for example, took 30 minutes to > develop in F# and 2 days in OCaml. It might be true that ocaml lacks some primitives for multi-core operations. But I would say that is mostly because so far it hasn't supported multi-core. If F# has great support for this then that might be a good place to steal some ideas from. But so far have heart nothing that would make F# fundamentally more capable than ocaml could become. > Hopefully, HLVM will bring these kinds of benefits to OCaml, albeit in the > form of a DSL for high performance parallel programming. MfG Goswin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-22 13:09 ` multicore wish Goswin von Brederlow @ 2009-12-22 19:12 ` Jon Harrop 2009-12-22 18:02 ` Edgar Friendly 2009-12-24 13:19 ` Goswin von Brederlow 0 siblings, 2 replies; 57+ messages in thread From: Jon Harrop @ 2009-12-22 19:12 UTC (permalink / raw) To: caml-list On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: > Jon Harrop <jon@ffconsultancy.com> writes: > > 1. The array "a" is just an ordinary array of any type of values on the > > shared heap in F# but, for generality in OCaml, this must be both the > > underlying ordinary data and a manually-managed shared big array of > > indices into the ordinary data where the indices get sorted while the > > original data remain in place until they are permuted at the end. > > Unless you have a primitive type that isn't a pointer. In OCaml, you would need to write a custom quicksort optimized for that particular type. In F#, the generic version just works and works efficiently. > The advantage with ocaml though is that you never have pointers into a > structure. Makes thinks a lot simpler for the GC and avoids large > overheads in memory. I don't understand what you mean by OCaml "never has pointers into a structure". Half the problem with OCaml is that OCaml almost always uses pointers and the programmer has no choice, e.g. for complex numbers. > > 2. The sorted subarrays are contiguous in memory and, at some > > subdivision, will fit into L2 cache. So F# offers optimal locality. In > > contrast, there is no locality whatsoever in the OCaml code and most > > accesses into the unsorted original array will incur cache misses right > > up to main memory. So the OCaml approach does not scale as well and will > > never see superlinear speedup because it cannot be cache friendly. > > On the other hand swapping two elements in the array has a constant > cost no matter what size they have. At some size there will be a break > even point where copying the data costs more than the cache misses and > with increasing size the cache won't help F# so much either. In theory, yes. In practice, that threshold is far larger than any value type that a real program would use so it is of no practical concern. Moreover, F# gives the programmer control over whether data are unboxed (value types) or boxed (reference types) anyway. In contrast, OCaml is tied to a few value types that are hard-coded into the GC. > > 3. Child tasks are likely to be executed on the same core as their parent > > and use a subset of their parent's data in F#, so they offer the best > > possible locality. In contrast, child processes are likely to be executed > > on another core in OCaml and offer the worst possible locality. > > But, if I understood you right, you first fork one process per > core. No, in OCaml I fork every child. That is the only transparent way to give the child a coherent view of the heap but it is extremely slow (~1ms): "F# can do 60MFLOPS of computation in the time it takes OCaml to fork a single process" - http://caml.inria.fr/pub/ml-archives/caml-list/2009/06/542b8bed77022b4a4824de2da5b7f714.en.html > Those should then also each pin themself to one core. You have no idea which core a forked child process will run on. > Each process then has a work queue which is works through. So they will > always use the local data. Only when a queue runs dry they steal from > another process and ruin locality. You are correctly describing the efficient solution based upon work-stealing queues that F# uses but OCaml cannot express it. > So I don't see where your argument fits. You are not creating childs > on the fly. Only at the start and they run till all the work is done. > At least in this example. No, every recursive invocation of the parallel quicksort spawns another child on-the-fly. That's precisely why it parallelizes so efficiently when you have wait-free work-stealing task deques and a shared heap. In general, you rewrite algorithms into this recursive divide and conquer form and parallelize when possible. You can parallelize a *lot* of problems efficiently that way. > > 4. Task deques can handle an arbitrary number of tasks limited only by > > memory whereas processes are a scarce resource and forking is likely to > > fail, whereupon the "invoke" combinator will simply execute sequentially. > > So it is much easier to write reliable and performant code in F# than > > OCaml. > > Why would you fork in invoke? Fork is currently the only transparent way to implement "invoke" but it is extremely slow and unreliable. > > 5. OCaml's fork-based "invoke" combinator is many orders of magnitude > > slower than pushing a closure onto a concurrent task deque in F#. > > > > 6. The "threshold" value is a magic number derived from measurements on a > > given machine in my OCaml code but is dynamically adjusted in a > > machine-independent way by the "invoke" combinator in my F# code using > > atomic operations and real time profiling of the proportion of time spent > > spawning tasks vs doing actual work. > > 5+6 seem to be an implementation detail of some specific > implementation you are talking about. Yes. I'm talking about today's OCaml and F# implementations. > I don't see anything in the theory that would require that. If by "in theory" you mean that a new performant concurrent GC for OCaml would solve these problems then yes. But I doubt OCaml is ever going to get one. > > The same basic principles apply to many algorithms. Although it can > > sometimes be tricky to figure out how best to use this technology to > > parallelize a given algorithm (e.g. successive over relaxation), I have > > found that a great many algorithms can be parallelized effectively using > > this approach when you have a suitable foundation in place (like the > > TPL). Moreover, the ability to use ordinary constructs in F# instead of > > hacks like type-specific shared memory big arrays in OCaml makes it a lot > > easier to parallelize programs. My parallel Burrows-Wheeler Transform > > (BWT), for example, took 30 minutes to develop in F# and 2 days in OCaml. > > It might be true that ocaml lacks some primitives for multi-core > operations. But I would say that is mostly because so far it hasn't > supported multi-core. Well, yes. :-) > If F# has great support for this then that might be a good place to steal > some ideas from. The infrastructure for this kind of shared memory parallel programming is really very simple. You just need a GC that can handle a shared heap (which HLVM already has) and work-stealing task deques. Then you can easily write parallel programs that leverage multicores and run a *lot* faster than anything that can be written in OCaml. You can even make this accessible to OCaml programmers as a DSL with automatic interop to make high performance parallel programming as easy as possible from OCaml. > But so far have heart nothing that would make F# fundamentally more capable > than ocaml could become. In theory, OCaml could catch up with F#. In practice, the core of OCaml's implementation is so heavily optimized in another direction (and that will not change because it is OCaml's raison d'être) that it is worth starting from scratch. Modern libraries like LLVM make this comparatively painless and the improvements are vast. And besides, it is great fun! :-) -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-22 19:12 ` [Caml-list] " Jon Harrop @ 2009-12-22 18:02 ` Edgar Friendly 2009-12-22 19:20 ` Jon Harrop 2009-12-24 13:19 ` Goswin von Brederlow 1 sibling, 1 reply; 57+ messages in thread From: Edgar Friendly @ 2009-12-22 18:02 UTC (permalink / raw) To: caml-list On 12/22/2009 01:12 PM, Jon Harrop wrote: > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: > >> The advantage with ocaml though is that you never have pointers into a >> structure. Makes thinks a lot simpler for the GC and avoids large >> overheads in memory. >> > I don't understand what you mean by OCaml "never has pointers into a > structure". Half the problem with OCaml is that OCaml almost always uses > pointers and the programmer has no choice, e.g. for complex numbers. > > I think he means that ocaml structs (records, tuples) will only ever have pointers pointing to their beginning - you can't have a pointer to somewhere in the middle of a structure. E ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-22 18:02 ` Edgar Friendly @ 2009-12-22 19:20 ` Jon Harrop 2009-12-24 12:58 ` Goswin von Brederlow 0 siblings, 1 reply; 57+ messages in thread From: Jon Harrop @ 2009-12-22 19:20 UTC (permalink / raw) To: caml-list On Tuesday 22 December 2009 18:02:32 Edgar Friendly wrote: > On 12/22/2009 01:12 PM, Jon Harrop wrote: > > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: > >> The advantage with ocaml though is that you never have pointers into a > >> structure. Makes thinks a lot simpler for the GC and avoids large > >> overheads in memory. > > > > I don't understand what you mean by OCaml "never has pointers into a > > structure". Half the problem with OCaml is that OCaml almost always uses > > pointers and the programmer has no choice, e.g. for complex numbers. > > I think he means that ocaml structs (records, tuples) will only ever > have pointers pointing to their beginning - you can't have a pointer to > somewhere in the middle of a structure. If so then I do not understand the relevance. You cannot have pointers into a structure in F# or HLVM either... -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-22 19:20 ` Jon Harrop @ 2009-12-24 12:58 ` Goswin von Brederlow 2009-12-24 16:51 ` Jon Harrop 0 siblings, 1 reply; 57+ messages in thread From: Goswin von Brederlow @ 2009-12-24 12:58 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list Jon Harrop <jon@ffconsultancy.com> writes: > On Tuesday 22 December 2009 18:02:32 Edgar Friendly wrote: >> On 12/22/2009 01:12 PM, Jon Harrop wrote: >> > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: >> >> The advantage with ocaml though is that you never have pointers into a >> >> structure. Makes thinks a lot simpler for the GC and avoids large >> >> overheads in memory. >> > >> > I don't understand what you mean by OCaml "never has pointers into a >> > structure". Half the problem with OCaml is that OCaml almost always uses >> > pointers and the programmer has no choice, e.g. for complex numbers. >> >> I think he means that ocaml structs (records, tuples) will only ever >> have pointers pointing to their beginning - you can't have a pointer to >> somewhere in the middle of a structure. Exactly. > If so then I do not understand the relevance. You cannot have pointers into a > structure in F# or HLVM either... If you have an array a of (int * int) then in ocaml a is an array of pointers to tuples. a.(5) is a pointer to the 6th tuple. You said that in F# the array will be really an array of tuples. Then a.(5) will be a pointer into the array at the position where the 6th tuple is. It does not point to the begining of an allocated block but to the middle of one. That means as long as a.(5) is reachable the full array has to remain allocated or the GC has to recognise that only one (a few) items of an array are reachable and copy them before freeing the array. The GC also needs a way to find the begining of an allocated block from a pointer into the block. Which means extra overhead in both memory and time. Another think is that tuples are immutable but arrays are mutable. In ocaml you get this nice behaviour: # let a = Array.init 5 (fun x -> (x,x));; val a : (int * int) array = [|(0, 0); (1, 1); (2, 2); (3, 3); (4, 4)|] # let x = a.(1);; val x : int * int = (1, 1) # a.(1) <- (2,2);; - : unit = () # x;; - : int * int = (1, 1) In F# you would get (2,2) for your fast sortable array. A different semantic. MfG Goswin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-24 12:58 ` Goswin von Brederlow @ 2009-12-24 16:51 ` Jon Harrop 0 siblings, 0 replies; 57+ messages in thread From: Jon Harrop @ 2009-12-24 16:51 UTC (permalink / raw) To: caml-list On Thursday 24 December 2009 12:58:18 Goswin von Brederlow wrote: > Jon Harrop <jon@ffconsultancy.com> writes: > > On Tuesday 22 December 2009 18:02:32 Edgar Friendly wrote: > >> On 12/22/2009 01:12 PM, Jon Harrop wrote: > >> > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: > >> >> The advantage with ocaml though is that you never have pointers into > >> >> a structure. Makes thinks a lot simpler for the GC and avoids large > >> >> overheads in memory. > >> > > >> > I don't understand what you mean by OCaml "never has pointers into a > >> > structure". Half the problem with OCaml is that OCaml almost always > >> > uses pointers and the programmer has no choice, e.g. for complex > >> > numbers. > >> > >> I think he means that ocaml structs (records, tuples) will only ever > >> have pointers pointing to their beginning - you can't have a pointer to > >> somewhere in the middle of a structure. > > Exactly. > > > If so then I do not understand the relevance. You cannot have pointers > > into a structure in F# or HLVM either... > > If you have an array a of (int * int) then in ocaml a is an array of > pointers to tuples. a.(5) is a pointer to the 6th tuple. Yes. > You said that in F# the array will be really an array of tuples. Then > a.(5) will be a pointer into the array at the position where the 6th > tuple is. No. The expression a.(5) loads all fields of the value type. For example, if it were an array of complex number then a.(5) would load the real and imaginary parts. > That means as long as a.(5) is reachable the full array has to remain > allocated or the GC has to recognise that only one (a few) items of an > array are reachable and copy them before freeing the array. The GC > also needs a way to find the begining of an allocated block from a > pointer into the block. Which means extra overhead in both memory and > time. > > Another think is that tuples are immutable but arrays are mutable. In > ocaml you get this nice behaviour: > > # let a = Array.init 5 (fun x -> (x,x));; > val a : (int * int) array = [|(0, 0); (1, 1); (2, 2); (3, 3); (4, 4)|] > # let x = a.(1);; > val x : int * int = (1, 1) > # a.(1) <- (2,2);; > - : unit = () > # x;; > - : int * int = (1, 1) > > In F# you would get (2,2) for your fast sortable array. A different > semantic. No. Even if F# represented tuples as structs (which it does not, unfortunately) you would still get the same result as you do in OCaml. HLVM already implements tuples as structs and your example works exactly as the OCaml does: # let xs = create(6, (1, 1)) in let x = xs.(1) in xs.(1) <- (2, 2); x;; - : `Struct[`Int; `Int] = (1, 1) The only difference is that HLVM unboxes the pairs in the array so the heap contains only a single pointer and accessing a pair requires only one indirection. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-22 19:12 ` [Caml-list] " Jon Harrop 2009-12-22 18:02 ` Edgar Friendly @ 2009-12-24 13:19 ` Goswin von Brederlow 2009-12-24 17:06 ` Jon Harrop 1 sibling, 1 reply; 57+ messages in thread From: Goswin von Brederlow @ 2009-12-24 13:19 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list Jon Harrop <jon@ffconsultancy.com> writes: > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: >> Jon Harrop <jon@ffconsultancy.com> writes: >> > 1. The array "a" is just an ordinary array of any type of values on the >> > shared heap in F# but, for generality in OCaml, this must be both the >> > underlying ordinary data and a manually-managed shared big array of >> > indices into the ordinary data where the indices get sorted while the >> > original data remain in place until they are permuted at the end. >> >> Unless you have a primitive type that isn't a pointer. > > In OCaml, you would need to write a custom quicksort optimized for that > particular type. In F#, the generic version just works and works efficiently. > >> The advantage with ocaml though is that you never have pointers into a >> structure. Makes thinks a lot simpler for the GC and avoids large >> overheads in memory. > > I don't understand what you mean by OCaml "never has pointers into a > structure". Half the problem with OCaml is that OCaml almost always uses > pointers and the programmer has no choice, e.g. for complex numbers. > >> > 2. The sorted subarrays are contiguous in memory and, at some >> > subdivision, will fit into L2 cache. So F# offers optimal locality. In >> > contrast, there is no locality whatsoever in the OCaml code and most >> > accesses into the unsorted original array will incur cache misses right >> > up to main memory. So the OCaml approach does not scale as well and will >> > never see superlinear speedup because it cannot be cache friendly. >> >> On the other hand swapping two elements in the array has a constant >> cost no matter what size they have. At some size there will be a break >> even point where copying the data costs more than the cache misses and >> with increasing size the cache won't help F# so much either. > > In theory, yes. In practice, that threshold is far larger than any value type > that a real program would use so it is of no practical concern. Moreover, F# > gives the programmer control over whether data are unboxed (value types) or > boxed (reference types) anyway. In contrast, OCaml is tied to a few value > types that are hard-coded into the GC. > >> > 3. Child tasks are likely to be executed on the same core as their parent >> > and use a subset of their parent's data in F#, so they offer the best >> > possible locality. In contrast, child processes are likely to be executed >> > on another core in OCaml and offer the worst possible locality. >> >> But, if I understood you right, you first fork one process per >> core. > > No, in OCaml I fork every child. That is the only transparent way to give the > child a coherent view of the heap but it is extremely slow (~1ms): So if you add a (sleep 60) to the ocaml code then ocaml gets even more worse than F#? You are comparing an implementation that is specifically optimized to use things F# is good at and ocaml is bad. To give a fair comparison you need to optimize the implementation to the language. > "F# can do 60MFLOPS of computation in the time it takes OCaml > to fork a single process" - > http://caml.inria.fr/pub/ml-archives/caml-list/2009/06/542b8bed77022b4a4824de2da5b7f714.en.html > >> Those should then also each pin themself to one core. > > You have no idea which core a forked child process will run on. I can pin each child to one core and the scheduler will move it there. I would expect a multi-core ocaml to actualy do this internaly already so there is on GC threads per core ocaml runs on. But that depends on the GC implementation that will be used. >> Each process then has a work queue which is works through. So they will >> always use the local data. Only when a queue runs dry they steal from >> another process and ruin locality. > > You are correctly describing the efficient solution based upon work-stealing > queues that F# uses but OCaml cannot express it. You mean that you didn't implement that way. I can easily express that with closures and pre-forked worker threads. >> So I don't see where your argument fits. You are not creating childs >> on the fly. Only at the start and they run till all the work is done. >> At least in this example. > > No, every recursive invocation of the parallel quicksort spawns another child > on-the-fly. That's precisely why it parallelizes so efficiently when you have > wait-free work-stealing task deques and a shared heap. In general, you > rewrite algorithms into this recursive divide and conquer form and > parallelize when possible. You can parallelize a *lot* of problems > efficiently that way. That seems awfully ineficient. Then you have a million children running on maybe 4 cores and each job queue holds no job. I think we mean different things by children. By children I mean what the kernel sees as children. Newly forked threads. I think you mean jobs that get put into the queues. There is no reason to fork a new system thread every time the parallel quicksort splits an array in two. >> > 4. Task deques can handle an arbitrary number of tasks limited only by >> > memory whereas processes are a scarce resource and forking is likely to >> > fail, whereupon the "invoke" combinator will simply execute sequentially. >> > So it is much easier to write reliable and performant code in F# than >> > OCaml. >> >> Why would you fork in invoke? > > Fork is currently the only transparent way to implement "invoke" but it is > extremely slow and unreliable. No, it is the insane way. Use closures. >> > 5. OCaml's fork-based "invoke" combinator is many orders of magnitude >> > slower than pushing a closure onto a concurrent task deque in F#. >> > >> > 6. The "threshold" value is a magic number derived from measurements on a >> > given machine in my OCaml code but is dynamically adjusted in a >> > machine-independent way by the "invoke" combinator in my F# code using >> > atomic operations and real time profiling of the proportion of time spent >> > spawning tasks vs doing actual work. >> >> 5+6 seem to be an implementation detail of some specific >> implementation you are talking about. > > Yes. I'm talking about today's OCaml and F# implementations. > >> I don't see anything in the theory that would require that. > > If by "in theory" you mean that a new performant concurrent GC for OCaml would > solve these problems then yes. But I doubt OCaml is ever going to get one. > >> > The same basic principles apply to many algorithms. Although it can >> > sometimes be tricky to figure out how best to use this technology to >> > parallelize a given algorithm (e.g. successive over relaxation), I have >> > found that a great many algorithms can be parallelized effectively using >> > this approach when you have a suitable foundation in place (like the >> > TPL). Moreover, the ability to use ordinary constructs in F# instead of >> > hacks like type-specific shared memory big arrays in OCaml makes it a lot >> > easier to parallelize programs. My parallel Burrows-Wheeler Transform >> > (BWT), for example, took 30 minutes to develop in F# and 2 days in OCaml. >> >> It might be true that ocaml lacks some primitives for multi-core >> operations. But I would say that is mostly because so far it hasn't >> supported multi-core. > > Well, yes. :-) > >> If F# has great support for this then that might be a good place to steal >> some ideas from. > > The infrastructure for this kind of shared memory parallel programming is > really very simple. You just need a GC that can handle a shared heap (which Which there is one example so far for ocaml. Maybe not a verry good one. Last simple benchmarking I remember it was like 3 times slower than mono-core. > HLVM already has) and work-stealing task deques. Then you can easily write Task deques you can implement given a multi-core GC, worker threads and using closures. > parallel programs that leverage multicores and run a *lot* faster than > anything that can be written in OCaml. You can even make this accessible to > OCaml programmers as a DSL with automatic interop to make high performance > parallel programming as easy as possible from OCaml. Or just implement it properly in ocaml. The problem part is the GC. The rest is easy. >> But so far have heart nothing that would make F# fundamentally more capable >> than ocaml could become. > > In theory, OCaml could catch up with F#. In practice, the core of OCaml's > implementation is so heavily optimized in another direction (and that will > not change because it is OCaml's raison d'être) that it is worth starting > from scratch. Modern libraries like LLVM make this comparatively painless and > the improvements are vast. And besides, it is great fun! :-) I think there is an ocaml frontend for LLVM. So someone already did. MfG Goswin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-24 13:19 ` Goswin von Brederlow @ 2009-12-24 17:06 ` Jon Harrop 2009-12-27 12:45 ` Goswin von Brederlow 0 siblings, 1 reply; 57+ messages in thread From: Jon Harrop @ 2009-12-24 17:06 UTC (permalink / raw) To: caml-list On Thursday 24 December 2009 13:19:52 Goswin von Brederlow wrote: > Jon Harrop <jon@ffconsultancy.com> writes: > > No, in OCaml I fork every child. That is the only transparent way to give > > the child a coherent view of the heap but it is extremely slow (~1ms): > > So if you add a (sleep 60) to the ocaml code then ocaml gets even > more worse than F#? You are comparing an implementation that is > specifically optimized to use things F# is good at and ocaml is > bad. To give a fair comparison you need to optimize the implementation > to the language. Post a better OCaml solution if you can. > >> Each process then has a work queue which is works through. So they will > >> always use the local data. Only when a queue runs dry they steal from > >> another process and ruin locality. > > > > You are correctly describing the efficient solution based upon > > work-stealing queues that F# uses but OCaml cannot express it. > > You mean that you didn't implement that way. No, I mean OCaml cannot express it. > I can easily express that with closures and pre-forked worker threads. OCaml's threads do not run in parallel so that will not work. > >> So I don't see where your argument fits. You are not creating childs > >> on the fly. Only at the start and they run till all the work is done. > >> At least in this example. > > > > No, every recursive invocation of the parallel quicksort spawns another > > child on-the-fly. That's precisely why it parallelizes so efficiently > > when you have wait-free work-stealing task deques and a shared heap. In > > general, you rewrite algorithms into this recursive divide and conquer > > form and parallelize when possible. You can parallelize a *lot* of > > problems efficiently that way. > > That seems awfully ineficient. Then you have a million children > running on maybe 4 cores and each job queue holds no job. > > I think we mean different things by children. By children I mean what > the kernel sees as children. Newly forked threads. I think you mean > jobs that get put into the queues. There is no reason to fork a new > system thread every time the parallel quicksort splits an array in > two. The children are lightweight tasks, not threads or processes. > >> Why would you fork in invoke? > > > > Fork is currently the only transparent way to implement "invoke" but it > > is extremely slow and unreliable. > > No, it is the insane way. Use closures. Please demonstrate. > > parallel programs that leverage multicores and run a *lot* faster than > > anything that can be written in OCaml. You can even make this accessible > > to OCaml programmers as a DSL with automatic interop to make high > > performance parallel programming as easy as possible from OCaml. > > Or just implement it properly in ocaml. The problem part is the > GC. The rest is easy. No kidding. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-24 17:06 ` Jon Harrop @ 2009-12-27 12:45 ` Goswin von Brederlow 2009-12-27 16:37 ` Jon Harrop 2009-12-28 12:28 ` Gerd Stolpmann 0 siblings, 2 replies; 57+ messages in thread From: Goswin von Brederlow @ 2009-12-27 12:45 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list Jon Harrop <jon@ffconsultancy.com> writes: > On Thursday 24 December 2009 13:19:52 Goswin von Brederlow wrote: >> Jon Harrop <jon@ffconsultancy.com> writes: >> > No, in OCaml I fork every child. That is the only transparent way to give >> > the child a coherent view of the heap but it is extremely slow (~1ms): >> >> So if you add a (sleep 60) to the ocaml code then ocaml gets even >> more worse than F#? You are comparing an implementation that is >> specifically optimized to use things F# is good at and ocaml is >> bad. To give a fair comparison you need to optimize the implementation >> to the language. > > Post a better OCaml solution if you can. > >> >> Each process then has a work queue which is works through. So they will >> >> always use the local data. Only when a queue runs dry they steal from >> >> another process and ruin locality. >> > >> > You are correctly describing the efficient solution based upon >> > work-stealing queues that F# uses but OCaml cannot express it. >> >> You mean that you didn't implement that way. > > No, I mean OCaml cannot express it. > >> I can easily express that with closures and pre-forked worker threads. > > OCaml's threads do not run in parallel so that will not work. >> Or just implement it properly in ocaml. The problem part is the >> GC. The rest is easy. > > No kidding. There is one implementation: http://www.algo-prog.info/ocmc/web/ But as said maybe not a verry good one. I tried implementing parallel threads under the original GC by forking multiple instances of the same program and using a Bigarray to mmap /dev/null for shared memory between the instances. That works for sharing primitive types, flat records (records of primitive types) and even fixed (or bound) sized structures but it gets more and more complex to share each and needs some Obj magic, marshaling or C stubs (except for primitive types). It works but is a real hack. I hope someone will pick up the pices and update OCaml4Multicore to the latest ocaml or maybe for ocaml to add it directly. If not then I fear ocaml will be left behind soon. MfG Goswin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-27 12:45 ` Goswin von Brederlow @ 2009-12-27 16:37 ` Jon Harrop 2009-12-28 12:28 ` Gerd Stolpmann 1 sibling, 0 replies; 57+ messages in thread From: Jon Harrop @ 2009-12-27 16:37 UTC (permalink / raw) To: caml-list On Sunday 27 December 2009 12:45:53 Goswin von Brederlow wrote: > There is one implementation: http://www.algo-prog.info/ocmc/web/ > But as said maybe not a very good one. > > I tried implementing parallel threads under the original GC by forking > multiple instances of the same program and using a Bigarray to mmap > /dev/null for shared memory between the instances. That works for > sharing primitive types, flat records (records of primitive types) and > even fixed (or bound) sized structures but it gets more and more > complex to share each and needs some Obj magic, marshaling or C stubs > (except for primitive types). It works but is a real hack. Once you've conceded to manual memory management (mmap of shared bigarrays) and low-level programming (no polymorphism, Obj.magic) you've lost the main advantages of OCaml and you still cannot get great performance. > I hope someone will pick up the pices and update OCaml4Multicore to > the latest ocaml or maybe for ocaml to add it directly. If not then I > fear ocaml will be left behind soon. Building upon OCaml rather than starting from scratch makes it vastly more difficult to implement a useful parallel GC not just because the GC and code gen must work in harmony together but because OCaml has been so heavily optimized in the wrong direction for this (e.g. a bit twiddled uniform representation that burdens the GC with everything from complex numbers to pairs). That's why I think the best solution is to start from scratch and build a completely separate VM bred for shared-memory parallelism. Indeed, HLVM already outperforms OC4MC even though I have put a fraction of the effort into it and built a lot more surrounding infrastructure. For example, my latest GC only took 5 days to write. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-27 12:45 ` Goswin von Brederlow 2009-12-27 16:37 ` Jon Harrop @ 2009-12-28 12:28 ` Gerd Stolpmann 2009-12-28 15:07 ` Anil Madhavapeddy 2009-12-28 18:05 ` Xavier Leroy 1 sibling, 2 replies; 57+ messages in thread From: Gerd Stolpmann @ 2009-12-28 12:28 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: Jon Harrop, caml-list Am Sonntag, den 27.12.2009, 13:45 +0100 schrieb Goswin von Brederlow: > Jon Harrop <jon@ffconsultancy.com> writes: > > > On Thursday 24 December 2009 13:19:52 Goswin von Brederlow wrote: > >> Jon Harrop <jon@ffconsultancy.com> writes: > >> > No, in OCaml I fork every child. That is the only transparent way to give > >> > the child a coherent view of the heap but it is extremely slow (~1ms): > >> > >> So if you add a (sleep 60) to the ocaml code then ocaml gets even > >> more worse than F#? You are comparing an implementation that is > >> specifically optimized to use things F# is good at and ocaml is > >> bad. To give a fair comparison you need to optimize the implementation > >> to the language. > > > > Post a better OCaml solution if you can. > > > >> >> Each process then has a work queue which is works through. So they will > >> >> always use the local data. Only when a queue runs dry they steal from > >> >> another process and ruin locality. > >> > > >> > You are correctly describing the efficient solution based upon > >> > work-stealing queues that F# uses but OCaml cannot express it. > >> > >> You mean that you didn't implement that way. > > > > No, I mean OCaml cannot express it. > > > >> I can easily express that with closures and pre-forked worker threads. > > > > OCaml's threads do not run in parallel so that will not work. > > >> Or just implement it properly in ocaml. The problem part is the > >> GC. The rest is easy. > > > > No kidding. > > There is one implementation: http://www.algo-prog.info/ocmc/web/ > But as said maybe not a verry good one. > > I tried implementing parallel threads under the original GC by forking > multiple instances of the same program and using a Bigarray to mmap > /dev/null for shared memory between the instances. That works for > sharing primitive types, flat records (records of primitive types) and > even fixed (or bound) sized structures but it gets more and more > complex to share each and needs some Obj magic, marshaling or C stubs > (except for primitive types). It works but is a real hack. It works with all types: https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli look for init_value. It's non-released code yet. However, there are some problems: Values outside the heap do not support the polymorphic comparison and hash functions. That's a hard limitation, e.g. you cannot even compare two strings, or build a hash table with strings as keys. That limits the usefulness of shared memory. Also, as we are striving for safe programs, there is the question when shared memory can be safely released. The GC could help here, but there are no ways to hook in, e.g. to get notified when there are no pointers anymore to a value living in shared memory. Gerd > > I hope someone will pick up the pices and update OCaml4Multicore to > the latest ocaml or maybe for ocaml to add it directly. If not then I > fear ocaml will be left behind soon. > > MfG > Goswin > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-28 12:28 ` Gerd Stolpmann @ 2009-12-28 15:07 ` Anil Madhavapeddy 2009-12-28 18:05 ` Xavier Leroy 1 sibling, 0 replies; 57+ messages in thread From: Anil Madhavapeddy @ 2009-12-28 15:07 UTC (permalink / raw) To: Gerd Stolpmann, Thomas Gazagnaire Cc: Goswin von Brederlow, Jon Harrop, caml-list On 28 Dec 2009, at 12:28, Gerd Stolpmann wrote: > However, there are some problems: Values outside the heap do not support > the polymorphic comparison and hash functions. That's a hard limitation, > e.g. you cannot even compare two strings, or build a hash table with > strings as keys. That limits the usefulness of shared memory. Camlp4 may help here; Thomas Gazagnaire and I have been working on a language-integrated ORM, and it has a reliable type-conv hash generator library [1]. It works with mutable records as well as immutable ones (the main limitation of the built-in polymorphic hash function) but it could be used to generate explicit comparison functions for shared memory as well. We need to split out the various support libraries in the ORM separately at some point anyway, so if the hash generator is of any use I'll cut a release. [1] http://github.com/avsm/ocaml-orm-sqlite/tree/master/hash/ -anill ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-28 12:28 ` Gerd Stolpmann 2009-12-28 15:07 ` Anil Madhavapeddy @ 2009-12-28 18:05 ` Xavier Leroy 2009-12-29 16:44 ` Gerd Stolpmann 1 sibling, 1 reply; 57+ messages in thread From: Xavier Leroy @ 2009-12-28 18:05 UTC (permalink / raw) To: Gerd Stolpmann; +Cc: caml-list Gerd Stolpmann wrote: > It works with all types: > > https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli > > look for init_value. It's non-released code yet. > > However, there are some problems: Values outside the heap do not support > the polymorphic comparison and hash functions. That's a hard limitation, > e.g. you cannot even compare two strings, or build a hash table with > strings as keys. That limits the usefulness of shared memory. In OCaml 3.11 and later, you can call caml_page_table_add(In_static_area, start, end) to inform the run-time system that the address range [start, end) contains well-formed Caml data that polymorphic primitives can safely work on. This should solve your problem. - Xavier Leroy ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Caml-list] Re: multicore wish 2009-12-28 18:05 ` Xavier Leroy @ 2009-12-29 16:44 ` Gerd Stolpmann 0 siblings, 0 replies; 57+ messages in thread From: Gerd Stolpmann @ 2009-12-29 16:44 UTC (permalink / raw) To: Xavier Leroy; +Cc: caml-list Am Montag, den 28.12.2009, 19:05 +0100 schrieb Xavier Leroy: > Gerd Stolpmann wrote: > > > It works with all types: > > > > https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli > > > > look for init_value. It's non-released code yet. > > > > However, there are some problems: Values outside the heap do not support > > the polymorphic comparison and hash functions. That's a hard limitation, > > e.g. you cannot even compare two strings, or build a hash table with > > strings as keys. That limits the usefulness of shared memory. > > In OCaml 3.11 and later, you can call > > caml_page_table_add(In_static_area, start, end) > > to inform the run-time system that the address range [start, end) > contains well-formed Caml data that polymorphic primitives can safely > work on. This should solve your problem. Yes, in deed! Very nice. As there are still unused bits in the page table... I'm thinking about the following. One bit could be used by the GC to indicate that values somewhere in the page are still referenced from heap memory. At the beginning of the mark phase the bit is cleared. Whenever a pointer is seen that goes to a page marked as In_static_area, the bit for the area is set. As before, the mark phase ignores these pointers otherwise (they are not followed). After the mark phase, one could check which of the pages are still used, and this would make it possible to call finalisers for whole pages. As we only add an "else case" to the mark loop, the costs for this are negligible. In the shared memory scenario, each process mapping share memory could define such a finaliser, and only when it is called the memory block is unmapped. (And when a global counter is decreased to 0, the shared memory object as such can be deleted. But that's outside the ocaml runtime.) With that help, shared memory would be relatively safe to use. The only remaining trap would be value pointers that go from shared memory to heap memory - but ocaml has already lots of ways to control mutability of values, so I think it is ok to let the programmer do this. Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [***SPAM*** Score/Req: 10.1/8.0] [Caml-list] Re: OCaml is broken 2009-12-19 19:38 OCaml is broken Jeff Shaw 2009-12-20 4:43 ` [Caml-list] " Jon Harrop @ 2009-12-20 11:56 ` Erik Rigtorp 1 sibling, 0 replies; 57+ messages in thread From: Erik Rigtorp @ 2009-12-20 11:56 UTC (permalink / raw) Cc: caml-list On Sat, Dec 19, 2009 at 20:38, Jeff Shaw <shawjef3@msu.edu> wrote: > My understanding is that since jocaml uses the regular ocaml runtime, it is > also not multicore enabled. > > Haskell is a functional language that has good performance that can use > multiple processors, but the learning curve is steeper and higher. Haskell has as far as i've understood very poor and unpredictable performance. > Note that if you're not using a lot of threads, you can use Unix.fork to do > true multithreaded programming ocaml. I would get to much IPC latency with that solution. ^ permalink raw reply [flat|nested] 57+ messages in thread
end of thread, other threads:[~2010-01-03 18:52 UTC | newest] Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-12-19 19:38 OCaml is broken Jeff Shaw 2009-12-20 4:43 ` [Caml-list] " Jon Harrop 2009-12-20 12:21 ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp 2009-12-20 13:22 ` Martin Jambon 2009-12-20 13:47 ` Yaron Minsky 2009-12-20 16:01 ` Gerd Stolpmann 2009-12-21 22:50 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp 2009-12-22 12:04 ` Erik Rigtorp 2009-12-22 12:27 ` Mihamina Rakotomandimby 2009-12-22 13:27 ` Gerd Stolpmann 2009-12-23 11:25 ` Erik Rigtorp 2009-12-29 12:07 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Richard Jones 2009-12-20 14:27 ` Dario Teixeira 2009-12-20 21:14 ` Jon Harrop 2009-12-21 1:08 ` Gerd Stolpmann 2009-12-21 4:30 ` Jon Harrop 2009-12-21 3:58 ` Yaron Minsky 2009-12-21 5:32 ` Markus Mottl 2009-12-21 13:29 ` Jon Harrop 2009-12-26 17:08 ` orbitz 2009-12-20 19:38 ` [***SPAM*** Score/Req: 10.1/8.0] " Jon Harrop 2009-12-21 12:26 ` Mihamina Rakotomandimby 2009-12-21 14:19 ` general question, was " Keyan 2009-12-21 14:40 ` [Caml-list] " rixed 2009-12-21 14:42 ` Gerd Stolpmann 2009-12-21 15:25 ` Eray Ozkural 2009-12-21 14:50 ` Philip 2009-12-21 15:01 ` Keyan 2009-12-21 15:13 ` Stefano Zacchiroli 2009-12-21 15:27 ` Dario Teixeira 2009-12-21 15:46 ` Jacques Carette 2009-12-21 18:50 ` Jon Harrop 2009-12-21 18:48 ` Jon Harrop 2010-01-03 10:49 ` Sylvain Le Gall 2010-01-03 20:06 ` [Caml-list] " Jon Harrop 2009-12-21 13:07 ` [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] " Damien Doligez 2009-12-21 13:31 ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow 2009-12-21 14:19 ` multicore wish Mihamina Rakotomandimby 2009-12-21 16:15 ` [Caml-list] " Fischbacher T. 2009-12-21 17:42 ` Dario Teixeira 2009-12-21 18:43 ` Jon Harrop 2009-12-21 19:53 ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Jon Harrop 2009-12-22 13:09 ` multicore wish Goswin von Brederlow 2009-12-22 19:12 ` [Caml-list] " Jon Harrop 2009-12-22 18:02 ` Edgar Friendly 2009-12-22 19:20 ` Jon Harrop 2009-12-24 12:58 ` Goswin von Brederlow 2009-12-24 16:51 ` Jon Harrop 2009-12-24 13:19 ` Goswin von Brederlow 2009-12-24 17:06 ` Jon Harrop 2009-12-27 12:45 ` Goswin von Brederlow 2009-12-27 16:37 ` Jon Harrop 2009-12-28 12:28 ` Gerd Stolpmann 2009-12-28 15:07 ` Anil Madhavapeddy 2009-12-28 18:05 ` Xavier Leroy 2009-12-29 16:44 ` Gerd Stolpmann 2009-12-20 11:56 ` [***SPAM*** Score/Req: 10.1/8.0] [Caml-list] Re: OCaml is broken Erik Rigtorp
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox