* SMP multithreading @ 2010-11-15 17:27 Wolfgang Draxinger 2010-11-16 6:46 ` [Caml-list] " Edgar Friendly ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Wolfgang Draxinger @ 2010-11-15 17:27 UTC (permalink / raw) To: caml-list Hi, I've just read http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html in particular this paragraph: | What about hyperthreading? Well, I believe it's the last convulsive | movement of SMP's corpse :-) We'll see how it goes market-wise. At | any rate, the speedups announced for hyperthreading in the Pentium 4 | are below a factor of 1.5; probably not enough to offset the overhead | of making the OCaml runtime system thread-safe. This reads just like the "640k ought be enough for everyone". Multicore systems are the standard today. Even the cheapest consumer machines come with at least two cores. Once can easily get 6 core machines today. Still thinking SMP was a niche and was dying? So, what're the developments regarding SMP multithreading OCaml? Cheers Wolfgang ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-15 17:27 SMP multithreading Wolfgang Draxinger @ 2010-11-16 6:46 ` Edgar Friendly 2010-11-16 17:04 ` Gerd Stolpmann ` (2 more replies) 2010-11-16 12:47 ` Sylvain Le Gall [not found] ` <AANLkTinyN2hHxm6ha2Yq4nx6NxY3So=BhFN_-EHKYfyc@mail.gmail.com> 2 siblings, 3 replies; 28+ messages in thread From: Edgar Friendly @ 2010-11-16 6:46 UTC (permalink / raw) To: caml-list On 11/15/2010 09:27 AM, Wolfgang Draxinger wrote: > Hi, > > I've just read > http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html > in particular this paragraph: > | What about hyperthreading? Well, I believe it's the last convulsive > | movement of SMP's corpse :-) We'll see how it goes market-wise. At > | any rate, the speedups announced for hyperthreading in the Pentium 4 > | are below a factor of 1.5; probably not enough to offset the overhead > | of making the OCaml runtime system thread-safe. > > This reads just like the "640k ought be enough for everyone". Multicore > systems are the standard today. Even the cheapest consumer machines > come with at least two cores. Once can easily get 6 core machines today. > > Still thinking SMP was a niche and was dying? > > So, what're the developments regarding SMP multithreading OCaml? > > > Cheers > > Wolfgang > At the risk of feeding a (possibly unintentional) troll, I'd like to share some possibly new thoughts on this ever-living topic. It looks like high-performance computing of the near future will be built out of many machines (message passing), each with many cores (SMP). One could use message passing for all communication in such a system, but a hybrid approach might be best for this architecture, with use of shared memory within each box and message passing between. Of course the best choice depends strongly on the particular task. In the long run, it'll likely be a combination of a few large, powerful cores (Intel-CPU style w/ the capability to run a single thread as fast as possible) with many many smaller compute engines (GPGPUs or the like, optimized for power and area, closely coupled with memory) that provides the highest performance density. The question of how to program such an architecture seems as if it's being answered without the functional community's input. What can we contribute? E. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-16 6:46 ` [Caml-list] " Edgar Friendly @ 2010-11-16 17:04 ` Gerd Stolpmann 2010-11-16 20:35 ` Eray Ozkural 2010-11-16 19:07 ` Norman Hardy 2010-11-17 16:34 ` David Allsopp 2 siblings, 1 reply; 28+ messages in thread From: Gerd Stolpmann @ 2010-11-16 17:04 UTC (permalink / raw) To: Edgar Friendly; +Cc: caml-list Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly: > On 11/15/2010 09:27 AM, Wolfgang Draxinger wrote: > > Hi, > > > > I've just read > > http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html > > in particular this paragraph: > > | What about hyperthreading? Well, I believe it's the last convulsive > > | movement of SMP's corpse :-) We'll see how it goes market-wise. At > > | any rate, the speedups announced for hyperthreading in the Pentium 4 > > | are below a factor of 1.5; probably not enough to offset the overhead > > | of making the OCaml runtime system thread-safe. > > > > This reads just like the "640k ought be enough for everyone". Multicore > > systems are the standard today. Even the cheapest consumer machines > > come with at least two cores. Once can easily get 6 core machines today. > > > > Still thinking SMP was a niche and was dying? > > > > So, what're the developments regarding SMP multithreading OCaml? > > > > > > Cheers > > > > Wolfgang > > > At the risk of feeding a (possibly unintentional) troll, I'd like to > share some possibly new thoughts on this ever-living topic. > > It looks like high-performance computing of the near future will be > built out of many machines (message passing), each with many cores > (SMP). One could use message passing for all communication in such a > system, but a hybrid approach might be best for this architecture, with > use of shared memory within each box and message passing between. Of > course the best choice depends strongly on the particular task. > > In the long run, it'll likely be a combination of a few large, powerful > cores (Intel-CPU style w/ the capability to run a single thread as fast > as possible) with many many smaller compute engines (GPGPUs or the like, > optimized for power and area, closely coupled with memory) that provides > the highest performance density. > > The question of how to program such an architecture seems as if it's > being answered without the functional community's input. What can we > contribute? Yes, that's generally the right question. Current hardware is a kind of experiment - vendors have only taken the multicore path because it is right now the easiest way of improving the performance potential, although it is questionable whether (non-server) applications can really benefit from it (excluding here server apps because for these parallelization is relatively easy to get). Future hardware will probably be even more different - however, it is still unclear which design paths will be taken. Could be manycores (many CPUs with non-uniform RAM), could be specialized compute units. Maybe we'll see again a separation of consumer and datacenter markets - the former optimizing for numeric simulation applications (i.e. games), the latter for high-throughput data paths and parallel CPU power. The problem here is that this is all speculation. There are some things we can do to improve the situation (and some ideas are not realistic): * A probably not-so-difficult improvement would be better message passing between independent but local processes. I've started an experiment for such a mechanism (http://projects.camlcity.org/projects/dl/ocamlnet-3.0.3/doc/html-main/Netcamlbox.html), which tries to exploit that GC-managed memory has a well-known structure. With more help from the GC this could be made even better (safer, fewer corner cases). * We need more frameworks for parallel programming. I'm currently developing Plasma, a Map/Reduce framework. Using a framework has the big advantage that the whole program is structured so it profits from parallelization, and that it is possible to train developers for it that have no idea about parallelization. There are probably more algorithm schemes where this is possible. * I have a lot of doubts whether FP languages ever run well on SMP with a bigger number of cores. The problem is the relatively high memory allocation rate - the GC has to work a lot harder than in imperative languages. The OC4MC project uses thread-local minor heaps because of this. Probably this is not enough, and one even needs thread-local major heaps (plus a third generation for values accessed by several threads). All in all you could get the same effect by instantiating the ocaml runtime several times (if this were possible), let each runtime run in its own thread, and provide some extra functionality for passing values between threads and for sharing values. This would not be exactly the SMP model, but would allow a number of parallelization techniques, and is probably future-proof as it encourages message passing over sharing. This is certainly worth experimentation. * One can also tackle the problem from the multi-processing side: Provide better mechanisms for message passing (see above) and value sharing. (That's probably the path I'll follow for my own experiments - no modifications of the runtime, but play tricks with the OS.) * As somebody mentioned "implicit parallelization": Don't expect anything from this. Even if a good compiler finds ways to parallelize 20% of the code (which would be a lot), the runtime effect would be marginal. 80% of the code is run at normal speed (hopefully) and dominates the runtime behavior. The point is that such compiler-driven code improvements are only local optimizations. For getting good parallelization results you need to restructure the design of the program - well, maybe compiler2.0 can do this at some time, but this is not in sight. * Looking for more "automatic" speedups: I would more look for parallelizing parts of the GC (e.g. parallelized sweep), but this is probably running quickly against the memory bandwidth limit. Maybe using 2 cores for the GC would result in an improvement, and more cores get you nothing extra. At least worth an experiment. Gerd > > E. > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-16 17:04 ` Gerd Stolpmann @ 2010-11-16 20:35 ` Eray Ozkural 2010-11-16 22:13 ` Gerd Stolpmann 0 siblings, 1 reply; 28+ messages in thread From: Eray Ozkural @ 2010-11-16 20:35 UTC (permalink / raw) To: Gerd Stolpmann; +Cc: Edgar Friendly, caml-list [-- Attachment #1: Type: text/plain, Size: 1023 bytes --] On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann <info@gerd-stolpmann.de>wrote: > Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly: > * As somebody mentioned "implicit parallelization": Don't expect > anything from this. Even if a good compiler finds ways to > parallelize 20% of the code (which would be a lot), the runtime > effect would be marginal. 80% of the code is run at normal speed > (hopefully) and dominates the runtime behavior. The point is > that such compiler-driven code improvements are only local > optimizations. For getting good parallelization results you need > to restructure the design of the program - well, maybe > compiler2.0 can do this at some time, but this is not in sight. > I think you are underestimating parallelizing compilers. -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 1535 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-16 20:35 ` Eray Ozkural @ 2010-11-16 22:13 ` Gerd Stolpmann 2010-11-16 23:04 ` Eray Ozkural 0 siblings, 1 reply; 28+ messages in thread From: Gerd Stolpmann @ 2010-11-16 22:13 UTC (permalink / raw) To: Eray Ozkural; +Cc: caml-list Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural: > > > On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann > <info@gerd-stolpmann.de> wrote: > Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar > Friendly: > * As somebody mentioned "implicit parallelization": Don't > expect > anything from this. Even if a good compiler finds ways > to > parallelize 20% of the code (which would be a lot), the > runtime > effect would be marginal. 80% of the code is run at > normal speed > (hopefully) and dominates the runtime behavior. The > point is > that such compiler-driven code improvements are only > local > optimizations. For getting good parallelization results > you need > to restructure the design of the program - well, maybe > compiler2.0 can do this at some time, but this is not > in sight. > > I think you are underestimating parallelizing compilers. I was more citing Amdahl's law, and did not want to criticize any effort in this area. It's more the usefulness for the majority of problems. How useful is the best parallelizing compiler if only a small part of the program _can_ actually benefit from it? Think about it. If you are not working in an area where many subroutines can be sped up, you consider this way of parallelizing as a waste of time. And this is still true for the majority of problems. Also, for many problems that can be tackled, the scalability is very limited. Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-16 22:13 ` Gerd Stolpmann @ 2010-11-16 23:04 ` Eray Ozkural 2010-11-16 23:52 ` Wolfgang Draxinger 2010-11-17 3:47 ` Jon Harrop 0 siblings, 2 replies; 28+ messages in thread From: Eray Ozkural @ 2010-11-16 23:04 UTC (permalink / raw) To: Gerd Stolpmann; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 6811 bytes --] On Wed, Nov 17, 2010 at 12:13 AM, Gerd Stolpmann <info@gerd-stolpmann.de>wrote: > Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural: > > > > > > On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann > > <info@gerd-stolpmann.de> wrote: > > Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar > > Friendly: > > * As somebody mentioned "implicit parallelization": Don't > > expect > > anything from this. Even if a good compiler finds ways > > to > > parallelize 20% of the code (which would be a lot), the > > runtime > > effect would be marginal. 80% of the code is run at > > normal speed > > (hopefully) and dominates the runtime behavior. The > > point is > > that such compiler-driven code improvements are only > > local > > optimizations. For getting good parallelization results > > you need > > to restructure the design of the program - well, maybe > > compiler2.0 can do this at some time, but this is not > > in sight. > > > > I think you are underestimating parallelizing compilers. > > I was more citing Amdahl's law, and did not want to criticize any effort > in this area. It's more the usefulness for the majority of problems. How > useful is the best parallelizing compiler if only a small part of the > program _can_ actually benefit from it? Think about it. If you are not > working in an area where many subroutines can be sped up, you consider > this way of parallelizing as a waste of time. And this is still true for > the majority of problems. Also, for many problems that can be tackled, > the scalability is very limited. > What makes you think only 20% of the code can be parallelized? I've worked in such a compiler project, and there were way too many opportunities for parallelization in ordinary C code, let alone a functional language; implicit parallelism would work wonders there. Of course you may think whatever you were thinking, but I know that a high degree of parallelism can be achieved through a functional language. I can't tell you much more though. If you think that a very small portion of the code can be parallelized you probably do not appreciate the kinds of static and dynamic analysis those compilers perform. Of course if you are thinking of applying it to some office or e-mail application, this might not be the case, but automatic parallelization strategies would work best when you apply them to a computationally intensive program. The really limiting factor for current functional languages would be their reliance on inherently sequential primitives like list processing, which may in some cases limit the compiler to only pipeline parallelism (something non-trivial to do by hand actually). Instead they would have to get their basic forms from high-level parallel PLs (which might mean rewriting a lot of things). The programs could look like something out of a category theory textbook then. But I think even without modification you could get a lot of speedup from ocaml code by applying state-of-the-art automatic parallelization techniques. By the way, how much of the serial work is parallelized that's what matters rather than the code length that is. Even if the parallelism is not so obvious, the analysis can find out which iterations are parallelizable, which variables have which kinds of dependencies, memory dependencies, etc. etc. In the public, there is this perception that automatic parallelization does not work, which is wrong, while it is true that the popular compiler vendors do not understand much in this regard, all I've seen (in popular products) are lame attempts to generate vector instructions.... That is not to say, that implicit parallelization of current code can replace parallel algorithm design (which is AI-complete). Rather, I think, implicit parallelization is one of the things that will help parallel computing people: by having them work at a more abstract level, exposing more concurrency through functional forms, yet avoiding writing low-level comms code. High-level explicit parallelism is also quite favorable, as there may be many situations where, say, dynamic load-balancing approaches are suitable. The approach of HPF might be relevant here, with the programmer making annotations to guide the distribution of data structures, and the rest inferred from code. So whoever says that isn't possible, probably hasn't read up much in the computer architecture community. Probably even expert PL researchers may be misled here, as they have made the quoted remark or similar remarks about multi-core/SMP architectures. It was known for a very long time (20+ years) that the clock speeds would hit a wall and then we'd have to expend more area. This is true regardless of the underlying architecture/technology actually. Mother nature is parallel. It is the sequence that is an abstraction. And I wonder what is more natural than expressing a solution to a problem in terms of functions and sets? The kind of non-leaky abstractions required can be provided by a type-safe functional language and then the compiler will have a lot more to work on than C code, although analysis can reveal much about even C code. With the correct heuristics, it might decide on good data and task distribution strategies, translating say a set constructor notation to an efficient parallel algorithm, why not?? What I say applies to parallelization based on analysis, of course for functional languages, other parallelization strategies are also possible, I suppose low-level task parallelization and dynamic load-balancing strategies (where functions or closures are distributed) are the most popular (the reasoning being there can be many independent function evaluations concurrently), I wonder if it has been attempted at all for ocaml? The impurity of the language can be dealt with easily. And is there anyone who might make suggestions to somebody who would like to work on automatic parallelization of ocaml code? I actually think it might be fun to try some of the simpler strategies and see how they yield on current hardware platforms like multi-core and GPU. We ideally wouldn't like just good automatic parallelization of, say, linear algebra code, but also cool stuff like functional data structures. Recursive procedures can be mapped to threads, it might match best hand-written parallelizations, actually, and you'd get platform independence as a bonus if the compiler is well-designed :) Best, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 7635 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-16 23:04 ` Eray Ozkural @ 2010-11-16 23:52 ` Wolfgang Draxinger 2010-11-17 1:55 ` Eray Ozkural 2010-11-17 3:41 ` Jon Harrop 2010-11-17 3:47 ` Jon Harrop 1 sibling, 2 replies; 28+ messages in thread From: Wolfgang Draxinger @ 2010-11-16 23:52 UTC (permalink / raw) To: caml-list On Wed, 17 Nov 2010 01:04:54 +0200 Eray Ozkural <examachine@gmail.com> wrote: > [readworthy text] I'd like to point out how the big competitor to OCaml deals with it. The GHC Haskell system has SMP parallization built in for some time, and it does it quite well. Wolfgang ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-16 23:52 ` Wolfgang Draxinger @ 2010-11-17 1:55 ` Eray Ozkural 2010-11-17 3:41 ` Jon Harrop 1 sibling, 0 replies; 28+ messages in thread From: Eray Ozkural @ 2010-11-17 1:55 UTC (permalink / raw) To: Wolfgang Draxinger; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 881 bytes --] On Wed, Nov 17, 2010 at 1:52 AM, Wolfgang Draxinger < wdraxinger.maillist@draxit.de> wrote: > On Wed, 17 Nov 2010 01:04:54 +0200 > Eray Ozkural <examachine@gmail.com> wrote: > > > [readworthy text] > > I'd like to point out how the big competitor to OCaml deals with it. > The GHC Haskell system has SMP parallization built in for some time, > and it does it quite well. > I think I tested the parallel features just once in the distant past, something like that would be so useful for ocaml :) Explicit threading that is suited to functional programming with a syntax independent of the actual thread implementation. The par combinator looks like fun to use. Wishlist item definitely :) Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 1435 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [Caml-list] SMP multithreading 2010-11-16 23:52 ` Wolfgang Draxinger 2010-11-17 1:55 ` Eray Ozkural @ 2010-11-17 3:41 ` Jon Harrop 1 sibling, 0 replies; 28+ messages in thread From: Jon Harrop @ 2010-11-17 3:41 UTC (permalink / raw) To: 'Wolfgang Draxinger', caml-list Wolfgang wrote: > I'd like to point out how the big competitor to OCaml deals with it. > The GHC Haskell system has SMP parallization built in for some time, > and it does it quite well. I beg to differ. Upon trying to reproduce many of the Haskell community's results, I found that even their own parallel Haskell programs often exhibit huge slowdowns. This is because Haskell's unpredictable performance leads to unpredictable granularity and, consequently, more time can be spent administering the tiny parallel computations than is gained by doing so. The results I found here are typical: http://flyingfrogblog.blogspot.com/2010/01/naive-parallelism-with-hlvm.html Note that the absolute performance peaks at an unpredictable number of cores only in the case of Haskell. This is because the GC does not scale beyond about 4 cores for any Haskell programs doing significant amounts of allocation, which is basically all Haskell programs because allocations are everywhere in Haskell. Ultimately, running on all cores attains no speedup at all with Haskell in that case. This was branded "the last core slowdown" but the slowdown clearly started well before all 8 cores. There was a significant development towards improving this situation but it won't fix the granularity problem: http://hackage.haskell.org/trac/ghc/blog/new-gc-preview The paper "Regular, shape-polymorphic, parallel arrays in Haskell" cites 2.5x speedups when existing techniques were not only already getting 7x speedups but better absolute performance as well. Cache complexity is the problem, as I explained here: http://flyingfrogblog.blogspot.com/2010/06/regular-shape-polymorphic-paralle l.html Probably the best solution for multicore programming is Cilk. This technique has already been adopted both in Intel's TBB and Microsoft's .NET 4 but, AFAIK, the only functional language with access to it is F#. There are some great papers on multicore-friendly cache oblivious algorithms written in Cilk: http://www.fftw.org/~athena/papers/tocs08.pdf Note, in particular, that Cilk is not only much faster but also much easier to use than explicit message passing. To do something like this, threads need to be able to run in parallel and mutate the same shared heap. Although that is objectively easy (I did it in HLVM), OCaml's reliance upon very high allocation rates, efficient collection of young garbage and a ridiculous density of pointers in the heap make it a *lot* harder. Cheers, Jon. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [Caml-list] SMP multithreading 2010-11-16 23:04 ` Eray Ozkural 2010-11-16 23:52 ` Wolfgang Draxinger @ 2010-11-17 3:47 ` Jon Harrop 2010-11-17 4:27 ` Eray Ozkural 1 sibling, 1 reply; 28+ messages in thread From: Jon Harrop @ 2010-11-17 3:47 UTC (permalink / raw) To: 'Eray Ozkural'; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 7439 bytes --] Granularity and cache complexity are the reasons why not. If you find anything and everything that can be done in parallel and parallelize it then you generally obtain only slowdowns. An essential trick is to exploit locality via mutation but, of course, purely functional programming sucks at that by design not least because it is striving to abstract that concept away. I share your dream but I doubt it will ever be realized. Cheers, Jon. From: caml-list-bounces@yquem.inria.fr [mailto:caml-list-bounces@yquem.inria.fr] On Behalf Of Eray Ozkural Sent: 16 November 2010 23:05 To: Gerd Stolpmann Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] SMP multithreading On Wed, Nov 17, 2010 at 12:13 AM, Gerd Stolpmann <info@gerd-stolpmann.de> wrote: Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural: > > > On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann > <info@gerd-stolpmann.de> wrote: > Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar > Friendly: > * As somebody mentioned "implicit parallelization": Don't > expect > anything from this. Even if a good compiler finds ways > to > parallelize 20% of the code (which would be a lot), the > runtime > effect would be marginal. 80% of the code is run at > normal speed > (hopefully) and dominates the runtime behavior. The > point is > that such compiler-driven code improvements are only > local > optimizations. For getting good parallelization results > you need > to restructure the design of the program - well, maybe > compiler2.0 can do this at some time, but this is not > in sight. > > I think you are underestimating parallelizing compilers. I was more citing Amdahl's law, and did not want to criticize any effort in this area. It's more the usefulness for the majority of problems. How useful is the best parallelizing compiler if only a small part of the program _can_ actually benefit from it? Think about it. If you are not working in an area where many subroutines can be sped up, you consider this way of parallelizing as a waste of time. And this is still true for the majority of problems. Also, for many problems that can be tackled, the scalability is very limited. What makes you think only 20% of the code can be parallelized? I've worked in such a compiler project, and there were way too many opportunities for parallelization in ordinary C code, let alone a functional language; implicit parallelism would work wonders there. Of course you may think whatever you were thinking, but I know that a high degree of parallelism can be achieved through a functional language. I can't tell you much more though. If you think that a very small portion of the code can be parallelized you probably do not appreciate the kinds of static and dynamic analysis those compilers perform. Of course if you are thinking of applying it to some office or e-mail application, this might not be the case, but automatic parallelization strategies would work best when you apply them to a computationally intensive program. The really limiting factor for current functional languages would be their reliance on inherently sequential primitives like list processing, which may in some cases limit the compiler to only pipeline parallelism (something non-trivial to do by hand actually). Instead they would have to get their basic forms from high-level parallel PLs (which might mean rewriting a lot of things). The programs could look like something out of a category theory textbook then. But I think even without modification you could get a lot of speedup from ocaml code by applying state-of-the-art automatic parallelization techniques. By the way, how much of the serial work is parallelized that's what matters rather than the code length that is. Even if the parallelism is not so obvious, the analysis can find out which iterations are parallelizable, which variables have which kinds of dependencies, memory dependencies, etc. etc. In the public, there is this perception that automatic parallelization does not work, which is wrong, while it is true that the popular compiler vendors do not understand much in this regard, all I've seen (in popular products) are lame attempts to generate vector instructions.... That is not to say, that implicit parallelization of current code can replace parallel algorithm design (which is AI-complete). Rather, I think, implicit parallelization is one of the things that will help parallel computing people: by having them work at a more abstract level, exposing more concurrency through functional forms, yet avoiding writing low-level comms code. High-level explicit parallelism is also quite favorable, as there may be many situations where, say, dynamic load-balancing approaches are suitable. The approach of HPF might be relevant here, with the programmer making annotations to guide the distribution of data structures, and the rest inferred from code. So whoever says that isn't possible, probably hasn't read up much in the computer architecture community. Probably even expert PL researchers may be misled here, as they have made the quoted remark or similar remarks about multi-core/SMP architectures. It was known for a very long time (20+ years) that the clock speeds would hit a wall and then we'd have to expend more area. This is true regardless of the underlying architecture/technology actually. Mother nature is parallel. It is the sequence that is an abstraction. And I wonder what is more natural than expressing a solution to a problem in terms of functions and sets? The kind of non-leaky abstractions required can be provided by a type-safe functional language and then the compiler will have a lot more to work on than C code, although analysis can reveal much about even C code. With the correct heuristics, it might decide on good data and task distribution strategies, translating say a set constructor notation to an efficient parallel algorithm, why not?? What I say applies to parallelization based on analysis, of course for functional languages, other parallelization strategies are also possible, I suppose low-level task parallelization and dynamic load-balancing strategies (where functions or closures are distributed) are the most popular (the reasoning being there can be many independent function evaluations concurrently), I wonder if it has been attempted at all for ocaml? The impurity of the language can be dealt with easily. And is there anyone who might make suggestions to somebody who would like to work on automatic parallelization of ocaml code? I actually think it might be fun to try some of the simpler strategies and see how they yield on current hardware platforms like multi-core and GPU. We ideally wouldn't like just good automatic parallelization of, say, linear algebra code, but also cool stuff like functional data structures. Recursive procedures can be mapped to threads, it might match best hand-written parallelizations, actually, and you'd get platform independence as a bonus if the compiler is well-designed :) Best, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 11764 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-17 3:47 ` Jon Harrop @ 2010-11-17 4:27 ` Eray Ozkural 2010-11-17 6:50 ` Gabriel Kerneis 0 siblings, 1 reply; 28+ messages in thread From: Eray Ozkural @ 2010-11-17 4:27 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 9601 bytes --] Oh well, I'm not so surprised that the fine-grain task-parallelism with (?) dynamic load-balancing strategy doesn't get much speedup. Doing HPC with Haskell is a bit like using Java for writing parallel programs, you might as well use a C-64 and Commodore BASIC. And yes, some people do use Java with MPI. Java people have benchmarks too :) But for some reason I had difficulty using Java and Haskell even with medium size problems. On the other hand, a lot more can be achieved with a parallelizing compiler that uses profiling and static analysis. As I said even in C good results can be achieved, I've seen that, so I know it's doable with ocaml, just a difficult kind of compiler. The functional features would expose more concurrency. At any rate, implicit parallelism isn't the same as a parallelizing compiler, it's better, because you would be using primitives, that the compiler knows to its heart. That's like combining the best of both worlds, I think, because obviously parallelizing compilers can work best on the easier kinds of parallelism. It can pull more tricks than many assume, but it would still not replace a parallel algorithm designer. You don't really expect to give quicksort as input and get hypercube quicksort as output in these parallelizing compilers that apply a number of heuristic transformations to the code, but in many problems they can be made to generate pretty good code. The best part is once you have it you can apply it to every program, it's one of the cheapest ways to get speedup, so I'd say it's worthwhile for ocaml right now. Just not the way GHC does. On Wed, Nov 17, 2010 at 5:47 AM, Jon Harrop < jonathandeanharrop@googlemail.com> wrote: > Granularity and cache complexity are the reasons why not. If you find > anything and everything that can be done in parallel and parallelize it then > you generally obtain only slowdowns. An essential trick is to exploit > locality via mutation but, of course, purely functional programming sucks at > that by design not least because it is striving to abstract that concept > away. > > > > I share your dream but I doubt it will ever be realized. > > > > Cheers, > > Jon. > > > > *From:* caml-list-bounces@yquem.inria.fr [mailto: > caml-list-bounces@yquem.inria.fr] *On Behalf Of *Eray Ozkural > *Sent:* 16 November 2010 23:05 > *To:* Gerd Stolpmann > *Cc:* caml-list@yquem.inria.fr > *Subject:* Re: [Caml-list] SMP multithreading > > > > > > On Wed, Nov 17, 2010 at 12:13 AM, Gerd Stolpmann <info@gerd-stolpmann.de> > wrote: > > Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural: > > > > > > > On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann > > <info@gerd-stolpmann.de> wrote: > > Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar > > Friendly: > > * As somebody mentioned "implicit parallelization": Don't > > expect > > anything from this. Even if a good compiler finds ways > > to > > parallelize 20% of the code (which would be a lot), the > > runtime > > effect would be marginal. 80% of the code is run at > > normal speed > > (hopefully) and dominates the runtime behavior. The > > point is > > that such compiler-driven code improvements are only > > local > > optimizations. For getting good parallelization results > > you need > > to restructure the design of the program - well, maybe > > compiler2.0 can do this at some time, but this is not > > in sight. > > > > I think you are underestimating parallelizing compilers. > > I was more citing Amdahl's law, and did not want to criticize any effort > in this area. It's more the usefulness for the majority of problems. How > useful is the best parallelizing compiler if only a small part of the > program _can_ actually benefit from it? Think about it. If you are not > working in an area where many subroutines can be sped up, you consider > this way of parallelizing as a waste of time. And this is still true for > the majority of problems. Also, for many problems that can be tackled, > the scalability is very limited. > > > > What makes you think only 20% of the code can be parallelized? I've worked > in such a compiler project, and there were way too many opportunities for > parallelization in ordinary C code, let alone a functional language; > implicit parallelism would work wonders there. Of course you may think > whatever you were thinking, but I know that a high degree of parallelism can > be achieved through a functional language. I can't tell you much more > though. If you think that a very small portion of the code can be > parallelized you probably do not appreciate the kinds of static and dynamic > analysis those compilers perform. Of course if you are thinking of applying > it to some office or e-mail application, this might not be the case, but > automatic parallelization strategies would work best when you apply them to > a computationally intensive program. > > The really limiting factor for current functional languages would be their > reliance on inherently sequential primitives like list processing, which may > in some cases limit the compiler to only pipeline parallelism (something > non-trivial to do by hand actually). Instead they would have to get their > basic forms from high-level parallel PLs (which might mean rewriting a lot > of things). The programs could look like something out of a category theory > textbook then. But I think even without modification you could get a lot of > speedup from ocaml code by applying state-of-the-art automatic > parallelization techniques. By the way, how much of the serial work is > parallelized that's what matters rather than the code length that is. Even > if the parallelism is not so obvious, the analysis can find out which > iterations are parallelizable, which variables have which kinds of > dependencies, memory dependencies, etc. etc. In the public, there is this > perception that automatic parallelization does not work, which is wrong, > while it is true that the popular compiler vendors do not understand much in > this regard, all I've seen (in popular products) are lame attempts to > generate vector instructions.... > > That is not to say, that implicit parallelization of current code can > replace parallel algorithm design (which is AI-complete). Rather, I think, > implicit parallelization is one of the things that will help parallel > computing people: by having them work at a more abstract level, exposing > more concurrency through functional forms, yet avoiding writing low-level > comms code. High-level explicit parallelism is also quite favorable, as > there may be many situations where, say, dynamic load-balancing approaches > are suitable. The approach of HPF might be relevant here, with the > programmer making annotations to guide the distribution of data structures, > and the rest inferred from code. > > So whoever says that isn't possible, probably hasn't read up much in the > computer architecture community. Probably even expert PL researchers may be > misled here, as they have made the quoted remark or similar remarks about > multi-core/SMP architectures. It was known for a very long time (20+ years) > that the clock speeds would hit a wall and then we'd have to expend more > area. This is true regardless of the underlying architecture/technology > actually. Mother nature is parallel. It is the sequence that is an > abstraction. And I wonder what is more natural than expressing a solution to > a problem in terms of functions and sets? The kind of non-leaky abstractions > required can be provided by a type-safe functional language and then the > compiler will have a lot more to work on than C code, although analysis can > reveal much about even C code. With the correct heuristics, it might decide > on good data and task distribution strategies, translating say a set > constructor notation to an efficient parallel algorithm, why not?? > > What I say applies to parallelization based on analysis, of course for > functional languages, other parallelization strategies are also possible, I > suppose low-level task parallelization and dynamic load-balancing strategies > (where functions or closures are distributed) are the most popular (the > reasoning being there can be many independent function evaluations > concurrently), I wonder if it has been attempted at all for ocaml? The > impurity of the language can be dealt with easily. And is there anyone who > might make suggestions to somebody who would like to work on automatic > parallelization of ocaml code? I actually think it might be fun to try some > of the simpler strategies and see how they yield on current hardware > platforms like multi-core and GPU. We ideally wouldn't like just good > automatic parallelization of, say, linear algebra code, but also cool stuff > like functional data structures. Recursive procedures can be mapped to > threads, it might match best hand-written parallelizations, actually, and > you'd get platform independence as a bonus if the compiler is well-designed > :) > > Best, > > > -- > Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara > http://groups.yahoo.com/group/ai-philosophy > http://myspace.com/arizanesil http://myspace.com/malfunct > -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 12257 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-17 4:27 ` Eray Ozkural @ 2010-11-17 6:50 ` Gabriel Kerneis 2010-11-17 13:41 ` Eray Ozkural 0 siblings, 1 reply; 28+ messages in thread From: Gabriel Kerneis @ 2010-11-17 6:50 UTC (permalink / raw) To: Eray Ozkural; +Cc: Jon Harrop, caml-list On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote: > As I said even in C good results can be achieved, I've seen that, so I > know it's doable with ocaml, just a difficult kind of compiler. The > functional features would expose more concurrency. Could you share a pointer to a paper describing this compiler? Thanks, -- Gabriel Kerneis ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-17 6:50 ` Gabriel Kerneis @ 2010-11-17 13:41 ` Eray Ozkural 2010-11-17 21:15 ` Jon Harrop 0 siblings, 1 reply; 28+ messages in thread From: Eray Ozkural @ 2010-11-17 13:41 UTC (permalink / raw) To: Eray Ozkural, Jon Harrop, caml-list [-- Attachment #1: Type: text/plain, Size: 1461 bytes --] On Wed, Nov 17, 2010 at 8:50 AM, Gabriel Kerneis <kerneis@pps.jussieu.fr>wrote: > On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote: > > As I said even in C good results can be achieved, I've seen that, so I > > know it's doable with ocaml, just a difficult kind of compiler. The > > functional features would expose more concurrency. > > Could you share a pointer to a paper describing this compiler? > > I can't reveal much, but just to point out that there are indeed more sophisticated compilers than gcc: http://www.research.ibm.com/vliw/compiler.html So, uh, there are compilers that turn loops into threads, and also parallelize independent blocks.... Both coarse-grain and fine-grain parallelization strategies in existing compiler research can be effectively applied to the multi-core architectures. In fact, some of the more advanced compilers (like that of the RAW architecture) must be able to target it already, but who knows. :) Just consider that most of the parallelization technology is language independent, they can be applied to any imperative language. So, would such a thing be able to work on ocaml generated binaries? Most definitely, I believe, it is in principle possible to start from the sequential binary and emit parallel code! Best, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 2056 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [Caml-list] SMP multithreading 2010-11-17 13:41 ` Eray Ozkural @ 2010-11-17 21:15 ` Jon Harrop 2010-11-18 0:28 ` Eray Ozkural 0 siblings, 1 reply; 28+ messages in thread From: Jon Harrop @ 2010-11-17 21:15 UTC (permalink / raw) To: 'Eray Ozkural', caml-list [-- Attachment #1: Type: text/plain, Size: 1704 bytes --] Can you cite any papers from this century? ;-) Cheers, Jon. From: Eray Ozkural [mailto:examachine@gmail.com] Sent: 17 November 2010 13:41 To: Eray Ozkural; Jon Harrop; caml-list@yquem.inria.fr Subject: Re: [Caml-list] SMP multithreading On Wed, Nov 17, 2010 at 8:50 AM, Gabriel Kerneis <kerneis@pps.jussieu.fr> wrote: On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote: > As I said even in C good results can be achieved, I've seen that, so I > know it's doable with ocaml, just a difficult kind of compiler. The > functional features would expose more concurrency. Could you share a pointer to a paper describing this compiler? I can't reveal much, but just to point out that there are indeed more sophisticated compilers than gcc: http://www.research.ibm.com/vliw/compiler.html So, uh, there are compilers that turn loops into threads, and also parallelize independent blocks.... Both coarse-grain and fine-grain parallelization strategies in existing compiler research can be effectively applied to the multi-core architectures. In fact, some of the more advanced compilers (like that of the RAW architecture) must be able to target it already, but who knows. :) Just consider that most of the parallelization technology is language independent, they can be applied to any imperative language. So, would such a thing be able to work on ocaml generated binaries? Most definitely, I believe, it is in principle possible to start from the sequential binary and emit parallel code! Best, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 5126 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-17 21:15 ` Jon Harrop @ 2010-11-18 0:28 ` Eray Ozkural 2010-11-18 1:00 ` Eray Ozkural 0 siblings, 1 reply; 28+ messages in thread From: Eray Ozkural @ 2010-11-18 0:28 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 2084 bytes --] Yes, actually. :P On Wed, Nov 17, 2010 at 11:15 PM, Jon Harrop < jonathandeanharrop@googlemail.com> wrote: > Can you cite any papers from this century? ;-) > > > > Cheers, > > Jon. > > > > *From:* Eray Ozkural [mailto:examachine@gmail.com] > *Sent:* 17 November 2010 13:41 > *To:* Eray Ozkural; Jon Harrop; caml-list@yquem.inria.fr > > *Subject:* Re: [Caml-list] SMP multithreading > > > > On Wed, Nov 17, 2010 at 8:50 AM, Gabriel Kerneis <kerneis@pps.jussieu.fr> > wrote: > > On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote: > > As I said even in C good results can be achieved, I've seen that, so I > > know it's doable with ocaml, just a difficult kind of compiler. The > > functional features would expose more concurrency. > > Could you share a pointer to a paper describing this compiler? > > > I can't reveal much, but just to point out that there are indeed more > sophisticated compilers than gcc: > http://www.research.ibm.com/vliw/compiler.html > > So, uh, there are compilers that turn loops into threads, and also > parallelize independent blocks.... Both coarse-grain and fine-grain > parallelization strategies in existing compiler research can be effectively > applied to the multi-core architectures. In fact, some of the more advanced > compilers (like that of the RAW architecture) must be able to target it > already, but who knows. :) Just consider that most of the parallelization > technology is language independent, they can be applied to any imperative > language. So, would such a thing be able to work on ocaml generated > binaries? Most definitely, I believe, it is in principle possible to start > from the sequential binary and emit parallel code! > > Best, > > > > -- > Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara > http://groups.yahoo.com/group/ai-philosophy > http://myspace.com/arizanesil http://myspace.com/malfunct > -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 4643 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-18 0:28 ` Eray Ozkural @ 2010-11-18 1:00 ` Eray Ozkural 0 siblings, 0 replies; 28+ messages in thread From: Eray Ozkural @ 2010-11-18 1:00 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 4486 bytes --] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1650134 This is one of the more recent papers a quick search turns up, but you have to keep in mind that thread extraction is only one problem among many for a parallelizing compiler. I think the keyword you are looking for is "thread extraction". And here probably, it's the simplest kind of extraction... Food for some thought: assume that you have a very good compiler pass that extracts all possible threads in the sequential code, can you name any other problems the compiler must solve to achieve good performance? I can't talk at all about the project I worked on, but as I mentioned previously, familiarize yourself with the RAW project, it was similar in some respects to the project I worked in: http://groups.csail.mit.edu/cag/raw/ This should, at least a bit, dispel the illusion that parallelizing compilers are helpless when they confront C code. Reading the OS/400 book had opened my mind about OS design, perhaps reading about recent computer architecture research projects will open others' eyes about compilers, and how useful they really can be! Also, I believe there ought to be some textbooks about multi-core architectures and relevant compilation strategies, let me post it if I find a comprehensive reference. The dream compiler would have all the cool linear algebra capabilities of HPF + the more general/free-form kinds of parallelization strategies in recent compilers. Ok, so what you really want to do is, parallelize applications that can benefit from them. Not file utils or web browsers. If you are so curious, stuff like povray would be in the test suite. Sometimes the parallelizing compiler parallelizes computations that a programmer wouldn't bother due to program complexity, here a basic block, there a basic block, some pipelining communication/computation overlap there.... I think it's a safe bet to say that, with all the general lameness surrounding parallel programming languages, parallelizing compilers will be very important in the near future. Cheers, On Thu, Nov 18, 2010 at 2:28 AM, Eray Ozkural <examachine@gmail.com> wrote: > Yes, actually. :P > > > On Wed, Nov 17, 2010 at 11:15 PM, Jon Harrop < > jonathandeanharrop@googlemail.com> wrote: > >> Can you cite any papers from this century? ;-) >> >> >> >> Cheers, >> >> Jon. >> >> >> >> *From:* Eray Ozkural [mailto:examachine@gmail.com] >> *Sent:* 17 November 2010 13:41 >> *To:* Eray Ozkural; Jon Harrop; caml-list@yquem.inria.fr >> >> *Subject:* Re: [Caml-list] SMP multithreading >> >> >> >> On Wed, Nov 17, 2010 at 8:50 AM, Gabriel Kerneis <kerneis@pps.jussieu.fr> >> wrote: >> >> On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote: >> > As I said even in C good results can be achieved, I've seen that, so I >> > know it's doable with ocaml, just a difficult kind of compiler. The >> > functional features would expose more concurrency. >> >> Could you share a pointer to a paper describing this compiler? >> >> >> I can't reveal much, but just to point out that there are indeed more >> sophisticated compilers than gcc: >> http://www.research.ibm.com/vliw/compiler.html >> >> So, uh, there are compilers that turn loops into threads, and also >> parallelize independent blocks.... Both coarse-grain and fine-grain >> parallelization strategies in existing compiler research can be effectively >> applied to the multi-core architectures. In fact, some of the more advanced >> compilers (like that of the RAW architecture) must be able to target it >> already, but who knows. :) Just consider that most of the parallelization >> technology is language independent, they can be applied to any imperative >> language. So, would such a thing be able to work on ocaml generated >> binaries? Most definitely, I believe, it is in principle possible to start >> from the sequential binary and emit parallel code! >> >> Best, >> >> >> >> -- >> Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara >> http://groups.yahoo.com/group/ai-philosophy >> http://myspace.com/arizanesil http://myspace.com/malfunct >> > > > > -- > Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara > http://groups.yahoo.com/group/ai-philosophy > http://myspace.com/arizanesil http://myspace.com/malfunct > > -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 7738 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-16 6:46 ` [Caml-list] " Edgar Friendly 2010-11-16 17:04 ` Gerd Stolpmann @ 2010-11-16 19:07 ` Norman Hardy 2010-11-17 16:34 ` David Allsopp 2 siblings, 0 replies; 28+ messages in thread From: Norman Hardy @ 2010-11-16 19:07 UTC (permalink / raw) To: caml-list On 2010 Nov 15, at 22:46 , Edgar Friendly wrote: > It looks like high-performance computing of the near future will be built out of many machines (message passing), each with many cores (SMP). One could use message passing for all communication in such a system, but a hybrid approach might be best for this architecture, with use of shared memory within each box and message passing between. Of course the best choice depends strongly on the particular task. > > In the long run, it'll likely be a combination of a few large, powerful cores (Intel-CPU style w/ the capability to run a single thread as fast as possible) with many many smaller compute engines (GPGPUs or the like, optimized for power and area, closely coupled with memory) that provides the highest performance density. OCaml code should be able to share immutable OCaml data with other processes just as it shares libraries. See http://cap-lore.com/Software/pch.html . Some of the ideas there might be improved with hardware support. Admission: If I had read all of the interesting pointers given on this thread I would never finish sending this e-mail. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [Caml-list] SMP multithreading 2010-11-16 6:46 ` [Caml-list] " Edgar Friendly 2010-11-16 17:04 ` Gerd Stolpmann 2010-11-16 19:07 ` Norman Hardy @ 2010-11-17 16:34 ` David Allsopp 2010-11-19 13:57 ` Eray Ozkural 2 siblings, 1 reply; 28+ messages in thread From: David Allsopp @ 2010-11-17 16:34 UTC (permalink / raw) To: Edgar Friendly, caml-list Edgar Friendly wrote: > It looks like high-performance computing of the near future will be built > out of many machines (message passing), each with many cores (SMP). One > could use message passing for all communication in such a system, but a > hybrid approach might be best for this architecture, with use of shared > memory within each box and message passing between. Of course the best > choice depends strongly on the particular task. Absolutely - and the problem in OCaml seems to be that shared memory parallelism is just branded as evil and ignored... > In the long run, it'll likely be a combination of a few large, powerful > cores (Intel-CPU style w/ the capability to run a single thread as fast as > possible) with many many smaller compute engines (GPGPUs or the like, > optimized for power and area, closely coupled with memory) that provides > the highest performance density. I think the central thing that we can be utterly sure about is that desktops will always have *> 1* general purpose CPU. Maybe not be an ever-increasing number of cores but definitely more than one. > The question of how to program such an architecture seems as if it's being > answered without the functional community's input. What can we contribute? It has often seemed to me when SMP has been discussed in the past on this list that it almost gets dismissed out of hand because it doesn't look future-proof or because we're worried about what's round the corner in technology terms. To me the principal question is not about whether a parallel/thread-safe GC will scale to 12, 16 or even the 2048 cores on something like http://www.hpc.cam.ac.uk/services/darwin.html but whether it will hurt a single-threaded application - i.e. whether you will still be able to implement message passing libraries and other scalable techniques without the parallel GC getting in the way of what you're doing. A parallel/thread-safe GC should be aiming to provide the same sort of contract as the present one - it "just works" for most things and in a few borderline cases (like HPC - yes, it's a borderline case) you'll need to tune your code or tweak GC parameters because it's causing some problems or because in your particular application squeezing every cycle out of the CPU is important. As long as the GC isn't (hugely) slower than the present one in OCaml then we can continue to use libraries, frameworks and technologies-still-to-come on top of a parallel/thread-safe GC which simply ignores shared memory thread-level parallelism just by not instantiating threads. The argument always seems to focus on utterly maxing out all possible available resources (CPU time, memory bandwidth, etc.) rather than on whether it's simply faster than what we're doing able to do at the moment on the same system. Of course, it may be that the only way to do that is to have different garbage collectors - one invoked when threads.cmxa is linked and then the normal one otherwise (that's so easy to type out as a sentence, summarising a vast amount of potential work!!) Multithreading in OCaml seems to be focused on jumping the entire width of the river of concurrency in one go, rather than coming up with stepping stones to cross it in bits... David ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-17 16:34 ` David Allsopp @ 2010-11-19 13:57 ` Eray Ozkural 0 siblings, 0 replies; 28+ messages in thread From: Eray Ozkural @ 2010-11-19 13:57 UTC (permalink / raw) To: David Allsopp; +Cc: Edgar Friendly, caml-list [-- Attachment #1: Type: text/plain, Size: 5628 bytes --] There seem to be solutions in theory. I think a colleague had pointed out one of the papers below, so there is indeed something like a "lock-free garbage collector". Then, why do we worry about much synchronization overhead? I don't quite understand. Maurice Herlihy<http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/h/Herlihy:Maurice.html>, J. Eliot B. Moss: Lock-Free Garbage Collection for Multiprocessors. IEEE Trans. Parallel Distrib. Syst. 3<http://www.informatik.uni-trier.de/~ley/db/journals/tpds/tpds3.html#HerlihyM92>(3): 304-311 (1992) http://doi.ieeecomputersociety.org/10.1109/71.139204 Hui Gao, Jan Friso Groote<http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/g/Groote:Jan_Friso.html> , Wim H. Hesselink<http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/h/Hesselink:Wim_H=.html>: Lock-free parallel and concurrent garbage collection by mark&sweep. Sci. Comput. Program. 64<http://www.informatik.uni-trier.de/~ley/db/journals/scp/scp64.html#GaoGH07>(3): 341-374 (2007) http://portal.acm.org/citation.cfm?id=1223239 Java's new garbage collector is lock-free? At any rate, we really needn't fall behind a mega-lame language like Java :) The first paper is from 1992, enough time for the knowledge to diffuse. The second 2007 paper is probably what Jon was referring to earlier. In my mind, you can use one of these, and use special pool allocation algorithms for small objects, and also use static lifetime analysis to bypass the garbage collection in many cases. Since there are many runtime designers here, I wonder, is there a language runtime that does all three of these? Cheers, Eray <http://doi.ieeecomputersociety.org/10.1109/71.139204> On Wed, Nov 17, 2010 at 6:34 PM, David Allsopp <dra-news@metastack.com>wrote: > Edgar Friendly wrote: > > It looks like high-performance computing of the near future will be built > > out of many machines (message passing), each with many cores (SMP). One > > could use message passing for all communication in such a system, but a > > hybrid approach might be best for this architecture, with use of shared > > memory within each box and message passing between. Of course the best > > choice depends strongly on the particular task. > > Absolutely - and the problem in OCaml seems to be that shared memory > parallelism is just branded as evil and ignored... > > > In the long run, it'll likely be a combination of a few large, powerful > > cores (Intel-CPU style w/ the capability to run a single thread as fast > as > > possible) with many many smaller compute engines (GPGPUs or the like, > > optimized for power and area, closely coupled with memory) that provides > > the highest performance density. > > I think the central thing that we can be utterly sure about is that > desktops will always have *> 1* general purpose CPU. Maybe not be an > ever-increasing number of cores but definitely more than one. > > > The question of how to program such an architecture seems as if it's > being > > answered without the functional community's input. What can we > contribute? > > It has often seemed to me when SMP has been discussed in the past on this > list that it almost gets dismissed out of hand because it doesn't look > future-proof or because we're worried about what's round the corner in > technology terms. > > To me the principal question is not about whether a parallel/thread-safe GC > will scale to 12, 16 or even the 2048 cores on something like > http://www.hpc.cam.ac.uk/services/darwin.html but whether it will hurt a > single-threaded application - i.e. whether you will still be able to > implement message passing libraries and other scalable techniques without > the parallel GC getting in the way of what you're doing. A > parallel/thread-safe GC should be aiming to provide the same sort of > contract as the present one - it "just works" for most things and in a few > borderline cases (like HPC - yes, it's a borderline case) you'll need to > tune your code or tweak GC parameters because it's causing some problems or > because in your particular application squeezing every cycle out of the CPU > is important. As long as the GC isn't (hugely) slower than the present one > in OCaml then we can continue to use libraries, frameworks and > technologies-still-to-come on top of a parallel/thread-safe GC which simply > ignores shared memory thread-level parallelism just by not instantiating > threads. > > The argument always seems to focus on utterly maxing out all possible > available resources (CPU time, memory bandwidth, etc.) rather than on > whether it's simply faster than what we're doing able to do at the moment on > the same system. Of course, it may be that the only way to do that is to > have different garbage collectors - one invoked when threads.cmxa is linked > and then the normal one otherwise (that's so easy to type out as a sentence, > summarising a vast amount of potential work!!) > > Multithreading in OCaml seems to be focused on jumping the entire width of > the river of concurrency in one go, rather than coming up with stepping > stones to cross it in bits... > > > David > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 7728 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SMP multithreading 2010-11-15 17:27 SMP multithreading Wolfgang Draxinger 2010-11-16 6:46 ` [Caml-list] " Edgar Friendly @ 2010-11-16 12:47 ` Sylvain Le Gall 2010-11-17 11:12 ` [Caml-list] " Goswin von Brederlow [not found] ` <AANLkTinyN2hHxm6ha2Yq4nx6NxY3So=BhFN_-EHKYfyc@mail.gmail.com> 2 siblings, 1 reply; 28+ messages in thread From: Sylvain Le Gall @ 2010-11-16 12:47 UTC (permalink / raw) To: caml-list Hi, On 15-11-2010, Wolfgang Draxinger <wdraxinger.maillist@draxit.de> wrote: > Hi, > > I've just read > http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html > in particular this paragraph: >| What about hyperthreading? Well, I believe it's the last convulsive >| movement of SMP's corpse :-) We'll see how it goes market-wise. At >| any rate, the speedups announced for hyperthreading in the Pentium 4 >| are below a factor of 1.5; probably not enough to offset the overhead >| of making the OCaml runtime system thread-safe. > > This reads just like the "640k ought be enough for everyone". Multicore > systems are the standard today. Even the cheapest consumer machines > come with at least two cores. Once can easily get 6 core machines today. > > Still thinking SMP was a niche and was dying? > Hyperthreading was never remarkable about performance or whatever and is probably not pure SMP (emulated SMP maybe?). > So, what're the developments regarding SMP multithreading OCaml? > There are various development regarding this subject (most recent first): - Plasma (MapReduce in OCaml) http://plasma.camlcity.org/plasma/index.html - OC4MC (OCaml for MultiCore) http://www.algo-prog.info/ocmc/web/ - ocamlp3l http://camlp3l.inria.fr/eng.htm - jocaml http://jocaml.inria.fr/ - ocamlmpi http://forge.ocamlcore.org/projects/ocamlmpi/ All these projects try to tackle the challenge of SMP from different point of view. Maybe you'll find what your answer in one of them. Regards, Sylvain Le Gall ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] Re: SMP multithreading 2010-11-16 12:47 ` Sylvain Le Gall @ 2010-11-17 11:12 ` Goswin von Brederlow 2010-11-17 11:34 ` Sylvain Le Gall 0 siblings, 1 reply; 28+ messages in thread From: Goswin von Brederlow @ 2010-11-17 11:12 UTC (permalink / raw) To: Sylvain Le Gall; +Cc: caml-list Sylvain Le Gall <sylvain@le-gall.net> writes: > Hi, > > On 15-11-2010, Wolfgang Draxinger <wdraxinger.maillist@draxit.de> wrote: >> Hi, >> >> I've just read >> http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html >> in particular this paragraph: >>| What about hyperthreading? Well, I believe it's the last convulsive >>| movement of SMP's corpse :-) We'll see how it goes market-wise. At >>| any rate, the speedups announced for hyperthreading in the Pentium 4 >>| are below a factor of 1.5; probably not enough to offset the overhead >>| of making the OCaml runtime system thread-safe. >> >> This reads just like the "640k ought be enough for everyone". Multicore >> systems are the standard today. Even the cheapest consumer machines >> come with at least two cores. Once can easily get 6 core machines today. >> >> Still thinking SMP was a niche and was dying? >> > > Hyperthreading was never remarkable about performance or whatever and is > probably not pure SMP (emulated SMP maybe?). Hyperthreading is a hack to better utilize idle cpu sub units. The CPU has multiple complete sets of registers, one per hyper thread. Execution of the threads is interleaved. Now when one thread is doing some floating point operation the cpu switches over to another thread and lets it do some integer aritmetic. But that assumes the threads are using different sub units. If they are using the same unit then they just block each other and no speedup occurs. The speedup of hyperthreading is purely from avoiding dead cycles when one thread waits for something. On te other hand the cache is shared between threads so per thread it is smaller and more easily trashed. Hyperthreading can be much slower too. MfG Goswin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SMP multithreading 2010-11-17 11:12 ` [Caml-list] " Goswin von Brederlow @ 2010-11-17 11:34 ` Sylvain Le Gall 2010-11-17 23:08 ` [Caml-list] " Christophe Raffalli 0 siblings, 1 reply; 28+ messages in thread From: Sylvain Le Gall @ 2010-11-17 11:34 UTC (permalink / raw) To: caml-list On 17-11-2010, Goswin von Brederlow <goswin-v-b@web.de> wrote: > Sylvain Le Gall <sylvain@le-gall.net> writes: > >> Hi, >> >> On 15-11-2010, Wolfgang Draxinger <wdraxinger.maillist@draxit.de> wrote: >>> Hi, >>> >>> I've just read >>> http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html >>> in particular this paragraph: >>>| What about hyperthreading? Well, I believe it's the last convulsive >>>| movement of SMP's corpse :-) We'll see how it goes market-wise. At >>>| any rate, the speedups announced for hyperthreading in the Pentium 4 >>>| are below a factor of 1.5; probably not enough to offset the overhead >>>| of making the OCaml runtime system thread-safe. >>> >>> This reads just like the "640k ought be enough for everyone". Multicore >>> systems are the standard today. Even the cheapest consumer machines >>> come with at least two cores. Once can easily get 6 core machines today. >>> >>> Still thinking SMP was a niche and was dying? >>> >> >> Hyperthreading was never remarkable about performance or whatever and is >> probably not pure SMP (emulated SMP maybe?). > > Hyperthreading is a hack to better utilize idle cpu sub units. The CPU > has multiple complete sets of registers, one per hyper thread. Execution > of the threads is interleaved. Now when one thread is doing some > floating point operation the cpu switches over to another thread and > lets it do some integer aritmetic. But that assumes the threads are > using different sub units. If they are using the same unit then they > just block each other and no speedup occurs. > > The speedup of hyperthreading is purely from avoiding dead cycles when > one thread waits for something. On te other hand the cache is shared > between threads so per thread it is smaller and more easily > trashed. Hyperthreading can be much slower too. > Indeed, the HT extension was designed to reduce pipeline bubbles, which most of the time occurs when you need to load data from a slow memory (slow = RAM as opposed to L1/L2 cache). In the old time of my P4, ocaml was performing quite well on the processor. One story about it: while compiling cameleon on it, I often get into "thermal warning" (the CPU was overheating). I think it could have been related to the fact the CPU idle level was very low (e.g. no pipeline bubble). I always thought that this was related to the fact the minor heap can be stored inside the cache and that reduces the hit/miss factor (i.e. avoid fetching data in RAM). I have never really tested this hypothesis. Maybe you can tell me your opinion about this? Regards, Sylvain Le Gall ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-17 11:34 ` Sylvain Le Gall @ 2010-11-17 23:08 ` Christophe Raffalli 2010-11-19 9:01 ` Christophe TROESTLER 0 siblings, 1 reply; 28+ messages in thread From: Christophe Raffalli @ 2010-11-17 23:08 UTC (permalink / raw) To: caml-list [-- Attachment #1: Type: text/plain, Size: 286 bytes --] Hello, And OCaml on GPU ? We just tested a recent GPU card with 480 processors at 900Mhz ... this is qui impressive ... and supported by matlab via cuda-lapack (http://www.culatools.com/) ... I imagine we could at least use cuda-lapack from OCaml ? Cheers, Christophe [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 259 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-17 23:08 ` [Caml-list] " Christophe Raffalli @ 2010-11-19 9:01 ` Christophe TROESTLER 2010-11-19 15:58 ` Goswin von Brederlow 0 siblings, 1 reply; 28+ messages in thread From: Christophe TROESTLER @ 2010-11-19 9:01 UTC (permalink / raw) To: Christophe Raffalli; +Cc: caml-list On Thu, 18 Nov 2010 00:08:19 +0100, Christophe Raffalli wrote: > > And OCaml on GPU ? We just tested a recent GPU card with 480 > processors at 900Mhz ... this is qui impressive ... and supported by > matlab via cuda-lapack (http://www.culatools.com/) ... I imagine we > could at least use cuda-lapack from OCaml ? This is certainly possible since they say that the standard LAPACK functions are available. If you try, let us know! Best, C. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-19 9:01 ` Christophe TROESTLER @ 2010-11-19 15:58 ` Goswin von Brederlow 2010-11-20 11:55 ` Jon Harrop 0 siblings, 1 reply; 28+ messages in thread From: Goswin von Brederlow @ 2010-11-19 15:58 UTC (permalink / raw) To: Christophe TROESTLER; +Cc: Christophe Raffalli, caml-list Christophe TROESTLER <Christophe.Troestler+ocaml@umh.ac.be> writes: > On Thu, 18 Nov 2010 00:08:19 +0100, Christophe Raffalli wrote: >> >> And OCaml on GPU ? We just tested a recent GPU card with 480 >> processors at 900Mhz ... this is qui impressive ... and supported by >> matlab via cuda-lapack (http://www.culatools.com/) ... I imagine we >> could at least use cuda-lapack from OCaml ? > > This is certainly possible since they say that the standard LAPACK > functions are available. If you try, let us know! > > Best, > C. And the functions should enter/leave_blocking_section() in the C stubs so you can have 480 ocaml threads. All of them can run some lapack code while always only one can run ocaml code at any one time. If the lapack functions take long enough almost all threads will be running. This is actually a quick way to use multiple cores with ocaml. Find a often called function that takes considerable time and offload it to C with enter/leave_blocking_section() around it. Isn't always possible and you need to use BigArray for data or copy the arguments. MfG Goswin ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [Caml-list] SMP multithreading 2010-11-19 15:58 ` Goswin von Brederlow @ 2010-11-20 11:55 ` Jon Harrop 2010-11-20 20:57 ` Goswin von Brederlow 0 siblings, 1 reply; 28+ messages in thread From: Jon Harrop @ 2010-11-20 11:55 UTC (permalink / raw) To: 'Goswin von Brederlow'; +Cc: caml-list > This is actually a quick way to use multiple cores with ocaml. Find a > often called function that takes considerable time and offload it to C Or HLVM, F#, Scala, Clojure or any of the other languages that permit shared memory parallelism. C is particularly poor in this regard so I would not just restrict yourself to C... Cheers, Jon. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] SMP multithreading 2010-11-20 11:55 ` Jon Harrop @ 2010-11-20 20:57 ` Goswin von Brederlow 0 siblings, 0 replies; 28+ messages in thread From: Goswin von Brederlow @ 2010-11-20 20:57 UTC (permalink / raw) To: Jon Harrop; +Cc: 'Goswin von Brederlow', caml-list Jon Harrop <jonathandeanharrop@googlemail.com> writes: >> This is actually a quick way to use multiple cores with ocaml. Find a >> often called function that takes considerable time and offload it to C > > Or HLVM, F#, Scala, Clojure or any of the other languages that permit shared > memory parallelism. C is particularly poor in this regard so I would not > just restrict yourself to C... > > Cheers, > Jon. I'm not talking about any shared memory parallelism here. The parallelism is completly restricted to the ocaml side. You just find some single threaded job that takes long, rewrite it as external function and release the ocaml lock while it is running. For example in my code I compute the sha256 sum of a block of data. Since I use a C library for sha256 anyway the function is already external. All I had to do was switch the interface from using string to Bigarray and add enter/leave_blocking_section(). After that multiple threads can compute the sha256 sum for blocks of data in parallel and my code run 3.7 times faster with 4 cores. MfG Goswin ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <AANLkTinyN2hHxm6ha2Yq4nx6NxY3So=BhFN_-EHKYfyc@mail.gmail.com>]
* Re: [Caml-list] SMP multithreading [not found] ` <AANLkTinyN2hHxm6ha2Yq4nx6NxY3So=BhFN_-EHKYfyc@mail.gmail.com> @ 2010-11-16 14:11 ` Wolfgang Draxinger 0 siblings, 0 replies; 28+ messages in thread From: Wolfgang Draxinger @ 2010-11-16 14:11 UTC (permalink / raw) Cc: caml-list Am Mon, 15 Nov 2010 22:05:52 +0100 schrieb Philippe Wang <mail@philippewang.info>: > Take the current Apple Mac Pro for instance (I take this reference > because it's easy to find and it doesn't evolve very often), with > 12-core configuration. > - Two 2.93GHz 6-Core Intel Xeon “Westmere” (12 cores) > - 1333MHz DDR3 ECC SDRAM (whatever the capacity) > => with HT, there are 24 logical units, which all share a tiny > bandwidth for CPU<->RAM communications. > Let's say bandwidth is about 2400MHz : 2400MHz/24Thread = > 100MHz/Thread. It's kind of "ridiculous..." You're assuming that there'd be a lot of communication between cores and RAM. Which is not (or should not) be the case in well written multithreaded programs. > OCaml is not (at least not yet) a language for HPC (high performance > computing), it is very efficient (compared to so many other languages) > and yet doesn't not take advantage of SMP. Well, sooner or later it > will actually probably need to support SMP. (But somehow it already > does, via C code boxed in "blocking sections"). Which is a pitty, since especially functional languages could much better parallelize tasks implicitly. > Well, if you take casual OCaml programs, and put them on SMP > architectures (on which indeed they often already are) while giving > them capacity to take advantage of SMP (via POSIX-C threads in > blocking sections, message-passing style, or OCaml-for-multicore, or > whatever else), they quickly become less efficient because there is a > bottleneck on the CPU<->RAM bus. Suppose you were to implement a convolution in n dimensions on a large data set. This is a prime example of where multithreading can help and where main-memory bandwidth is not the limiting factor. One can split up the whole task in small tasklets dispatching them tho individual cores. As long as the dataset, which are the payload data i.e. input, convolution kernel (and output buffer if not in-place) plus code, fit into the L1 cache everything will be executed on-cache. On current Intel CPUs this are 32kB, AMD it's even 64kB -- per core! And all the cores on the same die share L2 cache, which has far more bandwidth, about an order of magnitude, than to system memory. Modern OS schedulers thus try to keep together threads of the same process on CPU dice in the system. And further group it by NUMA. > I want to believe you're right to ask for SMP support, even if now I'm > pretty convinced that current state of OCaml is not compatible with "I > want to write HPC programs in pure OCaml". (One should implement a > brand new compiler maybe??) This is not just about HPC but about resource utilization. A single core running at full speed consumes far more power, than 4 cores, clocked down to minimal frequency. Even worse only the most recent CPU generations can clock cores individually. So a single core running at full speed will significantly increase power consumption (and thermal output). > There are people studying how to have HPC with OCaml, but it has quite > a little to do with SMP matters. Instead, it's more about (static or > dynamic) specialized-code generation for GPUs etc. We'll see in some > time what it produces... For the time being I'm more interested in what's actually preventing proper SMP in OCaml right now. I've read something about issues with the garbage collector, which surprises be, as I switched over to use Boehm-GC in my C programs to resolve problems in memory deallocation in multithreaded programs -- this of course was possible only after Boehm-GC became thread safe. Wolfgang ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2010-11-20 20:57 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-11-15 17:27 SMP multithreading Wolfgang Draxinger 2010-11-16 6:46 ` [Caml-list] " Edgar Friendly 2010-11-16 17:04 ` Gerd Stolpmann 2010-11-16 20:35 ` Eray Ozkural 2010-11-16 22:13 ` Gerd Stolpmann 2010-11-16 23:04 ` Eray Ozkural 2010-11-16 23:52 ` Wolfgang Draxinger 2010-11-17 1:55 ` Eray Ozkural 2010-11-17 3:41 ` Jon Harrop 2010-11-17 3:47 ` Jon Harrop 2010-11-17 4:27 ` Eray Ozkural 2010-11-17 6:50 ` Gabriel Kerneis 2010-11-17 13:41 ` Eray Ozkural 2010-11-17 21:15 ` Jon Harrop 2010-11-18 0:28 ` Eray Ozkural 2010-11-18 1:00 ` Eray Ozkural 2010-11-16 19:07 ` Norman Hardy 2010-11-17 16:34 ` David Allsopp 2010-11-19 13:57 ` Eray Ozkural 2010-11-16 12:47 ` Sylvain Le Gall 2010-11-17 11:12 ` [Caml-list] " Goswin von Brederlow 2010-11-17 11:34 ` Sylvain Le Gall 2010-11-17 23:08 ` [Caml-list] " Christophe Raffalli 2010-11-19 9:01 ` Christophe TROESTLER 2010-11-19 15:58 ` Goswin von Brederlow 2010-11-20 11:55 ` Jon Harrop 2010-11-20 20:57 ` Goswin von Brederlow [not found] ` <AANLkTinyN2hHxm6ha2Yq4nx6NxY3So=BhFN_-EHKYfyc@mail.gmail.com> 2010-11-16 14:11 ` Wolfgang Draxinger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox