* "ok with parallel threads" GC (aka ocaml for multicore) @ 2009-04-10 17:04 Philippe Wang 2009-04-10 20:52 ` [Caml-list] " Jon Harrop ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Philippe Wang @ 2009-04-10 17:04 UTC (permalink / raw) To: caml-list; +Cc: Philippe Wang Hello list, Mathias and Adrien have just started their internship (for their Master's degree requirements). Thus they have some time to spend on this project. Moreover, Mathias' internship is strongly related to this project. => man power dramatically increased We are currently searching for the last remaining bugs. Our thread library is restricted, it contains: Thread : create, join, yield, id, self, delay Mutex : full module Condition : full module Our alternative garbage collector - uses a Stop(the world)&Copy algorithm - has memory pages for threads (each thread takes a page at its creation) - has a shared heap for shared values and for old generation from pages (i.e. memory pages are flushed to this heap) - should be not to hard to replace. Blocking sections such as I/O operations or mutex locks do not prevent garbage collection. We currently do *not* support POSIX signals (let's say their behaviour is not specified). We should make a release soon, but before: - some code has to be cleaned - some benchmarks have to be done - some documentation has to be completed - an installation script still has to be written. Thus not a lot is left to do before the release :-) We are writing test programs to search for the last remaining bugs but also to measure performances. So far, as long as there are not too many concurrent memory accesses, it is not too hard to go n times faster with a n-core CPU; though intense memory accesses generate page faults and divide memory bandwidth by the number of concurrent accesses, and intense memory consuming programs show our GC is not as performant as INRIA's, of course. Cheers, -- Philippe Wang Philippe.Wang \at/ lip6.fr PS: Sorry for taking so much time, debugging parallel threads in shared memory style is hell (you can give it a try). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore) 2009-04-10 17:04 "ok with parallel threads" GC (aka ocaml for multicore) Philippe Wang @ 2009-04-10 20:52 ` Jon Harrop 2009-04-10 23:20 ` Jerome Benoit [not found] ` <BB3C091F-3BE9-4883-A7CF-9D672CDDF829@x9c.fr> 2 siblings, 0 replies; 11+ messages in thread From: Jon Harrop @ 2009-04-10 20:52 UTC (permalink / raw) To: caml-list On Friday 10 April 2009 18:04:12 Philippe Wang wrote: > We should make a release soon, but before: > - some code has to be cleaned > - some benchmarks have to be done > - some documentation has to be completed > - an installation script still has to be written. > Thus not a lot is left to do before the release :-) This is the best news I've heard since OCaml 3.11 made it into testing and Batteries went into beta! -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore) 2009-04-10 17:04 "ok with parallel threads" GC (aka ocaml for multicore) Philippe Wang 2009-04-10 20:52 ` [Caml-list] " Jon Harrop @ 2009-04-10 23:20 ` Jerome Benoit [not found] ` <BB3C091F-3BE9-4883-A7CF-9D672CDDF829@x9c.fr> 2 siblings, 0 replies; 11+ messages in thread From: Jerome Benoit @ 2009-04-10 23:20 UTC (permalink / raw) To: Philippe Wang; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 566 bytes --] Le Fri, 10 Apr 2009 19:04:12 +0200, Philippe Wang <Philippe.Wang@lip6.fr> a écrit : Hello, > We should make a release soon, but before: > - some code has to be cleaned > - some benchmarks have to be done > - some documentation has to be completed > - an installation script still has to be written. Is there a public SCM repository I can pull off the source ? Thanks. -- Jérôme Benoit aka fraggle La Météo du Net - http://grenouille.com OpenPGP Key ID : 9FE9161D Key fingerprint : 9CA4 0249 AF57 A35B 34B3 AC15 FAA0 CB50 9FE9 161D [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <BB3C091F-3BE9-4883-A7CF-9D672CDDF829@x9c.fr>]
* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore) [not found] ` <BB3C091F-3BE9-4883-A7CF-9D672CDDF829@x9c.fr> @ 2009-04-14 10:21 ` Philippe Wang 2009-04-14 14:18 ` xclerc 2009-04-16 9:45 ` Philippe Wang 0 siblings, 2 replies; 11+ messages in thread From: Philippe Wang @ 2009-04-14 10:21 UTC (permalink / raw) To: forum; +Cc: Philippe Wang, caml-list On Apr 10, 2009, at 20:36 CEDT, forum@x9c.fr wrote: > Would it be correct to assume that the current state of the project > implies that you have patched the OCaml runtime to make it > fully reentrant ? Is this following partial answer relevant enough ? The garbage collector is clearly separated from the rest of the runtime library: the GC is contained in a libgc.a and our patched ocamlopt links programs to this GC. The GC variables are known by the GC only. > If so, is this code/patch available for download ? Officially, not yet (and not before April 20th). We did not expect the debugging part to be so complex and hard, and taking so long. The man power dramatically decreased in late September : the 2 Master's students went back to Master's courses, and the 3 supervisors had to do research in parallel with teaching. Some major bug fixes were made in February/March, a lot of major bug fixes were made in April (yes, these last 2 weeks). You know, bugs hiding other bugs... however we are hopefully getting close to the fix point: today there is no known bug ! :-) Unsupported features are - of course - not considered as bugs. For instance, posix signals are (currently) not supported. And, as parallel computing *potentially* requires quite a lot more memory, some programs can easily end up in a blocked state when the heap becomes full: our GC (currently) uses (parameterized) fixed size pages and heap. The next days, we will concentrate on making benchmarks (if you have some relevant testing programs, they are welcome), and if we don't discover new bugs then we will focus on (finishing) writing a documentation and a building script, for the release. If we release as such now, we will have too much support to do because of the lack of documentation. So it's not quite a good idea... When we have the minimal-but-sufficient documentation, we will make the release :-) In parallel, we try to make it work with OS X Leopard 64 bit and/or ocaml 3.11 (currently we only support 3.10.2 - Linux x86_64). > Anyway, wholehearted respect for undertaking such a complex project. > Good luck in your bug-chasing tasks ! Thanks. -- Philippe Wang Philippe.Wang \at/ lip6.fr N.B. I hope we will not discover new bugs in our amd64.S, our assembly guru is enjoying (abroad with no www) his "vacances de pâques"... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore) 2009-04-14 10:21 ` Philippe Wang @ 2009-04-14 14:18 ` xclerc 2009-04-16 9:45 ` Philippe Wang 1 sibling, 0 replies; 11+ messages in thread From: xclerc @ 2009-04-14 14:18 UTC (permalink / raw) To: caml-list Le 14 avr. 09 à 12:21, Philippe Wang a écrit : > On Apr 10, 2009, at 20:36 CEDT, forum@x9c.fr wrote: > >> Would it be correct to assume that the current state of the project >> implies that you have patched the OCaml runtime to make it >> fully reentrant ? > > Is this following partial answer relevant enough ? > The garbage collector is clearly separated from the rest of the > runtime library: the GC is contained in a libgc.a and our patched > ocamlopt links programs to this GC. The GC variables are known by > the GC only. Well, this is not what I had in mind, but I realize that my question was not clear. A better question would have been: Is your implementation still based on a global runtime lock ? A negative answer would imply that you patched the OCaml runtime to make it reentrant. To illustrate my point, I will take the example of the file "byterun/compare.c". In this file, the code for the comparison of values makes use of a global variable (named "caml_compare_unordered"). So, allowing multiple threads to execute primitives from the runtime at the same time may result in incorrect outcomes. Unless you patch the runtime to make it reentrant, hence my original (foolishly indirect) question. If you patched the runtime to allow multiple threads to use it concurrently, I would also be interested by the strategy used: is the problem solved by additional parameters ? by the use of thread-local storage ? by any other mean ? Xavier Clerc ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore) 2009-04-14 10:21 ` Philippe Wang 2009-04-14 14:18 ` xclerc @ 2009-04-16 9:45 ` Philippe Wang 2009-04-17 22:15 ` Philippe Wang 1 sibling, 1 reply; 11+ messages in thread From: Philippe Wang @ 2009-04-16 9:45 UTC (permalink / raw) To: caml-list > Le 14 avr. 09 à 12:21, Philippe Wang a écrit : > >> The garbage collector is clearly separated from the rest of the >> runtime library: the GC is contained in a libgc.a and our patched >> ocamlopt links programs to this GC. The GC variables are known by >> the GC only. > > Well, this is not what I had in mind, but I realize that my question > was not clear. A better question would have been: > Is your implementation still based on a global runtime lock ? No, it isn't. (And it would probably too often prevent parallel threads, wouldn't it.) > A negative answer would imply that you patched the OCaml > runtime to make it reentrant. To illustrate my point, I will take > the example of the file "byterun/compare.c". In this file, the > code for the comparison of values makes use of a global > variable (named "caml_compare_unordered"). It should be, unless bugs are still hiding under that. It is supposed to have been done for a while. We'll make one more check. > If you patched the runtime to allow multiple threads to use > it concurrently, I would also be interested by the strategy > used: is the problem solved by additional parameters ? > by the use of thread-local storage ? by any other mean ? - local variable instead of global variable in functions - some functions are parameterized by thread identifier (that is, one more parameter than "before") (e.g. in amd64.S) Philippe Wang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore) 2009-04-16 9:45 ` Philippe Wang @ 2009-04-17 22:15 ` Philippe Wang 2009-04-17 22:20 ` Joel Reymont ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Philippe Wang @ 2009-04-17 22:15 UTC (permalink / raw) To: caml-list; +Cc: Philippe Wang On Apr 16, 2009, at 11:45 CEDT, Philippe Wang wrote: >> A negative answer would imply that you patched the OCaml >> runtime to make it reentrant. To illustrate my point, I will take >> the example of the file "byterun/compare.c". In this file, the >> code for the comparison of values makes use of a global >> variable (named "caml_compare_unordered"). > > It should be, unless bugs are still hiding under that. > It is supposed to have been done for a while. We'll make one more > check. > >> If you patched the runtime to allow multiple threads to use >> it concurrently, I would also be interested by the strategy >> used: is the problem solved by additional parameters ? >> by the use of thread-local storage ? by any other mean ? > > - local variable instead of global variable in functions > - some functions are parameterized by thread identifier (that is, > one more parameter than "before") (e.g. in amd64.S) Well, we went back into runtime code implementation. This is what can be said rapidly : - compare.c contains no global variables anymore, we use local variables instead - if a Caml-C or C-Caml interface uses caml_compare_unordered, we don't know what can happen with parallel threads - we have many global mutex locks with small scopes - we do use an "enter/leave blocking section" mechanism to prevent the GC from waiting on a blocking operation such as I/O or mutex locking etc. - we don't support weak values (not sure whether they don't work or they became strong, if they dont work anymore, they can be back in 2 minutes as strong values anyway) - serialisation of values is a little bit tricky, though it should work - most important : many global variables do not exist anymore because they are irrelevant in our implementation - we do not support unofficial-features of ocaml 3.10, e.g. the new features that come with 3.11 but actually have their roots in previous versions ~ it is almost sad to see all the based-on-"one-thread-at-a-time" optimisations removed... + (it looks like it works just fine) I hope there are no "hidden" bad global variables. Is it fully reentrant ? Hmmmm... maybe. We use a stop-the-world GC (which means no one is running is parallel with the GC), that is actually like original ocaml, that comes with its inconveniences : C calls not declared as blocking sections (which has quite a cost) may prevent other threads from running when the heap is full. Graphics module, for instance, is not reentrant at all (anyhow it's not part of the runtime). Same for Str. Funny thing is we can open several windows by launching parallel threads (though only one is useful at the end). Anyway, thank you for your questions and interest, they have helped us find&fix some bugs. -- Philippe Wang http://www-apr.lip6.fr/~pwang/ PS. We tried to switch to 3.11, but it seems to need too much time, it's far from being a piece of cake. We have tried to make it work on Leopard (actually, I failed the 1st time - half the way, I may try again if I have time). ===================================== A free very personal advice that may save you some headaches: do not program in concurrent shared memory style, especially when you can replace "concurrent" by "parallel". Even if it may have a future, even if it may sound great, even if it sounds exciting, even if it helps you go faster, even if <put-here- whatever-you-want>, it is **awful**. Well, if you really really don't have a choice, never mind what I said. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore) 2009-04-17 22:15 ` Philippe Wang @ 2009-04-17 22:20 ` Joel Reymont 2009-04-17 22:29 ` Philippe Wang 2009-04-17 23:05 ` forum 2009-04-23 16:49 ` Philippe Wang 2 siblings, 1 reply; 11+ messages in thread From: Joel Reymont @ 2009-04-17 22:20 UTC (permalink / raw) To: Philippe Wang; +Cc: caml-list On Apr 17, 2009, at 11:15 PM, Philippe Wang wrote: > PS. We tried to switch to 3.11, but it seems to need too much time, > it's far from being a piece of cake. > We have tried to make it work on Leopard (actually, I failed the 1st > time - half the way, I may try again if I have time). What was the problem with Leopard? --- Mac hacker with a performance bent http://linkedin.com/in/joelreymont ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore) 2009-04-17 22:20 ` Joel Reymont @ 2009-04-17 22:29 ` Philippe Wang 0 siblings, 0 replies; 11+ messages in thread From: Philippe Wang @ 2009-04-17 22:29 UTC (permalink / raw) To: Joel Reymont; +Cc: Philippe Wang, caml-list On Apr 18, 2009, at 00:20 CEDT, Joel Reymont wrote: > > On Apr 17, 2009, at 11:15 PM, Philippe Wang wrote: > >> PS. We tried to switch to 3.11, but it seems to need too much >> time, it's far from being a piece of cake. >> We have tried to make it work on Leopard (actually, I failed the >> 1st time - half the way, I may try again if I have time). > > > What was the problem with Leopard? We have our own amd64.S which is based on but quite different from ocaml 3.10 amd64.S. ocaml 3.11 has its amd64.S changed for Leopard. *And* we have made some small changes to emit.ml. Then, building the whole thing is not trivial. Building ocaml 3.10 with our modified runtime is not trivial with Linux x86_64 : we use an intern script that does it. Without it, it would need hours to build it. Adapting the script for Leopard is not trivial to do. But we think it's quite possible. Besides, it was late at night and I was sleepy and tired when I tried. Another answer is Leopard needs the changes made to amd64.S from 3.10 to 3.11, which mainly fix an alignment problem. -- Philippe Wang Philippe.Wang@lip6.fr http://www-apr.lip6.fr/~pwang/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore) 2009-04-17 22:15 ` Philippe Wang 2009-04-17 22:20 ` Joel Reymont @ 2009-04-17 23:05 ` forum 2009-04-23 16:49 ` Philippe Wang 2 siblings, 0 replies; 11+ messages in thread From: forum @ 2009-04-17 23:05 UTC (permalink / raw) To: caml users Le 18 avr. 09 à 00:15, Philippe Wang a écrit : > > On Apr 16, 2009, at 11:45 CEDT, Philippe Wang wrote: > >>> A negative answer would imply that you patched the OCaml >>> runtime to make it reentrant. To illustrate my point, I will take >>> the example of the file "byterun/compare.c". In this file, the >>> code for the comparison of values makes use of a global >>> variable (named "caml_compare_unordered"). >> >> It should be, unless bugs are still hiding under that. >> It is supposed to have been done for a while. We'll make one more >> check. >> >>> If you patched the runtime to allow multiple threads to use >>> it concurrently, I would also be interested by the strategy >>> used: is the problem solved by additional parameters ? >>> by the use of thread-local storage ? by any other mean ? >> >> - local variable instead of global variable in functions >> - some functions are parameterized by thread identifier (that is, >> one more parameter than "before") (e.g. in amd64.S) > > Well, we went back into runtime code implementation. > > This is what can be said rapidly : > - compare.c contains no global variables anymore, we use local > variables instead > - if a Caml-C or C-Caml interface uses caml_compare_unordered, we > don't know what can happen with parallel threads > - we have many global mutex locks with small scopes > - we do use an "enter/leave blocking section" mechanism to prevent > the GC from waiting on a blocking operation such as I/O or mutex > locking etc. > - we don't support weak values (not sure whether they don't work or > they became strong, if they dont work anymore, they can be back in 2 > minutes as strong values anyway) > - serialisation of values is a little bit tricky, though it should > work > - most important : many global variables do not exist anymore > because they are irrelevant in our implementation > - we do not support unofficial-features of ocaml 3.10, e.g. the new > features that come with 3.11 but actually have their roots in > previous versions > ~ it is almost sad to see all the based-on-"one-thread-at-a-time" > optimisations removed... > + (it looks like it works just fine) > > > I hope there are no "hidden" bad global variables. > > Is it fully reentrant ? Hmmmm... maybe. > > We use a stop-the-world GC (which means no one is running is > parallel with the GC), that is actually like original ocaml, that > comes with its inconveniences : C calls not declared as blocking > sections (which has quite a cost) may prevent other threads from > running when the heap is full. > > Graphics module, for instance, is not reentrant at all (anyhow it's > not part of the runtime). Same for Str. I was indeed mostly worried with the runtime itself. I wanted to have a fully reentrant runtime for the OCaml-Java project (to be able to execute several programs in the very same JVM) and remember that it implied primitives from "compare.c", "hash.c", "intern.c" and "extern.c" among others to be written differently for this purpose. Out of curiosity: you state that your GC is of "stop-the-world" nature, what about finalizers ? Are they executed by the GC thread when the world is stopped or concurrently with application threads ? Not sure this question really matters, just curious (I mean, it is doubtful that one would write finalizers with a long execution time). Keep up the good work ! Xavier Clerc ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore) 2009-04-17 22:15 ` Philippe Wang 2009-04-17 22:20 ` Joel Reymont 2009-04-17 23:05 ` forum @ 2009-04-23 16:49 ` Philippe Wang 2 siblings, 0 replies; 11+ messages in thread From: Philippe Wang @ 2009-04-23 16:49 UTC (permalink / raw) To: caml-list; +Cc: Philippe Wang > On Sat, Apr 18, 2009 at 1:05 AM, forum@x9c.fr <forum@x9c.fr> wrote: > I was indeed mostly worried with the runtime itself. I wanted to > have a fully reentrant runtime for the OCaml-Java > project (to be able to execute several programs in the very same > JVM) and remember that it implied primitives > from "compare.c", "hash.c", "intern.c" and "extern.c" among > others to be written differently for this purpose. Indeed. We have made them reentrant but we haven't made much stress testing on that. Reentrance on those are not free (they have a cost), and the way we chose is the "simplest or quickest way". > Out of curiosity: you state that your GC is of "stop-the-world" > nature, what about finalizers ? > Are they executed by the GC thread when the world is stopped or > concurrently with > application threads ? > Not sure this question really matters, just curious (I mean, it > is doubtful that one would write finalizers with > a long execution time). Finalizers are supposed to be called by the thread that does the garbage collection, so there is no concurrency with finalizers as the rest of the world is meant to be stopped when garbage collecting. (Our garbage collector does not try to be as smart as the original one on many many things) By the way, we are late on writing the documentation for our future release... but we have just implemented a (simple) experimental growing heap. Here is a quote from wikipedia (http://en.wikipedia.org/wiki/Speedup): << Sometimes a speedup of more than N when using N processors is observed in parallel computing, which is called super linear speedup. Super linear speedup rarely happens and often confuses beginners, who believe the theoretical maximum speedup should be N when N processors are used. >> Well, we *have* observed that on a matrix multiplication :-) -- Philippe Wang Philippe.Wang@lip6.fr http://www-apr.lip6.fr/~pwang/ ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-04-23 16:49 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-04-10 17:04 "ok with parallel threads" GC (aka ocaml for multicore) Philippe Wang 2009-04-10 20:52 ` [Caml-list] " Jon Harrop 2009-04-10 23:20 ` Jerome Benoit [not found] ` <BB3C091F-3BE9-4883-A7CF-9D672CDDF829@x9c.fr> 2009-04-14 10:21 ` Philippe Wang 2009-04-14 14:18 ` xclerc 2009-04-16 9:45 ` Philippe Wang 2009-04-17 22:15 ` Philippe Wang 2009-04-17 22:20 ` Joel Reymont 2009-04-17 22:29 ` Philippe Wang 2009-04-17 23:05 ` forum 2009-04-23 16:49 ` Philippe Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox