* mixing Ocaml with another GC-ed language @ 2000-02-25 13:37 STARYNKEVITCH Basile 2000-02-29 10:24 ` Xavier Leroy ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: STARYNKEVITCH Basile @ 2000-02-25 13:37 UTC (permalink / raw) To: caml-list [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1672 bytes --] Hello All, In a few weeks I will perhaps code in Ocaml inside a static code analyzer tool written in C++ (don't ask me why.. it is a painful subject [see the thread "convincing management to code in Ocaml" on this same mailing list, that I started last year]). It is not really plain C++ because I did code a (precise, mostly copying, generational) garbage collector which is used in the project. So Ocaml code will be called from (and will probably upcall to) C++ code using my GC. So I do know my GC quite well (and studied Ocaml's GC a bit also). My GC also support finalized objects, which it does not move (so their address remain fixed). Does any people have concrete experience mixing Ocaml with another GC-ed language (e.g. Java or Common Lisp) inside the same program? I do have my precise ideas on the problem (essentially, avoid mixing pointers from both worlds, either by copying data or by using my finalized C++ GCed objects which are not moved by my GC). But I will be happy to hear from other people's experience. Any pitfalls to avoid are interesting to hear. The custom tag object (introduced in Ocaml3, see the Ocaml CVS webserver) might also be helpful. Regards N.B. Any opinions expressed here are only mine, and not of my organization. N.B. Les opinions exprimees ici me sont personnelles et n engagent pas le CEA. --------------------------------------------------------------------- Basile STARYNKEVITCH ---- Commissariat à l Energie Atomique DTA/LETI/DEIN/SLA * CEA/Saclay b.528 (p111f) * 91191 GIF/YVETTE CEDEX * France phone: 1,69.08.60.55; fax: 1.69.08.83.95 home: 1,46.65.45.53 email: Basile point Starynkevitch at cea point fr ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mixing Ocaml with another GC-ed language 2000-02-25 13:37 mixing Ocaml with another GC-ed language STARYNKEVITCH Basile @ 2000-02-29 10:24 ` Xavier Leroy 2000-03-01 3:48 ` Nice feature Max Skaller 2000-03-01 3:52 ` Interpreter vs hardware threads Max Skaller 2 siblings, 0 replies; 14+ messages in thread From: Xavier Leroy @ 2000-02-29 10:24 UTC (permalink / raw) To: STARYNKEVITCH Basile, caml-list > It is not really plain C++ because I did code a (precise, mostly > copying, generational) garbage collector which is used in the > project. So Ocaml code will be called from (and will probably upcall > to) C++ code using my GC. So I do know my GC quite well (and studied > Ocaml's GC a bit also). My GC also support finalized objects, which it > does not move (so their address remain fixed). > > Does any people have concrete experience mixing Ocaml with another > GC-ed language (e.g. Java or Common Lisp) inside the same program? I can't say I have concrete experience, but I believe the following should work. So, we have two garbage-collected languages A and B, each managing its own heap. Assume both A and B support 1- finalized objects, and 2- explicit registration of GC roots. Then, to make an A object available to B, register the A pointer as a GC root for A (so that A doesn't reclaim it), allocate in B a proxy block containing the A pointer, and put a finalization function on the proxy block that un-registers the A pointer with A's GC when the proxy block is finalized. In this approach, A objects are viewed from B as an abstract type: B can't do anything with them except call A functions to operate on them. Allowing B to operate directly on A objects (e.g. read and write an array) is very language-dependent and generally hard; better go through foreign function calls. > I do have my precise ideas on the problem (essentially, avoid mixing > pointers from both worlds, either by copying data or by using my > finalized C++ GCed objects which are not moved by my GC). Copying is another option (that's what stubs generated by CamlIDL do, for instance). You get the benefit of having a concrete view on the data structure in both languages. But copying can be expensive on large structures, and also loses sharing for mutable objects. > The custom tag object (introduced in Ocaml3, see the Ocaml CVS > webserver) might also be helpful. Right. It's a generalization of OCaml's finalized objects, allowing you to attach to a Caml memory block not only a finalization function, but also an equality function, a hashing function, and serialization / deserialization functions (called by output_value and input_value). - Xavier Leroy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Nice feature 2000-02-25 13:37 mixing Ocaml with another GC-ed language STARYNKEVITCH Basile 2000-02-29 10:24 ` Xavier Leroy @ 2000-03-01 3:48 ` Max Skaller 2000-03-01 3:52 ` Interpreter vs hardware threads Max Skaller 2 siblings, 0 replies; 14+ messages in thread From: Max Skaller @ 2000-03-01 3:48 UTC (permalink / raw) Cc: caml-list Just thought I'd say -- the new error diagnostic produced for missing cases from matches -- listing cases that are not tested for -- is superb. This is saving me a lot of time. Thanks! BTW: I'm about to find out about 'convincing management' to use ocaml. I'm working in a C++ shop (C++ is mandatory for much of the production code for reasons of both efficiency and binary compatibility), with the task of writing a code generator -- which will (probably) generate C++, but will (hopefully) be written in ocaml. [I tend to push people to breaking point though -- I'm also writing the code using my literate programming tool :-] -- John (Max) Skaller at OTT [Open Telecommications Ltd] mailto:maxs@in.ot.com.au -- at work mailto:skaller@maxtal.com.au -- at home ^ permalink raw reply [flat|nested] 14+ messages in thread
* Interpreter vs hardware threads 2000-02-25 13:37 mixing Ocaml with another GC-ed language STARYNKEVITCH Basile 2000-02-29 10:24 ` Xavier Leroy 2000-03-01 3:48 ` Nice feature Max Skaller @ 2000-03-01 3:52 ` Max Skaller 2000-03-01 18:55 ` Michael Hicks 2000-03-01 20:02 ` Stefan Monnier 2 siblings, 2 replies; 14+ messages in thread From: Max Skaller @ 2000-03-01 3:52 UTC (permalink / raw) Cc: caml-list Does anyone know how fast the ocaml bytecode interpreter 'switches' between threads (that is, what is the scheduling overhead?) Assume most of the threads are waiting (say, on a condition variable). Asume LOTS of threads (thousands). [Hardware threads are not fast enough] -- John (Max) Skaller at OTT [Open Telecommications Ltd] mailto:maxs@in.ot.com.au -- at work mailto:skaller@maxtal.com.au -- at home ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Interpreter vs hardware threads 2000-03-01 3:52 ` Interpreter vs hardware threads Max Skaller @ 2000-03-01 18:55 ` Michael Hicks 2000-03-01 20:02 ` Stefan Monnier 1 sibling, 0 replies; 14+ messages in thread From: Michael Hicks @ 2000-03-01 18:55 UTC (permalink / raw) To: Max Skaller; +Cc: caml-list > Does anyone know how fast the ocaml bytecode interpreter > 'switches' between threads (that is, what is the scheduling overhead?) > Assume most of the threads are waiting (say, on a condition variable). > Asume LOTS of threads (thousands). > > [Hardware threads are not fast enough] Remembering back to the thread scheduler in Ocaml 2.00 (I assume it's the same one), I recall that the context switch directly relates to how many threads are waiting. This is because at each CS point, every thread is checked to see if its timeout has expired or has some I/O ready. As a result, performance degrades sharply as you add more threads. With a small number of threads (around 10) on a Pentium II 300 MHz machine, we measured the context switch time to be about 100 microsecs. See our INFOCOM 99 paper for more info on the system we were measuring this in (our active network, PLANet): http://www.cis.upenn.edu/~switchware/papers/infocom99-plan.ps. Mike -- Michael Hicks Ph.D. Candidate, the University of Pennsylvania http://www.cis.upenn.edu/~mwh mailto://mwh@dsl.cis.upenn.edu "The pessimist sees difficulty in every opportunity. The optimist sees the opportunity in every difficulty." -- Winston Churchill ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Interpreter vs hardware threads 2000-03-01 3:52 ` Interpreter vs hardware threads Max Skaller 2000-03-01 18:55 ` Michael Hicks @ 2000-03-01 20:02 ` Stefan Monnier 2000-03-02 18:18 ` William Chesters 1 sibling, 1 reply; 14+ messages in thread From: Stefan Monnier @ 2000-03-01 20:02 UTC (permalink / raw) To: caml-list >>>>> "Max" == Max Skaller <maxs@in.ot.com.au> writes: > Asume LOTS of threads (thousands). (insert-file "many-threads-bad-design") > [Hardware threads are not fast enough] [ I assume you mean OS threads, since hardware-supported threads are not very common. ] I guess it depends on the OS and the pthreads implementation, but it seems that they are usually pretty fast (i.e. fast enough that people more or less stopped trying to make them faster). Stefan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Interpreter vs hardware threads 2000-03-01 20:02 ` Stefan Monnier @ 2000-03-02 18:18 ` William Chesters 2000-03-03 16:59 ` John Max Skaller 2000-03-06 17:35 ` Xavier Leroy 0 siblings, 2 replies; 14+ messages in thread From: William Chesters @ 2000-03-02 18:18 UTC (permalink / raw) To: caml-list Stefan Monnier writes: > >>>>> "Max" == Max Skaller <maxs@in.ot.com.au> writes: > > Asume LOTS of threads (thousands). > > (insert-file "many-threads-bad-design") > > > [Hardware threads are not fast enough] > > [ I assume you mean OS threads, since hardware-supported threads > are not very common. ] > > I guess it depends on the OS and the pthreads implementation, but > it seems that they are usually pretty fast (i.e. fast enough that > people more or less stopped trying to make them faster). I think you are being patronising from a position of innocent ignorance. Judging by Max's .sig he's doing embedded telecoms boxes, something like a GSM HLR which juggles many thousands of concurrent transactions in some ghastly protocol or other. Typically the logic of each transaction is encoded in an unstructured FSM---the same way as the wretched committees "specify" them---rendered literally in C/C++. That's very fiddly and leads to bugs, hence loss of revenue on a telephone-number scale, and a well-founded reluctance to add features. The convenience of using threads and a more sensible language would pay for some loss of performance, but obviously a massive hit for using lots of threads is unacceptable. IIRC, Linux native pthreads, for one, aren't at the moment meant to be used in huge numbers---the flood-test performance of those Linux Java VMs which map Java threads onto them supposedly suffers compared with those that don't. But Xavier would be able to tell us more about that :). (Of course, one could always ask the ocaml team that won the ICFP competition in such spectacular fashion to knock off an ocaml-to-event-driven-FSM compiler ...) Max---if you end up using ocaml for embedded work I'd be very interested to hear about it. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Interpreter vs hardware threads 2000-03-02 18:18 ` William Chesters @ 2000-03-03 16:59 ` John Max Skaller 2000-03-06 17:35 ` Xavier Leroy 1 sibling, 0 replies; 14+ messages in thread From: John Max Skaller @ 2000-03-03 16:59 UTC (permalink / raw) To: William Chesters; +Cc: caml-list William Chesters wrote: > Max---if you end up using ocaml for embedded work I'd be very > interested to hear about it. I am using it at the moment to build 'code generation' tools -- for that ocaml is definitely suitable. -- John (Max) Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 checkout Vyper http://Vyper.sourceforge.net download Interscript http://Interscript.sourceforge.net ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Interpreter vs hardware threads 2000-03-02 18:18 ` William Chesters 2000-03-03 16:59 ` John Max Skaller @ 2000-03-06 17:35 ` Xavier Leroy 2000-03-06 23:11 ` John Max Skaller 2000-03-07 13:19 ` Julian Assange 1 sibling, 2 replies; 14+ messages in thread From: Xavier Leroy @ 2000-03-06 17:35 UTC (permalink / raw) To: William Chesters, caml-list > IIRC, Linux native pthreads, for one, aren't at the moment meant to > be used in huge numbers---the flood-test performance of those Linux > Java VMs which map Java threads onto them supposedly suffers compared > with those that don't. But Xavier would be able to tell us more about > that :). My pleasure :-) It is true that threads in LinuxThreads are not as lightweight as some would like, mostly because they map 1-to-1 to kernel processes, and the Linux kernel limits the number of processes to 512 (for the super-user) or 256 (for normal users). Also, kernel scheduling takes time linear in the number of processes (just like the OCaml bytecode-thread scheduler, which was strongly inspired by that of the Linux kernel), so context switching times increase as more threads are run. More generally, there are two very different viewpoints on threads that call for radically different implementations. The first viewpoint is that threads are a convenient abstraction to exploit fully some hardware features, such as multiple processors and overlapping of I/O and computation. You can exploit multiple processors and overlap I/O without threads using e.g. communicating Unix processes and asynchronous I/O. But it's a pain to program; threads make the programming of such applications much easier. In this viewpoint, you never need huge numbers of threads. For heavy computation on a multiprocessor, N + 1 or N + 2 threads, where N is the number of processors, delivers optimal throughput; you're not going to run any faster with more threads. For overlapped I/O on PC-class hardware, 20 threads or so are plenty: if you overlap more than 20 I/O requests at the same time, the disks and network cards won't follow, and throughput will not be increased either. Both LinuxThreads and to a lesser extent Caml threads were designed with that kind of applications in mind. Caml threads can't exploit multiprocessors because the runtime system isn't safe w.r.t. concurrent execution, but they have proven quite effective for overlapped I/O, e.g. in the V6 intelligent Web proxy. The other viewpoint is that threads are a code structuring facility, making it easier to write programs that conceptually perform a lot of things concurrently, e.g. in response to external stimuli. In this viewpoint, threads should be as lightweight as possible (starting a thread shouldn't be much more expensive than calling a function), and limited in number only by available memory. The sad fact is that there doesn't exist any implementation technique for threads that satisfy both viewpoints at once. Very lightweight threads do exist, see e.g. the call/cc-based threads of SML/NJ, but entail significant performance penalties not only on I/O, but also on the actual running speed of the sequential code. "Heavyweight" threads such as LinuxThreads or Win32 threads are very efficient w.r.t. I/O, but are expensive to start and to context-switch. Two-level thread libraries are a compromise that is appealing on paper, but not that much "lighter" than pure kernel-based threads in reality. Add to this the fact that the primitives provided by the underlying OS affect quite a lot thread libraries. For instance, some Unixes provide async I/O notification via signals, and that could be used to speed up the Caml bytecode-thread scheduler, but not all Unixes provide them. Also, if we were certain that the underlying OS provides native threads, we could build a two-level scheduler for bytecode threads that would probably be more efficient. But of course we have to shoot for the lowest common denominator. So, don't expect miracles from Caml threads, either bytecode or system. As Michael Hicks said, the current bytecode thread scheduler could be improved to run in time proportional to the number of threads waiting on I/O, rather than on the number of threads. That would help if most of your threads are blocked on mutexes, conditions or event channels, but not if most of your threads perform I/O. > (Of course, one could always ask the ocaml team that won the ICFP > competition in such spectacular fashion to knock off an > ocaml-to-event-driven-FSM compiler ...) I'm not sure that OCaml is the best input language for generating FSMs, but, yes, generating FSMs from a high-level concurrent language is an excellent approach. You get both ultimate execution speed and a readable, maintainable description of the program. Have a look for instance at Esterel (another glorious INRIA product :-)) - Xavier Leroy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Interpreter vs hardware threads 2000-03-06 17:35 ` Xavier Leroy @ 2000-03-06 23:11 ` John Max Skaller 2000-03-07 13:19 ` Julian Assange 1 sibling, 0 replies; 14+ messages in thread From: John Max Skaller @ 2000-03-06 23:11 UTC (permalink / raw) To: Xavier Leroy; +Cc: William Chesters, caml-list Xavier Leroy wrote: > I'm not sure that OCaml is the best input language for generating > FSMs, but, yes, generating FSMs from a high-level concurrent language > is an excellent approach. You get both ultimate execution speed and a > readable, maintainable description of the program. Have a look for > instance at Esterel (another glorious INRIA product :-)) OK. Thanks for the explanation. 'Threads' are typically delayed waiting for 'any' event, and occasionally, a specific event (response to a request). -- John (Max) Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 checkout Vyper http://Vyper.sourceforge.net download Interscript http://Interscript.sourceforge.net ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Interpreter vs hardware threads 2000-03-06 17:35 ` Xavier Leroy 2000-03-06 23:11 ` John Max Skaller @ 2000-03-07 13:19 ` Julian Assange 2000-03-08 20:12 ` Xavier Leroy 2000-03-08 23:30 ` Max Skaller 1 sibling, 2 replies; 14+ messages in thread From: Julian Assange @ 2000-03-07 13:19 UTC (permalink / raw) To: Xavier Leroy; +Cc: William Chesters, caml-list, proff Xavier Leroy <Xavier.Leroy@inria.fr> writes: > > IIRC, Linux native pthreads, for one, aren't at the moment meant to > > be used in huge numbers---the flood-test performance of those Linux > > Java VMs which map Java threads onto them supposedly suffers compared > > with those that don't. But Xavier would be able to tell us more about > > that :). > > My pleasure :-) It is true that threads in LinuxThreads are not as > lightweight as some would like, mostly because they map 1-to-1 to > kernel processes, and the Linux kernel limits the number of processes > to 512 (for the super-user) or 256 (for normal users). Also, kernel > scheduling takes time linear in the number of processes (just like the > OCaml bytecode-thread scheduler, which was strongly inspired by that > of the Linux kernel), so context switching times increase as more > threads are run. I recently worked on a project in C that used a FSM to simulate several thousand concurrent IO threads, as part of a concurrent version of dig (able to recursively zone transfer all of *.au in under three minutes). Due to the complicated, but inherently forward moving nature of the operations involved (i.e do this. if there is no error or timeout, do the next step, and so on for about 30 steps) together with inter-state resources (i.e various queues, buffers and so on) this ended up as an extremely large and unintuitive piece of code. A lot of the added complexity was a result of having to explicity deallocate data and maintain reference counts, and manually follow allocation dependencies, which in the circumstances of multiple interdependent event queues and states became extremely tedious. What is the source of the linearity of thread context switches in ocaml? Are there ways to eliminate it? If so I'd be happy to have a look at doing so. Cheers, Julian. -- Stefan Kahrs in [Kah96] discusses the notion of completeness--programs which never go wrong can be type-checked--which complements Milner's notion of soundness--type-checked programs never go wrong [Mil78]. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Interpreter vs hardware threads 2000-03-07 13:19 ` Julian Assange @ 2000-03-08 20:12 ` Xavier Leroy 2000-03-19 6:10 ` Julian Assange 2000-03-08 23:30 ` Max Skaller 1 sibling, 1 reply; 14+ messages in thread From: Xavier Leroy @ 2000-03-08 20:12 UTC (permalink / raw) To: Julian Assange; +Cc: William Chesters, caml-list > What is the source of the linearity of thread context switches in > ocaml? Well, the list of all threads is scanned at each context switch to find runnable threads, determine whether we need to select(), etc. > Are there ways to eliminate it? If so I'd be happy to have a > look at doing so. It's been on my "to do" list for a while, so feel free to give it a try... The standard trick is to split the collection of threads into several queues: a queue of runnable threads and a queue of threads waiting for I/O or timer operations, for instance. Kepp the latter sorted by timer expiration date to find easily the timers that have expired. Also, suspended threads (threads waiting on internal events such as mutexes and conditions) are not in those two queues, but added back to the runnable queue when woken up. This should get the complexity of context switches down to O(N_io), where N_io is the number of threads waiting on I/O. You could do better by using heavyweight threads or signal-based async I/O instead of select(), but it becomes less portable. - Xavier Leroy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Interpreter vs hardware threads 2000-03-08 20:12 ` Xavier Leroy @ 2000-03-19 6:10 ` Julian Assange 0 siblings, 0 replies; 14+ messages in thread From: Julian Assange @ 2000-03-19 6:10 UTC (permalink / raw) To: Xavier Leroy; +Cc: William Chesters, caml-list, proff Here <http://www.kegel.com/c10k.html> is an extremely interesting article/resource list about io/thread scaling implementation. I'm involved in a linguistic project that needs to make several thousand concurrent connections to mine the internet for corpi. At the moment neither ocaml lwt or heavier pthreads are capable of scaling well enough to do this. If this problem were solved ocaml would make an wonderful language for all kinds of massively concurrent io interaction. Cheers, Julian. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Interpreter vs hardware threads 2000-03-07 13:19 ` Julian Assange 2000-03-08 20:12 ` Xavier Leroy @ 2000-03-08 23:30 ` Max Skaller 1 sibling, 0 replies; 14+ messages in thread From: Max Skaller @ 2000-03-08 23:30 UTC (permalink / raw) To: Julian Assange; +Cc: Xavier Leroy, William Chesters, caml-list Julian Assange wrote: > What is the source of the linearity of thread context switches in > ocaml? Are there ways to eliminate it? If so I'd be happy to have a > look at doing so. One problem with this is that it only affects the bytecode interpreter. It would be nice to have lightweight threads available for compiled ocaml too. The system I need to implement uses a (heavyweight) event dispatcher thread and a few (heavyweight) worker threads. The events are 'executed', in other words, the client code built as an object with callbacks. This is very fast: there's no need to 'check' for a blocked IO operation and restart the 'thread'. The thread is restarted directly by the event. This all works fine, except that it is very hard to code the logic of a 'logical' thread of control. So I need to 'control invert' a client program, which is written in procedural style, to hide the ugly event driven implementation. -- John (Max) Skaller at OTT [Open Telecommications Ltd] mailto:maxs@in.ot.com.au -- at work mailto:skaller@maxtal.com.au -- at home ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2000-03-21 14:29 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2000-02-25 13:37 mixing Ocaml with another GC-ed language STARYNKEVITCH Basile 2000-02-29 10:24 ` Xavier Leroy 2000-03-01 3:48 ` Nice feature Max Skaller 2000-03-01 3:52 ` Interpreter vs hardware threads Max Skaller 2000-03-01 18:55 ` Michael Hicks 2000-03-01 20:02 ` Stefan Monnier 2000-03-02 18:18 ` William Chesters 2000-03-03 16:59 ` John Max Skaller 2000-03-06 17:35 ` Xavier Leroy 2000-03-06 23:11 ` John Max Skaller 2000-03-07 13:19 ` Julian Assange 2000-03-08 20:12 ` Xavier Leroy 2000-03-19 6:10 ` Julian Assange 2000-03-08 23:30 ` Max Skaller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox