From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA26540 for caml-redistribution@pauillac.inria.fr; Fri, 10 Mar 2000 19:49:09 +0100 (MET)
Resent-Message-Id: <200003101849.TAA26540@pauillac.inria.fr>
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA03276 for <caml-list@pauillac.inria.fr>; Fri, 10 Mar 2000 19:02:09 +0100 (MET)
Received: from mail3.microsoft.com (mail3.microsoft.com [131.107.3.123])
	by nez-perce.inria.fr (8.8.7/8.8.7) with SMTP id TAA23877
	for <caml-list@inria.fr>; Fri, 10 Mar 2000 19:02:08 +0100 (MET)
Received: from 157.54.9.100 by mail3.microsoft.com (InterScan E-Mail VirusWall NT); Fri, 10 Mar 2000 10:02:02 -0800 (Pacific Standard Time)
Received: by INET-IMC-03 with Internet Mail Service (5.5.2651.58)
	id <GF5XVS31>; Fri, 10 Mar 2000 10:02:02 -0800
Message-ID: <783D93998201D311B0CF00805FEAA07B7E915F@RED-MSG-42>
From: Manuel Fahndrich <maf@microsoft.com>
To: "'Xavier Leroy'" <Xavier.Leroy@inria.fr>,
        "'caml-list@inria.fr'"
	 <caml-list@inria.fr>
Cc: "Norman Ramsey (E-mail)" <nr@deas.harvard.edu>
Subject: RE: Interpreter vs hardware threads
Date: Fri, 10 Mar 2000 10:01:53 -0800
X-Mailer: Internet Mail Service (5.5.2651.58)
Resent-From: weis@pauillac.inria.fr
Resent-Date: Fri, 10 Mar 2000 19:49:09 +0100
Resent-To: caml-redistribution@pauillac.inria.fr


The advantage of heap closures in SML/NJ for efficient thread creation is
only present when the threads are super lightweight, that is completely
handled by the SML/NJ system via callcc/throw. If we want the threads to be
scheduled by the OS, because e.g. we are running on multi-processor
machines, or we don't want to block on IO, then each thread again needs some
C stack for the OS interaction, which destroys the heap closure advantage.

What we really need is an OS w/o a stack.

-Manuel


-----Original Message-----
From: Xavier Leroy [mailto:Xavier.Leroy@inria.fr]
Sent: Friday, March 10, 2000 12:00 AM
To: caml-redistribution@pauillac.inria.fr
Cc: Norman Ramsey (E-mail)
Subject: Re: Interpreter vs hardware threads


> | Very lightweight
> | threads do exist, see e.g. the call/cc-based threads of SML/NJ, but
> | entail significant performance penalties not only on I/O, but also on
> | the actual running speed of the sequential code.
> Is that really so, Xavier?  What percentage performance penalty do
> you think is involved?  1%?  10%?  Factor of 2?

For SML/NJ, I think their stackless execution model (all activation
records beging heap-allocated) entail a significant speed penalty for
sequential code -- at least 20%, I'd say.  But it sure gives you
blazingly fast thread creation.

The Concurrent Haskell approach that you describe is somehow less
lightweight than the SML/NJ approach.  In particular, thread creation
is going to be more expensive because of the cost of allocating and
setting up a new stack.  Also, the memory consumption of each thread
is going to be higher.  I guess the SML/NJ approach could accommodate
one million threads, while Concurrent Haskell sounds more in the 100 000s.
Still, I agree the solution you outline is pretty lightweight.

The expression "lightweight threads" is getting too vague.  I guess we
need several degrees of lightweightness, from SML/NJ's "ultra-lightweight
threads" to POSIX's "not so lightweight threads", with Haskell's
"pretty lightweight" threads and Caml's "lightweight, but no more"
threads...

> For potentially-blocking I/O operations it is true that there is
> some extra work to do, much like a context switch.  The pointers need
> to be saved in a safe place in case GC strikes, and a global lock should
> be released so that a new heavyweight thread can take over the business
> of running the lightweight threads if the I/O blocks.  But none of
> this seems really expensive in terms of % of computation time, does it?

You have to add to this the overhead on the I/O operation itself.  If
your threads build upon heavyweight threads for I/O, that should be
negligible.  (Say, one or two extra kernel context switches.)  But if
you have to perform I/O polling via select() (like OCaml bytecode
threads do for maximal portability), the overhead becomes significant.
(select() is one of the most expensive Unix system calls, in
particular because the kernel has to scan a huge set of file
descriptors just to determine which ones to wait upon; it's so bad
that Unix 98 introduced an alternate form, poll(), that does exactly
the same thing but with a more compact description of the f.d. sets.)

- Xavier Leroy