On Fri, May 9, 2008 at 11:00 PM, Till Varoquaux <till.varoquaux@gmail.com> wrote:
First of all let's try to stop the squabling and have some actual some
discussions with actual content (trolling is very tempting and I am
the first to fall for it). OCaml is extremly nice but not perfect.
Other languages have other tradeoffs and the INRIA is not here to
fullfill all our desires.
shm_open shares memories through file descriptors and, under
On Fri, May 9, 2008 at 9:40 PM, Gerd Stolpmann <info@gerd-stolpmann.de> wrote:
>
> Am Freitag, den 09.05.2008, 19:10 +0100 schrieb Jon Harrop:
>> On Friday 09 May 2008 12:12:00 Gerd Stolpmann wrote:
>> > I think the parallelism capabilities are already excellent. We have been
>> > able to implement the application backend of Wink's people search in
>> > O'Caml, and it is of course a highly parallel system of programs. This
>> > is not the same class raytracers or desktop parallelism fall into - this
>> > is highly professional supercomputing. I'm talking about a cluster of
>> > ~20 computers with something like 60 CPUs.
>> >
>> > Of course, we did not use multithreading very much. We are relying on
>> > multi-processing (both "fork"ed style and separately started programs),
>> > and multiplexing (i.e. application-driven micro-threading). I especially
>> > like the latter: Doing multiplexing in O'Caml is fun, and a substitute
>> > for most applications of multithreading. For example, you want to query
>> > multiple remote servers in parallel: Very easy with multiplexing,
>> > whereas the multithreaded counterpart would quickly run into scalability
>> > problems (threads are heavy-weight, and need a lot of resources).
>>
>> If OCaml is good for concurrency on distributed systems that is great but it
>> is completely different to CPU-bound parallelism on multicores.
>
> You sound like somebody who tries to sell hardware :-)
>
> Well, our algorithms are quite easy to parallelize. I don't see a
> difference in whether they are CPU-bound or disk-bound - we also have
> lots of CPU-bound stuff, and the parallelization strategies are the
> same.
>
> The important thing is whether the algorithm can be formulated in a way
> so that state mutations are rare, or can at least be done in a
> "cache-friendly" way. Such algorithms exist for a lot of problems. I
> don't know which problems you want to solve, but it sounds like as if it
> were special problems. Like for most industries, most of our problems
> are simply "do the same for N objects" where N is very large, and
> sometimes "sort data", also for large N.
>
>> > In our case, the mutable data structures that count are on disk.
>> > Everything else is only temporary state.
>>
>> Exactly. That is a completely different kettle of fish to writing high
>> performance numerical codes for scientific computing.
>
> I don't understand. Relying on disk for sharing state is a big problem
> for us, but unavoidable. Disk is slow memory with a very special timing.
> Experience shows that even accessing state over the network is cheaper
> than over disk. Often, we end up designing our algorithms around the
> disk access characteristics. Compared to that the access to RAM-backed
> state over network is fast and easy.
>
linux/glibc, this done using /dev/shm. You can mmap this as a bigarray
and, voila, shared memory. This is quite nice for numerical
computation, plus you get closures etc... in your forks. Oh and COW on
modern OS's makes this very cheap.