From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <goswin-v-b@web.de>
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by yquem.inria.fr (Postfix) with ESMTP id 9E614BBAF
	for <caml-list@yquem.inria.fr>; Wed, 27 Oct 2010 15:43:45 +0200 (CEST)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AkoEAPbJx0zZSMDqgGdsb2JhbAChWRUBAQsLChgDH70OhUgEjVs
X-IronPort-AV: E=Sophos;i="4.58,246,1286143200"; 
   d="scan'208";a="77237838"
Received: from fmmailgate03.web.de ([217.72.192.234])
  by mail2-smtp-roc.national.inria.fr with ESMTP; 27 Oct 2010 15:43:39 +0200
Received:  from smtp04.web.de  ( [172.20.0.225])
	by fmmailgate03.web.de (Postfix) with ESMTP id 347EB16A19A6E
	for <caml-list@inria.fr>; Wed, 27 Oct 2010 15:43:38 +0200 (CEST)
Received: from [78.43.204.177] (helo=frosties.localdomain)
	by smtp04.web.de with asmtp (TLSv1:AES256-SHA:256)
	(WEB.DE 4.110 #24)
	id 1PB6Hy-0008GP-00
	for caml-list@inria.fr; Wed, 27 Oct 2010 15:43:38 +0200
Received: from mrvn by frosties.localdomain with local (Exim 4.72)
	(envelope-from <goswin-v-b@web.de>)
	id 1PB6Hx-0000rL-IP
	for caml-list@inria.fr; Wed, 27 Oct 2010 15:43:37 +0200
From: Goswin von Brederlow <goswin-v-b@web.de>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Asynchronous IO programming in OCaml
In-Reply-To: <20101027111835.GA5664@gaia> ("Jeremie Dimino"'s message of "Wed,
	27 Oct 2010 13:18:35 +0200")
References: <AANLkTikgf2qF4y=o+S8PYUxZoCrCG-TRcovUq8uB=y7k@mail.gmail.com>
	<A7C6ED92-10BA-444B-A381-FC6AB7666932@recoil.org>
	<20101024225037.GA8999@aurora> <87y69mk6jm.fsf@frosties.localdomain>
	<20101025111022.GA32282@aurora>
	<AANLkTimP77PDEChW3Yt6uUy_qxYpj6EOZWQ_==id-LBC@mail.gmail.com>
	<20101025143317.GB32282@aurora>
	<AANLkTikZvCMQaOFknx4TBKUESpYcc2riRUexpU4QnFht@mail.gmail.com>
	<20101025172633.GC32282@aurora> <877hh4dlog.fsf@frosties.localdomain>
	<20101027111835.GA5664@gaia>
User-Agent: Gnus/5.110009 (No Gnus v0.9) XEmacs/21.4.22 (linux, no MULE)
Date: Wed, 27 Oct 2010 15:43:37 +0200
Message-ID: <877hh33g52.fsf@frosties.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Sender: goswin-v-b@web.de
X-Sender: goswin-v-b@web.de
X-Provags-ID: V01U2FsdGVkX1+Ry4pj4tZc4Ipx5xVqtHEXhoIlPUvGUOw585c1
	5a2p7Lai3vApTBWyiw5NodbQTFRlA8TaR126mfh3ZmfpO2P8+P
	dhVbxSqv0=
X-Spam: no; 0.00; ocaml:01 0200,:01 prefetching:01 memcpy:01 runtime:01 overwrite:01 overwrite:01 commited:01 totaly:98 mfg:98 threads:01 threads:01 wrote:01 caml-list:01 writes:01 

Jérémie Dimino <jeremie@dimino.org> writes:

> On Wed, Oct 27, 2010 at 11:33:51AM +0200, Goswin von Brederlow wrote:
>> You aren't doing any multithreading. You are creating a thread and
>> waiting for the thread to finish its read before strating a second.
>> There are never ever 2 reads running in parallel. So all you do is add
>> thread creation and destruction for every read to your first example.
>
> Yes, i know that. The idea was just to show the overhead of context
> switches.

But then you tune your benchmark to give you the result you want instead
of benchmarking what normaly happens.

You already have a context switch when the read returns (or any other
syscall that blocks). The context switch to a thread that waits on it
costs the same as switching to the main thread. When it blocked at
least.

>> You should start multiple threads and let them read from different
>> offsets (use pread) and only once they are all started join them all
>> again.
>
> Sure, but doing this directly in Lwt raises other problems:
>
> - This means prefetching a large part of the file into the program
>   memory.

Yes. totaly. But if you want to do work asynchonously then you need to
spend the memory to keep the data for multiple jobs in memory.

> - The kernel already prefetches files from the disk, so most of the time
>   this is just a memory copy in parallel...

That only works for sequential reads on a verry limited number of files
(if more than one at all). If you have 1000+ clients requesting files
from a webserver for example they will never be covered by the kernels
read-ahead.

And don't forget. Multi core systems are more and more widely
spread. What is wrong with 2 cores doing memcpy twice as fast?

> - How many threads do we launch in parallel ? This depends on the
>   overhead of context switching between threads, which can not be
>   determined at runtime.
>
> I think that the solution with mincore + mmap is better because it uses
> threads only when really needed.
>
> Jérémie

For reads I have to agree with you there. You can only hide the cost of
a thread when the read blocks. By using mincore to test if a read would
block first you avoid context switches where they aren't free.

Unfortunately you can't use mmap() for write (efficiently). Writing to a
mmap()ed file will first read in the old data from disk, then overwrite
the memory and sometime later write it back to disk. There seems to be
no way to tell the kernel to skip reading in a page on wirst access
because one is going to completly overwrite it anyway.


Do you have a different solution for writes that will avoid threads when
a write won't block? And how do you restart jobs once the data has
actually been commited to disk? (i.e. how do you do fsync()?)

MfG
        Goswin