From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id E6665BC6B for ; Sat, 28 Apr 2007 15:50:02 +0200 (CEST) Received: from ipmail01.adl2.internode.on.net (ipmail01.adl2.internode.on.net [203.16.214.140]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l3SDo0a7009580 for ; Sat, 28 Apr 2007 15:50:02 +0200 X-IronPort-AV: E=Sophos;i="4.14,464,1170595800"; d="scan'208";a="120617712" Received: from ppp8-148.lns1.syd7.internode.on.net (HELO [192.168.1.201]) ([59.167.8.148]) by ipmail01.adl2.internode.on.net with ESMTP; 28 Apr 2007 23:19:56 +0930 Subject: Re: [Caml-list] mboxlib reloaded ;-) From: skaller To: Oliver Bandel Cc: caml-list@yquem.inria.fr In-Reply-To: <20070428114450.GI363@first.in-berlin.de> References: <20070427135425.GA1161@first.in-berlin.de> <20070427162911.GA10099@furbychan.cocan.org> <20070427231220.GA1507@first.in-berlin.de> <1177721646.16582.8.camel@rosella.wigram> <20070428104746.GA363@first.in-berlin.de> <20070428125453.1fef4ae4@kerneis.info> <20070428114450.GI363@first.in-berlin.de> Content-Type: text/plain; charset=utf-8 Date: Sat, 28 Apr 2007 23:49:51 +1000 Message-Id: <1177768191.11923.24.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 8bit X-Miltered: at concorde with ID 46335108.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; reloaded:01 0200,:01 bandel:01 0200,:01 bandel:01 in-berlin:01 lexer:01 ocamllex:01 mll:01 transitions:01 afaik:01 ocaml:01 buffer:01 ocaml:01 buffer:01 On Sat, 2007-04-28 at 13:44 +0200, Oliver Bandel wrote: > On Sat, Apr 28, 2007 at 12:54:53PM +0200, Gabriel Kerneis wrote: > > Le Sat, 28 Apr 2007 12:47:47 +0200, Oliver Bandel > > a écrit : > > > > You should check the size (number of states) of the generated > > > > lexer. > > > > > > How? > > > > It's printed out by ocamllex when you run it on you .mll file. > > Regards, > > Ah, ok. :) > > > 18 states, 261 transitions, table size 1152 bytes. > > Does not loooks very huge ;-) Lol, no it is tiny. You are probably right, too many calls, and too much copying data around. AFAIK Ocaml channels also add an extra buffer layer (is that right?) so there's even more copying. Still, although Ocaml may generate more code than C, if your code is reasonably tight it should be cached and be fast: function calls are actually quite cheap. Here's an idea: you said: "For the about 100MB mbox there are 2.5 * 10^6 calls to to Buffer.add_string for the header and 1.6 * 10^6 calls to Buffer.add_string for the body, 2.6*10^6 calls to the function lexing.engine, ..." How about NOT storing the body text. Instead, just store the integer file offset of the first byte and the length? Not sure what you application is doing; perhaps that would work for you? -- John Skaller Felix, successor to C++: http://felix.sf.net