From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 3E9CBBC69 for ; Tue, 25 Sep 2007 10:54:20 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAIdp+EbAXQImh2dsb2JhbACOKwEBAQgKJw X-IronPort-AV: E=Sophos;i="4.20,294,1186351200"; d="scan'208";a="1417499" Received: from discorde.inria.fr ([192.93.2.38]) by mail1-smtp-roc.national.inria.fr with ESMTP; 25 Sep 2007 10:54:20 +0200 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l8P8sJlK017639 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Tue, 25 Sep 2007 10:54:19 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAABBp+EaGuIEKnmdsb2JhbACOKwEBAQEHBAYFChg X-IronPort-AV: E=Sophos;i="4.20,294,1186351200"; d="scan'208";a="1733868" Received: from smtp.vub.ac.be (HELO dorado.vub.ac.be) ([134.184.129.10]) by mail2-smtp-roc.national.inria.fr with ESMTP; 25 Sep 2007 10:54:19 +0200 Received: from exchange.etro.vub.ac.be (etro10.vub.ac.be [134.184.2.10]) by dorado.vub.ac.be (Postfix) with ESMTP id E7BD961D for ; Tue, 25 Sep 2007 10:54:15 +0200 (CEST) Received: from ssel.vub.ac.be ([10.0.4.37]) by exchange.etro.vub.ac.be with Microsoft SMTPSVC(6.0.3790.3959); Tue, 25 Sep 2007 10:54:15 +0200 Received: by ssel.vub.ac.be (Postfix, from userid 241) id 910302468BD9; Tue, 25 Sep 2007 10:54:15 +0200 (CEST) Received: from localhost (localhost.localdomain [127.0.0.1]) by ssel.vub.ac.be (Postfix) with ESMTP id 291DA2468BD8; Tue, 25 Sep 2007 10:54:15 +0200 (CEST) X-Virus-Scanned: amavisd-new at ssel.vub.ac.be Received: from ssel.vub.ac.be ([127.0.0.1]) by localhost (ssel.vub.ac.be [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6vJ8xATnNiCI; Tue, 25 Sep 2007 10:54:02 +0200 (CEST) Received: from [192.168.2.2] (213.219.134.88.adsl.dyn.edpnet.net [213.219.134.88]) by ssel.vub.ac.be (Postfix) with ESMTP id 1DD102468BD6; Tue, 25 Sep 2007 10:54:02 +0200 (CEST) In-Reply-To: <46F815F2.1010903@frisch.fr> References: <20070427135425.GA1161@first.in-berlin.de> <20070427162911.GA10099@furbychan.cocan.org> <20070427231220.GA1507@first.in-berlin.de> <46F815F2.1010903@frisch.fr> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Cc: Caml-list ml Content-Transfer-Encoding: 7bit From: Bruno De Fraine Subject: Re: ocamllex speed [was Re: [Caml-list] mboxlib reloaded ;-)] Date: Tue, 25 Sep 2007 10:53:59 +0200 To: Alain Frisch X-Mailer: Apple Mail (2.752.3) X-ClamAV-Alt: OK X-OriginalArrivalTime: 25 Sep 2007 08:54:15.0803 (UTC) FILETIME=[A6BA9CB0:01C7FF51] X-Miltered: at discorde with ID 46F8CCBB.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocamllex:01 frisch:01 avoided:01 compiler:01 ocamlopt:01 compiler:01 endline:01 getcwd:01 lexbuf:01 lexbuf:01 vub:98 2007,:98 3.5:98 0.35:98 white-space:98 On 24 Sep 2007, at 21:54, Alain Frisch wrote: > Bruno De Fraine wrote: >> Neither version does anything useful, except print the current >> directory when it encounters the string "current_directory". I >> tested this on a 57M text file (that has only a few >> "current_directory" occurrences). The ocamllex-version takes about >> 3.5s, while the Str-version takes only 0.35s. What causes this >> difference? Perhaps there is a high overhead in calling the >> translate function for every input character in such big input >> files, but I don't know how this can be avoided. > > Did you try to compile the same programs with the native compiler? Yes, the timings were for the ocamlopt compiler. Following skaller's suggestion, I also tried: rule translate = parse | [^ ' ' '\n' '\t']+ as word { if word = "current_directory" then print_endline (Sys.getcwd ()); translate lexbuf } | _ { translate lexbuf } | eof { () } This brings it down to 1.87s. Of course, only occurrences of "current_directory" that are white-space separated are picked up. Regards, Bruno