From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 14F45BBAF for ; Tue, 21 Apr 2009 17:57:53 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgMBAOeJ7UnUnwdjjmdsb2JhbACCHpQiAQEBAQkLCAkPBr5Pg3oG X-IronPort-AV: E=Sophos;i="4.40,225,1238968800"; d="scan'208";a="26587611" Received: from relay.pcl-ipout01.plus.net ([212.159.7.99]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 21 Apr 2009 17:57:46 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AswEACWK7UnUnw4S/2dsb2JhbACCHtMhg3oG Received: from pih-relay05.plus.net ([212.159.14.18]) by relay.pcl-ipout01.plus.net with ESMTP; 21 Apr 2009 16:57:45 +0100 Received: from [87.115.105.35] (helo=leper.local) by pih-relay05.plus.net with esmtp (Exim) id 1LwILw-0007XD-6D; Tue, 21 Apr 2009 16:57:44 +0100 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: David MENTRE , caml-list@inria.fr Subject: Re: [Caml-list] Parallelized parsing Date: Tue, 21 Apr 2009 17:04:36 +0100 User-Agent: KMail/1.9.9 References: <200904202215.27735.jon@ffconsultancy.com> <3d13dcfc0904210019x1f624bebv82855519b0885050@mail.gmail.com> In-Reply-To: <3d13dcfc0904210019x1f624bebv82855519b0885050@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200904211704.36455.jon@ffconsultancy.com> X-Plusnet-Relay: d12548d4b8709ef53665c211c1a87925 X-Spam: no; 0.00; syntax:01 integers:01 lexing:01 tokens:01 grammars:01 speedup:01 speedups:01 2009:98 20,:98 2009:98 frog:98 parsed:01 wrote:01 wrote:01 parsing:01 On Tuesday 21 April 2009 08:19:33 you wrote: > On Mon, Apr 20, 2009 at 23:15, Jon Harrop wrote: > > For example, Mathematica syntax for nested lists of integers looks like: > > > > =A0{{{1, 2}}, {{3, 4}, {4, 5}}, ..} > > > > and there are obvious divide-and-conquer approaches to lexing and parsi= ng > > that grammar. You can recursively subdivide the string (e.g. memory > > mapped from a file) to build a tree of where the tokens { , and } appear > > by index and then recursively convert the tree into an AST. > > > > What other grammars can be lexed and/or parsed efficiently in parallel? > > Is it of any use? The overhead of parsing a single file in parallel is > so high that you won't have any speedup, especially compared to the > much simpler approach of parsing *several* files in parallel. I'm seeing near-linear speedups parsing nested Mathematica lists in F# usin= g=20 the algorithms I described. Moreover, dumping large quantities of data from= =20 Mathematica in that format is much easier than dividing it into separate=20 files because there is no clear break in the tree. =2D-=20 Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e