From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id F4156BBAF for ; Mon, 20 Apr 2009 23:08:37 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgYBAJiA7EnUnwdjjmdsb2JhbACCHZQfAQEBAQkLCAkPBrsig30G X-IronPort-AV: E=Sophos;i="4.40,219,1238968800"; d="scan'208";a="24888180" Received: from relay.pcl-ipout01.plus.net ([212.159.7.99]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 20 Apr 2009 23:08:37 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq8EAEyB7EnUnw4U/2dsb2JhbACCHc92g30G Received: from pih-relay08.plus.net ([212.159.14.20]) by relay.pcl-ipout01.plus.net with ESMTP; 20 Apr 2009 22:08:37 +0100 Received: from [87.115.105.35] (helo=leper.local) by pih-relay08.plus.net with esmtp (Exim) id 1Lw0jF-0008Gw-0H for caml-list@inria.fr; Mon, 20 Apr 2009 22:08:37 +0100 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@inria.fr Subject: Parallelized parsing Date: Mon, 20 Apr 2009 22:15:27 +0100 User-Agent: KMail/1.9.9 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200904202215.27735.jon@ffconsultancy.com> X-Plusnet-Relay: fd9d98baa5c524ea0ee8487e02fc9025 X-Spam: no; 0.00; ocaml:01 parsers:01 grammars:01 syntax:01 integers:01 lexing:01 tokens:01 grammars:01 frog:98 parsed:01 parsing:01 parsing:01 recursively:01 recursively:01 grammar:02 I'm desperately trying to prepare for the imminent drop of a rock-solid multicore-friendly OCaml implementation and was wondering what work has been done on parallelized parsers and/or parallel-friendly grammars? For example, Mathematica syntax for nested lists of integers looks like: {{{1, 2}}, {{3, 4}, {4, 5}}, ..} and there are obvious divide-and-conquer approaches to lexing and parsing that grammar. You can recursively subdivide the string (e.g. memory mapped from a file) to build a tree of where the tokens { , and } appear by index and then recursively convert the tree into an AST. What other grammars can be lexed and/or parsed efficiently in parallel? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e