From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 0D044BC2A for ; Fri, 5 Aug 2005 11:14:41 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j759Eekb024212 for ; Fri, 5 Aug 2005 11:14:40 +0200 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id LAA01440 for ; Fri, 5 Aug 2005 11:14:40 +0200 (MET DST) Received: from postfix4-1.free.fr (postfix4-1.free.fr [213.228.0.62]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j759EdDG024206 for ; Fri, 5 Aug 2005 11:14:39 +0200 Received: from quatre.invalid (vol75-1-81-57-79-249.fbx.proxad.net [81.57.79.249]) by postfix4-1.free.fr (Postfix) with ESMTP id 98EF0319D1D; Fri, 5 Aug 2005 11:14:39 +0200 (CEST) Received: from berke by quatre.invalid with local (Exim 4.50) id 1E0yIM-0006Ea-1x; Fri, 05 Aug 2005 11:15:14 +0200 Date: Fri, 5 Aug 2005 11:15:14 +0200 From: Berke Durak To: james woodyatt Cc: caml-list@inria.fr Subject: Re: [Caml-list] ocamllex problem Message-ID: <20050805091513.GC4076@ara.zapto.org> References: <1123203791.6720.63.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i X-Miltered: at concorde with ID 42F32E00.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 42F32DFF.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; berke:01 durak:01 caml-list:01 ocamllex:01 woodyatt:01 non-trivial:01 dfa:01 dfa:01 stack:01 backtracking:01 sequences:01 parser:01 parsers:01 toplas:01 berke:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=FORGED_RCVD_HELO autolearn=disabled version=3.0.3 On Thu, Aug 04, 2005 at 11:15:48PM -0700, james woodyatt wrote: > One thing: the pattern [':'((letter|' ')* as s)] is interesting. > You're definitely right that something non-trivial is happening > inside the DFA. My [Cf_dfa] module does not keep a stack of > backtracking sequences because I did something else to resolve the > problem. Look at the ( $@ ) operators, which allow you to use a > parser monad on the recognized input sequence to obtain the result of > a lexical rule. Using this, you can implement something like the > feature you're interested in by defining a nested hierarchy of parsers. You may also wish to have a look at : Thomas Reps, "Maximal-Munch" Tokenization in Linear Time, ACM Trans. Program. Lang. Syst., vol. 20, num. 2, 1998, pp.259-273 http://www.cs.wisc.edu/wpis/papers/toplas98b.pdf -- Berke Durak