From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 66215BC69 for ; Thu, 29 Mar 2007 19:08:27 +0200 (CEST) Received: from ipmail02.adl2.internode.on.net (ipmail02.adl2.internode.on.net [203.16.214.141]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l2TH8PJw004742 for ; Thu, 29 Mar 2007 19:08:26 +0200 Received: from ppp36-111.lns2.syd6.internode.on.net (HELO [192.168.1.201]) ([59.167.36.111]) by ipmail02.adl2.internode.on.net with ESMTP; 30 Mar 2007 02:38:03 +0930 X-IronPort-AV: i="4.14,347,1170595800"; d="scan'208"; a="103836184:sNHT6165700030" Subject: Re: [Caml-list] Hacking the lexer in the new camlp4 From: skaller To: Nicolas Pouillard Cc: "Harrison, John R" , caml-list@inria.fr In-Reply-To: References: <509223F0BF55E74FA1247D17207E7A0C01433DE5@orsmsx419.amr.corp.intel.com> Content-Type: text/plain Date: Fri, 30 Mar 2007 03:07:58 +1000 Message-Id: <1175188078.2454.32.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 460BF289.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; lexer:01 camlp:01 0200,:01 camlp:01 lexer:01 ocamllex:01 ocamllex:01 lexers:01 lexeme:01 lexeme:01 recognizes:01 tokens:01 buffer:01 sourceforge:01 wrote:01 On Thu, 2007-03-29 at 17:29 +0200, Nicolas Pouillard wrote: > On 3/29/07, Harrison, John R wrote: > > | > In the current camlp4, the only way I found to do this was basically > > | > to copy the existing lexer and edit it. Although it works, it's ugly > > | > and invariably means that I've had to change something with almost > > | > every new version of camlp4. Does the new camlp4 offer a nicer way > > of > > | > changing the lexer? > > | > > | How did you that in the previous one without copy/paste the old lexer? > Ok, so the new one is build with ocamllex, so it's not really extensible. This doesn't follow entirely. There are at least two ways to extend ocamllex lexers. 1. Recursively process a given lexeme. 2. Dispatch the error case with enough information to start another lexer. Method 1 can either tokenise the given lexeme, or simply use it as a trigger to start another lexer. Method 2 is just a special case of method 1. All you really need is to pass the lexer a class with an overridable method for each lexemical class the lexer recognizes, which accepts the state data, and returns a list of tokens. Ocamllex currently ensures that the state of the buffer is ready to process the next character after the lexeme just decoded, even if it had to overshoot to get there. -- John Skaller Felix, successor to C++: http://felix.sf.net