From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <skaller@users.sourceforge.net>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.1.3
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39])
	by yquem.inria.fr (Postfix) with ESMTP id 66215BC69
	for <caml-list@yquem.inria.fr>; Thu, 29 Mar 2007 19:08:27 +0200 (CEST)
Received: from ipmail02.adl2.internode.on.net (ipmail02.adl2.internode.on.net [203.16.214.141])
	by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l2TH8PJw004742
	for <caml-list@inria.fr>; Thu, 29 Mar 2007 19:08:26 +0200
Received: from ppp36-111.lns2.syd6.internode.on.net (HELO [192.168.1.201]) ([59.167.36.111])
  by ipmail02.adl2.internode.on.net with ESMTP; 30 Mar 2007 02:38:03 +0930
X-IronPort-AV: i="4.14,347,1170595800"; 
   d="scan'208"; a="103836184:sNHT6165700030"
Subject: Re: [Caml-list] Hacking the lexer in the new camlp4
From: skaller <skaller@users.sourceforge.net>
To: Nicolas Pouillard <nicolas.pouillard@gmail.com>
Cc: "Harrison, John R" <john.r.harrison@intel.com>,
	caml-list@inria.fr
In-Reply-To: <cd67f63a0703290829j73222254n49a05b1031919317@mail.gmail.com>
References: <cd67f63a0703290230j44c39dd2ma9314783b9313972@mail.gmail.com>
	 <509223F0BF55E74FA1247D17207E7A0C01433DE5@orsmsx419.amr.corp.intel.com>
	 <cd67f63a0703290829j73222254n49a05b1031919317@mail.gmail.com>
Content-Type: text/plain
Date: Fri, 30 Mar 2007 03:07:58 +1000
Message-Id: <1175188078.2454.32.camel@rosella.wigram>
Mime-Version: 1.0
X-Mailer: Evolution 2.8.1 
Content-Transfer-Encoding: 7bit
X-Miltered: at concorde with ID 460BF289.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.00; lexer:01 camlp:01 0200,:01 camlp:01 lexer:01 ocamllex:01 ocamllex:01 lexers:01 lexeme:01 lexeme:01 recognizes:01 tokens:01 buffer:01 sourceforge:01 wrote:01 

On Thu, 2007-03-29 at 17:29 +0200, Nicolas Pouillard wrote:
> On 3/29/07, Harrison, John R <john.r.harrison@intel.com> wrote:
> > | > In the current camlp4, the only way I found to do this was basically
> > | > to copy the existing lexer and edit it. Although it works, it's ugly
> > | > and invariably means that I've had to change something with almost
> > | > every new version of camlp4. Does the new camlp4 offer a nicer way
> > of
> > | > changing the lexer?
> > |
> > | How did you that in the previous one without copy/paste the old lexer?

> Ok, so the new one is build with ocamllex, so it's not really extensible.

This doesn't follow entirely. There are at least two ways to 
extend ocamllex lexers.

1. Recursively process a given lexeme. 

2. Dispatch the error case with enough information to start
another lexer.

Method 1 can either tokenise the given lexeme, or simply
use it as a trigger to start another lexer. Method 2 is
just a special case of method 1.

All you really need is to pass the lexer a class with
an overridable method for each lexemical class the lexer
recognizes, which accepts the state data, and returns
a list of tokens.

Ocamllex currently ensures that the state of the buffer
is ready to process the next character after the lexeme
just decoded, even if it had to overshoot to get
there.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net