From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 3E185BC6B for ; Wed, 18 Apr 2007 16:21:18 +0200 (CEST) Received: from ipmail01.adl2.internode.on.net (ipmail01.adl2.internode.on.net [203.16.214.140]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l3IELFm2004330 for ; Wed, 18 Apr 2007 16:21:17 +0200 X-IronPort-AV: E=Sophos;i="4.14,422,1170595800"; d="scan'208";a="116486453" Received: from ppp8-148.lns1.syd7.internode.on.net (HELO [192.168.1.201]) ([59.167.8.148]) by ipmail01.adl2.internode.on.net with ESMTP; 18 Apr 2007 23:51:13 +0930 Subject: Re: [Caml-list] using different lexers with one parser? From: skaller To: Oliver Bandel Cc: caml-list@yquem.inria.fr In-Reply-To: <20070418100142.GD336@first.in-berlin.de> References: <20070417203311.GA553@first.in-berlin.de> <20070418100142.GD336@first.in-berlin.de> Content-Type: text/plain Date: Thu, 19 Apr 2007 00:21:10 +1000 Message-Id: <1176906070.15945.39.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 4626295B.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; lexers:01 parser:01 0200,:01 bandel:01 0200,:01 hendrik:01 tews:01 bandel:01 in-berlin:01 lexer:01 parsing:01 parser:01 ocamlyacc:01 lexer:01 lexbuf:01 On Wed, 2007-04-18 at 12:01 +0200, Oliver Bandel wrote: > On Wed, Apr 18, 2007 at 09:38:20AM +0200, Hendrik Tews wrote: > > Oliver Bandel writes: > > > > Is it possible (without too much effort) to switch > > the lexer during parsing (from within the parser)? > > > > Yes, see > > http://caml.inria.fr/pub/ml-archives/caml-list/2003/09/3e7f3495840e2bc851b91c3dba8abab9.en.html > > > [...] > > OK; I hoped to find a solution without the global-switching-var hacks; > something that is built in in ocamlyacc. There is a solution without global variables. Referring to Hendrick's example, write: rule lexer global_var = parse | "" { match !global_var with | Xlex -> xlex global_var lexbuf | Ylex -> ylex global_var lexbuf } instead. Now, you must change at least some -- but I recommend all -- of your tokens to include the global_var, for example %token TOKEN and xlex global var = parse | ... { TOKEN (global_var,42) } Now in any ocamlyacc action code you can match write this | TOKEN ... { let global_var, attrib = $1 in ... } to get the variable into the action code, and now you can change it. This leaves the problem of the lookahead token, which may or may not exist. You can predict whether it exists for each reduction. You can do this 'mentally' by asking "Does this production have a definite set of terminators, (no lookahead) or does it rely on bumping into something unrecognizable (lookahead)"? [If you can't tell .. you must refactor your grammar so you can] If the lookahead exists, you must 'put it back' into the lexbuf. At present you can do this by studying the implementation details in Lexing module. The lexeme is certain to be IN the buffer, so it is safe to reset the pointers to the lexbuf state before it was lexed. To get at the lexbuf from the parser .. you have to modify your tokens AGAIN to include the lexbuf. I recommend considering using an Ocaml class to encapsulate all the state information you need. Passing state from the lexer to the parser this way is ugly but it does work and is fully re-entrant. -- John Skaller Felix, successor to C++: http://felix.sf.net