From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 50CF3BC3D for ; Mon, 8 Aug 2005 10:59:36 +0200 (CEST) Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j788xY9t027690 for ; Mon, 8 Aug 2005 10:59:35 +0200 Received: from rosella (ppp19-123.lns2.syd7.internode.on.net [59.167.19.123]) by smtp3.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id j788xN2e073716; Mon, 8 Aug 2005 18:29:29 +0930 (CST) Subject: Re: [Caml-list] ocamllex+ocamlyacc and not parsing properly From: skaller To: Jonathan Roewen Cc: caml-list@yquem.inria.fr In-Reply-To: References: <200508080058.12357.jon@ffconsultancy.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-ukBUrNODpA5UqgN+JGvK" Date: Mon, 08 Aug 2005 18:59:23 +1000 Message-Id: <1123491563.9947.42.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 X-Miltered: at nez-perce with ID 42F71EF6.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 ocamllex:01 ocamlyacc:01 parsing:01 ocamllex:01 lexer:01 tokens:01 tokens:01 grammar:01 lexer:01 parser:01 ocaml:01 grammar:01 recursive:01 rec:01 X-Attachments: type="application/pgp-signature" name="signature.asc" X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 --=-ukBUrNODpA5UqgN+JGvK Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Mon, 2005-08-08 at 16:23 +1200, Jonathan Roewen wrote: > Is there any way to call another rule based on some variable in > ocamllex? I see you can pass arguments to a rule, but what use are > these except in the actions part? You are trying to do too much work in the wrong places IMHO. Your lexer should always be context insensitive and build small sensible pretokens NOT tokens: identifier integer whitespace newline : # . Then postprocess these pretokens into tokens, this is easiest with a list where you can use pattern matching and functional techniques to look ahead. Then parse the tokens, this is easy because you just choose the tokens to make it easy :) Use the grammar production arguments $1 $2 ... to do further processing in the action as required. The point is: the easy stuff is done by the two automata (lexer, parser) and the hard stuff is done in OCAML code. For example: Felix lexer generates these pretokens: WHITESPACE NEWLINE COMMENT which are NOT tokens of the grammar. These tokens are stripped out by the preprocessor. You may actually do this: let pack_names tokens =3D match tokens with | NAME s1 :: WHITE s2:: NAME s3 :: t ->=20 NAME (String.concat [s1;s2;s3]) :: pack_names t | WHITE :: t -> pack_names t | h :: t -> h :: pack_names t | [] -> [] Yeah, this isn't idea because it isn't tail recursive, but it should illustrate the idea: do the hard stuff in a language capable of handling the hard stuff easily .. :) Tail rec version: let pack output input =3D match input with | NAME .... :: t ->=20 pack (NAME (Str.....) :: output) t ... [] -> List.rev output This technique can often be used to condition a nasty language into an LALR(1) .. I managed to turn Python into an LALR(1) languages .. but it took 17 preprocessing passes to do it .. :)) Mainly, fiddling with INDENT, UNDENT since Python is based on indentation rules, but a lot of hassles with the expression x,y, which is extremely hard to parse (optional trailing comma ..) --=20 John Skaller --=-ukBUrNODpA5UqgN+JGvK Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQBC9x7psRp8/9aGVGsRAsfoAJ9PIeGZNaypxLXCsHd74PmQf7nykACeJcoV N2KZp4isF0SbFdf16xVzsgg= =WHVE -----END PGP SIGNATURE----- --=-ukBUrNODpA5UqgN+JGvK--