From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <skaller@users.sourceforge.net>
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78])
	by yquem.inria.fr (Postfix) with ESMTP id 50CF3BC3D
	for <caml-list@yquem.inria.fr>; Mon,  8 Aug 2005 10:59:36 +0200 (CEST)
Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203])
	by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j788xY9t027690
	for <caml-list@yquem.inria.fr>; Mon, 8 Aug 2005 10:59:35 +0200
Received: from rosella (ppp19-123.lns2.syd7.internode.on.net [59.167.19.123])
	by smtp3.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id j788xN2e073716;
	Mon, 8 Aug 2005 18:29:29 +0930 (CST)
Subject: Re: [Caml-list] ocamllex+ocamlyacc and not parsing properly
From: skaller <skaller@users.sourceforge.net>
To: Jonathan Roewen <jonathan.roewen@gmail.com>
Cc: caml-list@yquem.inria.fr
In-Reply-To: <ad8cfe7e05080721237a609e@mail.gmail.com>
References: <ad8cfe7e050807143962166f9@mail.gmail.com>
	 <200508080058.12357.jon@ffconsultancy.com>
	 <ad8cfe7e0508071917d4d57f9@mail.gmail.com>
	 <ad8cfe7e05080721237a609e@mail.gmail.com>
Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-ukBUrNODpA5UqgN+JGvK"
Date: Mon, 08 Aug 2005 18:59:23 +1000
Message-Id: <1123491563.9947.42.camel@localhost.localdomain>
Mime-Version: 1.0
X-Mailer: Evolution 2.2.1.1 
X-Miltered: at nez-perce with ID 42F71EF6.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Spam: no; 0.00; caml-list:01 ocamllex:01 ocamlyacc:01 parsing:01 ocamllex:01 lexer:01 tokens:01 tokens:01 grammar:01 lexer:01 parser:01 ocaml:01 grammar:01 recursive:01 rec:01 
X-Attachments: type="application/pgp-signature" name="signature.asc" 
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.0.3


--=-ukBUrNODpA5UqgN+JGvK
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

On Mon, 2005-08-08 at 16:23 +1200, Jonathan Roewen wrote:
> Is there any way to call another rule based on some variable in
> ocamllex? I see you can pass arguments to a rule, but what use are
> these except in the actions part?

You are trying to do too much work in the wrong
places IMHO.

Your lexer should always be context insensitive
and build small sensible pretokens NOT tokens:

identifier
integer
whitespace
newline
: # .

Then postprocess these pretokens into tokens,
this is easiest with a list where you can use
pattern matching and functional techniques to
look ahead.

Then parse the tokens, this is easy because you just
choose the tokens to make it easy :)

Use the grammar production arguments $1 $2 ...
to do further processing in the action as required.

The point is: the easy stuff is done by the two
automata (lexer, parser) and the hard stuff
is done in OCAML code.

For example: Felix lexer generates these pretokens:

WHITESPACE NEWLINE COMMENT

which are NOT tokens of the grammar. These tokens
are stripped out by the preprocessor. You may actually
do this:

let pack_names tokens =3D
match tokens with
| NAME s1 :: WHITE s2:: NAME s3 :: t ->=20
  NAME (String.concat [s1;s2;s3]) :: pack_names t
| WHITE :: t -> pack_names t
| h :: t -> h :: pack_names t
| [] -> []

Yeah, this isn't idea because it isn't tail recursive,
but it should illustrate the idea: do the hard stuff
in a language capable of handling the hard stuff easily .. :)

Tail rec version:

let pack output input =3D match input with
| NAME .... :: t ->=20
  pack (NAME (Str.....) :: output) t
 ...
[] -> List.rev output

This technique can often be used to condition a
nasty language into an LALR(1) .. I managed to turn
Python into an LALR(1) languages .. but it took
17 preprocessing passes to do it .. :))
Mainly, fiddling with INDENT, UNDENT since Python
is based on indentation rules, but a lot of hassles
with the expression x,y, which is extremely hard
to parse (optional trailing comma ..)


--=20
John Skaller <skaller at users dot sourceforge dot net>


--=-ukBUrNODpA5UqgN+JGvK
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQBC9x7psRp8/9aGVGsRAsfoAJ9PIeGZNaypxLXCsHd74PmQf7nykACeJcoV
N2KZp4isF0SbFdf16xVzsgg=
=WHVE
-----END PGP SIGNATURE-----

--=-ukBUrNODpA5UqgN+JGvK--