Re: [Caml-list] [ANNOUNCE] Alpha release of Menhir, an LR(1) parser generator for ocaml

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: skaller <skaller@users.sourceforge.net>
To: Nathaniel Gray <n8gray@gmail.com>
Cc: Francois.Pottier@inria.fr, Caml Mailing List <caml-list@yquem.inria.fr>
Subject: Re: [Caml-list] [ANNOUNCE] Alpha release of Menhir, an LR(1) parser generator for ocaml
Date: Wed, 14 Dec 2005 17:08:15 +1100	[thread overview]
Message-ID: <1134540495.8980.63.camel@rosella> (raw)
In-Reply-To: <aee06c9e0512131307k3fc494a5k3591d549d552f1b@mail.gmail.com>

On Tue, 2005-12-13 at 13:07 -0800, Nathaniel Gray wrote:
> This is pretty nice!  Every time I use ocamlyacc I think "somebody
> should write something better."  Now it looks like somebody has!  I
> can't tell you how many times I've wanted parameterized rules and
> simple "library" rules for parsing delimiter-separated lists and
> such... 

Yes, it is pretty nice! However it still appears to have some
problems. Any comments appreciated.

0. The licence. Q public licence for the generator????
Please NO NO NO!! Not unless it is distributed
as part of the official distro. Is there any chance of that?
If not even GPL would be better ;(

1. Generating a functor is cute, but it doesn't seem to
allow arguments to parser functions. Perhaps I missed something?
Is there a way to use the functorisation with closures to
add an argument?

In particular, can the parser be generated *inside*
an environment such a function or let binding?
[Felix allows that, which means an extra argument is
not required, a variable in the environment can be used
instead]

2. The signature of parsers is still wrong? 
Ocamlyacc usesthe typing

	val parser: (lexbuf->token) -> lexbuf -> 'a

which is just bad. A better signature is

	val parser: ( unit -> token ) -> 'a

There is no need to provide location information: the correct
solution is to throw an exception, which is caught in a 
context which can determine the location.

It would be nice to be able to generate this signature 
with a command line switch, pragma, or some other mechanism,
even if the default is chosen for ocamlyacc compatibility.

3. I have doubts about the claim that parsers can 'share'
token types. I do not see how this is possible. It is
contradicted by the compilation model description, which
explains how it is necessary to join separate files making
up a grammar specification. In this case, the joined system
is going to generate a single token type, and any type
generated by another joining is certain to generate
a distinct type because

(a) the type is defined in a distinct ocaml module (mli file)
(b) the typing of normal variants is nominal

This problem would go away if polymorphic variants
were used instead, because the typenames are then simply
abbreviations, since pm-variants are structurally, not
nominally, typed.

Perhaps a command line switch, pragma, or whatever, to use
polymorphic variants instead of ordinary ones?

Actually, I personally find the 'yacc' technique of
generating tokens to be rather lame. Felix does this
much better -- the parser simply expects a token type
which is a variant, the type can be defined wherever
you like. In particular, the lexer and parser can
share that definition.

As far as I can see Menhir COULD do this, except of
course one would use %token as a special way
of generating the variant. All that would be required
I think is the syntax

%import_tokens "filename"

which refers to the token definition file -- as an
alternative to inlining these token definitions.
(if pm-variants are used you could probably support both,
though I'm not sure).

A token definition file then generates two files,
an ordinary mli file with the token variant type,
and, a special information file for the parser generator
(with the same information, but in a more useful form).

In Felix none of this is necessary because parsing is
built in, so the compiler can find the information required
for the parser generator directly from the token variant type.

4. Just curious, but how practical is LR(1) in terms of
generated code sizes? Felix is using Elkhound as its 
parser which is a GLR parser with an LALR(1) core. In theory
there is an option for choosing the core automaton, which
also allows LR(1) however I recall Scott McPeak commenting
it wasn't worth supporting because it generated tables
which were far too big. 

I'm curious how one would be able to predict the size of the 
generated code since I don't  really understand the 
additional constraints LALR(1) introduces .. 

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

next prev parent reply	other threads:[~2005-12-14  6:08 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-12 17:58 Francois Pottier
2005-12-12 19:51 ` [Caml-list] " "Márk S. Zoltán"
2005-12-13 21:07 ` Nathaniel Gray
2005-12-14  6:08   ` skaller [this message]
2005-12-14  9:04     ` Francois Pottier
2005-12-14 10:27       ` Alessandro Baretta
2005-12-14 21:04         ` skaller
2005-12-15  8:46           ` Francois Pottier
2005-12-15 11:03             ` skaller
2005-12-14 20:51       ` skaller
2005-12-14 22:15         ` Joaquin Cuenca Abela
2005-12-15  8:40           ` Francois Pottier
2005-12-15  6:35 ` Stefan Monnier
2005-12-15  8:47   ` [Caml-list] " Francois Pottier
2005-12-15 16:41     ` Stefan Monnier
2005-12-15 16:50       ` Francois Pottier
2005-12-15 18:56         ` Stefan Monnier
2005-12-30 21:57         ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1134540495.8980.63.camel@rosella \
    --to=skaller@users.sourceforge.net \
    --cc=Francois.Pottier@inria.fr \
    --cc=caml-list@yquem.inria.fr \
    --cc=n8gray@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox