Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Francois Pottier <Francois.Pottier@inria.fr>
To: skaller <skaller@users.sourceforge.net>
Cc: Caml Mailing List <caml-list@yquem.inria.fr>
Subject: Re: [Caml-list] [ANNOUNCE] Alpha release of Menhir, an LR(1) parser generator for ocaml
Date: Wed, 14 Dec 2005 10:04:27 +0100	[thread overview]
Message-ID: <20051214090427.GB1330@yquem.inria.fr> (raw)
In-Reply-To: <1134540495.8980.63.camel@rosella>


Hello,

On Wed, Dec 14, 2005 at 05:08:15PM +1100, skaller wrote:

> 0. The licence. Q public licence for the generator????
> Please NO NO NO!! Not unless it is distributed
> as part of the official distro. Is there any chance of that?

Our plan (not yet approved by the OCaml higher authorities)
is to make it part of the OCaml distribution.

> In particular, can the parser be generated *inside*
> an environment such a function or let binding?

No. The only way of passing arguments to the parser is through
functor arguments. (Or through global state, of course.) Why is
that a problem?

> Ocamlyacc usesthe typing
> 
> 	val parser: (lexbuf->token) -> lexbuf -> 'a
> 
> which is just bad. A better signature is
> 
> 	val parser: ( unit -> token ) -> 'a
> 
> There is no need to provide location information: the correct
> solution is to throw an exception, which is caught in a 
> context which can determine the location.

It can be nice to have the parser automatically extract position information
for you; doing it manually and explicitly within semantic actions would be
quite tedious.

On the other hand, the signature that you suggest would allow using lexers
that do not necessarily conform to the lexbuf interface; is that your point?
If so, we will consider adding a command line switch, as you suggest. It
would disallow the use of the position keywords ($startpos, $endpos, etc.)

> 3. I have doubts about the claim that parsers can 'share'
> token types. I do not see how this is possible. 

It certainly is possible: have a look at demos/calc-two in the
distribution. The example is artificial but illustrates the
possibility of sharing a token type.

> Actually, I personally find the 'yacc' technique of
> generating tokens to be rather lame. Felix does this
> much better -- the parser simply expects a token type
> which is a variant, the type can be defined wherever
> you like.

With Menhir, the token type can also be hand-written or generated via other
means. It doesn't have to be generated by Menhir. It has to be an algebraic
data type, though, not a polymorphic variant.

> %import_tokens "filename"

This is called --external-tokens Filename. It's a command line switch.

> 4. Just curious, but how practical is LR(1) in terms of
> generated code sizes? Felix is using Elkhound as its 
> parser which is a GLR parser with an LALR(1) core. In theory
> there is an option for choosing the core automaton, which
> also allows LR(1) however I recall Scott McPeak commenting
> it wasn't worth supporting because it generated tables
> which were far too big. 

There are two questions: how big is the automaton compared to
an LALR automaton? and how big is the code that implements the
automaton?

The answer to the first question is, in most cases, the grammar
is LALR(1) anyway, and the automaton generated by Menhir is
identical to the one that would be produced by ocamlyacc or
by some other LALR(1) parser generator. When the grammar is
not LALR(1), the LR(1) automaton has more states than an
LALR(1) automaton for the same grammar. The number of extra
states is usually small (often just one, or a handful). So
there's no concern here.

Considering the answer to the first question, the second
question would not arise at all if Menhir produced tables, like
ocamlyacc. However, Menhir does not produce tables; it
compiles the automaton down to a bunch of mutually recursive
functions. We have not yet scientifically assessed the
difference in size between tables and code, but a few
quick experiments indicate that it is reasonable (the
code is larger than the tables by a factor of two or
three).

-- 
François Pottier
Francois.Pottier@inria.fr
http://pauillac.inria.fr/~fpottier/


  reply	other threads:[~2005-12-14  9:04 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-12 17:58 Francois Pottier
2005-12-12 19:51 ` [Caml-list] " "Márk S. Zoltán"
2005-12-13 21:07 ` Nathaniel Gray
2005-12-14  6:08   ` skaller
2005-12-14  9:04     ` Francois Pottier [this message]
2005-12-14 10:27       ` Alessandro Baretta
2005-12-14 21:04         ` skaller
2005-12-15  8:46           ` Francois Pottier
2005-12-15 11:03             ` skaller
2005-12-14 20:51       ` skaller
2005-12-14 22:15         ` Joaquin Cuenca Abela
2005-12-15  8:40           ` Francois Pottier
2005-12-15  6:35 ` Stefan Monnier
2005-12-15  8:47   ` [Caml-list] " Francois Pottier
2005-12-15 16:41     ` Stefan Monnier
2005-12-15 16:50       ` Francois Pottier
2005-12-15 18:56         ` Stefan Monnier
2005-12-30 21:57         ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051214090427.GB1330@yquem.inria.fr \
    --to=francois.pottier@inria.fr \
    --cc=caml-list@yquem.inria.fr \
    --cc=skaller@users.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox