From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id BAEBEBB81; Wed, 14 Dec 2005 21:51:14 +0100 (CET) Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id jBEKpCDP005836; Wed, 14 Dec 2005 21:51:13 +0100 Received: from rosella (ppp33-4.lns1.syd2.internode.on.net [59.167.33.4] (may be forged)) by smtp3.adl2.internode.on.net (8.12.9/8.12.6) with ESMTP id jBEKp3Zu016722; Thu, 15 Dec 2005 07:21:03 +1030 (CST) (envelope-from skaller@users.sourceforge.net) Subject: Re: [Caml-list] [ANNOUNCE] Alpha release of Menhir, an LR(1) parser generator for ocaml From: skaller To: Francois.Pottier@inria.fr Cc: Caml Mailing List In-Reply-To: <20051214090427.GB1330@yquem.inria.fr> References: <20051212175838.GA8502@yquem.inria.fr> <1134540495.8980.63.camel@rosella> <20051214090427.GB1330@yquem.inria.fr> Content-Type: text/plain Date: Thu, 15 Dec 2005 07:51:02 +1100 Message-Id: <1134593462.8942.67.camel@rosella> Mime-Version: 1.0 X-Mailer: Evolution 2.4.1 Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce with ID 43A085C0.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 parser:01 ocaml:01 distro:01 ocaml:01 parser:01 functor:01 recursive:01 recursive:01 parsers:01 ocamlyacc:01 token:01 lexer:01 ocamlyacc:01 val:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On Wed, 2005-12-14 at 10:04 +0100, Francois Pottier wrote: > Hello, > > On Wed, Dec 14, 2005 at 05:08:15PM +1100, skaller wrote: > > > 0. The licence. Q public licence for the generator???? > > Please NO NO NO!! Not unless it is distributed > > as part of the official distro. Is there any chance of that? > > Our plan (not yet approved by the OCaml higher authorities) > is to make it part of the OCaml distribution. Good!!! > > In particular, can the parser be generated *inside* > > an environment such a function or let binding? > > No. The only way of passing arguments to the parser is through > functor arguments. (Or through global state, of course.) Which is the same thing... > Why is that a problem? Because it destroys re-entrancy. My parser is recursive, it actually calls a recursive descent parser to handle user defined extensions, that parser in turn can call other parsers, including the original ocamlyacc parser. Passing an argument to the parser is the natural way to do this. What I'm actually doing now is storing the data in a token, since this is the only way to get the required data to the parser. however it is a trick. It 'happens' to work at the moment, but may break at any time (it works because the information is constructed by the lexer which runs in an earlier phase -- that is good enough for the current extensions I allow, but this may change tomorrow -- I'd love to scope these extensions properly, but that really does require passing an argument) It's just the usual reason for functional programming and persistent data structures I guess. > > Ocamlyacc usesthe typing > > > > val parser: (lexbuf->token) -> lexbuf -> 'a > > > > which is just bad. A better signature is > > > > val parser: ( unit -> token ) -> 'a > > > > There is no need to provide location information: the correct > > solution is to throw an exception, which is caught in a > > context which can determine the location. > > It can be nice to have the parser automatically extract position information > for you; doing it manually and explicitly within semantic actions would be > quite tedious. Yes, it can be nice for a toy parser. For a complex production quality parser there is no way to avoid working hard to manage source location data. > On the other hand, the signature that you suggest would allow using lexers > that do not necessarily conform to the lexbuf interface; is that your point? Indeed. (My parsers run off lists of tokens .. they're lexed using ocamllex .. but in an earlier phase. And the lists are processed in between, for example, whitespace tokens are elided because the parser can't handle them) > If so, we will consider adding a command line switch, as you suggest. It > would disallow the use of the position keywords ($startpos, $endpos, etc.) Yes. That makes sense. There are other alternatives, such as requiring the tokeniser to return a pair, consisting of a token and the location. In Ocamlyacc this would be a bad idea, since it would require a fixed notion of location. In Menhir, however, this is not so, since it can generate a functor which can be parameterised by the location type. > > 3. I have doubts about the claim that parsers can 'share' > > token types. I do not see how this is possible. > > It certainly is possible: have a look at demos/calc-two in the > distribution. The example is artificial but illustrates the > possibility of sharing a token type. Hmm .. ok, I will look. > > Actually, I personally find the 'yacc' technique of > > generating tokens to be rather lame. Felix does this > > much better -- the parser simply expects a token type > > which is a variant, the type can be defined wherever > > you like. > > With Menhir, the token type can also be hand-written or generated via other > means. It doesn't have to be generated by Menhir. It has to be an algebraic > data type, though, not a polymorphic variant. Ah, I see, the same as Felix in that respect then. This solves a lot of problems, IMHO. > Considering the answer to the first question, the second > question would not arise at all if Menhir produced tables, like > ocamlyacc. However, Menhir does not produce tables; it > compiles the automaton down to a bunch of mutually recursive > functions. We have not yet scientifically assessed the > difference in size between tables and code, but a few > quick experiments indicate that it is reasonable (the > code is larger than the tables by a factor of two or > three). I'm also curious as to the performance of code vs table driven interpreter. One would think the code is faster .. but size matters a lot on current CPUs due to caching. Still, the O() performance will be much the same -- 2-3 times larger code is reasonable considering that it will probably make it easier to enhance the generator (as has already been done by be able to generate functors .. :) -- John Skaller Felix, successor to C++: http://felix.sf.net