From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 43355BC69; Wed, 2 May 2007 18:29:23 +0200 (CEST) Received: from ipmail01.adl2.internode.on.net (ipmail01.adl2.internode.on.net [203.16.214.140]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l42GTKAW005821; Wed, 2 May 2007 18:29:21 +0200 X-IronPort-AV: E=Sophos;i="4.14,480,1170595800"; d="scan'208";a="122473396" Received: from ppp8-148.lns1.syd7.internode.on.net (HELO [192.168.1.201]) ([59.167.8.148]) by ipmail01.adl2.internode.on.net with ESMTP; 03 May 2007 01:59:19 +0930 Subject: Re: [Caml-list] menhir From: skaller To: Francois.Pottier@inria.fr Cc: caml-list@inria.fr In-Reply-To: <20070502123058.GC21560@yquem.inria.fr> References: <1177756336.11923.18.camel@rosella.wigram> <20070428165058.GA31584@yquem.inria.fr> <1177821783.25394.37.camel@rosella.wigram> <20070501155705.GA29617@yquem.inria.fr> <1178039464.8967.10.camel@rosella.wigram> <20070501173409.GB7308@yquem.inria.fr> <1178062950.8967.39.camel@rosella.wigram> <20070502053815.GA726@yquem.inria.fr> <1178095304.13660.71.camel@rosella.wigram> <20070502123058.GC21560@yquem.inria.fr> Content-Type: text/plain Date: Thu, 03 May 2007 02:29:17 +1000 Message-Id: <1178123357.6486.49.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit X-Miltered: at discorde with ID 4638BC60.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; 0200,:01 ocamlyacc:01 ocamlyacc:01 ocamllex:01 compilation:01 endmarker:01 non-terminal:01 endmarker:01 compilation:01 parser:01 parser:01 lexbuf:01 mutable:01 lexbuf:01 tokens:01 On Wed, 2007-05-02 at 14:30 +0200, Francois Pottier wrote: > Hello, > > On Wed, May 02, 2007 at 06:41:44PM +1000, skaller wrote: > > Exactly. In Ocamlyacc it is named 'eof', and you can use that > > token in your productions. > > As far as I know, this is incorrect. ocamlyacc does not have a predefined > eof token. Perhaps you are thinking of ocamllex, which has an eof pattern. I believe you're right, I apologise for confusion. > > compilation_unit: > > | statement_aster ENDMARKER { $1 } > > > > where no non-terminal on which statements_aster depends > > has a production containing ENDMARKER or # (eof). > > > > Therefore, there is no conflict. When compilation_unit > > is reduced the parser returns, the next token, whether > > it is # or any other, is irrelevant. > > Good. I seem to agree with you. Menhir should not report an end-of-stream > conflict here. So, what does it report? Built an LR(0) automaton with 1416 states. Built an LR(1) automaton with 2009 states. Warning: 145 states have an end-of-stream conflict. Can I send you the file? [signature of parser] > I believe this is a separate issue. Yes, I agree. > You are right in saying that the historic > signature, which involves lexbuf, is dubious. Following your suggestion, we > could just as well use > > parser: (state -> token * state) -> state -> ast * state > > if we wish to promote a purely functional style (where values of type > state are immutable), or just > > parser: (unit -> token) -> ast > > if we are willing to accept mutable state. (I am sweeping the issue of > locations under the rug; we should use token * location instead of just > token.) Or forget it, which is the approach taken by Felix: every token contains its location: the user can organise this. This has the advantage of not specifying a particular location format. > That said, the historic signature > > parser: (lexbuf -> token) -> lexbuf -> ast > > is really equivalent to the previous one, in the sense that I can write > functions that convert between the two styles (see attached file). Yes, but you cannot write functions that take a state argument because lexbuf is a fixed data type and there's no where to add in any user state data. > > The point again is that the token input to the parser is infinite: it can't > > ever be an error to read a next token. > > I beg to disagree. First, the input stream does not have to be infinite: if > I am reading from a file, clearly it is finite. EOF is returned an infinite number of times in C. > Second, regardless of whether > the stream is finite or infinite, it *is* an error to read more tokens than > you were supposed to. If the grammar's start symbol is S, then the parser > should read a sequence of tokens that derives from S, and nothing more; it > should not overshoot and consume the first token that follows. This requires the definition: parse the *shortest* head of the input stream. > The only way of avoiding these conflicts is to change your grammar somehow. > But I still haven't understood what causes these conflicts in your grammar. > Perhaps it would be time to show it? > ocamlyacc never complains. It just trusts you to know what you are doing. I generate an .output file, grep for the word 'conflict', and terminate my build if there is one found. I do not permit any conflicts in my grammar: it's strictly unambiguous LALR(1). It's also pure in the sense that it doesn't use crud like %left, %prec etc to resolve conflicts. [The way dypgen does this is vastly superior!] -- John Skaller Felix, successor to C++: http://felix.sf.net