From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 4B9D9BC69 for ; Wed, 2 May 2007 07:38:26 +0200 (CEST) Received: from yquem.inria.fr (yquem.inria.fr [128.93.8.37]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l425cFVl000539; Wed, 2 May 2007 07:38:15 +0200 Received: by yquem.inria.fr (Postfix, from userid 18965) id 279CEBC69; Wed, 2 May 2007 07:38:15 +0200 (CEST) Date: Wed, 2 May 2007 07:38:15 +0200 From: Francois Pottier To: skaller Cc: caml-list@inria.fr Subject: Re: [Caml-list] menhir Message-ID: <20070502053815.GA726@yquem.inria.fr> Reply-To: Francois.Pottier@inria.fr References: <1177756336.11923.18.camel@rosella.wigram> <20070428165058.GA31584@yquem.inria.fr> <1177821783.25394.37.camel@rosella.wigram> <20070501155705.GA29617@yquem.inria.fr> <1178039464.8967.10.camel@rosella.wigram> <20070501173409.GB7308@yquem.inria.fr> <1178062950.8967.39.camel@rosella.wigram> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1178062950.8967.39.camel@rosella.wigram> User-Agent: Mutt/1.5.9i X-Miltered: at discorde with ID 463823C7.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; tokens:01 tokens:01 ocamlyacc:01 arises:01 lexer:01 token:01 token:01 cristal:01 caml-list:01 pottier:01 pottier:01 precisely:01 clarify:02 treating:02 treating:02 Hello, > That is, instead of treating the token set precisely as the > set of user tokens + eof, menhir is treating eof specially > and not consistently with other tokens. As far as Menhir is concerned, there is no special eof token. The set of tokens is exactly the set of user-defined tokens. This is true also of ocamlyacc. Internally, the construction of the automaton uses a pseudo-token, written #, which stands for the end of the token stream. This token can appear in conflict explanation messages. > I'm not sure i fully understand the 'do we need lookahead' > issue: I would have thought: you need a fetch if your action is > shift, and not if it is a reduce. What you mean is, you need to *consume* one token if your action is shift, and none if your action is reduce. However, in order to make a decision (in order to choose between shift and reduce, and between multiple reduce actions), you sometimes need to *consult* one lookahead token, without consuming it. This is necessary only *sometimes*, because, in some states, only one reduce action is possible, and can be taken without consulting a lookahead token. An end-of-stream conflict arises when a state has multiple actions, some of which are on real tokens, and some of which are on the pseudo-token #. In that case, one would like to consult the next token in order to make a decision; however, there is a possibility that we are at the end of the sentence that we are trying to recognize, so asking the lexer for one more token might be a mistake (in your terminology, it might be an overshoot). Does this clarify things? -- François Pottier Francois.Pottier@inria.fr http://cristal.inria.fr/~fpottier/