From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <francois.pottier@inria.fr>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled 
	version=3.1.3
Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38])
	by yquem.inria.fr (Postfix) with ESMTP id 4B9D9BC69
	for <caml-list@yquem.inria.fr>; Wed,  2 May 2007 07:38:26 +0200 (CEST)
Received: from yquem.inria.fr (yquem.inria.fr [128.93.8.37])
	by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l425cFVl000539;
	Wed, 2 May 2007 07:38:15 +0200
Received: by yquem.inria.fr (Postfix, from userid 18965)
	id 279CEBC69; Wed,  2 May 2007 07:38:15 +0200 (CEST)
Date: Wed, 2 May 2007 07:38:15 +0200
From: Francois Pottier <Francois.Pottier@inria.fr>
To: skaller <skaller@users.sourceforge.net>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] menhir
Message-ID: <20070502053815.GA726@yquem.inria.fr>
Reply-To: Francois.Pottier@inria.fr
References: <1177756336.11923.18.camel@rosella.wigram> <20070428165058.GA31584@yquem.inria.fr> <1177821783.25394.37.camel@rosella.wigram> <20070501155705.GA29617@yquem.inria.fr> <1178039464.8967.10.camel@rosella.wigram> <20070501173409.GB7308@yquem.inria.fr> <1178062950.8967.39.camel@rosella.wigram>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1178062950.8967.39.camel@rosella.wigram>
User-Agent: Mutt/1.5.9i
X-Miltered: at discorde with ID 463823C7.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.00; tokens:01 tokens:01 ocamlyacc:01 arises:01 lexer:01 token:01 token:01 cristal:01 caml-list:01 pottier:01 pottier:01 precisely:01 clarify:02 treating:02 treating:02 


Hello,

> That is, instead of treating the token set precisely as the
> set of user tokens + eof, menhir is treating eof specially
> and not consistently with other tokens.

As far as Menhir is concerned, there is no special eof token.
The set of tokens is exactly the set of user-defined tokens.
This is true also of ocamlyacc.

Internally, the construction of the automaton uses a pseudo-token,
written #, which stands for the end of the token stream. This token
can appear in conflict explanation messages.

> I'm not sure i fully understand the 'do we need lookahead'
> issue: I would have thought: you need a fetch if your action is 
> shift, and not if it is a reduce.

What you mean is, you need to *consume* one token if your action
is shift, and none if your action is reduce. However, in order to
make a decision (in order to choose between shift and reduce, and
between multiple reduce actions), you sometimes need to *consult*
one lookahead token, without consuming it.

This is necessary only *sometimes*, because, in some states, only
one reduce action is possible, and can be taken without consulting
a lookahead token.

An end-of-stream conflict arises when a state has multiple
actions, some of which are on real tokens, and some of which are
on the pseudo-token #. In that case, one would like to consult
the next token in order to make a decision; however, there is a
possibility that we are at the end of the sentence that we are
trying to recognize, so asking the lexer for one more token might
be a mistake (in your terminology, it might be an overshoot).

Does this clarify things?

-- 
François Pottier
Francois.Pottier@inria.fr
http://cristal.inria.fr/~fpottier/