From: skaller <skaller@users.sourceforge.net>
To: Francois.Pottier@inria.fr
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] menhir
Date: Thu, 03 May 2007 02:29:17 +1000 [thread overview]
Message-ID: <1178123357.6486.49.camel@rosella.wigram> (raw)
In-Reply-To: <20070502123058.GC21560@yquem.inria.fr>
On Wed, 2007-05-02 at 14:30 +0200, Francois Pottier wrote:
> Hello,
>
> On Wed, May 02, 2007 at 06:41:44PM +1000, skaller wrote:
> > Exactly. In Ocamlyacc it is named 'eof', and you can use that
> > token in your productions.
>
> As far as I know, this is incorrect. ocamlyacc does not have a predefined
> eof token. Perhaps you are thinking of ocamllex, which has an eof pattern.
I believe you're right, I apologise for confusion.
> > compilation_unit:
> > | statement_aster ENDMARKER { $1 }
> >
> > where no non-terminal on which statements_aster depends
> > has a production containing ENDMARKER or # (eof).
> >
> > Therefore, there is no conflict. When compilation_unit
> > is reduced the parser returns, the next token, whether
> > it is # or any other, is irrelevant.
>
> Good. I seem to agree with you. Menhir should not report an end-of-stream
> conflict here. So, what does it report?
Built an LR(0) automaton with 1416 states.
Built an LR(1) automaton with 2009 states.
Warning: 145 states have an end-of-stream conflict.
Can I send you the file?
[signature of parser]
> I believe this is a separate issue.
Yes, I agree.
> You are right in saying that the historic
> signature, which involves lexbuf, is dubious. Following your suggestion, we
> could just as well use
>
> parser: (state -> token * state) -> state -> ast * state
>
> if we wish to promote a purely functional style (where values of type
> state are immutable), or just
>
> parser: (unit -> token) -> ast
>
> if we are willing to accept mutable state. (I am sweeping the issue of
> locations under the rug; we should use token * location instead of just
> token.)
Or forget it, which is the approach taken by Felix: every token
contains its location: the user can organise this. This has the
advantage of not specifying a particular location format.
> That said, the historic signature
>
> parser: (lexbuf -> token) -> lexbuf -> ast
>
> is really equivalent to the previous one, in the sense that I can write
> functions that convert between the two styles (see attached file).
Yes, but you cannot write functions that take a state argument
because lexbuf is a fixed data type and there's no where to
add in any user state data.
> > The point again is that the token input to the parser is infinite: it can't
> > ever be an error to read a next token.
>
> I beg to disagree. First, the input stream does not have to be infinite: if
> I am reading from a file, clearly it is finite.
EOF is returned an infinite number of times in C.
> Second, regardless of whether
> the stream is finite or infinite, it *is* an error to read more tokens than
> you were supposed to. If the grammar's start symbol is S, then the parser
> should read a sequence of tokens that derives from S, and nothing more; it
> should not overshoot and consume the first token that follows.
This requires the definition: parse the *shortest* head of the
input stream.
> The only way of avoiding these conflicts is to change your grammar somehow.
> But I still haven't understood what causes these conflicts in your grammar.
> Perhaps it would be time to show it?
> ocamlyacc never complains. It just trusts you to know what you are doing.
I generate an .output file, grep for the word 'conflict',
and terminate my build if there is one found. I do not permit
any conflicts in my grammar: it's strictly unambiguous LALR(1).
It's also pure in the sense that it doesn't use crud
like %left, %prec etc to resolve conflicts.
[The way dypgen does this is vastly superior!]
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
next prev parent reply other threads:[~2007-05-02 16:29 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-28 10:32 menhir skaller
2007-04-28 16:50 ` [Caml-list] menhir Francois Pottier
2007-04-28 19:47 ` Markus Mottl
2007-04-28 21:15 ` Jon Harrop
2007-04-29 4:43 ` skaller
2007-04-29 7:27 ` Christophe Raffalli
2007-05-01 15:57 ` Francois Pottier
2007-05-01 17:11 ` skaller
2007-05-01 17:34 ` Francois Pottier
2007-05-01 23:42 ` skaller
2007-05-02 5:38 ` Francois Pottier
2007-05-02 5:50 ` Francois Pottier
2007-05-02 8:41 ` skaller
2007-05-02 12:30 ` Francois Pottier
2007-05-02 16:29 ` skaller [this message]
2007-05-02 18:35 ` Francois Pottier
2007-05-03 1:30 ` skaller
2007-05-03 8:43 ` Joel Reymont
2007-05-01 17:15 ` skaller
2007-05-01 17:31 ` Francois Pottier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1178123357.6486.49.camel@rosella.wigram \
--to=skaller@users.sourceforge.net \
--cc=Francois.Pottier@inria.fr \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox