From: Francois Pottier <Francois.Pottier@inria.fr>
To: David Allsopp <dra-news@metastack.com>
Cc: OCaml List <caml-list@yquem.inria.fr>
Subject: Re: [Caml-list] Yet another yacc question
Date: Thu, 24 May 2007 15:24:36 +0200 [thread overview]
Message-ID: <20070524132436.GA22868@yquem.inria.fr> (raw)
In-Reply-To: <00cf01c79d4c$6e33c3b0$6a7ba8c0@treble>
Hello,
On Wed, May 23, 2007 at 04:09:58PM +0100, David Allsopp wrote:
> The third case fails because "AA" is not in the grammar. However, the first
> two work even though "ABCZ" and "ACZ" are also not in the grammar (and Z
> isn't even a token!). They work because ocamlyacc doesn't need look-ahead
> after the "C" in each case to determine that it can reduce to the entry
> non-terminal and so return (). In the third case, look-ahead is required -
> it looks ahead, sees an A and so fails.
Your analysis is correct.
> I would quite like the third to match as well and ignore the second A
> (ignore and leave on the buffer ready for a future parse... so "peek-ahead"
> rather than "look-ahead", I guess).
It's hard to ignore the second A once one has requested it... The lexer
interface does not have a "peek" or "put-back" operation, and LR parsers
aren't designed to support these operations anyway.
> So my question: can menhir, dypgen or any of the other parser generators out
> there do this - i.e. return one () on the first call and then another () on
> the second with the string "AA"? It would finally be a reason for abandoning
> ocamlyacc :o)
In theory, menhir adopts the same semantics as ocamlyacc, but it will attempt
to help you by providing more warnings -- here, it will complain that your
grammar has an end-of-stream conflict.
> (Incidentally, lest anyone have it confirmed that I'm mad, I'm trying to
> parse batches of SQL statements so have no obvious terminating token for a
> clause - the parser needs to do a longest possible match ignoring anything
> else following that would appear to be a syntax error)
If you want to parse sequences of statements without clear separators, it
seems to me that the best approach would be to invoke the parser just once for
an entire sequence, instead of trying to invoke it once per statement and have
it leave the token stream in a consistent state. This approach would require
an inversion of control (the parser would drive the rest of your code, instead
of the code invoking the parser). If you can accept that, you should be fine.
Hope this helps,
--
François Pottier
Francois.Pottier@inria.fr
http://cristal.inria.fr/~fpottier/
prev parent reply other threads:[~2007-05-24 13:24 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-23 15:09 David Allsopp
2007-05-24 13:24 ` Francois Pottier [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070524132436.GA22868@yquem.inria.fr \
--to=francois.pottier@inria.fr \
--cc=caml-list@yquem.inria.fr \
--cc=dra-news@metastack.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox