From: "David Allsopp" <dra-news@metastack.com>
To: "OCaml List" <caml-list@yquem.inria.fr>
Subject: Yet another yacc question
Date: Wed, 23 May 2007 16:09:58 +0100 [thread overview]
Message-ID: <00cf01c79d4c$6e33c3b0$6a7ba8c0@treble> (raw)
Suppose I have the context-free grammar
P -> A | A B | A B C | A C
In ocamlyacc, I code this up as:
%token A B C
%type <unit> parse
%start parse
%%
parse:
A {()}
| A B {()}
| A B C {()}
| A C {()}
;
And set-up a lexer (called lexer) so that the characters 'A', 'B' and 'C'
produce the tokens A, B and C. Then I write the following function:
let f s = parse lexer (Lexing.from_string s)
And use it a few times...
f "ABCZ" ... gives ()
f "ACZ" ... gives ()
f "AA" ... raises Parsing.Parse_error
The third case fails because "AA" is not in the grammar. However, the first
two work even though "ABCZ" and "ACZ" are also not in the grammar (and Z
isn't even a token!). They work because ocamlyacc doesn't need look-ahead
after the "C" in each case to determine that it can reduce to the entry
non-terminal and so return (). In the third case, look-ahead is required -
it looks ahead, sees an A and so fails.
I would quite like the third to match as well and ignore the second A
(ignore and leave on the buffer ready for a future parse... so "peek-ahead"
rather than "look-ahead", I guess). I think I'm probably right in assuming
that ocamlyacc can't do this. I'm not willing to alter my parser to return a
list of tokens which as far as I can see is the only way to make ocamlyacc
do this correctly - i.e.
parse:
token parse {$1::$2}
| EOF {[]}
;
token:
/as for parse in the previous grammar/
(Incidentally, lest anyone have it confirmed that I'm mad, I'm trying to
parse batches of SQL statements so have no obvious terminating token for a
clause - the parser needs to do a longest possible match ignoring anything
else following that would appear to be a syntax error)
So my question: can menhir, dypgen or any of the other parser generators out
there do this - i.e. return one () on the first call and then another () on
the second with the string "AA"? It would finally be a reason for abandoning
ocamlyacc :o)
Thanks! (in hope that I haven't missed something blindingly obvious...)
David
next reply other threads:[~2007-05-23 15:10 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-23 15:09 David Allsopp [this message]
2007-05-24 13:24 ` [Caml-list] " Francois Pottier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='00cf01c79d4c$6e33c3b0$6a7ba8c0@treble' \
--to=dra-news@metastack.com \
--cc=caml-list@yquem.inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox