Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
* Yet another yacc question
@ 2007-05-23 15:09 David Allsopp
  2007-05-24 13:24 ` [Caml-list] " Francois Pottier
  0 siblings, 1 reply; 2+ messages in thread
From: David Allsopp @ 2007-05-23 15:09 UTC (permalink / raw)
  To: OCaml List

Suppose I have the context-free grammar

P -> A | A B | A B C | A C

In ocamlyacc, I code this up as:

%token A B C
%type <unit> parse
%start parse
%%

parse:
  A     {()}
| A B   {()}
| A B C {()}
| A C   {()}
;

And set-up a lexer (called lexer) so that the characters 'A', 'B' and 'C'
produce the tokens A, B and C. Then I write the following function:

let f s = parse lexer (Lexing.from_string s)

And use it a few times...

f "ABCZ"   ...   gives ()
f "ACZ"    ...   gives ()
f "AA"     ...   raises Parsing.Parse_error

The third case fails because "AA" is not in the grammar. However, the first
two work even though "ABCZ" and "ACZ" are also not in the grammar (and Z
isn't even a token!). They work because ocamlyacc doesn't need look-ahead
after the "C" in each case to determine that it can reduce to the entry
non-terminal and so return (). In the third case, look-ahead is required -
it looks ahead, sees an A and so fails.

I would quite like the third to match as well and ignore the second A
(ignore and leave on the buffer ready for a future parse... so "peek-ahead"
rather than "look-ahead", I guess). I think I'm probably right in assuming
that ocamlyacc can't do this. I'm not willing to alter my parser to return a
list of tokens which as far as I can see is the only way to make ocamlyacc
do this correctly - i.e.

parse:
  token parse {$1::$2}
| EOF {[]}
;

token:
  /as for parse in the previous grammar/ 

(Incidentally, lest anyone have it confirmed that I'm mad, I'm trying to
parse batches of SQL statements so have no obvious terminating token for a
clause - the parser needs to do a longest possible match ignoring anything
else following that would appear to be a syntax error)

So my question: can menhir, dypgen or any of the other parser generators out
there do this - i.e. return one () on the first call and then another () on
the second with the string "AA"? It would finally be a reason for abandoning
ocamlyacc :o)

Thanks! (in hope that I haven't missed something blindingly obvious...)


David


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-05-24 13:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-23 15:09 Yet another yacc question David Allsopp
2007-05-24 13:24 ` [Caml-list] " Francois Pottier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox