From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=NO_RELAYS autolearn=disabled version=3.1.3 Received: by yquem.inria.fr (Postfix, from userid 18965) id 45A5DBC0A; Thu, 24 May 2007 15:24:36 +0200 (CEST) Date: Thu, 24 May 2007 15:24:36 +0200 From: Francois Pottier To: David Allsopp Cc: OCaml List Subject: Re: [Caml-list] Yet another yacc question Message-ID: <20070524132436.GA22868@yquem.inria.fr> Reply-To: Francois.Pottier@inria.fr References: <00cf01c79d4c$6e33c3b0$6a7ba8c0@treble> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <00cf01c79d4c$6e33c3b0$6a7ba8c0@treble> User-Agent: Mutt/1.5.9i X-Spam: no; 0.00; 0100,:01 ocamlyacc:01 look-ahead:01 non-terminal:01 look-ahead:01 buffer:01 lexer:01 parsers:01 parser:01 ocamlyacc:01 semantics:01 terminating:01 parser:01 syntax:01 separators:01 Hello, On Wed, May 23, 2007 at 04:09:58PM +0100, David Allsopp wrote: > The third case fails because "AA" is not in the grammar. However, the first > two work even though "ABCZ" and "ACZ" are also not in the grammar (and Z > isn't even a token!). They work because ocamlyacc doesn't need look-ahead > after the "C" in each case to determine that it can reduce to the entry > non-terminal and so return (). In the third case, look-ahead is required - > it looks ahead, sees an A and so fails. Your analysis is correct. > I would quite like the third to match as well and ignore the second A > (ignore and leave on the buffer ready for a future parse... so "peek-ahead" > rather than "look-ahead", I guess). It's hard to ignore the second A once one has requested it... The lexer interface does not have a "peek" or "put-back" operation, and LR parsers aren't designed to support these operations anyway. > So my question: can menhir, dypgen or any of the other parser generators out > there do this - i.e. return one () on the first call and then another () on > the second with the string "AA"? It would finally be a reason for abandoning > ocamlyacc :o) In theory, menhir adopts the same semantics as ocamlyacc, but it will attempt to help you by providing more warnings -- here, it will complain that your grammar has an end-of-stream conflict. > (Incidentally, lest anyone have it confirmed that I'm mad, I'm trying to > parse batches of SQL statements so have no obvious terminating token for a > clause - the parser needs to do a longest possible match ignoring anything > else following that would appear to be a syntax error) If you want to parse sequences of statements without clear separators, it seems to me that the best approach would be to invoke the parser just once for an entire sequence, instead of trying to invoke it once per statement and have it leave the token stream in a consistent state. This approach would require an inversion of control (the parser would drive the rest of your code, instead of the code invoking the parser). If you can accept that, you should be fine. Hope this helps, -- François Pottier Francois.Pottier@inria.fr http://cristal.inria.fr/~fpottier/