From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 31A98BC0A for ; Fri, 18 May 2007 15:48:50 +0200 (CEST) Received: from ipmail01.adl2.internode.on.net (ipmail01.adl2.internode.on.net [203.16.214.140]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l4IDmlJ5031147 for ; Fri, 18 May 2007 15:48:49 +0200 X-IronPort-AV: E=Sophos;i="4.14,552,1170595800"; d="scan'208";a="130730197" Received: from ppp59-172.lns2.syd6.internode.on.net (HELO [192.168.1.201]) ([121.44.59.172]) by ipmail01.adl2.internode.on.net with ESMTP; 18 May 2007 23:18:45 +0930 Subject: Re: [Caml-list] Error messages with dypgen From: skaller To: Joel Reymont Cc: OCaml List , Emmanuel Onzon In-Reply-To: References: <66D9A385-E61C-4AA2-81AA-B352FB941A57@gmail.com> <1179477743.17322.10.camel@rosella.wigram> Content-Type: text/plain Date: Fri, 18 May 2007 23:48:43 +1000 Message-Id: <1179496123.6151.25.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 464DAEBF.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; 0100,:01 parser:01 semicolon:01 parser:01 lexbuf:01 lexer:01 tokens:01 lexer:01 lexbuf:01 syntax:01 functor:01 parametric:01 parsers:01 point':01 threads:01 On Fri, 2007-05-18 at 12:36 +0100, Joel Reymont wrote: > John, > > There should at least be a textual message embedded in the exception. > > This is what I have right now: > > input_declarations: > | INPUT COLON input_decs { `InputDecls (List.rev $3) } > | INPUT COLON input_decs error { > parser_error "Missing semicolon" $startpos($4) $endpos($4) > } > | INPUT COLON error { > parser_error "Error after INPUT:" $startpos($3) $endpos($3) > } > | INPUT error { > parser_error "Missing ':' after INPUT" $startpos($2) $endpos($2) > } > > I clearly know why the error is happening here. Positions > notwithstanding, wow do I rewrite this for dypgen? There are two issues here. First, the above code will "work" in dypgen already, assuming you supply a parser_error function. This code would not work for me, because my lexbuf is a dummy: the lexer has already run and made a list of tokens, and the lexer function is bound to the list. It ignores, totally, the lexbuf. In my system the positional information is stored in every token. So my extant parser is an example that 'proves' that dypgen *must not* standardise the format of source reference information and certainly must not raise a syntax error exception encoding that information. One solution to this may be to use an abstract type and a functor to make the 'source' information parametric. Second: this style of error handling CANNOT work with a GLR parser, because GLR parsers can simultaneously try multiple alternatives. The only time you can be sure you have an error is at a 'cut point', that is, a point where all threads join, and none of them proceed. A conclusion: dypgen may need to be modified so that there is a way to 'return' an error. At present you can raise Giveup to indicate a parse thread failed, however your technique above is to *successfully* parse an error. A more advanced conclusion: Ocamlyacc parser interface is seriously broken, and should be supported only for compatibility. There are two proper parser interfaces, IMHO: one for input iterators (mutable streams) and one for forward iterators (functional streams). The mutable interface looks like: lexer: state -> info get_loc: info -> srcloc get_token: info -> token The functional interface looks like: lexer: state -> state * info instead. With this interface, backtracking to an old 'state' value is possible. Input and forward iterators are interconvertible. A forward iterator can be made into an input iterator by simply using a reference to the state, that is, use a state variable to record the current state. An input iterator can be converted to a forward iterator by 'buffering' tokens in a list. Doing this efficiently is slightly tricky, that is, only buffering enough tokens to satisfy a possible backtrack (usually done with cut points). In both these interfaces the srcloc type is supplied by the user. Ideally, the token type would be too. in that case another function is needed: get_token_code: token -> int which is what the parser uses: that's the tag of a variant constructor or whatever. These interfaces should be standardised for ALL parsers so we have 'plugin' ability. Of course, the semantics may depend on the kind of parser and grammar. Ocamlyacc itself could be easily modified to fit this design by the lexer simply returning its lexbuf with the token. in summary: the key problem with what you want to do is that it is makes no sense semantically. You want to return information that the parser cannot in principle obtain. The fact it appears visible is actually a design bug in Ocamlyacc which has been duplicated by Dypgen in compatibility mode. -- John Skaller Felix, successor to C++: http://felix.sf.net