* Error messages with dypgen @ 2007-05-18 7:20 Joel Reymont 2007-05-18 8:42 ` [Caml-list] " skaller 0 siblings, 1 reply; 4+ messages in thread From: Joel Reymont @ 2007-05-18 7:20 UTC (permalink / raw) To: OCaml List I understand that dypgen throws an exception when a syntax error is found. How do I get it to produce line/column numbers when that happens? It would be rather groovy if the the exception carried symbol_start_pos, symbol_end_pos, rhs_start_pos, rhs_end_pos from the dyp record since this information is available at the time that the exception is raised. A placeholder for a textual message would also be very helpful. A simple error function built into dypgen could then take a message and raise the syntax error exception with all the required info. Thanks, Joel -- http://wagerlabs.com/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Error messages with dypgen 2007-05-18 7:20 Error messages with dypgen Joel Reymont @ 2007-05-18 8:42 ` skaller 2007-05-18 11:36 ` Joel Reymont 0 siblings, 1 reply; 4+ messages in thread From: skaller @ 2007-05-18 8:42 UTC (permalink / raw) To: Joel Reymont; +Cc: OCaml List On Fri, 2007-05-18 at 08:20 +0100, Joel Reymont wrote: > I understand that dypgen throws an exception when a syntax error is > found. > > How do I get it to produce line/column numbers when that happens? > > It would be rather groovy if the the exception carried > symbol_start_pos, symbol_end_pos, rhs_start_pos, rhs_end_pos from the > dyp record since this information is available at the time that the > exception is raised. No it isn't. Dypgen uses lexbufs for compatibility with the broken Ocamlyacc interface. Dypgen lets you use ulex or other lexer as well. The type of the error thrown by the automaton should not be polluted by positional information that has no reasonable standard specification. If you want this information, you can look it up yourself in the lexbuf. The parser has no business at all examining the lexbuf, the lexbuf belongs to the lexer. > A simple error function built into dypgen could then take a message > and raise the syntax error exception with all the required info. An error function is a good idea, except the Ocamlyacc style interface is broken so there's no way to pass it so it would have to be global. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Error messages with dypgen 2007-05-18 8:42 ` [Caml-list] " skaller @ 2007-05-18 11:36 ` Joel Reymont 2007-05-18 13:48 ` skaller 0 siblings, 1 reply; 4+ messages in thread From: Joel Reymont @ 2007-05-18 11:36 UTC (permalink / raw) To: skaller; +Cc: OCaml List, Emmanuel Onzon John, There should at least be a textual message embedded in the exception. This is what I have right now: input_declarations: | INPUT COLON input_decs { `InputDecls (List.rev $3) } | INPUT COLON input_decs error { parser_error "Missing semicolon" $startpos($4) $endpos($4) } | INPUT COLON error { parser_error "Error after INPUT:" $startpos($3) $endpos($3) } | INPUT error { parser_error "Missing ':' after INPUT" $startpos($2) $endpos($2) } I clearly know why the error is happening here. Positions notwithstanding, wow do I rewrite this for dypgen? Thanks, Joel On May 18, 2007, at 9:42 AM, skaller wrote: > No it isn't. Dypgen uses lexbufs for compatibility with the > broken Ocamlyacc interface. Dypgen lets you use ulex or > other lexer as well. The type of the error thrown by the > automaton should not be polluted by positional information > that has no reasonable standard specification. -- http://wagerlabs.com/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Error messages with dypgen 2007-05-18 11:36 ` Joel Reymont @ 2007-05-18 13:48 ` skaller 0 siblings, 0 replies; 4+ messages in thread From: skaller @ 2007-05-18 13:48 UTC (permalink / raw) To: Joel Reymont; +Cc: OCaml List, Emmanuel Onzon On Fri, 2007-05-18 at 12:36 +0100, Joel Reymont wrote: > John, > > There should at least be a textual message embedded in the exception. > > This is what I have right now: > > input_declarations: > | INPUT COLON input_decs { `InputDecls (List.rev $3) } > | INPUT COLON input_decs error { > parser_error "Missing semicolon" $startpos($4) $endpos($4) > } > | INPUT COLON error { > parser_error "Error after INPUT:" $startpos($3) $endpos($3) > } > | INPUT error { > parser_error "Missing ':' after INPUT" $startpos($2) $endpos($2) > } > > I clearly know why the error is happening here. Positions > notwithstanding, wow do I rewrite this for dypgen? There are two issues here. First, the above code will "work" in dypgen already, assuming you supply a parser_error function. This code would not work for me, because my lexbuf is a dummy: the lexer has already run and made a list of tokens, and the lexer function is bound to the list. It ignores, totally, the lexbuf. In my system the positional information is stored in every token. So my extant parser is an example that 'proves' that dypgen *must not* standardise the format of source reference information and certainly must not raise a syntax error exception encoding that information. One solution to this may be to use an abstract type and a functor to make the 'source' information parametric. Second: this style of error handling CANNOT work with a GLR parser, because GLR parsers can simultaneously try multiple alternatives. The only time you can be sure you have an error is at a 'cut point', that is, a point where all threads join, and none of them proceed. A conclusion: dypgen may need to be modified so that there is a way to 'return' an error. At present you can raise Giveup to indicate a parse thread failed, however your technique above is to *successfully* parse an error. A more advanced conclusion: Ocamlyacc parser interface is seriously broken, and should be supported only for compatibility. There are two proper parser interfaces, IMHO: one for input iterators (mutable streams) and one for forward iterators (functional streams). The mutable interface looks like: lexer: state -> info get_loc: info -> srcloc get_token: info -> token The functional interface looks like: lexer: state -> state * info instead. With this interface, backtracking to an old 'state' value is possible. Input and forward iterators are interconvertible. A forward iterator can be made into an input iterator by simply using a reference to the state, that is, use a state variable to record the current state. An input iterator can be converted to a forward iterator by 'buffering' tokens in a list. Doing this efficiently is slightly tricky, that is, only buffering enough tokens to satisfy a possible backtrack (usually done with cut points). In both these interfaces the srcloc type is supplied by the user. Ideally, the token type would be too. in that case another function is needed: get_token_code: token -> int which is what the parser uses: that's the tag of a variant constructor or whatever. These interfaces should be standardised for ALL parsers so we have 'plugin' ability. Of course, the semantics may depend on the kind of parser and grammar. Ocamlyacc itself could be easily modified to fit this design by the lexer simply returning its lexbuf with the token. in summary: the key problem with what you want to do is that it is makes no sense semantically. You want to return information that the parser cannot in principle obtain. The fact it appears visible is actually a design bug in Ocamlyacc which has been duplicated by Dypgen in compatibility mode. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-05-18 13:48 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-05-18 7:20 Error messages with dypgen Joel Reymont 2007-05-18 8:42 ` [Caml-list] " skaller 2007-05-18 11:36 ` Joel Reymont 2007-05-18 13:48 ` skaller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox