Re: [Caml-list] Error messages with dypgen

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: skaller <skaller@users.sourceforge.net>
To: Joel Reymont <joelr1@gmail.com>
Cc: OCaml List <caml-list@inria.fr>,
	Emmanuel Onzon <emmanuel.onzon@ens-lyon.fr>
Subject: Re: [Caml-list] Error messages with dypgen
Date: Fri, 18 May 2007 23:48:43 +1000	[thread overview]
Message-ID: <1179496123.6151.25.camel@rosella.wigram> (raw)
In-Reply-To: <F57F48B5-6005-4F22-AFB7-896986EECA50@gmail.com>

On Fri, 2007-05-18 at 12:36 +0100, Joel Reymont wrote:
> John,
> 
> There should at least be a textual message embedded in the exception.
> 
> This is what I have right now:
> 
> input_declarations:
>    | INPUT COLON input_decs { `InputDecls (List.rev $3) }
>    | INPUT COLON input_decs error {
>        parser_error "Missing semicolon" $startpos($4) $endpos($4)
>      }
>    | INPUT COLON error {
>        parser_error "Error after INPUT:" $startpos($3) $endpos($3)
>      }
>    | INPUT error {
>        parser_error "Missing ':' after INPUT" $startpos($2) $endpos($2)
>      }
> 
> I clearly know why the error is happening here. Positions  
> notwithstanding, wow do I rewrite this for dypgen?

There are two issues here. 

First, the above code will
"work" in dypgen already, assuming you supply a 
parser_error function. This code would not work for me,
because my lexbuf is a dummy: the lexer has already run
and made a list of tokens, and the lexer function is bound
to the list. It ignores, totally, the lexbuf. In my system
the positional information is stored in every token.

So my extant parser is an example that 'proves' that
dypgen *must not* standardise the format of source
reference information and certainly must not raise
a syntax error exception encoding that information.

One solution to this may be to use an abstract type
and a functor to make the 'source' information 
parametric.

Second: this style of error handling CANNOT work with a GLR
parser, because GLR parsers can simultaneously try multiple
alternatives. The only time you can be sure you have an error
is at a 'cut point', that is, a point where all threads 
join, and none of them proceed.

A conclusion: dypgen may need to be modified so that there
is a way to 'return' an error. At present you can 
raise Giveup to indicate a parse thread failed,
however your technique above is to *successfully* 
parse an error.

A more advanced conclusion: Ocamlyacc parser interface
is seriously broken, and should be supported only
for compatibility.

There are two proper parser interfaces, IMHO:
one for input iterators (mutable streams) and
one for forward iterators (functional streams).

The mutable interface looks like:

	lexer: state -> info
	get_loc: info -> srcloc
	get_token: info -> token

The functional interface looks like:

	lexer: state -> state * info

instead. With this interface, backtracking to
an old 'state' value is possible.

Input and forward iterators are interconvertible.

A forward iterator can be made into an input
iterator by simply using a reference to the state,
that is, use a state variable to record the current
state.

An input iterator can be converted to a forward
iterator by 'buffering' tokens in a list. Doing
this efficiently is slightly tricky, that is,
only buffering enough tokens to satisfy a possible
backtrack (usually done with cut points).

In both these interfaces the srcloc type is supplied
by the user. Ideally, the token type would be too.
in that case another function is needed:

	get_token_code: token -> int

which is what the parser uses: that's the tag
of a variant constructor or whatever.

These interfaces should be standardised for ALL
parsers so we have 'plugin' ability. Of course,
the semantics may depend on the kind of parser
and grammar.

Ocamlyacc itself could be easily modified to fit
this design by the lexer simply returning its lexbuf
with the token.

in summary: the key problem with what you want
to do is that it is makes no sense semantically.
You want to return information that the parser
cannot in principle obtain. The fact it appears
visible is actually a design bug in Ocamlyacc
which has been duplicated by Dypgen in compatibility
mode.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

     prev parent reply	other threads:[~2007-05-18 13:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-18  7:20 Joel Reymont
2007-05-18  8:42 ` [Caml-list] " skaller
2007-05-18 11:36   ` Joel Reymont
2007-05-18 13:48     ` skaller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1179496123.6151.25.camel@rosella.wigram \
    --to=skaller@users.sourceforge.net \
    --cc=caml-list@inria.fr \
    --cc=emmanuel.onzon@ens-lyon.fr \
    --cc=joelr1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox