Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: skaller <skaller@users.sourceforge.net>
To: Brian Hurt <bhurt@spnz.org>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] yacc question
Date: Mon, 05 Dec 2005 03:37:40 +1100	[thread overview]
Message-ID: <1133714261.11050.35.camel@rosella> (raw)
In-Reply-To: <Pine.LNX.4.63.0512040919570.24132@localhost.localdomain>

On Sun, 2005-12-04 at 09:25 -0600, Brian Hurt wrote:
> 
> On Mon, 5 Dec 2005, skaller wrote:
> 
> > I have the 'usual' kind of parser for expressions, with two
> > nonterminals 'expr' and 'atom', the latter including ( expr )
> > and INTEGER of course.
> >
> > When I have input like
> >
> > 1;
> > 1 + 2 ;
> > (1 + 2) ;
> >
> > none of the case parse as expressions, the first and
> > last do parse as atoms (leaving the ; in the buffer).
> >
> > What I want is to parse the longest possible match.
> > The only way I can think of doing this is:
> >
> > top_expr:
> >  | expr token_not_allowed_in_expressions
> >
> > and then 'put back' the trailing token into the buffer.
> > Is there another way?
> >
> 
> The standard way to implement this is:
> 
> statement:
>  	expression SEMICOLON
> 
> expression:
>  	| mul_expression
>  	| expression PLUS mul_expression
>  	| expression MINUS mul_expression
> 
> mul_expression:
>  	atom_expression
>  	| mul_expression TIMES atom_expression
>  	| mul_expression DIVIDE atom_expression
>  	| mul_expression MODULO atom_expression
> 
> atom_expression:
>  	ATOM
>  	| OPEN_PAREN expression CLOSE_PAREN
> 
> Notice that precedence is taken care of immediately- 3 + 4 * 5 is parsed 
> as 3 + (4 * 5).  Also notice that the semicolon is what terminates us- we 
> keep collecting up an expression until we see a semicolon- only then do we 
> complete the statement.

Expressions cannot be statements, but I have instead:

statement:
	NAME LEFT_ARROW expression SEMICOLON

for example, so basically your example holds with a modification.

The problem is I want to parse a *user defined statement*.
This will be done by a preprocessor definition, something like:

#keyword within
#statement select expr within statement ;

This makes 'select' and 'within' keywords, when ocamlyacc
sees the user defined statement keyword 'select', it looks
up a table and finds the grammar production. I then parse
by recursive descent -- backtracking is cool in an FPL,
my tokens are a list, so I can backtrack easily.

When interpreting the grammar, if I see the token 'expr'
then I parse an expression using the ocamlyacc entry point.
Similarly for statements -- in particular, since the 
user defined statements can be 'added' into the statement
grammar, recursion in the 'within' clause will also include
the new statements (the recursion is covariant?)

There is no problem with statements .. but for expressions,
I cannot just terminate on SEMICOLON -- I have to terminate
on many symbols, which can't be part of an expression --
and leave them in the token buffer too. In the example
grammar, 'within' terminates the expression.

The problem of course is that Ocamlyacc doesn't know about
the new grammar .. but I'm still using it to parse expressions.

Mixing LALR1 and an RD parser seems messy .. but I see
no other way to obtain high performance for 'the usual case'
whilst still allowing extensibility -- Ocaml doesn't come
with an LL parser generator.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


      reply	other threads:[~2005-12-04 16:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-04 14:15 skaller
2005-12-04 15:03 ` [Caml-list] " Robert W.
2005-12-04 16:46   ` skaller
2005-12-04 15:25 ` Brian Hurt
2005-12-04 16:37   ` skaller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1133714261.11050.35.camel@rosella \
    --to=skaller@users.sourceforge.net \
    --cc=bhurt@spnz.org \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox