Re: [Caml-list] Context Free Grammars?

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: Paul Snively <psnively@mac.com>
To: David McClain <David.McClain@Avisere.com>
Cc: Joshua Smith <josh@trdlnk.com>, caml-list@inria.fr
Subject: Re: [Caml-list] Context Free Grammars?
Date: Thu, 12 Aug 2004 12:25:14 -0700	[thread overview]
Message-ID: <55D5BF02-EC95-11D8-915C-000A27DEEC20@mac.com> (raw)
In-Reply-To: <7D827B04-EC8B-11D8-9939-000A95C19BAA@Avisere.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi David!

On Aug 12, 2004, at 11:14 AM, David McClain wrote:

> I am not a YACC expert (no kidding!?) but I always had the notion that  
> nonterminal reduction trees would somehow guide the reduce decisions.  
> Now, however, it appears that YACC is simply looking blindly for  
> syntactic token chains and reducing on them with absolutely no notion  
> of who might be interested in the result. Kind of takes the wind out  
> of my ancient understandings of YACC.
>
Quite right. YACC, as you no doubt know, is LALR(1), which expands to  
"Look Ahead Left-Right 1, which is a fancy way of saying basically two  
things:

1) Only one token at a time is taken into account in reductions.
2) Right-recursion will screw you over.

More in a moment.

> [So I guess I was writing my grammars as though there were some kind  
> of recursive descent engine in control. Not apparently so...]
>
Right! And hand-written recursive-descent parsers are almost always  
LL(1):

1) Only one token at a time is taken into account in reductions.
2) Left-recursion will screw you over.

If you're accustomed to writing LL(1) grammars, you'll need to factor  
right recursion to left recursion in order to go LALR(1). But you're  
still left with that pesky (1), which really seems to be what your post  
is about. So forget about LALR vs. LL and let's look at some solutions  
to the (1) problem.

O'Caml as a suite already has some quite nice parsing and lexing tools,  
and third parties have made others. Let me first recommend that you  
have a look at the Camlp4 tutorial at  
<http://caml.inria.fr/camlp4/tutorial>. Camlp4 is O'Caml's  
"preprocessor," but unfortunately that term has become one of rather  
ill repute thanks to the C/C++ communities. Camlp4 is much closer in  
expressive power and safety to Scheme's hygenic macros or to a good  
LL-based parser-generator system, as the tutorial makes abundantly  
clear.

However, as nice as Camlp4 undeniably is, it still leaves you with your  
(1) problem barring lexer hacks as described at  
<http://caml.inria.fr/camlp4/tutorial/tutorial003.html>: "The input 'A  
C' raises Stream.Error. There is no simple solution to this problem,  
but there is a solution (not very clean, actually): create a entry from  
a parser (it is possible via the function Grammar.Entry.of_parser).  
This parser must scan the stream using Stream.npeek until the number of  
necessary tokens allows to make the difference; return unit or raise  
Stream.Failure according to the cases." This is the traditional "smart  
lexer" approach to handling ambiguous (i.e. real-world) grammars in  
either an LALR(1) (ocamlyacc) or LL(1) (Camlp4) framework.

A lot of work has been done on relaxing the (1) restriction while still  
allowing efficient parsing. I won't go into the history and  
alternatives, but rather will just refer to  
<http://www.cs.berkeley.edu/~smcpeak/elkhound>. Elkhound is a GLR  
parser generator that will generate either C++ or O'Caml parsers. Being  
GLR, it deals nicely with ambiguities, and will even let you defer  
implementing resolution of them until after you've prototyped your  
entire grammar if you wish. There's a tutorial at  
<http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elkhound/ 
tutorial.html> that may be helpful, although it's in C++ terms rather  
than O'Caml. Simply putting:

option lang_OCaml;

at the beginning of your grammar will fix that. :-)

> David McClain
> Senior Corporate Scientist
> Avisere, Inc.
>
> david.mcclain@avisere.com
> +1.520.390.7738 (USA)
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:  
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:  
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>
I hope this is helpful,
Paul Snively

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iEYEARECAAYFAkEbxDUACgkQbot1wzHBQBXfXgCdHCoAXU3CrE/VhmHozY5+z0sx
GfwAn2pgYKqTnksbdHjK6bqM0V/Dq1gN
=Pddt
-----END PGP SIGNATURE-----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

next prev parent reply	other threads:[~2004-08-12 19:25 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-12 17:18 David McClain
2004-08-12 18:02 ` Joshua Smith
2004-08-12 18:14   ` David McClain
2004-08-12 19:25     ` Paul Snively [this message]
2004-08-12 21:47       ` Erik de Castro Lopo
2004-08-13  5:22       ` skaller
2004-08-13  5:59         ` David Brown
2004-08-13 14:20         ` Brian Hurt
2004-08-13  4:45 ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55D5BF02-EC95-11D8-915C-000A27DEEC20@mac.com \
    --to=psnively@mac.com \
    --cc=David.McClain@Avisere.com \
    --cc=caml-list@inria.fr \
    --cc=josh@trdlnk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox