From: Paul Snively <psnively@mac.com>
To: David McClain <David.McClain@Avisere.com>
Cc: Joshua Smith <josh@trdlnk.com>, caml-list@inria.fr
Subject: Re: [Caml-list] Context Free Grammars?
Date: Thu, 12 Aug 2004 12:25:14 -0700 [thread overview]
Message-ID: <55D5BF02-EC95-11D8-915C-000A27DEEC20@mac.com> (raw)
In-Reply-To: <7D827B04-EC8B-11D8-9939-000A95C19BAA@Avisere.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi David!
On Aug 12, 2004, at 11:14 AM, David McClain wrote:
> I am not a YACC expert (no kidding!?) but I always had the notion that
> nonterminal reduction trees would somehow guide the reduce decisions.
> Now, however, it appears that YACC is simply looking blindly for
> syntactic token chains and reducing on them with absolutely no notion
> of who might be interested in the result. Kind of takes the wind out
> of my ancient understandings of YACC.
>
Quite right. YACC, as you no doubt know, is LALR(1), which expands to
"Look Ahead Left-Right 1, which is a fancy way of saying basically two
things:
1) Only one token at a time is taken into account in reductions.
2) Right-recursion will screw you over.
More in a moment.
> [So I guess I was writing my grammars as though there were some kind
> of recursive descent engine in control. Not apparently so...]
>
Right! And hand-written recursive-descent parsers are almost always
LL(1):
1) Only one token at a time is taken into account in reductions.
2) Left-recursion will screw you over.
If you're accustomed to writing LL(1) grammars, you'll need to factor
right recursion to left recursion in order to go LALR(1). But you're
still left with that pesky (1), which really seems to be what your post
is about. So forget about LALR vs. LL and let's look at some solutions
to the (1) problem.
O'Caml as a suite already has some quite nice parsing and lexing tools,
and third parties have made others. Let me first recommend that you
have a look at the Camlp4 tutorial at
<http://caml.inria.fr/camlp4/tutorial>. Camlp4 is O'Caml's
"preprocessor," but unfortunately that term has become one of rather
ill repute thanks to the C/C++ communities. Camlp4 is much closer in
expressive power and safety to Scheme's hygenic macros or to a good
LL-based parser-generator system, as the tutorial makes abundantly
clear.
However, as nice as Camlp4 undeniably is, it still leaves you with your
(1) problem barring lexer hacks as described at
<http://caml.inria.fr/camlp4/tutorial/tutorial003.html>: "The input 'A
C' raises Stream.Error. There is no simple solution to this problem,
but there is a solution (not very clean, actually): create a entry from
a parser (it is possible via the function Grammar.Entry.of_parser).
This parser must scan the stream using Stream.npeek until the number of
necessary tokens allows to make the difference; return unit or raise
Stream.Failure according to the cases." This is the traditional "smart
lexer" approach to handling ambiguous (i.e. real-world) grammars in
either an LALR(1) (ocamlyacc) or LL(1) (Camlp4) framework.
A lot of work has been done on relaxing the (1) restriction while still
allowing efficient parsing. I won't go into the history and
alternatives, but rather will just refer to
<http://www.cs.berkeley.edu/~smcpeak/elkhound>. Elkhound is a GLR
parser generator that will generate either C++ or O'Caml parsers. Being
GLR, it deals nicely with ambiguities, and will even let you defer
implementing resolution of them until after you've prototyped your
entire grammar if you wish. There's a tutorial at
<http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elkhound/
tutorial.html> that may be helpful, although it's in C++ terms rather
than O'Caml. Simply putting:
option lang_OCaml;
at the beginning of your grammar will fix that. :-)
> David McClain
> Senior Corporate Scientist
> Avisere, Inc.
>
> david.mcclain@avisere.com
> +1.520.390.7738 (USA)
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>
I hope this is helpful,
Paul Snively
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)
iEYEARECAAYFAkEbxDUACgkQbot1wzHBQBXfXgCdHCoAXU3CrE/VhmHozY5+z0sx
GfwAn2pgYKqTnksbdHjK6bqM0V/Dq1gN
=Pddt
-----END PGP SIGNATURE-----
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
next prev parent reply other threads:[~2004-08-12 19:25 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-12 17:18 David McClain
2004-08-12 18:02 ` Joshua Smith
2004-08-12 18:14 ` David McClain
2004-08-12 19:25 ` Paul Snively [this message]
2004-08-12 21:47 ` Erik de Castro Lopo
2004-08-13 5:22 ` skaller
2004-08-13 5:59 ` David Brown
2004-08-13 14:20 ` Brian Hurt
2004-08-13 4:45 ` skaller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55D5BF02-EC95-11D8-915C-000A27DEEC20@mac.com \
--to=psnively@mac.com \
--cc=David.McClain@Avisere.com \
--cc=caml-list@inria.fr \
--cc=josh@trdlnk.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox