* [Caml-list] Context Free Grammars?
@ 2004-08-12 17:18 David McClain
2004-08-12 18:02 ` Joshua Smith
2004-08-13 4:45 ` skaller
0 siblings, 2 replies; 9+ messages in thread
From: David McClain @ 2004-08-12 17:18 UTC (permalink / raw)
To: caml-list
I think my mind has been poisoned from exposure to recursive descent
parsing...
I am running into a huge number of reduce/reduce conflicts in
OCamlYacc. It is beginning to dawn on me that Yacc really is for
context-free grammars... (that's what they said! only now I'm starting
to realize it..)
So the question is, does OCaml actually have a CFG description? I'm
confused about the similarity of patterns and expression from the
viewpoint of CFG description. They share many similarities, and in the
correct context there can be no confusion.
But when I try to generate a parser it appears that pieces of
expression syntax and pieces of pattern syntax are confusing the
parser. If the parser really ignores any kind of context -- such as the
parent tree for the subproduction -- then the lack of any context
knowledge would certainly be confusing.
Anyone have any hints about syntax transformations so that CFG's can
really be used here? I read a tremendous number of references that
indicate how nasty these reduce/reduce conflicts can be. I believe
them. Trouble is they don't go very far in explaining how to fix these
conflicts, other than to state that "you have a mistake in your
grammar". Some references do hint that syntax description
transformations can become unwieldy and unnatural to read.
I have to stop thinking like recursive descent and try to view the
universe as flat-land...
David McClain
Senior Corporate Scientist
Avisere, Inc.
david.mcclain@avisere.com
+1.520.390.7738 (USA)
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars?
2004-08-12 17:18 [Caml-list] Context Free Grammars? David McClain
@ 2004-08-12 18:02 ` Joshua Smith
2004-08-12 18:14 ` David McClain
2004-08-13 4:45 ` skaller
1 sibling, 1 reply; 9+ messages in thread
From: Joshua Smith @ 2004-08-12 18:02 UTC (permalink / raw)
To: David McClain; +Cc: caml-list
[-- Attachment #1: Type: text/plain, Size: 193 bytes --]
David McClain wrote:
> I think my mind has been poisoned from exposure to recursive descent
> parsing...
Out of curiosity, why not use recursive descent parsing instead of
ocamlyacc?
-jbs
[-- Attachment #2: josh.vcf --]
[-- Type: text/x-vcard, Size: 277 bytes --]
begin:vcard
fn:Joshua Smith
n:Smith;Joshua
org:;IT
adr:Suite 2300;;200 W. Jackson;Chicago;IL;60606;USA
email;internet:josh@trdlnk.com
title:200 W. Jackson Suite 2300
tel;work:312-264-2046
tel;fax:312-264-2001
tel;cell:312-933-9916
x-mozilla-html:FALSE
version:2.1
end:vcard
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars?
2004-08-12 18:02 ` Joshua Smith
@ 2004-08-12 18:14 ` David McClain
2004-08-12 19:25 ` Paul Snively
0 siblings, 1 reply; 9+ messages in thread
From: David McClain @ 2004-08-12 18:14 UTC (permalink / raw)
To: Joshua Smith; +Cc: caml-list
> Out of curiosity, why not use recursive descent parsing instead of
> ocamlyacc?
>
Well, not a bad idea... I suppose. It would give me better control over
the course of things. I had the impression that YACC was a much faster
system, being table driven and all. And possibly less bulky in
generated code, but that isn't much of an argument in its favor
today...
The problems I am seeing have to do with factoring the grammar, similar
to factoring threaded code. However the parent trees of these factors
have very different semantic notions about what that fragment should
mean. Recursive descent would certainly handle this better.
I am not a YACC expert (no kidding!?) but I always had the notion that
nonterminal reduction trees would somehow guide the reduce decisions.
Now, however, it appears that YACC is simply looking blindly for
syntactic token chains and reducing on them with absolutely no notion
of who might be interested in the result. Kind of takes the wind out of
my ancient understandings of YACC.
[So I guess I was writing my grammars as though there were some kind of
recursive descent engine in control. Not apparently so...]
David McClain
Senior Corporate Scientist
Avisere, Inc.
david.mcclain@avisere.com
+1.520.390.7738 (USA)
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars?
2004-08-12 18:14 ` David McClain
@ 2004-08-12 19:25 ` Paul Snively
2004-08-12 21:47 ` Erik de Castro Lopo
2004-08-13 5:22 ` skaller
0 siblings, 2 replies; 9+ messages in thread
From: Paul Snively @ 2004-08-12 19:25 UTC (permalink / raw)
To: David McClain; +Cc: Joshua Smith, caml-list
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi David!
On Aug 12, 2004, at 11:14 AM, David McClain wrote:
> I am not a YACC expert (no kidding!?) but I always had the notion that
> nonterminal reduction trees would somehow guide the reduce decisions.
> Now, however, it appears that YACC is simply looking blindly for
> syntactic token chains and reducing on them with absolutely no notion
> of who might be interested in the result. Kind of takes the wind out
> of my ancient understandings of YACC.
>
Quite right. YACC, as you no doubt know, is LALR(1), which expands to
"Look Ahead Left-Right 1, which is a fancy way of saying basically two
things:
1) Only one token at a time is taken into account in reductions.
2) Right-recursion will screw you over.
More in a moment.
> [So I guess I was writing my grammars as though there were some kind
> of recursive descent engine in control. Not apparently so...]
>
Right! And hand-written recursive-descent parsers are almost always
LL(1):
1) Only one token at a time is taken into account in reductions.
2) Left-recursion will screw you over.
If you're accustomed to writing LL(1) grammars, you'll need to factor
right recursion to left recursion in order to go LALR(1). But you're
still left with that pesky (1), which really seems to be what your post
is about. So forget about LALR vs. LL and let's look at some solutions
to the (1) problem.
O'Caml as a suite already has some quite nice parsing and lexing tools,
and third parties have made others. Let me first recommend that you
have a look at the Camlp4 tutorial at
<http://caml.inria.fr/camlp4/tutorial>. Camlp4 is O'Caml's
"preprocessor," but unfortunately that term has become one of rather
ill repute thanks to the C/C++ communities. Camlp4 is much closer in
expressive power and safety to Scheme's hygenic macros or to a good
LL-based parser-generator system, as the tutorial makes abundantly
clear.
However, as nice as Camlp4 undeniably is, it still leaves you with your
(1) problem barring lexer hacks as described at
<http://caml.inria.fr/camlp4/tutorial/tutorial003.html>: "The input 'A
C' raises Stream.Error. There is no simple solution to this problem,
but there is a solution (not very clean, actually): create a entry from
a parser (it is possible via the function Grammar.Entry.of_parser).
This parser must scan the stream using Stream.npeek until the number of
necessary tokens allows to make the difference; return unit or raise
Stream.Failure according to the cases." This is the traditional "smart
lexer" approach to handling ambiguous (i.e. real-world) grammars in
either an LALR(1) (ocamlyacc) or LL(1) (Camlp4) framework.
A lot of work has been done on relaxing the (1) restriction while still
allowing efficient parsing. I won't go into the history and
alternatives, but rather will just refer to
<http://www.cs.berkeley.edu/~smcpeak/elkhound>. Elkhound is a GLR
parser generator that will generate either C++ or O'Caml parsers. Being
GLR, it deals nicely with ambiguities, and will even let you defer
implementing resolution of them until after you've prototyped your
entire grammar if you wish. There's a tutorial at
<http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elkhound/
tutorial.html> that may be helpful, although it's in C++ terms rather
than O'Caml. Simply putting:
option lang_OCaml;
at the beginning of your grammar will fix that. :-)
> David McClain
> Senior Corporate Scientist
> Avisere, Inc.
>
> david.mcclain@avisere.com
> +1.520.390.7738 (USA)
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>
I hope this is helpful,
Paul Snively
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)
iEYEARECAAYFAkEbxDUACgkQbot1wzHBQBXfXgCdHCoAXU3CrE/VhmHozY5+z0sx
GfwAn2pgYKqTnksbdHjK6bqM0V/Dq1gN
=Pddt
-----END PGP SIGNATURE-----
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars?
2004-08-12 19:25 ` Paul Snively
@ 2004-08-12 21:47 ` Erik de Castro Lopo
2004-08-13 5:22 ` skaller
1 sibling, 0 replies; 9+ messages in thread
From: Erik de Castro Lopo @ 2004-08-12 21:47 UTC (permalink / raw)
To: caml-list
On Thu, 12 Aug 2004 12:25:14 -0700
Paul Snively <psnively@mac.com> wrote:
<snip>
> A lot of work has been done on relaxing the (1) restriction while still
> allowing efficient parsing. I won't go into the history and
> alternatives, but rather will just refer to
> <http://www.cs.berkeley.edu/~smcpeak/elkhound>. Elkhound is a GLR
> parser generator that will generate either C++ or O'Caml parsers.
Since Bison 1.50, GNU Bison has had an optional GLP parser
http://www.delorie.com/gnu/docs/bison/bison_11.html
although I must admit that using it didn't solve any problems
for me :-).
Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo nospam@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+
"I ran it on my DeathStation 9000 and demons flew out of my nose." --Kaz
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars?
2004-08-12 17:18 [Caml-list] Context Free Grammars? David McClain
2004-08-12 18:02 ` Joshua Smith
@ 2004-08-13 4:45 ` skaller
1 sibling, 0 replies; 9+ messages in thread
From: skaller @ 2004-08-13 4:45 UTC (permalink / raw)
To: David McClain; +Cc: caml-list
On Fri, 2004-08-13 at 03:18, David McClain wrote:
> Anyone have any hints about syntax transformations so that CFG's can
> really be used here?
Yeah, I've been through this pain and still bump into it
a lot.
There are two tricks. The first is to change your thinking
to *bottom up*. The grammar names 'syntactic fragments'
of increasing size and complexity as it sees them,
rather than going searching for some wholistic shape.
So name your fragments uniquely; think in terms:
"the terms I'm parsing have intrinsic (synthetic) structure"
that is passed upwards, rather than "I'm looking down
the tree for something I expect, if I don't find it
I will try something else"
The second trick is: make your grammar coarse grained
and far too general. Don't use the LALR(1) parser to
enforce constraints. Do that in the { executable code }
part or in post processing. Type checking is the obvious
example of that.
In the Felix grammar (which is LALR(1) and entirely
unambiguous), I use exactly the same coarse syntax for
executable expressions and for type annotations.
In both cases I allow x + y * z (yup, Felix has anonymous
sum types). So when I parse
(x + y * z) : ( t + u * v)
// executable : type annotation
I use (the moral equivalent of):
expr COLON expr {
let x = $1 in
let t = expr_as_type $3 in
`Coercion (x,y)
}
Since not all executable expressions are type expressions,
I trap that in the function 'expr_as_type' and throw
an exception -- which produces a vastly superior error message
to 'Syntax Error' that is the best the parser can produce
automatically.
Finally: if you are parsing a *nasty* language, such as Python,
that isn't even remotely LALR(1), you can still use a LALR(1)
grammar with some care are trickery, to do a lot of the work.
To parse Python, I wrote multi-stage 'token filter' to preprocess
the token stream, generating 'INDENT' and 'UNDENT' tokens,
for example (Python uses indentation to specify block structure).
Another nastiness in Python is (a,) for a unary tuple: that trailing
comma is allowed in pairs too: (a,b,) and it really screws up
LALR(1) parsing.
It took around 10 separate passes to generate a list of tokens
that I could more easily parse with Ocamlyacc.
--
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850,
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars?
2004-08-12 19:25 ` Paul Snively
2004-08-12 21:47 ` Erik de Castro Lopo
@ 2004-08-13 5:22 ` skaller
2004-08-13 5:59 ` David Brown
2004-08-13 14:20 ` Brian Hurt
1 sibling, 2 replies; 9+ messages in thread
From: skaller @ 2004-08-13 5:22 UTC (permalink / raw)
To: Paul Snively; +Cc: David McClain, Joshua Smith, caml-list
On Fri, 2004-08-13 at 05:25, Paul Snively wrote:
> 1) Only one token at a time is taken into account in reductions.
> 2) Right-recursion will screw you over.
Actually Ocamlyacc handles right recursion just fine.
I have no idea why .. but it does, in fact its the
prefered form in many cases: here's an example
of BOTH left and right recursion being used:
tlelement:
| lelement COLON factor { $1,Some (typecode_of_expr $3) }
| lelement { $1,None }
lexprs:
| tlelement COMMA lexprs { $1 :: $3 }
| tlelement { [$1] }
This works -- its part of the Felix grammar: tlelement
is left recursive, lexprs is right recursive,
and the reason is clear -- its the most efficient
way to build a list.
Perhaps someone can explain why this actually works ..??
--
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850,
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars?
2004-08-13 5:22 ` skaller
@ 2004-08-13 5:59 ` David Brown
2004-08-13 14:20 ` Brian Hurt
1 sibling, 0 replies; 9+ messages in thread
From: David Brown @ 2004-08-13 5:59 UTC (permalink / raw)
To: skaller; +Cc: Paul Snively, David McClain, Joshua Smith, caml-list
On Fri, Aug 13, 2004 at 03:22:28PM +1000, skaller wrote:
> tlelement:
> | lelement COLON factor { $1,Some (typecode_of_expr $3) }
> | lelement { $1,None }
>
> lexprs:
> | tlelement COMMA lexprs { $1 :: $3 }
> | tlelement { [$1] }
>
> This works -- its part of the Felix grammar: tlelement
> is left recursive, lexprs is right recursive,
> and the reason is clear -- its the most efficient
> way to build a list.
>
> Perhaps someone can explain why this actually works ..??
It is similar to tail recursion and non-tail recursion in ocaml code. The
right recursive one works (at least some forms work), but you end up
shifting the entire expression off and not reducing the entire thing until
the end. In other words, even if you can use right recursion, the left
recursive version will be more efficient.
I think there are situations where a right recursive version wouldn't be
able to parse, but a left one would. But, it is too late, and I don't
think I can think about it right now :-)
Dave
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars?
2004-08-13 5:22 ` skaller
2004-08-13 5:59 ` David Brown
@ 2004-08-13 14:20 ` Brian Hurt
1 sibling, 0 replies; 9+ messages in thread
From: Brian Hurt @ 2004-08-13 14:20 UTC (permalink / raw)
To: skaller; +Cc: caml-list
On 13 Aug 2004, skaller wrote:
> Perhaps someone can explain why this actually works ..??
Right recursion "works". The problem is that LALR parsers can do
something akin to tail recursion with left recursion that they can't do
with right recursion. Say, for example, you have the rules:
mul_expr: mul_expr '*' term
| mul_expr '/' term
| mul_expr MOD term
| term
term: NUMBER
Here, mul_expr is "left recursive"- the "recursive" call to mul_expr is on
the left hand side of the expression. Now, with an LALR parser, no mater
how many multiplications you had, it'd use constant stack space. Each
time it'd see a new multiplication term, it'd immediately collapse back
down to a mul_expr. An equally valid way to parse this would be right
recursively:
mul_expr: term '*' mul_expr
| term '/' mul_expr
| term MOD mul_expr
| term
The problem here is that if you had N multiplications in a row, it'd take
O(N) stack spaces on the parse tree to parse them all. This would still
work- right up until you hit the point where you have so many
multiplication terms that you blow stack.
It works, and there are times when it's the correct thing to use, but if
you have the choice between left recursive and right recursive grammars,
pick left recursive for LALR parsers.
--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-08-13 14:12 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-12 17:18 [Caml-list] Context Free Grammars? David McClain
2004-08-12 18:02 ` Joshua Smith
2004-08-12 18:14 ` David McClain
2004-08-12 19:25 ` Paul Snively
2004-08-12 21:47 ` Erik de Castro Lopo
2004-08-13 5:22 ` skaller
2004-08-13 5:59 ` David Brown
2004-08-13 14:20 ` Brian Hurt
2004-08-13 4:45 ` skaller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox