[Caml-list] Context Free Grammars?

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

* [Caml-list] Context Free Grammars?
@ 2004-08-12 17:18 David McClain
  2004-08-12 18:02 ` Joshua Smith
  2004-08-13  4:45 ` skaller
  0 siblings, 2 replies; 9+ messages in thread
From: David McClain @ 2004-08-12 17:18 UTC (permalink / raw)
  To: caml-list

I think my mind has been poisoned from exposure to recursive descent 
parsing...

I am running into a huge number of reduce/reduce conflicts in 
OCamlYacc. It is beginning to dawn on me that Yacc really is for 
context-free grammars... (that's what they said! only now I'm starting 
to realize it..)

So the question is, does OCaml actually have a CFG description? I'm 
confused about the similarity of patterns and expression from the 
viewpoint of CFG description. They share many similarities, and in the 
correct context there can be no confusion.

But when I try to generate a parser it appears that pieces of 
expression syntax and pieces of pattern syntax are confusing the 
parser. If the parser really ignores any kind of context -- such as the 
parent tree for the subproduction -- then the lack of any context 
knowledge would certainly be confusing.

Anyone have any hints about syntax transformations so that CFG's can 
really be used here? I read a tremendous number of references that 
indicate how nasty these reduce/reduce conflicts can be. I believe 
them. Trouble is they don't go very far in explaining how to fix these 
conflicts, other than to state that "you have a mistake in your 
grammar".  Some references do hint that syntax description 
transformations can become unwieldy and unnatural to read.

I have to stop thinking like recursive descent and try to view the 
universe as flat-land...

David McClain
Senior Corporate Scientist
Avisere, Inc.

david.mcclain@avisere.com
+1.520.390.7738 (USA)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Context Free Grammars?
  2004-08-12 17:18 [Caml-list] Context Free Grammars? David McClain
@ 2004-08-12 18:02 ` Joshua Smith
  2004-08-12 18:14   ` David McClain
  2004-08-13  4:45 ` skaller
  1 sibling, 1 reply; 9+ messages in thread
From: Joshua Smith @ 2004-08-12 18:02 UTC (permalink / raw)
  To: David McClain; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 193 bytes --]

David McClain wrote:

> I think my mind has been poisoned from exposure to recursive descent
> parsing...


Out of curiosity, why not use recursive descent parsing instead of
ocamlyacc?

-jbs


[-- Attachment #2: josh.vcf --]
[-- Type: text/x-vcard, Size: 277 bytes --]

begin:vcard
fn:Joshua  Smith
n:Smith;Joshua 
org:;IT
adr:Suite 2300;;200 W. Jackson;Chicago;IL;60606;USA
email;internet:josh@trdlnk.com
title:200 W. Jackson Suite 2300
tel;work:312-264-2046
tel;fax:312-264-2001
tel;cell:312-933-9916
x-mozilla-html:FALSE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Context Free Grammars?
  2004-08-12 18:02 ` Joshua Smith
@ 2004-08-12 18:14   ` David McClain
  2004-08-12 19:25     ` Paul Snively
  0 siblings, 1 reply; 9+ messages in thread
From: David McClain @ 2004-08-12 18:14 UTC (permalink / raw)
  To: Joshua Smith; +Cc: caml-list

> Out of curiosity, why not use recursive descent parsing instead of
> ocamlyacc?
>

Well, not a bad idea... I suppose. It would give me better control over 
the course of things. I had the impression that YACC was a much faster 
system, being table driven and all. And possibly less bulky in 
generated code, but that isn't much of an argument in its favor 
today...

The problems I am seeing have to do with factoring the grammar, similar 
to factoring threaded code. However the parent trees of these factors 
have very different semantic notions about what that fragment should 
mean. Recursive descent would certainly handle this better.

I am not a YACC expert (no kidding!?) but I always had the notion that 
nonterminal reduction trees would somehow guide the reduce decisions. 
Now, however, it appears that YACC is simply looking blindly for 
syntactic token chains and reducing on them with absolutely no notion 
of who might be interested in the result. Kind of takes the wind out of 
my ancient understandings of YACC.

[So I guess I was writing my grammars as though there were some kind of 
recursive descent engine in control. Not apparently so...]

David McClain
Senior Corporate Scientist
Avisere, Inc.

david.mcclain@avisere.com
+1.520.390.7738 (USA)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Context Free Grammars?
  2004-08-12 18:14   ` David McClain
@ 2004-08-12 19:25     ` Paul Snively
  2004-08-12 21:47       ` Erik de Castro Lopo
  2004-08-13  5:22       ` skaller
  0 siblings, 2 replies; 9+ messages in thread
From: Paul Snively @ 2004-08-12 19:25 UTC (permalink / raw)
  To: David McClain; +Cc: Joshua Smith, caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi David!

On Aug 12, 2004, at 11:14 AM, David McClain wrote:

> I am not a YACC expert (no kidding!?) but I always had the notion that  
> nonterminal reduction trees would somehow guide the reduce decisions.  
> Now, however, it appears that YACC is simply looking blindly for  
> syntactic token chains and reducing on them with absolutely no notion  
> of who might be interested in the result. Kind of takes the wind out  
> of my ancient understandings of YACC.
>
Quite right. YACC, as you no doubt know, is LALR(1), which expands to  
"Look Ahead Left-Right 1, which is a fancy way of saying basically two  
things:

1) Only one token at a time is taken into account in reductions.
2) Right-recursion will screw you over.

More in a moment.

> [So I guess I was writing my grammars as though there were some kind  
> of recursive descent engine in control. Not apparently so...]
>
Right! And hand-written recursive-descent parsers are almost always  
LL(1):

1) Only one token at a time is taken into account in reductions.
2) Left-recursion will screw you over.

If you're accustomed to writing LL(1) grammars, you'll need to factor  
right recursion to left recursion in order to go LALR(1). But you're  
still left with that pesky (1), which really seems to be what your post  
is about. So forget about LALR vs. LL and let's look at some solutions  
to the (1) problem.

O'Caml as a suite already has some quite nice parsing and lexing tools,  
and third parties have made others. Let me first recommend that you  
have a look at the Camlp4 tutorial at  
<http://caml.inria.fr/camlp4/tutorial>. Camlp4 is O'Caml's  
"preprocessor," but unfortunately that term has become one of rather  
ill repute thanks to the C/C++ communities. Camlp4 is much closer in  
expressive power and safety to Scheme's hygenic macros or to a good  
LL-based parser-generator system, as the tutorial makes abundantly  
clear.

However, as nice as Camlp4 undeniably is, it still leaves you with your  
(1) problem barring lexer hacks as described at  
<http://caml.inria.fr/camlp4/tutorial/tutorial003.html>: "The input 'A  
C' raises Stream.Error. There is no simple solution to this problem,  
but there is a solution (not very clean, actually): create a entry from  
a parser (it is possible via the function Grammar.Entry.of_parser).  
This parser must scan the stream using Stream.npeek until the number of  
necessary tokens allows to make the difference; return unit or raise  
Stream.Failure according to the cases." This is the traditional "smart  
lexer" approach to handling ambiguous (i.e. real-world) grammars in  
either an LALR(1) (ocamlyacc) or LL(1) (Camlp4) framework.

A lot of work has been done on relaxing the (1) restriction while still  
allowing efficient parsing. I won't go into the history and  
alternatives, but rather will just refer to  
<http://www.cs.berkeley.edu/~smcpeak/elkhound>. Elkhound is a GLR  
parser generator that will generate either C++ or O'Caml parsers. Being  
GLR, it deals nicely with ambiguities, and will even let you defer  
implementing resolution of them until after you've prototyped your  
entire grammar if you wish. There's a tutorial at  
<http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elkhound/ 
tutorial.html> that may be helpful, although it's in C++ terms rather  
than O'Caml. Simply putting:

option lang_OCaml;

at the beginning of your grammar will fix that. :-)

> David McClain
> Senior Corporate Scientist
> Avisere, Inc.
>
> david.mcclain@avisere.com
> +1.520.390.7738 (USA)
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:  
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:  
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>
I hope this is helpful,
Paul Snively

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iEYEARECAAYFAkEbxDUACgkQbot1wzHBQBXfXgCdHCoAXU3CrE/VhmHozY5+z0sx
GfwAn2pgYKqTnksbdHjK6bqM0V/Dq1gN
=Pddt
-----END PGP SIGNATURE-----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Context Free Grammars?
  2004-08-12 19:25     ` Paul Snively
@ 2004-08-12 21:47       ` Erik de Castro Lopo
  2004-08-13  5:22       ` skaller
  1 sibling, 0 replies; 9+ messages in thread
From: Erik de Castro Lopo @ 2004-08-12 21:47 UTC (permalink / raw)
  To: caml-list

On Thu, 12 Aug 2004 12:25:14 -0700
Paul Snively <psnively@mac.com> wrote:

<snip>


> A lot of work has been done on relaxing the (1) restriction while still  
> allowing efficient parsing. I won't go into the history and  
> alternatives, but rather will just refer to  
> <http://www.cs.berkeley.edu/~smcpeak/elkhound>. Elkhound is a GLR  
> parser generator that will generate either C++ or O'Caml parsers. 

Since Bison 1.50, GNU Bison has had an optional GLP parser 
    http://www.delorie.com/gnu/docs/bison/bison_11.html

although I must admit that using it didn't solve any problems 
for me :-).

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo  nospam@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+
"I ran it on my DeathStation 9000 and demons flew out of my nose." --Kaz

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Context Free Grammars?
  2004-08-12 17:18 [Caml-list] Context Free Grammars? David McClain
  2004-08-12 18:02 ` Joshua Smith
@ 2004-08-13  4:45 ` skaller
  1 sibling, 0 replies; 9+ messages in thread
From: skaller @ 2004-08-13  4:45 UTC (permalink / raw)
  To: David McClain; +Cc: caml-list

On Fri, 2004-08-13 at 03:18, David McClain wrote:

> Anyone have any hints about syntax transformations so that CFG's can 
> really be used here? 

Yeah, I've been through this pain and still bump into it
a lot.

There are two tricks. The first is to change your thinking
to *bottom up*. The grammar names 'syntactic fragments'
of increasing size and complexity as it sees them,
rather than going searching for some wholistic shape.
So name your fragments uniquely; think in terms:

"the terms I'm parsing have intrinsic (synthetic) structure" 
that is passed upwards, rather than "I'm looking down
the tree for something I expect, if I don't find it
I will try something else"

The second trick is: make your grammar coarse grained
and far too general. Don't use the LALR(1) parser to
enforce constraints. Do that in the { executable code }
part or in post processing. Type checking is the obvious
example of that.

In the Felix grammar (which is LALR(1) and entirely
unambiguous), I use exactly the same coarse syntax for
executable expressions and for type annotations. 

In both cases I allow x + y * z (yup, Felix has anonymous
sum types). So when I parse

	(x + y * z) : ( t + u * v)
	// executable  : type annotation

I use (the moral equivalent of):

	expr COLON expr {
		let x = $1 in
		let t = expr_as_type $3 in
		`Coercion (x,y)
	}

Since not all executable expressions are type expressions,
I trap that in the function 'expr_as_type' and throw
an exception -- which produces a vastly superior error message
to 'Syntax Error' that is the best the parser can produce
automatically.

Finally: if you are parsing a *nasty* language, such as Python,
that isn't even remotely LALR(1), you can still use a LALR(1)
grammar with some care are trickery, to do a lot of the work.
To parse Python, I wrote  multi-stage 'token filter' to preprocess
the token stream, generating 'INDENT' and 'UNDENT' tokens,
for example (Python uses indentation to specify block structure).
Another nastiness in Python is (a,) for a unary tuple: that trailing
comma is allowed in pairs too: (a,b,) and it really screws up
LALR(1) parsing.

It took around 10 separate passes to generate a list of tokens
that I could more easily parse with Ocamlyacc.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Context Free Grammars?
  2004-08-12 19:25     ` Paul Snively
  2004-08-12 21:47       ` Erik de Castro Lopo
@ 2004-08-13  5:22       ` skaller
  2004-08-13  5:59         ` David Brown
  2004-08-13 14:20         ` Brian Hurt
  1 sibling, 2 replies; 9+ messages in thread
From: skaller @ 2004-08-13  5:22 UTC (permalink / raw)
  To: Paul Snively; +Cc: David McClain, Joshua Smith, caml-list

On Fri, 2004-08-13 at 05:25, Paul Snively wrote:

> 1) Only one token at a time is taken into account in reductions.
> 2) Right-recursion will screw you over.

Actually Ocamlyacc handles right recursion just fine.
I have no idea why .. but it does, in fact its the
prefered form in many cases: here's an example
of BOTH left and right recursion being used:
 
tlelement:
  | lelement COLON factor { $1,Some (typecode_of_expr $3) } 
  | lelement { $1,None }

lexprs:
  | tlelement COMMA lexprs { $1 :: $3 }
  | tlelement { [$1] }


This works -- its part of the Felix grammar: tlelement
is left recursive, lexprs is right recursive,
and the reason is clear -- its the most efficient
way to build a list. 

Perhaps someone can explain why this actually works ..??

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Context Free Grammars?
  2004-08-13  5:22       ` skaller
@ 2004-08-13  5:59         ` David Brown
  2004-08-13 14:20         ` Brian Hurt
  1 sibling, 0 replies; 9+ messages in thread
From: David Brown @ 2004-08-13  5:59 UTC (permalink / raw)
  To: skaller; +Cc: Paul Snively, David McClain, Joshua Smith, caml-list

On Fri, Aug 13, 2004 at 03:22:28PM +1000, skaller wrote:

> tlelement:
>   | lelement COLON factor { $1,Some (typecode_of_expr $3) } 
>   | lelement { $1,None }
> 
> lexprs:
>   | tlelement COMMA lexprs { $1 :: $3 }
>   | tlelement { [$1] }
> 
> This works -- its part of the Felix grammar: tlelement
> is left recursive, lexprs is right recursive,
> and the reason is clear -- its the most efficient
> way to build a list. 
> 
> Perhaps someone can explain why this actually works ..??

It is similar to tail recursion and non-tail recursion in ocaml code.  The
right recursive one works (at least some forms work), but you end up
shifting the entire expression off and not reducing the entire thing until
the end.  In other words, even if you can use right recursion, the left
recursive version will be more efficient.

I think there are situations where a right recursive version wouldn't be
able to parse, but a left one would.  But, it is too late, and I don't
think I can think about it right now :-)

Dave

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Context Free Grammars?
  2004-08-13  5:22       ` skaller
  2004-08-13  5:59         ` David Brown
@ 2004-08-13 14:20         ` Brian Hurt
  1 sibling, 0 replies; 9+ messages in thread
From: Brian Hurt @ 2004-08-13 14:20 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

On 13 Aug 2004, skaller wrote:

> Perhaps someone can explain why this actually works ..??

Right recursion "works".  The problem is that LALR parsers can do 
something akin to tail recursion with left recursion that they can't do 
with right recursion.  Say, for example, you have the rules:

mul_expr: mul_expr '*' term
        | mul_expr '/' term
        | mul_expr MOD term
        | term

term: NUMBER

Here, mul_expr is "left recursive"- the "recursive" call to mul_expr is on
the left hand side of the expression.  Now, with an LALR parser, no mater
how many multiplications you had, it'd use constant stack space.  Each
time it'd see a new multiplication term, it'd immediately collapse back
down to a mul_expr.  An equally valid way to parse this would be right 
recursively:

mul_expr: term '*' mul_expr
        | term '/' mul_expr
        | term MOD mul_expr
        | term

The problem here is that if you had N multiplications in a row, it'd take 
O(N) stack spaces on the parse tree to parse them all.  This would still 
work- right up until you hit the point where you have so many 
multiplication terms that you blow stack.

It works, and there are times when it's the correct thing to use, but if 
you have the choice between left recursive and right recursive grammars, 
pick left recursive for LALR parsers.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-08-13 14:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-12 17:18 [Caml-list] Context Free Grammars? David McClain
2004-08-12 18:02 ` Joshua Smith
2004-08-12 18:14   ` David McClain
2004-08-12 19:25     ` Paul Snively
2004-08-12 21:47       ` Erik de Castro Lopo
2004-08-13  5:22       ` skaller
2004-08-13  5:59         ` David Brown
2004-08-13 14:20         ` Brian Hurt
2004-08-13  4:45 ` skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox