* [Caml-list] Context Free Grammars? @ 2004-08-12 17:18 David McClain 2004-08-12 18:02 ` Joshua Smith 2004-08-13 4:45 ` skaller 0 siblings, 2 replies; 9+ messages in thread From: David McClain @ 2004-08-12 17:18 UTC (permalink / raw) To: caml-list I think my mind has been poisoned from exposure to recursive descent parsing... I am running into a huge number of reduce/reduce conflicts in OCamlYacc. It is beginning to dawn on me that Yacc really is for context-free grammars... (that's what they said! only now I'm starting to realize it..) So the question is, does OCaml actually have a CFG description? I'm confused about the similarity of patterns and expression from the viewpoint of CFG description. They share many similarities, and in the correct context there can be no confusion. But when I try to generate a parser it appears that pieces of expression syntax and pieces of pattern syntax are confusing the parser. If the parser really ignores any kind of context -- such as the parent tree for the subproduction -- then the lack of any context knowledge would certainly be confusing. Anyone have any hints about syntax transformations so that CFG's can really be used here? I read a tremendous number of references that indicate how nasty these reduce/reduce conflicts can be. I believe them. Trouble is they don't go very far in explaining how to fix these conflicts, other than to state that "you have a mistake in your grammar". Some references do hint that syntax description transformations can become unwieldy and unnatural to read. I have to stop thinking like recursive descent and try to view the universe as flat-land... David McClain Senior Corporate Scientist Avisere, Inc. david.mcclain@avisere.com +1.520.390.7738 (USA) ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars? 2004-08-12 17:18 [Caml-list] Context Free Grammars? David McClain @ 2004-08-12 18:02 ` Joshua Smith 2004-08-12 18:14 ` David McClain 2004-08-13 4:45 ` skaller 1 sibling, 1 reply; 9+ messages in thread From: Joshua Smith @ 2004-08-12 18:02 UTC (permalink / raw) To: David McClain; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 193 bytes --] David McClain wrote: > I think my mind has been poisoned from exposure to recursive descent > parsing... Out of curiosity, why not use recursive descent parsing instead of ocamlyacc? -jbs [-- Attachment #2: josh.vcf --] [-- Type: text/x-vcard, Size: 277 bytes --] begin:vcard fn:Joshua Smith n:Smith;Joshua org:;IT adr:Suite 2300;;200 W. Jackson;Chicago;IL;60606;USA email;internet:josh@trdlnk.com title:200 W. Jackson Suite 2300 tel;work:312-264-2046 tel;fax:312-264-2001 tel;cell:312-933-9916 x-mozilla-html:FALSE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars? 2004-08-12 18:02 ` Joshua Smith @ 2004-08-12 18:14 ` David McClain 2004-08-12 19:25 ` Paul Snively 0 siblings, 1 reply; 9+ messages in thread From: David McClain @ 2004-08-12 18:14 UTC (permalink / raw) To: Joshua Smith; +Cc: caml-list > Out of curiosity, why not use recursive descent parsing instead of > ocamlyacc? > Well, not a bad idea... I suppose. It would give me better control over the course of things. I had the impression that YACC was a much faster system, being table driven and all. And possibly less bulky in generated code, but that isn't much of an argument in its favor today... The problems I am seeing have to do with factoring the grammar, similar to factoring threaded code. However the parent trees of these factors have very different semantic notions about what that fragment should mean. Recursive descent would certainly handle this better. I am not a YACC expert (no kidding!?) but I always had the notion that nonterminal reduction trees would somehow guide the reduce decisions. Now, however, it appears that YACC is simply looking blindly for syntactic token chains and reducing on them with absolutely no notion of who might be interested in the result. Kind of takes the wind out of my ancient understandings of YACC. [So I guess I was writing my grammars as though there were some kind of recursive descent engine in control. Not apparently so...] David McClain Senior Corporate Scientist Avisere, Inc. david.mcclain@avisere.com +1.520.390.7738 (USA) ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars? 2004-08-12 18:14 ` David McClain @ 2004-08-12 19:25 ` Paul Snively 2004-08-12 21:47 ` Erik de Castro Lopo 2004-08-13 5:22 ` skaller 0 siblings, 2 replies; 9+ messages in thread From: Paul Snively @ 2004-08-12 19:25 UTC (permalink / raw) To: David McClain; +Cc: Joshua Smith, caml-list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi David! On Aug 12, 2004, at 11:14 AM, David McClain wrote: > I am not a YACC expert (no kidding!?) but I always had the notion that > nonterminal reduction trees would somehow guide the reduce decisions. > Now, however, it appears that YACC is simply looking blindly for > syntactic token chains and reducing on them with absolutely no notion > of who might be interested in the result. Kind of takes the wind out > of my ancient understandings of YACC. > Quite right. YACC, as you no doubt know, is LALR(1), which expands to "Look Ahead Left-Right 1, which is a fancy way of saying basically two things: 1) Only one token at a time is taken into account in reductions. 2) Right-recursion will screw you over. More in a moment. > [So I guess I was writing my grammars as though there were some kind > of recursive descent engine in control. Not apparently so...] > Right! And hand-written recursive-descent parsers are almost always LL(1): 1) Only one token at a time is taken into account in reductions. 2) Left-recursion will screw you over. If you're accustomed to writing LL(1) grammars, you'll need to factor right recursion to left recursion in order to go LALR(1). But you're still left with that pesky (1), which really seems to be what your post is about. So forget about LALR vs. LL and let's look at some solutions to the (1) problem. O'Caml as a suite already has some quite nice parsing and lexing tools, and third parties have made others. Let me first recommend that you have a look at the Camlp4 tutorial at <http://caml.inria.fr/camlp4/tutorial>. Camlp4 is O'Caml's "preprocessor," but unfortunately that term has become one of rather ill repute thanks to the C/C++ communities. Camlp4 is much closer in expressive power and safety to Scheme's hygenic macros or to a good LL-based parser-generator system, as the tutorial makes abundantly clear. However, as nice as Camlp4 undeniably is, it still leaves you with your (1) problem barring lexer hacks as described at <http://caml.inria.fr/camlp4/tutorial/tutorial003.html>: "The input 'A C' raises Stream.Error. There is no simple solution to this problem, but there is a solution (not very clean, actually): create a entry from a parser (it is possible via the function Grammar.Entry.of_parser). This parser must scan the stream using Stream.npeek until the number of necessary tokens allows to make the difference; return unit or raise Stream.Failure according to the cases." This is the traditional "smart lexer" approach to handling ambiguous (i.e. real-world) grammars in either an LALR(1) (ocamlyacc) or LL(1) (Camlp4) framework. A lot of work has been done on relaxing the (1) restriction while still allowing efficient parsing. I won't go into the history and alternatives, but rather will just refer to <http://www.cs.berkeley.edu/~smcpeak/elkhound>. Elkhound is a GLR parser generator that will generate either C++ or O'Caml parsers. Being GLR, it deals nicely with ambiguities, and will even let you defer implementing resolution of them until after you've prototyped your entire grammar if you wish. There's a tutorial at <http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elkhound/ tutorial.html> that may be helpful, although it's in C++ terms rather than O'Caml. Simply putting: option lang_OCaml; at the beginning of your grammar will fix that. :-) > David McClain > Senior Corporate Scientist > Avisere, Inc. > > david.mcclain@avisere.com > +1.520.390.7738 (USA) > > ------------------- > To unsubscribe, mail caml-list-request@inria.fr Archives: > http://caml.inria.fr > Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: > http://caml.inria.fr/FAQ/ > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > I hope this is helpful, Paul Snively -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iEYEARECAAYFAkEbxDUACgkQbot1wzHBQBXfXgCdHCoAXU3CrE/VhmHozY5+z0sx GfwAn2pgYKqTnksbdHjK6bqM0V/Dq1gN =Pddt -----END PGP SIGNATURE----- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars? 2004-08-12 19:25 ` Paul Snively @ 2004-08-12 21:47 ` Erik de Castro Lopo 2004-08-13 5:22 ` skaller 1 sibling, 0 replies; 9+ messages in thread From: Erik de Castro Lopo @ 2004-08-12 21:47 UTC (permalink / raw) To: caml-list On Thu, 12 Aug 2004 12:25:14 -0700 Paul Snively <psnively@mac.com> wrote: <snip> > A lot of work has been done on relaxing the (1) restriction while still > allowing efficient parsing. I won't go into the history and > alternatives, but rather will just refer to > <http://www.cs.berkeley.edu/~smcpeak/elkhound>. Elkhound is a GLR > parser generator that will generate either C++ or O'Caml parsers. Since Bison 1.50, GNU Bison has had an optional GLP parser http://www.delorie.com/gnu/docs/bison/bison_11.html although I must admit that using it didn't solve any problems for me :-). Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo nospam@mega-nerd.com (Yes it's valid) +-----------------------------------------------------------+ "I ran it on my DeathStation 9000 and demons flew out of my nose." --Kaz ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars? 2004-08-12 19:25 ` Paul Snively 2004-08-12 21:47 ` Erik de Castro Lopo @ 2004-08-13 5:22 ` skaller 2004-08-13 5:59 ` David Brown 2004-08-13 14:20 ` Brian Hurt 1 sibling, 2 replies; 9+ messages in thread From: skaller @ 2004-08-13 5:22 UTC (permalink / raw) To: Paul Snively; +Cc: David McClain, Joshua Smith, caml-list On Fri, 2004-08-13 at 05:25, Paul Snively wrote: > 1) Only one token at a time is taken into account in reductions. > 2) Right-recursion will screw you over. Actually Ocamlyacc handles right recursion just fine. I have no idea why .. but it does, in fact its the prefered form in many cases: here's an example of BOTH left and right recursion being used: tlelement: | lelement COLON factor { $1,Some (typecode_of_expr $3) } | lelement { $1,None } lexprs: | tlelement COMMA lexprs { $1 :: $3 } | tlelement { [$1] } This works -- its part of the Felix grammar: tlelement is left recursive, lexprs is right recursive, and the reason is clear -- its the most efficient way to build a list. Perhaps someone can explain why this actually works ..?? -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars? 2004-08-13 5:22 ` skaller @ 2004-08-13 5:59 ` David Brown 2004-08-13 14:20 ` Brian Hurt 1 sibling, 0 replies; 9+ messages in thread From: David Brown @ 2004-08-13 5:59 UTC (permalink / raw) To: skaller; +Cc: Paul Snively, David McClain, Joshua Smith, caml-list On Fri, Aug 13, 2004 at 03:22:28PM +1000, skaller wrote: > tlelement: > | lelement COLON factor { $1,Some (typecode_of_expr $3) } > | lelement { $1,None } > > lexprs: > | tlelement COMMA lexprs { $1 :: $3 } > | tlelement { [$1] } > > This works -- its part of the Felix grammar: tlelement > is left recursive, lexprs is right recursive, > and the reason is clear -- its the most efficient > way to build a list. > > Perhaps someone can explain why this actually works ..?? It is similar to tail recursion and non-tail recursion in ocaml code. The right recursive one works (at least some forms work), but you end up shifting the entire expression off and not reducing the entire thing until the end. In other words, even if you can use right recursion, the left recursive version will be more efficient. I think there are situations where a right recursive version wouldn't be able to parse, but a left one would. But, it is too late, and I don't think I can think about it right now :-) Dave ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars? 2004-08-13 5:22 ` skaller 2004-08-13 5:59 ` David Brown @ 2004-08-13 14:20 ` Brian Hurt 1 sibling, 0 replies; 9+ messages in thread From: Brian Hurt @ 2004-08-13 14:20 UTC (permalink / raw) To: skaller; +Cc: caml-list On 13 Aug 2004, skaller wrote: > Perhaps someone can explain why this actually works ..?? Right recursion "works". The problem is that LALR parsers can do something akin to tail recursion with left recursion that they can't do with right recursion. Say, for example, you have the rules: mul_expr: mul_expr '*' term | mul_expr '/' term | mul_expr MOD term | term term: NUMBER Here, mul_expr is "left recursive"- the "recursive" call to mul_expr is on the left hand side of the expression. Now, with an LALR parser, no mater how many multiplications you had, it'd use constant stack space. Each time it'd see a new multiplication term, it'd immediately collapse back down to a mul_expr. An equally valid way to parse this would be right recursively: mul_expr: term '*' mul_expr | term '/' mul_expr | term MOD mul_expr | term The problem here is that if you had N multiplications in a row, it'd take O(N) stack spaces on the parse tree to parse them all. This would still work- right up until you hit the point where you have so many multiplication terms that you blow stack. It works, and there are times when it's the correct thing to use, but if you have the choice between left recursive and right recursive grammars, pick left recursive for LALR parsers. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Context Free Grammars? 2004-08-12 17:18 [Caml-list] Context Free Grammars? David McClain 2004-08-12 18:02 ` Joshua Smith @ 2004-08-13 4:45 ` skaller 1 sibling, 0 replies; 9+ messages in thread From: skaller @ 2004-08-13 4:45 UTC (permalink / raw) To: David McClain; +Cc: caml-list On Fri, 2004-08-13 at 03:18, David McClain wrote: > Anyone have any hints about syntax transformations so that CFG's can > really be used here? Yeah, I've been through this pain and still bump into it a lot. There are two tricks. The first is to change your thinking to *bottom up*. The grammar names 'syntactic fragments' of increasing size and complexity as it sees them, rather than going searching for some wholistic shape. So name your fragments uniquely; think in terms: "the terms I'm parsing have intrinsic (synthetic) structure" that is passed upwards, rather than "I'm looking down the tree for something I expect, if I don't find it I will try something else" The second trick is: make your grammar coarse grained and far too general. Don't use the LALR(1) parser to enforce constraints. Do that in the { executable code } part or in post processing. Type checking is the obvious example of that. In the Felix grammar (which is LALR(1) and entirely unambiguous), I use exactly the same coarse syntax for executable expressions and for type annotations. In both cases I allow x + y * z (yup, Felix has anonymous sum types). So when I parse (x + y * z) : ( t + u * v) // executable : type annotation I use (the moral equivalent of): expr COLON expr { let x = $1 in let t = expr_as_type $3 in `Coercion (x,y) } Since not all executable expressions are type expressions, I trap that in the function 'expr_as_type' and throw an exception -- which produces a vastly superior error message to 'Syntax Error' that is the best the parser can produce automatically. Finally: if you are parsing a *nasty* language, such as Python, that isn't even remotely LALR(1), you can still use a LALR(1) grammar with some care are trickery, to do a lot of the work. To parse Python, I wrote multi-stage 'token filter' to preprocess the token stream, generating 'INDENT' and 'UNDENT' tokens, for example (Python uses indentation to specify block structure). Another nastiness in Python is (a,) for a unary tuple: that trailing comma is allowed in pairs too: (a,b,) and it really screws up LALR(1) parsing. It took around 10 separate passes to generate a list of tokens that I could more easily parse with Ocamlyacc. -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-08-13 14:12 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-08-12 17:18 [Caml-list] Context Free Grammars? David McClain 2004-08-12 18:02 ` Joshua Smith 2004-08-12 18:14 ` David McClain 2004-08-12 19:25 ` Paul Snively 2004-08-12 21:47 ` Erik de Castro Lopo 2004-08-13 5:22 ` skaller 2004-08-13 5:59 ` David Brown 2004-08-13 14:20 ` Brian Hurt 2004-08-13 4:45 ` skaller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox