* yacc question @ 2005-12-04 14:15 skaller 2005-12-04 15:03 ` [Caml-list] " Robert W. 2005-12-04 15:25 ` Brian Hurt 0 siblings, 2 replies; 5+ messages in thread From: skaller @ 2005-12-04 14:15 UTC (permalink / raw) To: caml-list I have the 'usual' kind of parser for expressions, with two nonterminals 'expr' and 'atom', the latter including ( expr ) and INTEGER of course. When I have input like 1; 1 + 2 ; (1 + 2) ; none of the case parse as expressions, the first and last do parse as atoms (leaving the ; in the buffer). What I want is to parse the longest possible match. The only way I can think of doing this is: top_expr: | expr token_not_allowed_in_expressions and then 'put back' the trailing token into the buffer. Is there another way? -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] yacc question 2005-12-04 14:15 yacc question skaller @ 2005-12-04 15:03 ` Robert W. 2005-12-04 16:46 ` skaller 2005-12-04 15:25 ` Brian Hurt 1 sibling, 1 reply; 5+ messages in thread From: Robert W. @ 2005-12-04 15:03 UTC (permalink / raw) To: caml-list On Mon, Dec 05, 2005 at 01:15:40AM +1100, skaller wrote: > I have the 'usual' kind of parser for expressions, with two > nonterminals 'expr' and 'atom', the latter including ( expr ) > and INTEGER of course. > > When I have input like > > 1; > 1 + 2 ; > (1 + 2) ; > > none of the case parse as expressions, the first and > last do parse as atoms (leaving the ; in the buffer). > > What I want is to parse the longest possible match. > The only way I can think of doing this is: > ocamlyacc parses longest match by default. Your rule for atoms seem to complex or you lexer isn't able to extract the token for atoms correctly from the line. > | expr token_not_allowed_in_expressions > > and then 'put back' the trailing token into the buffer. > Is there another way? > This shouldn't be necessary, normally you can redesign your parser rules to avoid this. -- Robert... ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] yacc question 2005-12-04 15:03 ` [Caml-list] " Robert W. @ 2005-12-04 16:46 ` skaller 0 siblings, 0 replies; 5+ messages in thread From: skaller @ 2005-12-04 16:46 UTC (permalink / raw) To: Robert W.; +Cc: caml-list On Sun, 2005-12-04 at 16:03 +0100, Robert W. wrote: > ocamlyacc parses longest match by default. > Your rule for atoms seem to complex or you lexer isn't able to extract > the token for atoms correctly from the line. When I parse atoms only, it all works fine. The problem is that if the user specifies a production like: #statement select expr within statement ; then if I parse 'expr' with 'atom' this will not be allowed: select 1 + 2 within print x; because '1 + 2' isn't an atom. The user would be forced to write: select (1 + 2) within print x; But if I parse with 'expr' instead of atom, the parse fails when it hits the unknown symbol 'within'. > > | expr token_not_allowed_in_expressions > > > > and then 'put back' the trailing token into the buffer. > > Is there another way? > > > This shouldn't be necessary, normally you can redesign your parser > rules to avoid this. The problem is I'm trying to add 'on the fly' user defined grammar productions -- so the 'grammar' is extensible. This will be done by recursive descent, but hooked inside, and hooking, the existing ocamlyacc grammar. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] yacc question 2005-12-04 14:15 yacc question skaller 2005-12-04 15:03 ` [Caml-list] " Robert W. @ 2005-12-04 15:25 ` Brian Hurt 2005-12-04 16:37 ` skaller 1 sibling, 1 reply; 5+ messages in thread From: Brian Hurt @ 2005-12-04 15:25 UTC (permalink / raw) To: skaller; +Cc: caml-list On Mon, 5 Dec 2005, skaller wrote: > I have the 'usual' kind of parser for expressions, with two > nonterminals 'expr' and 'atom', the latter including ( expr ) > and INTEGER of course. > > When I have input like > > 1; > 1 + 2 ; > (1 + 2) ; > > none of the case parse as expressions, the first and > last do parse as atoms (leaving the ; in the buffer). > > What I want is to parse the longest possible match. > The only way I can think of doing this is: > > top_expr: > | expr token_not_allowed_in_expressions > > and then 'put back' the trailing token into the buffer. > Is there another way? > The standard way to implement this is: statement: expression SEMICOLON expression: | mul_expression | expression PLUS mul_expression | expression MINUS mul_expression mul_expression: atom_expression | mul_expression TIMES atom_expression | mul_expression DIVIDE atom_expression | mul_expression MODULO atom_expression atom_expression: ATOM | OPEN_PAREN expression CLOSE_PAREN Notice that precedence is taken care of immediately- 3 + 4 * 5 is parsed as 3 + (4 * 5). Also notice that the semicolon is what terminates us- we keep collecting up an expression until we see a semicolon- only then do we complete the statement. Hope this helps. Brian ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] yacc question 2005-12-04 15:25 ` Brian Hurt @ 2005-12-04 16:37 ` skaller 0 siblings, 0 replies; 5+ messages in thread From: skaller @ 2005-12-04 16:37 UTC (permalink / raw) To: Brian Hurt; +Cc: caml-list On Sun, 2005-12-04 at 09:25 -0600, Brian Hurt wrote: > > On Mon, 5 Dec 2005, skaller wrote: > > > I have the 'usual' kind of parser for expressions, with two > > nonterminals 'expr' and 'atom', the latter including ( expr ) > > and INTEGER of course. > > > > When I have input like > > > > 1; > > 1 + 2 ; > > (1 + 2) ; > > > > none of the case parse as expressions, the first and > > last do parse as atoms (leaving the ; in the buffer). > > > > What I want is to parse the longest possible match. > > The only way I can think of doing this is: > > > > top_expr: > > | expr token_not_allowed_in_expressions > > > > and then 'put back' the trailing token into the buffer. > > Is there another way? > > > > The standard way to implement this is: > > statement: > expression SEMICOLON > > expression: > | mul_expression > | expression PLUS mul_expression > | expression MINUS mul_expression > > mul_expression: > atom_expression > | mul_expression TIMES atom_expression > | mul_expression DIVIDE atom_expression > | mul_expression MODULO atom_expression > > atom_expression: > ATOM > | OPEN_PAREN expression CLOSE_PAREN > > Notice that precedence is taken care of immediately- 3 + 4 * 5 is parsed > as 3 + (4 * 5). Also notice that the semicolon is what terminates us- we > keep collecting up an expression until we see a semicolon- only then do we > complete the statement. Expressions cannot be statements, but I have instead: statement: NAME LEFT_ARROW expression SEMICOLON for example, so basically your example holds with a modification. The problem is I want to parse a *user defined statement*. This will be done by a preprocessor definition, something like: #keyword within #statement select expr within statement ; This makes 'select' and 'within' keywords, when ocamlyacc sees the user defined statement keyword 'select', it looks up a table and finds the grammar production. I then parse by recursive descent -- backtracking is cool in an FPL, my tokens are a list, so I can backtrack easily. When interpreting the grammar, if I see the token 'expr' then I parse an expression using the ocamlyacc entry point. Similarly for statements -- in particular, since the user defined statements can be 'added' into the statement grammar, recursion in the 'within' clause will also include the new statements (the recursion is covariant?) There is no problem with statements .. but for expressions, I cannot just terminate on SEMICOLON -- I have to terminate on many symbols, which can't be part of an expression -- and leave them in the token buffer too. In the example grammar, 'within' terminates the expression. The problem of course is that Ocamlyacc doesn't know about the new grammar .. but I'm still using it to parse expressions. Mixing LALR1 and an RD parser seems messy .. but I see no other way to obtain high performance for 'the usual case' whilst still allowing extensibility -- Ocaml doesn't come with an LL parser generator. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-12-04 16:48 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-12-04 14:15 yacc question skaller 2005-12-04 15:03 ` [Caml-list] " Robert W. 2005-12-04 16:46 ` skaller 2005-12-04 15:25 ` Brian Hurt 2005-12-04 16:37 ` skaller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox