* [Caml-list] yacc & lex: Does the contructor name matter? @ 2002-12-02 8:55 Andrzej M. Ostruszka 2002-12-03 10:01 ` Xavier Leroy 0 siblings, 1 reply; 3+ messages in thread From: Andrzej M. Ostruszka @ 2002-12-02 8:55 UTC (permalink / raw) To: Caml List I've got two questions (I hope they qualify to this list -- otherwise my apologies) 1. Does the constructor name matter in yacc and lex? I've suddenly wanted to have an interpreter (I'm rather numeric guy so this comes to me as an surprise :)) so as you probably guess I started with a calculator :). The problem is that I don't understand behaviour of the following implementation -- if I change the constructor name EoL into EOF then everything works OK but otherwise I have a "parse error". Could someone enlighten me? If the grammar (line rule especially) is incorrect then I should always get "parse error", if it is correct then it should work no matter the name of the constructor --- at least this is what I think. --->8--parser.mly----- %token <float> Num %token Plus Minus Times Div Pow %token LParen RParen %token EoL <-- change this into EOF and it will work %left Plus Minus %left Times Div %right Pow %nonassoc UMinus %start line %type <float option> line %% line: | expr { Some $1 } | /* nothing */ { None } ; expr: | Num { $1 } | LParen expr RParen { $2 } | expr Plus expr { $1 +. $3 } | expr Minus expr { $1 -. $3 } | expr Times expr { $1 *. $3 } | expr Div expr { $1 /. $3 } | expr Pow expr { $1 ** $3 } | Minus expr %prec UMinus { -. $2 } ; --->8--lexer.mll----- { open Parser } let white = [' ' '\t'] let digit = ['0'-'9'] rule token = parse | white+ { token lexbuf } (* skip white *) | digit* '.'? digit* (['e' 'E'] ['+' '-']? digit+)? { Num (float_of_string (Lexing.lexeme lexbuf)) } | '(' { LParen } | ')' { RParen } | '+' { Plus } | '-' { Minus } | '*' { Times } | '/' { Div } | '^' { Pow } | eof { EoL } <--- change this to EOF --->8---------------- And if that matters here's calc.ml let main_loop () = let go_on = ref true in while !go_on do begin try let lexbuf = Lexing.from_string (read_line ()) in begin match Parser.line Lexer.token lexbuf with | Some res -> print_float res; print_newline () | None -> print_endline "Give some meaningful expression" end with | End_of_file -> go_on := false | Failure s -> print_endline ("failure: " ^ s) | Parsing.Parse_error -> print_endline "parse error" end; flush stdout done let _ = main_loop () 2. My second question is what is the most efficient representation of the compiled expression. I don't care how long (with some reasonable limits :)) it will take to compile but I want the result be the fastest afterwords. The only representations I can come up with are: - keep AST and recursively pattern match on it - compile AST to stack like evaluator: there's stack of the results and a list of functions that perform operations on it: push_var, push_const, plus, minus etc... and the result is left as the only value on the stack after applying all function in the list - compile AST to list of functions which will take in arguments the operands and the "pointer" of the place where to store the result. I don't have now a clear idea how to do it (I still think in C :)) but the motivation is to remove pushing functions and stack management of the previous solution I guess that most of you are quite experienced in this stuff and can shed a bit of light on this subject. Best regards PS. Sorry for a length post -- ____ _ ___ / | \_/ |/ _ \ Andrzej Marek Ostruszka / _ | | (_) | Instytut Fizyki, Uniwersytet Jagiellonski (Cracow) /_/ L|_|V|_|\___/ (PGP <-- finger ostruszk@order.if.uj.edu.pl) ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Caml-list] yacc & lex: Does the contructor name matter? 2002-12-02 8:55 [Caml-list] yacc & lex: Does the contructor name matter? Andrzej M. Ostruszka @ 2002-12-03 10:01 ` Xavier Leroy 2002-12-03 10:48 ` [Caml-list] " Andrzej M. Ostruszka 0 siblings, 1 reply; 3+ messages in thread From: Xavier Leroy @ 2002-12-03 10:01 UTC (permalink / raw) To: Caml List, ostruszk > 1. Does the constructor name matter in yacc and lex? > > I've suddenly wanted to have an interpreter (I'm rather numeric guy so > this comes to me as an surprise :)) so as you probably guess I started > with a calculator :). The problem is that I don't understand behaviour > of the following implementation -- if I change the constructor name EoL > into EOF then everything works OK but otherwise I have a "parse error". > Could someone enlighten me? If the grammar (line rule especially) is > incorrect then I should always get "parse error", if it is correct then > it should work no matter the name of the constructor --- at least this > is what I think. The EOF token is indeed special, in the sense that it is the only valid token for the lookahead of a grammar entry point. To understand why in full details, you need to look at the .output file generated by ocamlyacc -v. But here is a simplified account. The actual grammar that ocamlyacc compiles down to a PDA has some additional productions: $accept: %entry EOF { should not reduce } %entry: '\001' line { stop parsing with result $2 } line: expr { Some $1 } | /*nothing*/ { None } ... where "$accept" is the real entry point and EOF is either a user-provided token with that name, or if none was provided, an internally-generated token distinct from any other. (The %entry and '\001' business is used to select between multiple entry points; you can ignore it for the purposes of this explanation.) Here, the parsing automaton cannot reduce "line" by the "nothing" production (thus terminating the parsing) without looking at the next token, which is EoL in your case. However, EoL and EOF are distinct tokens, so the input doesn't match the grammar, and a parse error is generated. The best way to avoid this difficulty is to make sure that all your grammar entry points are explicitly terminated by a token, as shown in the example in section 12.6 of the manual: %start line %% line: | expr EoL { Some $1 } | /* nothing */ EoL { None } ... Here, the parsing automaton will reduce "line" by the correct rule just by seeing the EoL token; no test "is the next token EOF?" will ever be performed. The moral of this story is thus: always terminate your entry points. > 2. My second question is what is the most efficient representation of > the compiled expression. I don't care how long (with some reasonable > limits :)) it will take to compile but I want the result be the fastest > afterwords. The only representations I can come up with are: > - keep AST and recursively pattern match on it > - compile AST to stack like evaluator: there's stack of the results and > a list of functions that perform operations on it: push_var, > push_const, plus, minus etc... and the result is left as the only > value on the stack after applying all function in the list > - compile AST to list of functions which will take in arguments the > operands and the "pointer" of the place where to store the result. > I don't have now a clear idea how to do it (I still think in C :)) but > the motivation is to remove pushing functions and stack management of > the previous solution I don't understand your third option. My advice would be to go for the simplest, clearest solution, which is the first you mentioned. Your second solution is closer to a bytecode interpreter (a virtual machine), but to make it significantly more efficient than the first, you'd need to go further: encode instructions as a datatype, not as functions; represent the program by arrays or strings rather than lists of instructions; etc. Make sure it's worth the effort before embarking on this task :-) - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 3+ messages in thread
* [Caml-list] Re: yacc & lex: Does the contructor name matter? 2002-12-03 10:01 ` Xavier Leroy @ 2002-12-03 10:48 ` Andrzej M. Ostruszka 0 siblings, 0 replies; 3+ messages in thread From: Andrzej M. Ostruszka @ 2002-12-03 10:48 UTC (permalink / raw) To: Caml List On Tue, Dec 03 (2002), Xavier Leroy wrote: [...] First of all thank you for an explanation > The actual grammar that ocamlyacc compiles down to a PDA has some > additional productions: > > $accept: %entry EOF { should not reduce } > %entry: '\001' line { stop parsing with result $2 } > line: expr { Some $1 } > | /*nothing*/ { None } I've looked at the .ml and .mli but haven't found anything regarding EOF so that's why I asked. > The best way to avoid this difficulty is to make sure that all your > grammar entry points are explicitly terminated by a token, as shown > in the example in section 12.6 of the manual: > > %start line > > %% > line: > | expr EoL { Some $1 } > | /* nothing */ EoL { None } :), I've read that section and knew the solution and just wanted to know the cause of this behaviour. > > - compile AST to list of functions which will take in arguments the > > operands and the "pointer" of the place where to store the result. > > I don't have now a clear idea how to do it (I still think in C :)) but > > the motivation is to remove pushing functions and stack management of > > the previous solution > > I don't understand your third option. Well I thought about having some list of records { double x, y; double* r; and maybe function pointer } where the function would take x and y and store the result where r points to and the job of the compiler would be to craft those pointers (and x's in case of constants) appropriately. As I've told I'm still rather C than OCaml guy :). Best regards PS. Doubling of the posts is not my fault it happens "somewhere" in France (one can check that by message ID). Maybe this is because I'm sending from another address that I'm subscribed? -- ____ _ ___ / | \_/ |/ _ \ Andrzej Marek Ostruszka / _ | | (_) | Instytut Fizyki, Uniwersytet Jagiellonski (Cracow) /_/ L|_|V|_|\___/ (PGP <-- finger ostruszk@order.if.uj.edu.pl) ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2002-12-03 19:16 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-12-02 8:55 [Caml-list] yacc & lex: Does the contructor name matter? Andrzej M. Ostruszka 2002-12-03 10:01 ` Xavier Leroy 2002-12-03 10:48 ` [Caml-list] " Andrzej M. Ostruszka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox