* Another nasty with ocamlyacc: magic needed? @ 2005-12-05 5:35 skaller 2005-12-05 6:25 ` [Caml-list] " Geoff Wozniak ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: skaller @ 2005-12-05 5:35 UTC (permalink / raw) To: caml-list I have what looks like a showstopper (meaning black magic seems to be the only solution). I need two *distinct* parsers which use the same set of tokens. This is currently impossible AFAICS without magic (distinct -- in a different module entirely) Roughly, the problem is that Ocamlyacc generates the tokens, and they go in the parser.mli file. So any second parser has to depend on it. However the only way to *invoke* the parser is to pass it inside a token. the interface for that parser must be defined AFTER the parser.mli since it needs to know the token types. But it has to be known INSIDE the actual parser, to invoke it. There is no way to extend the interface Ocamlyacc generates for a parser (you can add extra helper functions but they don't go in the interface). Thus -- it is impossible. IMHO the traditional yacc style design is poor, but it works fine for C because there is no problem with recursive dependencies in interfaces (header files). In Ocaml, there is such a problem, and so the yacc design basically fails -- it isn't suitable for Ocaml. But i need to work around this somehow, does anyone have any smart ideas how to fix the problem, without abandoning Ocamlyacc? If I could abstract the type of the secondary parser so it didn't need to expose the knowledge of tokens, it would fix the problem I think -- then the type could be defined BEFORE the ocamlyacc parser, but the implementation defined AFTER it (using Ocaml classes and class types). Just to be clear the situation is: user_statement: | USER_KEYWORD { let srt, kw = $1 in let sr = slift srt in print_endline ("USER KEYWORD " ^ kw); `AST_nop (sr,kw) } Instead of `AST_nop I need to call a function which parses more of the input, probably passing it the current lexbuf as well. The lexer knows this information and could put it into the USER_KEYWORD token .. if it were possible to define the type of the attribute of the USER_KEYWORD token.. but that can only be done in one place .. prior to the parser mli file. So I think I may have to use Obj.magic .. or completely replace Ocamlyacc. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Another nasty with ocamlyacc: magic needed? 2005-12-05 5:35 Another nasty with ocamlyacc: magic needed? skaller @ 2005-12-05 6:25 ` Geoff Wozniak 2005-12-05 7:08 ` Alessandro Baretta 2005-12-06 0:41 ` skaller 2 siblings, 0 replies; 6+ messages in thread From: Geoff Wozniak @ 2005-12-05 6:25 UTC (permalink / raw) To: caml-list On 5-Dec-05, at 12:35 AM, skaller wrote: > I need two *distinct* parsers which use the same set > of tokens. This is currently impossible AFAICS without > magic (distinct -- in a different module entirely) > For the record, I ran into this problem myself. The easiest solution to my particular problem was to switch to using Lisp, although I am not advocating that you do the same. (I'm not interested in any kind of language evangelizing flamewar.) I couldn't think of an appealing way to deal with the problem. -- Geoff Wozniak PhD Candidate Department of Computer Science University of Western Ontario ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Another nasty with ocamlyacc: magic needed? 2005-12-05 5:35 Another nasty with ocamlyacc: magic needed? skaller 2005-12-05 6:25 ` [Caml-list] " Geoff Wozniak @ 2005-12-05 7:08 ` Alessandro Baretta 2005-12-05 8:52 ` skaller 2005-12-06 0:41 ` skaller 2 siblings, 1 reply; 6+ messages in thread From: Alessandro Baretta @ 2005-12-05 7:08 UTC (permalink / raw) To: skaller; +Cc: caml-list skaller wrote: > I have what looks like a showstopper (meaning black magic > seems to be the only solution). > > I need two *distinct* parsers which use the same set > of tokens. This is currently impossible AFAICS without > magic (distinct -- in a different module entirely) Let's not overstate the problem. Yacc is an imperfect tool. Everybody who used it knows this, but it is usually flexible enough that you can--eventually--get it to do what you have in mind. I would suggest that you consider a couple of tools which might help get out of this /impasse/. Firstly, remember that can define as many parser entry points as your heat desires. Each entry point is virtually a distinct parser, except that the token type is shared and the non-terminal namespace is also shared, which might be a bane or a boon depending on whether the two grammars you wish to impolement share some productions or not. Secondly, after compiling the automaton with ocamlyacc, you can always delete and regenerate the interface with ocamlc -i. My Makefile generator for ocaml, builder, actually handles this pretty nicely, allowing the parser to export anything you put in the header. Alex -- ********************************************************************* http://www.barettadeit.com/ Baretta DE&IT A division of Baretta SRL tel. +39 02 370 111 55 fax. +39 02 370 111 54 Our technology: The Application System/Xcaml (AS/Xcaml) <http://www.asxcaml.org/> The FreerP Project <http://www.freerp.org/> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Another nasty with ocamlyacc: magic needed? 2005-12-05 7:08 ` Alessandro Baretta @ 2005-12-05 8:52 ` skaller 2005-12-05 9:05 ` Alessandro Baretta 0 siblings, 1 reply; 6+ messages in thread From: skaller @ 2005-12-05 8:52 UTC (permalink / raw) To: Alessandro Baretta; +Cc: caml-list On Mon, 2005-12-05 at 08:08 +0100, Alessandro Baretta wrote: > skaller wrote: > > I have what looks like a showstopper (meaning black magic > > seems to be the only solution). > > > > I need two *distinct* parsers which use the same set > > of tokens. This is currently impossible AFAICS without > > magic (distinct -- in a different module entirely) > Let's not overstate the problem. Yacc is an imperfect tool. Yup :) > Firstly, remember that can > define as many parser entry points as your heat desires. I do. It is useful. But the secondary parser is an RD parser interpreter. The idea isn't to allow arbitrary grammar extensions .. only to make particular yacc productions open-recursive closed by the dynamically built table of extensions. Thus the RD parser calls back into yacc entry points, and, the yacc productions call into the RD parser entry points. This is organised via the lexer. > Secondly, after compiling the automaton with ocamlyacc, you can always > delete and regenerate the interface with ocamlc -i. Yes, that is possible, but the result is contains undesired entry points (tables and whatnot). It also isn't clear how good a solution this is if I have to paste many other modules into it -- I can always build the whole program as a single module :) -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Another nasty with ocamlyacc: magic needed? 2005-12-05 8:52 ` skaller @ 2005-12-05 9:05 ` Alessandro Baretta 0 siblings, 0 replies; 6+ messages in thread From: Alessandro Baretta @ 2005-12-05 9:05 UTC (permalink / raw) To: skaller; +Cc: caml-list skaller wrote: > I do. It is useful. But the secondary parser is an RD parser > interpreter. The idea isn't to allow arbitrary grammar > extensions .. only to make particular yacc productions > open-recursive closed by the dynamically built table > of extensions. > > Thus the RD parser calls back into yacc entry points, > and, the yacc productions call into the RD parser entry > points. This is organised via the lexer. By applying the 'ocamlyacc parser.mly; rm parser.mli; ocamlc -i parser.ml' approach, you can enclose the code generated by ocamlyacc into a structure. I used this approach when I was young and inexperienced to generate a functor from a parser definition, whereby the parser actions called a bunch of functions passed as parameters to the functor. I can also imagine that, possibly with some help from pa_ocamllex, you can define a pair of mutually recursive lexer and parser modules, which is about as far as you think about going with the yacc architecture. All this does require some "black magic", but no Obj.magic, which is rather neat. Alex -- ********************************************************************* http://www.barettadeit.com/ Baretta DE&IT A division of Baretta SRL tel. +39 02 370 111 55 fax. +39 02 370 111 54 Our technology: The Application System/Xcaml (AS/Xcaml) <http://www.asxcaml.org/> The FreerP Project <http://www.freerp.org/> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Another nasty with ocamlyacc: magic needed? 2005-12-05 5:35 Another nasty with ocamlyacc: magic needed? skaller 2005-12-05 6:25 ` [Caml-list] " Geoff Wozniak 2005-12-05 7:08 ` Alessandro Baretta @ 2005-12-06 0:41 ` skaller 2 siblings, 0 replies; 6+ messages in thread From: skaller @ 2005-12-06 0:41 UTC (permalink / raw) To: caml-list Thank you to all who helped on this and my other ocamlyacc problem. A proof-of-principle implementation is now working!! ------USER DEFINED STATEMENTS DEMO------- #statement while expr do statements done ; #keyword whenever #statement repeat statements whenever expr ; #statement forall ident in expr do statements done ; repeat f x; g k; whenever 1>0; while 1>0 do f x; g x; done; forall i in (1,2,3) do f i; done; -------------------------------------- The code integrates Ocamlyacc LALR1 parser for a complex language (Felix) including expressions and statements, with a hand written recursive descent parser engine driven by a table of extensions to the statement non-terminal of the Ocamlyacc grammar. With respect to Felix these extensions can be added on the fly (dynamically) as illustrated in the example. To make this all work I had to cast two spells, however the code is free of the evil Obj.magic and is reentrant: no global variables! Spell 1: to parse expressions, I had to add a special entry point: %type<expr_t * token> exprx exprx: expr expr_terminating_token { $1, $2 } expr_terminating_token: | TOK1 { TOK1 $1 } | TOK2 { TOK2 $1 } ... and the function parsing an expression has to 'put back' the gratuitous lookahead token into the token source. Having to list, by hand, all the tokens which terminate an expression is a maintenance problem .. but it does work. I formed the list by adding ALL tokens, and then using ocamlyacc -v output to remove tokens that caused a conflict. Spell 2: In order to parse anything, the token source must be available, however it is defined *after* the ocamlyacc parser and so cannot be embedded into the token, because there is no way then to specify the type of the token. In addition, it cannot be placed in the token at the point of defining the grammar extensions either, because that is handled by the pre-processor, whose job is to *construct* the token source -- it doesn't exist until after the preprocessor has run. So I used a two phase approach. A new USER_STATEMENT_KEYWORD token is used to capture the grammar production in the preprocessor -- the production is just a list of tokens, and so it can be defined in the ocamlyacc parser, it has type: %token<Flx_ast.srcref * string * token list> USER_STATEMENT_KEYWORD Then, I modified the token source function itself, so that when this token is seen it is replaced by: %token<Flx_ast.srcref * string * (unit -> Flx_ast.statement_t)> USER_STATEMENT_DRIVER The function here is a closure, here is the actual code: method token_src (dummy :Lexing.lexbuf) = let tmp = List.hd tokens in tokens <- List.tl tokens; current_token_index <- current_token_index + 1; match tmp with | Flx_parse.USER_STATEMENT_KEYWORD (sr,s,tks) -> let f = fun () -> self#parse_user_statement s tks in Flx_parse.USER_STATEMENT_DRIVER (sr,s,f) | _ -> tmp and so, by delaying the binding until the token source object is known, and then currying it into the new driver token, I was able to hide all the gritty details defined after the ocamlyacc parser from it. The Ocamlyacc parser then calls back the function stored in the token, which calls the RD parser, which in turn can recursively call ocamlyacc parser entry points: the actual ocamlyacc production used is just: user_statement: | USER_STATEMENT_DRIVER { let srt, kw, f = $1 in f () } In effect, this is 'covariant open recursion' closed by the dynamically extensible driver tables. The LALR1/RD combination is pleasing because it is expected to be faster and more reliable than a pure RD parser, yet many of the advantages, including the ease of extension remain available: in particular the compiler writer can extend the Ocamlyacc grammar and the user can extend the RD grammar: the coupling is fairly well isolated. Again thanks to those who encouraged my to continue seeking a solution! -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-12-06 0:41 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-12-05 5:35 Another nasty with ocamlyacc: magic needed? skaller 2005-12-05 6:25 ` [Caml-list] " Geoff Wozniak 2005-12-05 7:08 ` Alessandro Baretta 2005-12-05 8:52 ` skaller 2005-12-05 9:05 ` Alessandro Baretta 2005-12-06 0:41 ` skaller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox