* [Caml-list] Calling ocamlyacc from ocamlyacc @ 2003-09-01 17:28 katre 2003-09-02 15:19 ` Michal Moskal 0 siblings, 1 reply; 6+ messages in thread From: katre @ 2003-09-01 17:28 UTC (permalink / raw) To: caml-list Hello, I am currently involved in a project to re-build a compiler for an old system from the mid-1980's, where we have the original compiler docs, we have the original source files in the language used, but the actual compiler itself is lost. This is an interesting pursuit, and I am making use of it to learn ocaml. However, due to the nature of this language, which is not very regular at all, I am having trouble expressing a parser in ocamlyacc that isn't a large hack. What would be ideal would be to have one main parser, and then one sub-parser that I could call only for a specified domain. However, all the source code is in one place. Is there a way to specify a separate parser and lexer (using ocamllex and ocamlyacc), and then to jump into them from an ocamlyacc action? Thanks for the help, John ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Calling ocamlyacc from ocamlyacc 2003-09-01 17:28 [Caml-list] Calling ocamlyacc from ocamlyacc katre @ 2003-09-02 15:19 ` Michal Moskal 2003-09-02 15:23 ` katre 0 siblings, 1 reply; 6+ messages in thread From: Michal Moskal @ 2003-09-02 15:19 UTC (permalink / raw) To: katre; +Cc: caml-list On Mon, Sep 01, 2003 at 01:28:28PM -0400, katre wrote: > Hello, > > I am currently involved in a project to re-build a compiler for an old > system from the mid-1980's, where we have the original compiler docs, we > have the original source files in the language used, but the actual > compiler itself is lost. This is an interesting pursuit, and I am > making use of it to learn ocaml. > > However, due to the nature of this language, which is not very regular > at all, I am having trouble expressing a parser in ocamlyacc that isn't > a large hack. What would be ideal would be to have one main parser, and > then one sub-parser that I could call only for a specified domain. > However, all the source code is in one place. > > Is there a way to specify a separate parser and lexer (using ocamllex > and ocamlyacc), and then to jump into them from an ocamlyacc action? You can define several start symbols in your grammar. Parsing functions are defined for each. You can also define several rule ... in your lexer (lexing functions are defined for each). Hope that helps, I can't help more, since I don't quite understand nature of your problem. -- : Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv : When in doubt, use brute force. -- Ken Thompson : {E-,w}-- {b++,e}>+++ h ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Calling ocamlyacc from ocamlyacc 2003-09-02 15:19 ` Michal Moskal @ 2003-09-02 15:23 ` katre 2003-09-02 15:40 ` Michal Moskal ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: katre @ 2003-09-02 15:23 UTC (permalink / raw) To: caml-list Michal Moskal wrote: > > You can define several start symbols in your grammar. Parsing functions > are defined for each. You can also define several rule ... in your lexer > (lexing functions are defined for each). Hope that helps, I can't help > more, since I don't quite understand nature of your problem. > Right, but is there a way, in a ocamlyacc action, to switch which lexer rule you're using? That seems to be the main part I am missing. Or if I could access the lexbuf directly, I could also use that. John ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Calling ocamlyacc from ocamlyacc 2003-09-02 15:23 ` katre @ 2003-09-02 15:40 ` Michal Moskal 2003-09-03 10:28 ` Hendrik Tews 2003-09-06 3:36 ` skaller 2 siblings, 0 replies; 6+ messages in thread From: Michal Moskal @ 2003-09-02 15:40 UTC (permalink / raw) To: katre; +Cc: caml-list On Tue, Sep 02, 2003 at 11:23:40AM -0400, katre wrote: > Michal Moskal wrote: > > > > You can define several start symbols in your grammar. Parsing functions > > are defined for each. You can also define several rule ... in your lexer > > (lexing functions are defined for each). Hope that helps, I can't help > > more, since I don't quite understand nature of your problem. > > > > Right, but is there a way, in a ocamlyacc action, to switch which lexer > rule you're using? That seems to be the main part I am missing. Or if > I could access the lexbuf directly, I could also use that. I believe you can set some flag in lexer (from parser action), to make it switch for another rule. But you have to consider lookahead. -- : Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv : When in doubt, use brute force. -- Ken Thompson : {E-,w}-- {b++,e}>+++ h ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Calling ocamlyacc from ocamlyacc 2003-09-02 15:23 ` katre 2003-09-02 15:40 ` Michal Moskal @ 2003-09-03 10:28 ` Hendrik Tews 2003-09-06 3:36 ` skaller 2 siblings, 0 replies; 6+ messages in thread From: Hendrik Tews @ 2003-09-03 10:28 UTC (permalink / raw) To: caml-list katre writes: Right, but is there a way, in a ocamlyacc action, to switch which lexer rule you're using? That seems to be the main part I am missing. Or if I could access the lexbuf directly, I could also use that. In the lexer you can do rule lexer = parse | "" { match !global_var with | Xlex -> xlex lexbuf | Ylex -> ylex lexbuf } and xlex = parse .... and ylex = parse .... You can set the global_var from the actions in the grammar. However, there is probably a better solution: First note that ocamlyacc generated functions expect a (Lexing.lexbuf -> token) function. So you can write your own master lexer: let lexer lexbuf = match !global_var with | Xlex -> Lexer.xlex lexbuf | Ylex -> .... with a bit of hacking you can also combine ocamllex lexers with other lexers. In both approaches the problem is the lookahead token: In some cases yacc fetches the next token and decides on that token whether to shift or reduce. If the action taken on reduce changes the lexer then have used the wrong lexer for the next token. You can examine the grammar.output file and the OCAMLRUNPARAM=p trace to find out if ocamlyacc needs the lookahead token for a given rule. (I can give examples on that if you are interested.) Bye, Hendrik ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Calling ocamlyacc from ocamlyacc 2003-09-02 15:23 ` katre 2003-09-02 15:40 ` Michal Moskal 2003-09-03 10:28 ` Hendrik Tews @ 2003-09-06 3:36 ` skaller 2 siblings, 0 replies; 6+ messages in thread From: skaller @ 2003-09-06 3:36 UTC (permalink / raw) To: katre; +Cc: caml-list On Wed, 2003-09-03 at 01:23, katre wrote: > Michal Moskal wrote: > > > > You can define several start symbols in your grammar. Parsing functions > > are defined for each. You can also define several rule ... in your lexer > > (lexing functions are defined for each). Hope that helps, I can't help > > more, since I don't quite understand nature of your problem. > > > > Right, but is there a way, in a ocamlyacc action, to switch which lexer > rule you're using? That seems to be the main part I am missing. Or if > I could access the lexbuf directly, I could also use that. I think the answer is no, you can't do that. The reason is that yacc et al are LR(1) meaning 1 token look ahead is needed before a reduction: in general, when a reduction occurs there is no guarrantee what other tokens haven't been fetched. Now yacc/lex are normally driven by the parser fetching tokens. So what you _can_ do is pretend that the 'deviant sublanguage' you need to use a different grammar for is a single *huge* token. The lexer, unlike the parser, can be invoked recursively, and in particular when a regex is matched, the code which returns a value can do anything. In particular, you can switch to another lexing rule. I do this all the time for handling comments and strings: the lexeme for open quote is recognised and it calls a string gathering rule which uses the same lexbuf. Lexbufs have a current position, which is the end of the lexeme .. even if the finite state automaton actually looked ahead further. So with lexbuf, you know *exactly* where you are, whenever the code associated with a regexp is matched. Here is the mainline lexer: .... (* Python strings *) | quote { fun state -> state#inbody; parse_qstring lexbuf state } | qqq { fun state -> state#inbody; parse_qqqstring lexbuf state } which invokes the sublexer: rule parse_qstring = parse | qstring { fun state -> state#inbody; [STRING ( state#get_srcref lexbuf, state#decode decode_qstring (lexeme lexbuf) )] } | _ { fun state -> [ERRORTOKEN ( state#get_srcref lexbuf, "' string" )] } and parse_qqqstring = parse | qqqstring { fun state -> state#inbody; [STRING ( state#get_srcref lexbuf, state#decode decode_qqqstring (lexeme lexbuf) )] } | _ { fun state -> state#inbody; [ERRORTOKEN ( state#get_srcref lexbuf, "''' string" )] } ------- Now, I said you can do anything, and in the example I just call another lexer rule, but .. there is no reason you can't call a parser function, passing the same lexbuf. Note that you have to do this from the LEXER code, to ensure that the sub-parser is invoked on exactly the correct starting character. You may also note in the example my lexer codes have a fun state -> ... form (for every lexeme which is boring to write). This state is a mutable object which is passed to the lexer as an extra argument (just add it after the call to the rule as in the nested example: | quote { fun state -> state#inbody; parse_qstring lexbuf state } --------------------------------------*************--------***** - rule extra arg Note: I am returning lists of tokens not tokens. My lexer code is NOT called by the parser. I call it myself and build a list of tokens, pre-process them, and pass the output of that to the parser via a dummy lexbuf. I have in fact constructed a PYTHON parser using Ocamlyacc, even though Python grammar is 'strongly not LR(1)' :-)) I do this by something like 13 filterings of the token streams (to find the indentation etc) before it is in an LR(1) form. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-09-06 3:36 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-09-01 17:28 [Caml-list] Calling ocamlyacc from ocamlyacc katre 2003-09-02 15:19 ` Michal Moskal 2003-09-02 15:23 ` katre 2003-09-02 15:40 ` Michal Moskal 2003-09-03 10:28 ` Hendrik Tews 2003-09-06 3:36 ` skaller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox