* Incremental, undoable parsing in OCaml as the general parser inversion @ 2007-07-03 11:38 oleg 2007-07-03 15:28 ` [Caml-list] " skaller 2007-07-04 19:41 ` Paul Snively 0 siblings, 2 replies; 10+ messages in thread From: oleg @ 2007-07-03 11:38 UTC (permalink / raw) To: caml-list This message is to show that the incremental parsing in OCaml is a solved problem -- essentially for any parser. Moreover, the procedure to convert from a regular parser to an incremental one is independent of the parser implementation. The incremental parsing lets us parse from a stream that is only partially known. The parser may report what it can and will ask for more input. When more input is supplied, the parsing resumes. No OS threads are required. The solution to control inversion is to use delimited continuations -- which are designed for the purpose of giving fine control over continuations in direct-style programs. skaller wrote: > This problem is not restricted to parsers .. it's a general > problem with Ocaml and also C, C++, and most other languages. > > My language Felix solves this problem for procedures with > user space threading .. but it doesn't work with functions. > [And the builtin GLR parser is purely functional:] The solution below works especially well with functions. That is, if the parser maintains no visible state, the parsing is not only incremental but is also undoable and restartable. If, after ingesting the current chunk of input the parser reported a problem, we can `go back' and supply a different. > Other languages like Scheme and I think Haskell and MLton > have more general solutions because they're not restricted > to the archaic concept of a linear stack. Linear or segmented stack is an implementation detail. The central issue is delimited continuations. In fact, OCaml has a rare distinction of being the first system with widely available native delimited continuations (the native implementation of `control' was announced for Scheme48 back at ICFP02, but it was not included in the Scheme48 distribution). So, OCaml is well ahead of the pack. The solution is illustrated below. For simplicity, we'll be using Genlex.make_lexer of OCaml 3.09 as our parser (it is quite inconvenient for me at the moment to upgrade to OCaml 3.10). Code fragments (cut and pasted from the eshell buffer) are given below within [[ ... ]] brackets. First, preliminaries: [[ (* Open the DelimCC library http://pobox.com/~oleg/ftp/Computation/Continuations.html#caml-shift *) open Delimcc;; let shift p f = take_subcont p (fun sk () -> push_prompt p (fun () -> (f (fun c -> push_prompt p (fun () -> push_subcont sk (fun () -> c)))))) ;; open Stream;; open Printf;; ]] First, we define the lexer and demonstrate non-incremental parsing. We apply the lexer to a sample stream and print out the produced tokens. We use the standard library Stream.iter function. [[ let lexer = Genlex.make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"];; let stream_fn1 str = fun pos -> (* printf "stream_fn1: asked for char #%d\n" pos; *) if pos < String.length str then Some str.[pos] else None;; let show_token_stream = Stream.iter (function | Genlex.Kwd s -> printf "Stream elem: Kwd %s\n" s | Genlex.Ident s -> printf "Stream elem: Ident %s\n" s | Genlex.Int d -> printf "Stream elem: int %d\n" d | Genlex.Float f -> printf "Stream elem: float %g\n" f | Genlex.String s -> printf "Stream elem: \"%s\"\n" s | Genlex.Char c -> printf "Stream elem: '%c'\n" c );; ]] Evaluating [[ let test1 = show_token_stream (lexer (Stream.from (stream_fn1 "let x = 1 + 2 in (* comment *) \"xxx\" ^ x")));; ]] produces the expected result Stream elem: Kwd let Stream elem: Ident x Stream elem: Kwd = Stream elem: int 1 Stream elem: Kwd + Stream elem: int 2 Stream elem: Ident in Stream elem: "xxx" Stream elem: Ident ^ Stream elem: Ident x val test1 : unit = () We now invert the parser, in the following two lines: [[ type stream_req = ReqDone | ReqChar of int * (char option -> stream_req);; let stream_inv p = fun pos -> shift p (fun sk -> ReqChar (pos,sk));; ]] That is it. If we wish to ask the user for chunks (rather than characters) of data, we need a simple buffering layer. [[ let rec filler buffer buffer_pos = function ReqDone -> ReqDone | ReqChar (pos,k) as req -> let pos_rel = pos - buffer_pos in let _ = (* we don't accumulate already emptied buffers. We could. *) assert (pos_rel >= 0) in if pos_rel < String.length buffer then (* desired char is already in buffer *) filler buffer buffer_pos (k (Some buffer.[pos_rel])) else (* buffer underflow. Ask the user to fill the buffer *) req ;; let cont str (ReqChar (pos,k) as req) = filler str pos req;; let finish (ReqChar (pos,k)) = filler "" pos (k None);; ]] We are all set. Please compare the iterator below with test1 above. The composition "show_token_stream (lexer (Stream.from" is exactly as before. We merely replaced the stream function stream_fn with stream_inv. In calculus terms, we replaced "x" with "x+dx". [[ let test2c0 = let p = new_prompt () in push_prompt p (fun () -> filler "" 0 ( show_token_stream (lexer (Stream.from (stream_inv p))); ReqDone));; ]] Now, the result is val test2c0 : stream_req = ReqChar (0, <fun>) That is, the parser is suspended, awaiting the character number 0. We now feed the parser chunks of input. The chunks do not have to contain complete lexems. For example, a comment may start in one chunk and end in another. [[ let test2c1 = cont "let x = 1 + 2 " test2c0;; ]] Stream elem: Kwd let Stream elem: Ident x Stream elem: Kwd = Stream elem: int 1 Stream elem: Kwd + Stream elem: int 2 val test2c1 : stream_req = ReqChar (14, <fun>) The parser ate the chunk, produced some tokens, and waits for more input. [[ let test2c2 = cont "in (* com " test2c1;; ]] Stream elem: Ident in val test2c2 : stream_req = ReqChar (24, <fun>) Here, the chunk contains a non-terminating comment. [[ let test2c3 = cont "ment *) " test2c2;; ]] val test2c3 : stream_req = ReqChar (32, <fun>) [[ let test2c4 = cont "\"xx" test2c3;; ]] val test2c4 : stream_req = ReqChar (35, <fun>) [[ let test2c5 = cont "x\" " test2c4;; ]] Stream elem: "xxx" val test2c5 : stream_req = ReqChar (38, <fun>) Finally the parser produced something, because it got the matching double quote. [[ let test2c6 = cont " ^ x" test2c5;; ]] Stream elem: Ident ^ val test2c6 : stream_req = ReqChar (42, <fun>) [[ let test2c7 = finish test2c6;; ]] Stream elem: Ident x val test2c7 : stream_req = ReqDone And we are finished. As we said earlier, the parser can be restartable and undoable. Suppose, we have changed our mind and decided to continue parsing after the checkpoint test2c6: [[ let test2c71 = cont "yx * 10" test2c6;; let test2c8 = finish test2c71;; ]] Stream elem: Ident xyx Stream elem: Kwd * val test2c71 : stream_req = ReqChar (49, <fun>) Stream elem: int 10 val test2c8 : stream_req = ReqDone and so we get an `alternative' parse, of an alternative stream. We can go back to any checkpoint test2c1, test2c2, etc. and continue parsing from that point. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion 2007-07-03 11:38 Incremental, undoable parsing in OCaml as the general parser inversion oleg @ 2007-07-03 15:28 ` skaller 2007-07-04 19:41 ` Paul Snively 1 sibling, 0 replies; 10+ messages in thread From: skaller @ 2007-07-03 15:28 UTC (permalink / raw) To: oleg; +Cc: caml-list On Tue, 2007-07-03 at 04:38 -0700, oleg@pobox.com wrote: > This message is to show that the incremental parsing in OCaml is a > solved problem -- essentially for any parser. Hmm .. I'm afraid I don't understand. If i have an Ocamlyacc parser function f .. how do I control invert it? -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion 2007-07-03 11:38 Incremental, undoable parsing in OCaml as the general parser inversion oleg 2007-07-03 15:28 ` [Caml-list] " skaller @ 2007-07-04 19:41 ` Paul Snively 2007-07-04 23:42 ` Paul Snively 1 sibling, 1 reply; 10+ messages in thread From: Paul Snively @ 2007-07-04 19:41 UTC (permalink / raw) To: oleg; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 640 bytes --] Oleg, Thanks for this! I've attempted to rewrite this in terms of your monadic version of delimited continuations, as included in the pa_monad syntax extension distribution. I'm attaching my effort so far, which I have run in an OCaml 3.09.3 toploop in which I've done: #use "topfind";; #camlp4o;; #require "monad";; #load "cc.cmo";; I'm having some difficulty, I think, due to stream_req being a recursive type that I need to turn into a type that sometimes represents a computation that returns the (recursive) type. Any advice you (or others) might have would be most welcome. Many thanks and best regards, Paul Snively [-- Attachment #2: incremental-parsing.ml --] [-- Type: application/octet-stream, Size: 1781 bytes --] open Cc;; open Stream;; open Printf;; let lexer = Genlex.make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"];; let stream_fn1 str = fun pos -> (* printf "stream_fn1: asked for char #%d\n" pos; *) if pos < String.length str then Some str.[pos] else None;; let show_token_stream = Stream.iter (function | Genlex.Kwd s -> printf "Stream elem: Kwd %s\n" s | Genlex.Ident s -> printf "Stream elem: Ident %s\n" s | Genlex.Int d -> printf "Stream elem: int %d\n" d | Genlex.Float f -> printf "Stream elem: float %g\n" f | Genlex.String s -> printf "Stream elem: \"%s\"\n" s | Genlex.Char c -> printf "Stream elem: '%c'\n" c );; let test1 = show_token_stream (lexer (Stream.from (stream_fn1 "let x = 1 + 2 in (* comment *) \"xxx\" ^ x")));; type stream_req = ReqDone | ReqChar of (int * stream_req Cc.m);; let stream_inv p = fun pos -> run (shiftP p (fun sk -> ReqChar (pos,sk)));; let rec filler buffer buffer_pos = function ReqDone -> ReqDone | ReqChar (pos,k) as req -> let pos_rel = pos - buffer_pos in let _ = (* we don't accumulate already emptied buffers. We could. *) assert (pos_rel >= 0) in if pos_rel < String.length buffer then (* desired char is already in buffer *) filler buffer buffer_pos (k (Some buffer.[pos_rel])) else (* buffer underflow. Ask the user to fill the buffer *) req ;; let finish (ReqChar (pos,k)) = filler "" pos (run (k (return None)));; let cont str (ReqChar (pos,k) as req) = filler str pos req;; let finish (ReqChar (pos,k)) = filler "" pos (run (k (return None)));; let test2c0 = perform p <-- new_prompt (); return (pushP p (return (filler "" 0 ( show_token_stream (lexer (Stream.from (stream_inv p))); ReqDone))));; let test2c1 = cont "let x = 1 + 2 " test2c0;; [-- Attachment #3: Type: text/plain, Size: 7679 bytes --] On Jul 3, 2007, at 4:38 AM, oleg@pobox.com wrote: > > This message is to show that the incremental parsing in OCaml is a > solved problem -- essentially for any parser. Moreover, the procedure > to convert from a regular parser to an incremental one is independent > of the parser implementation. The incremental parsing lets us parse > from a stream that is only partially known. The parser may > report what it can and will ask for more input. When more input is > supplied, the parsing resumes. No OS threads are required. The > solution to control inversion is to use delimited continuations -- > which are designed for the purpose of giving fine control over > continuations in direct-style programs. > > skaller wrote: >> This problem is not restricted to parsers .. it's a general >> problem with Ocaml and also C, C++, and most other languages. >> >> My language Felix solves this problem for procedures with >> user space threading .. but it doesn't work with functions. >> [And the builtin GLR parser is purely functional:] > > The solution below works especially well with functions. That is, if > the parser maintains no visible state, the parsing is not only > incremental but is also undoable and restartable. If, after ingesting > the current chunk of input the parser reported a problem, we can `go > back' and supply a different. > >> Other languages like Scheme and I think Haskell and MLton >> have more general solutions because they're not restricted >> to the archaic concept of a linear stack. > > Linear or segmented stack is an implementation detail. The central > issue is delimited continuations. In fact, OCaml has a rare > distinction of being the first system with widely available native > delimited continuations (the native implementation of `control' was > announced for Scheme48 back at ICFP02, but it was not included in the > Scheme48 distribution). So, OCaml is well ahead of the pack. > > The solution is illustrated below. For simplicity, we'll be using > Genlex.make_lexer of OCaml 3.09 as our parser (it is quite > inconvenient for me at the moment to upgrade to OCaml 3.10). Code > fragments (cut and pasted from the eshell buffer) are given below > within [[ ... ]] brackets. > > First, preliminaries: > > [[ > > (* Open the DelimCC library > http://pobox.com/~oleg/ftp/Computation/Continuations.html#caml-shift > *) > open Delimcc;; > let shift p f = take_subcont p (fun sk () -> > push_prompt p (fun () -> (f (fun c -> > push_prompt p (fun () -> push_subcont sk (fun () -> c)))))) > ;; > > open Stream;; > open Printf;; > ]] > > First, we define the lexer and demonstrate non-incremental parsing. We > apply the lexer to a sample stream and print out the produced tokens. > We use the standard library Stream.iter function. > > [[ > let lexer = Genlex.make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"];; > > let stream_fn1 str = fun pos -> > (* printf "stream_fn1: asked for char #%d\n" pos; *) > if pos < String.length str then Some str.[pos] else None;; > > let show_token_stream = Stream.iter (function > | Genlex.Kwd s -> printf "Stream elem: Kwd %s\n" s > | Genlex.Ident s -> printf "Stream elem: Ident %s\n" s > | Genlex.Int d -> printf "Stream elem: int %d\n" d > | Genlex.Float f -> printf "Stream elem: float %g\n" f > | Genlex.String s -> printf "Stream elem: \"%s\"\n" s > | Genlex.Char c -> printf "Stream elem: '%c'\n" c > );; > ]] > > Evaluating > [[ > let test1 = show_token_stream (lexer (Stream.from (stream_fn1 > "let x = 1 + 2 in (* comment *) \"xxx\" ^ x")));; > ]] > > produces the expected result > > Stream elem: Kwd let > Stream elem: Ident x > Stream elem: Kwd = > Stream elem: int 1 > Stream elem: Kwd + > Stream elem: int 2 > Stream elem: Ident in > Stream elem: "xxx" > Stream elem: Ident ^ > Stream elem: Ident x > val test1 : unit = () > > > We now invert the parser, in the following two lines: > [[ > type stream_req = ReqDone | ReqChar of int * (char option -> > stream_req);; > let stream_inv p = fun pos -> shift p (fun sk -> ReqChar (pos,sk));; > ]] > > That is it. If we wish to ask the user for chunks (rather than > characters) of data, we need a simple buffering layer. > > [[ > let rec filler buffer buffer_pos = function ReqDone -> ReqDone > | ReqChar (pos,k) as req -> > let pos_rel = pos - buffer_pos in > let _ = > (* we don't accumulate already emptied buffers. We could. *) > assert (pos_rel >= 0) in > if pos_rel < String.length buffer then > (* desired char is already in buffer *) > filler buffer buffer_pos (k (Some buffer.[pos_rel])) > else > (* buffer underflow. Ask the user to fill the buffer *) > req > ;; > > > let cont str (ReqChar (pos,k) as req) = filler str pos req;; > let finish (ReqChar (pos,k)) = filler "" pos (k None);; > > ]] > > We are all set. Please compare the iterator below with test1 above. > The composition "show_token_stream (lexer (Stream.from" is exactly as > before. We merely replaced the stream function stream_fn with > stream_inv. In calculus terms, we replaced "x" with "x+dx". > > [[ > let test2c0 = > let p = new_prompt () in > push_prompt p (fun () -> > filler "" 0 ( > show_token_stream (lexer (Stream.from (stream_inv p))); > ReqDone));; > ]] > > Now, the result is > val test2c0 : stream_req = ReqChar (0, <fun>) > > That is, the parser is suspended, awaiting the character number 0. We > now feed the parser chunks of input. The chunks do > not have to contain complete lexems. For example, a comment may start > in one chunk and end in another. > > [[ > let test2c1 = cont "let x = 1 + 2 " test2c0;; > ]] > Stream elem: Kwd let > Stream elem: Ident x > Stream elem: Kwd = > Stream elem: int 1 > Stream elem: Kwd + > Stream elem: int 2 > val test2c1 : stream_req = ReqChar (14, <fun>) > > The parser ate the chunk, produced some tokens, and waits for more > input. > > [[ > let test2c2 = cont "in (* com " test2c1;; > ]] > > Stream elem: Ident in > val test2c2 : stream_req = ReqChar (24, <fun>) > > Here, the chunk contains a non-terminating comment. > > [[ > let test2c3 = cont "ment *) " test2c2;; > ]] > val test2c3 : stream_req = ReqChar (32, <fun>) > > [[ > let test2c4 = cont "\"xx" test2c3;; > ]] > val test2c4 : stream_req = ReqChar (35, <fun>) > > [[ > let test2c5 = cont "x\" " test2c4;; > ]] > Stream elem: "xxx" > val test2c5 : stream_req = ReqChar (38, <fun>) > > Finally the parser produced something, because it got the matching > double quote. > > [[ > let test2c6 = cont " ^ x" test2c5;; > ]] > > Stream elem: Ident ^ > val test2c6 : stream_req = ReqChar (42, <fun>) > [[ > let test2c7 = finish test2c6;; > ]] > > Stream elem: Ident x > val test2c7 : stream_req = ReqDone > > And we are finished. As we said earlier, the parser can be > restartable and undoable. Suppose, we have changed our mind and > decided to continue parsing after the checkpoint test2c6: > > [[ > let test2c71 = cont "yx * 10" test2c6;; > let test2c8 = finish test2c71;; > ]] > > Stream elem: Ident xyx > Stream elem: Kwd * > val test2c71 : stream_req = ReqChar (49, <fun>) > Stream elem: int 10 > val test2c8 : stream_req = ReqDone > > and so we get an `alternative' parse, of an alternative stream. We > can go > back to any checkpoint test2c1, test2c2, etc. and continue parsing > from that point. > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion 2007-07-04 19:41 ` Paul Snively @ 2007-07-04 23:42 ` Paul Snively 2007-07-05 8:13 ` oleg 0 siblings, 1 reply; 10+ messages in thread From: Paul Snively @ 2007-07-04 23:42 UTC (permalink / raw) To: Paul Snively; +Cc: oleg, caml-list [-- Attachment #1: Type: text/plain, Size: 289 bytes --] I think I made some boneheaded mistakes in my previous code. I think I've rectified them, but the code still doesn't compile. I've attached my new code, which at least fails to compile in consistent, suggestive ways, (hopefully) instead of being trivially ridiculous. Thanks, Paul [-- Attachment #2: incremental-parsing.ml --] [-- Type: application/octet-stream, Size: 1796 bytes --] open Cc;; open Stream;; open Printf;; let lexer = Genlex.make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"];; let stream_fn1 str = fun pos -> (* printf "stream_fn1: asked for char #%d\n" pos; *) if pos < String.length str then Some str.[pos] else None;; let show_token_stream = Stream.iter (function | Genlex.Kwd s -> printf "Stream elem: Kwd %s\n" s | Genlex.Ident s -> printf "Stream elem: Ident %s\n" s | Genlex.Int d -> printf "Stream elem: int %d\n" d | Genlex.Float f -> printf "Stream elem: float %g\n" f | Genlex.String s -> printf "Stream elem: \"%s\"\n" s | Genlex.Char c -> printf "Stream elem: '%c'\n" c );; let test1 = show_token_stream (lexer (Stream.from (stream_fn1 "let x = 1 + 2 in (* comment *) \"xxx\" ^ x")));; type stream_req = ReqDone | ReqChar of (char option -> stream_req) Cc.m;; let stream_inv p = fun pos -> run (shiftP p (fun sk -> ReqChar (pos,sk)));; let rec filler buffer buffer_pos = function ReqDone -> ReqDone | ReqChar (pos,k) as req -> let pos_rel = pos - buffer_pos in let _ = (* we don't accumulate already emptied buffers. We could. *) assert (pos_rel >= 0) in if pos_rel < String.length buffer then (* desired char is already in buffer *) filler buffer buffer_pos ((run k) (Some buffer.[pos_rel])) else (* buffer underflow. Ask the user to fill the buffer *) req ;; let finish (ReqChar (pos,k)) = filler "" pos ((run k) (return None));; let cont str (ReqChar (pos,k) as req) = filler str pos req;; let finish (ReqChar (pos,k)) = filler "" pos ((run k) (return None));; let test2c0 = perform p <-- new_prompt (); return (pushP p (return (filler "" 0 ( show_token_stream (lexer (Stream.from (stream_inv p))); ReqDone))));; let test2c1 = cont "let x = 1 + 2 " test2c0;; [-- Attachment #3: Type: text/plain, Size: 8990 bytes --] On Jul 4, 2007, at 12:41 PM, Paul Snively wrote: > Oleg, > > Thanks for this! I've attempted to rewrite this in terms of your > monadic version of delimited continuations, as included in the > pa_monad syntax extension distribution. I'm attaching my effort so > far, which I have run in an OCaml 3.09.3 toploop in which I've done: > > #use "topfind";; > #camlp4o;; > #require "monad";; > #load "cc.cmo";; > > I'm having some difficulty, I think, due to stream_req being a > recursive type that I need to turn into a type that sometimes > represents a computation that returns the (recursive) type. Any > advice you (or others) might have would be most welcome. > > Many thanks and best regards, > Paul Snively > > <incremental-parsing.ml> > On Jul 3, 2007, at 4:38 AM, oleg@pobox.com wrote: > >> >> This message is to show that the incremental parsing in OCaml is a >> solved problem -- essentially for any parser. Moreover, the procedure >> to convert from a regular parser to an incremental one is independent >> of the parser implementation. The incremental parsing lets us parse >> from a stream that is only partially known. The parser may >> report what it can and will ask for more input. When more input is >> supplied, the parsing resumes. No OS threads are required. The >> solution to control inversion is to use delimited continuations -- >> which are designed for the purpose of giving fine control over >> continuations in direct-style programs. >> >> skaller wrote: >>> This problem is not restricted to parsers .. it's a general >>> problem with Ocaml and also C, C++, and most other languages. >>> >>> My language Felix solves this problem for procedures with >>> user space threading .. but it doesn't work with functions. >>> [And the builtin GLR parser is purely functional:] >> >> The solution below works especially well with functions. That is, if >> the parser maintains no visible state, the parsing is not only >> incremental but is also undoable and restartable. If, after ingesting >> the current chunk of input the parser reported a problem, we can `go >> back' and supply a different. >> >>> Other languages like Scheme and I think Haskell and MLton >>> have more general solutions because they're not restricted >>> to the archaic concept of a linear stack. >> >> Linear or segmented stack is an implementation detail. The central >> issue is delimited continuations. In fact, OCaml has a rare >> distinction of being the first system with widely available native >> delimited continuations (the native implementation of `control' was >> announced for Scheme48 back at ICFP02, but it was not included in the >> Scheme48 distribution). So, OCaml is well ahead of the pack. >> >> The solution is illustrated below. For simplicity, we'll be using >> Genlex.make_lexer of OCaml 3.09 as our parser (it is quite >> inconvenient for me at the moment to upgrade to OCaml 3.10). Code >> fragments (cut and pasted from the eshell buffer) are given below >> within [[ ... ]] brackets. >> >> First, preliminaries: >> >> [[ >> >> (* Open the DelimCC library >> http://pobox.com/~oleg/ftp/Computation/Continuations.html#caml- >> shift >> *) >> open Delimcc;; >> let shift p f = take_subcont p (fun sk () -> >> push_prompt p (fun () -> (f (fun c -> >> push_prompt p (fun () -> push_subcont sk (fun () -> c)))))) >> ;; >> >> open Stream;; >> open Printf;; >> ]] >> >> First, we define the lexer and demonstrate non-incremental >> parsing. We >> apply the lexer to a sample stream and print out the produced tokens. >> We use the standard library Stream.iter function. >> >> [[ >> let lexer = Genlex.make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"];; >> >> let stream_fn1 str = fun pos -> >> (* printf "stream_fn1: asked for char #%d\n" pos; *) >> if pos < String.length str then Some str.[pos] else None;; >> >> let show_token_stream = Stream.iter (function >> | Genlex.Kwd s -> printf "Stream elem: Kwd %s\n" s >> | Genlex.Ident s -> printf "Stream elem: Ident %s\n" s >> | Genlex.Int d -> printf "Stream elem: int %d\n" d >> | Genlex.Float f -> printf "Stream elem: float %g\n" f >> | Genlex.String s -> printf "Stream elem: \"%s\"\n" s >> | Genlex.Char c -> printf "Stream elem: '%c'\n" c >> );; >> ]] >> >> Evaluating >> [[ >> let test1 = show_token_stream (lexer (Stream.from (stream_fn1 >> "let x = 1 + 2 in (* comment *) \"xxx\" ^ x")));; >> ]] >> >> produces the expected result >> >> Stream elem: Kwd let >> Stream elem: Ident x >> Stream elem: Kwd = >> Stream elem: int 1 >> Stream elem: Kwd + >> Stream elem: int 2 >> Stream elem: Ident in >> Stream elem: "xxx" >> Stream elem: Ident ^ >> Stream elem: Ident x >> val test1 : unit = () >> >> >> We now invert the parser, in the following two lines: >> [[ >> type stream_req = ReqDone | ReqChar of int * (char option -> >> stream_req);; >> let stream_inv p = fun pos -> shift p (fun sk -> ReqChar (pos,sk));; >> ]] >> >> That is it. If we wish to ask the user for chunks (rather than >> characters) of data, we need a simple buffering layer. >> >> [[ >> let rec filler buffer buffer_pos = function ReqDone -> ReqDone >> | ReqChar (pos,k) as req -> >> let pos_rel = pos - buffer_pos in >> let _ = >> (* we don't accumulate already emptied buffers. We could. *) >> assert (pos_rel >= 0) in >> if pos_rel < String.length buffer then >> (* desired char is already in buffer *) >> filler buffer buffer_pos (k (Some buffer.[pos_rel])) >> else >> (* buffer underflow. Ask the user to fill the buffer *) >> req >> ;; >> >> >> let cont str (ReqChar (pos,k) as req) = filler str pos req;; >> let finish (ReqChar (pos,k)) = filler "" pos (k None);; >> >> ]] >> >> We are all set. Please compare the iterator below with test1 above. >> The composition "show_token_stream (lexer (Stream.from" is exactly as >> before. We merely replaced the stream function stream_fn with >> stream_inv. In calculus terms, we replaced "x" with "x+dx". >> >> [[ >> let test2c0 = >> let p = new_prompt () in >> push_prompt p (fun () -> >> filler "" 0 ( >> show_token_stream (lexer (Stream.from (stream_inv p))); >> ReqDone));; >> ]] >> >> Now, the result is >> val test2c0 : stream_req = ReqChar (0, <fun>) >> >> That is, the parser is suspended, awaiting the character number 0. We >> now feed the parser chunks of input. The chunks do >> not have to contain complete lexems. For example, a comment may start >> in one chunk and end in another. >> >> [[ >> let test2c1 = cont "let x = 1 + 2 " test2c0;; >> ]] >> Stream elem: Kwd let >> Stream elem: Ident x >> Stream elem: Kwd = >> Stream elem: int 1 >> Stream elem: Kwd + >> Stream elem: int 2 >> val test2c1 : stream_req = ReqChar (14, <fun>) >> >> The parser ate the chunk, produced some tokens, and waits for more >> input. >> >> [[ >> let test2c2 = cont "in (* com " test2c1;; >> ]] >> >> Stream elem: Ident in >> val test2c2 : stream_req = ReqChar (24, <fun>) >> >> Here, the chunk contains a non-terminating comment. >> >> [[ >> let test2c3 = cont "ment *) " test2c2;; >> ]] >> val test2c3 : stream_req = ReqChar (32, <fun>) >> >> [[ >> let test2c4 = cont "\"xx" test2c3;; >> ]] >> val test2c4 : stream_req = ReqChar (35, <fun>) >> >> [[ >> let test2c5 = cont "x\" " test2c4;; >> ]] >> Stream elem: "xxx" >> val test2c5 : stream_req = ReqChar (38, <fun>) >> >> Finally the parser produced something, because it got the matching >> double quote. >> >> [[ >> let test2c6 = cont " ^ x" test2c5;; >> ]] >> >> Stream elem: Ident ^ >> val test2c6 : stream_req = ReqChar (42, <fun>) >> [[ >> let test2c7 = finish test2c6;; >> ]] >> >> Stream elem: Ident x >> val test2c7 : stream_req = ReqDone >> >> And we are finished. As we said earlier, the parser can be >> restartable and undoable. Suppose, we have changed our mind and >> decided to continue parsing after the checkpoint test2c6: >> >> [[ >> let test2c71 = cont "yx * 10" test2c6;; >> let test2c8 = finish test2c71;; >> ]] >> >> Stream elem: Ident xyx >> Stream elem: Kwd * >> val test2c71 : stream_req = ReqChar (49, <fun>) >> Stream elem: int 10 >> val test2c8 : stream_req = ReqDone >> >> and so we get an `alternative' parse, of an alternative stream. We >> can go >> back to any checkpoint test2c1, test2c2, etc. and continue parsing >> from that point. >> >> _______________________________________________ >> Caml-list mailing list. Subscription management: >> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list >> Archives: http://caml.inria.fr >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion 2007-07-04 23:42 ` Paul Snively @ 2007-07-05 8:13 ` oleg 2007-07-05 12:39 ` skaller ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: oleg @ 2007-07-05 8:13 UTC (permalink / raw) To: psnively, skaller; +Cc: caml-list Paul Snively wrote: > I've attempted to rewrite this in terms of your > monadic version of delimited continuations, as included in the > pa_monad syntax extension distribution. I'm afraid that cannot be done. That is, short of re-writing make_lexer and all other library functions in the monadic style. That is the trouble with the monadic style, well articulated by Greg Morrisett in his invited talk at the ML Workshop 2005. In Haskell, we essentially need two versions of every function. For example, we need a pure map of the type (a->b) -> [a] -> [b] and the monadic version mapM, of the type (a->m b) -> [a] -> m [b]. The claim goes that once we went into trouble of writing a monadic version, we can instantiate it with any monad and so `extend' it arbitrarily. For one thing, not arbitrarily: the monadic transformer approach is of limited expressiveness and there are practically significant computations that cannot be expressed with monad transformers because they impose the fixed order of layers. Mainly, monadic meta-language is of limited expressiveness itself and is just plain ugly. One can't write guards with monadic expressions (for a good reason), a lot of other conveniences of Haskell become unavailable. Therefore, people quite often resort to unsafePerformIO -- even in reputable projects like Lava. That is really as unsound as it sounds: one has to specifically disable compiler optimizations (inlining); so-called lazy IO can lead to problems like reading from a file after it was closed and so one must _deeply_ force expressions and enforce evaluation order, etc. I guess it is clear how much burden monadic notation really is if even the high priests and most faithful of Haskell would rather commit cardinal sins and forsake the static benefits of Haskell -- only to avoid programming extensively in monadic style. This reminds me of a funny event at the Haskell workshop 2006. One participant stood up and sincerely proposed that Haskell' standard would find a way to automatically derive a monadic version of a pure expression. Chung-chieh Shan recommended that person to take a look at OCaml... The beauty of delimited continuations is that we take any function that accepts some kind of `callback' and pass a specifically constructed callback (which is essentially shift f f). Thus we have accomplished inversion. The function in question can be a map-like traversal combinator (the callback is the function to map), a parser (the callback here is the stream, the function that provides data to the parser), a GUI system with its own event loop, etc. An analogy with a debugger might make it easy to understand why it all works. The operation (shift f f) is akin to the break point to the debugger (INT 3 aka CC for x86 aficionados). When we pass this break point as a callback to parse, for example, we drop into the debugger whenever parse invokes our callback, that is, whenever it needs a new character. Once in the debugger, we can examine our state and resume the parser passing it the character of our choice. The form `reset' (aka push_prompt) makes it possible for us to `script the debugger' in our own program. Thus _delimited_ continuations offer us a scriptable debugger. Unlike the free-wheeling debugger like gdb, the debugger of delimited continuations respects abstractions and is type sound, supporting static invariants of the type system. John Skaller wrote: > If i have an Ocamlyacc parser function f .. how do I control invert > it? That parser takes stream, which can be (made of) a function. We merely need to create a stream (lexbuf) type stream_req = ReqDone | ReqBuf of string * int * (int -> stream_req);; Lexing.from_function (fun buffer n -> shift p (fun sk -> ReqBuf (buffer,n,sk))) and pass the that stream to the parser. We need to write something like the filler in the previous message, the handler for ReqBuf requests. The handler will fill the buffer (the first argument of the ReqBuf request) and invoke 'sk' passing it the number of placed characters. I would be remiss if I don't mention one of the main benefits of the monadic style: monadic style converts control flow into data flow. Therefore, types (which normally enforce data-flow invariants like prohibiting the addition of an integer and a string) can be used to _statically_ enforce invariants of control flow, e.g., all accessible file handles are open, all acquired locks are released, no blocking operation is attempted while a lock is held, etc. We have many working examples of statically enforcing such properties. Without monadic style, we need the dual of type systems -- effect systems -- to provide similar static guarantees. Alas, despite Filinski's prescient call for more attention to effect systems 13 years ago, hardly any effect system is available in a mainstream functional language. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion 2007-07-05 8:13 ` oleg @ 2007-07-05 12:39 ` skaller 2007-07-05 23:00 ` [Caml-list] Incremental, undoable parsing in OCaml as the oleg 2007-07-05 13:23 ` [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion skaller 2007-07-05 13:54 ` Paul Snively 2 siblings, 1 reply; 10+ messages in thread From: skaller @ 2007-07-05 12:39 UTC (permalink / raw) To: oleg; +Cc: psnively, caml-list On Thu, 2007-07-05 at 01:13 -0700, oleg@pobox.com wrote: > Paul Snively wrote: > > I've attempted to rewrite this in terms of your > > monadic version of delimited continuations, as included in the > > pa_monad syntax extension distribution. > > I'm afraid that cannot be done. That is, short of re-writing > make_lexer and all other library functions in the monadic style. That was my original point. You wrote: "This message is to show that the incremental parsing in OCaml is a solved problem -- essentially for any parser. Moreover, the procedure to convert from a regular parser to an incremental one is independent of the parser implementation." But this isn't so, because you confuse the meaning of 'parser' and 'implementation' to a programmer, as compared to a theoretician. A programmer thinks an 'implementation' is concrete syntax .. i.e. an actual text file.. a theoretician's 'concrete syntax' is still an abstract encoding. What you've shown is that it is possible to *manually* control invert a given algorithm. Now you should note what Felix does. It *mechanically* control inverts any algorithm encoded with Felix procedures. And I mean *concrete syntax* here :) That translation is universal and generated code is expensive to run due to many closures, but the cost is reduced by the optimiser so that code not requiring control inversion reduces to plain old subroutines. [Literally, procedure calls are converted to yielding the procedure to call, etc .. it's continuation passing] I am guessing my technique differs from what you describe in that the 'continuations' aren't statically delimited. That's a serious drawback of the current Felix implementation. The effect is that the inverted code only works with channels at the top level, that is, not inside a functional closure, because the latter use the machine stack and do cannot yield to the driver (scheduler). BTW: procedural code is just 'code written in a monadic style'. Every imperative programming language is just a purely functional one with monads.. This shows that monads have all the same problems as ordinary imperative code if you over use them (as of course is common practice in low level procedural programming languages). -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Incremental, undoable parsing in OCaml as the 2007-07-05 12:39 ` skaller @ 2007-07-05 23:00 ` oleg 2007-07-06 4:40 ` skaller 0 siblings, 1 reply; 10+ messages in thread From: oleg @ 2007-07-05 23:00 UTC (permalink / raw) To: skaller; +Cc: psnively, caml-list The claim of *mechanical*, automatic and universal inversion of existing, unmodified, separately compiled parsers, etc. stands. I believe we have a mis-understanding. There are TWO implementations of delimited continuations for OCaml. One is the monadic, implemented in pure 100% OCaml and available for both bytecode and native compilers. The other is non-monadic implementation, which is at present available for byte-code only. Paul Snively asked we about the first, monadic one. To use that implementation, one indeed has to re-write the parser and everything else for that matter in monadic style. The original message was talking about the non-monadic, direct style approach. There, the inversion of the parser works automatically and mechanically. There is no need to re-write the parser or anything else. John Skaller wrote: > What you've shown is that it is possible to *manually* > control invert a given algorithm. > > Now you should note what Felix does. It *mechanically* > control inverts any algorithm encoded with Felix procedures. > And I mean *concrete syntax* here :) The original message demonstrated how to *mechanically* and automatically invert the given algorithm. You could have tried it yourself: the runnable code was included. The code specifically used Genlex.make_lexer from the standard library as it is. The code also used Stream.iter, Stream.from and other library functions -- as they are, as they *had* been compiled and placed in the standard library. No modifications were done to any of these functions. > But this isn't so, because you confuse the meaning of 'parser' > and 'implementation' to a programmer, as compared to a > theoretician. A programmer thinks an 'implementation' is > concrete syntax .. i.e. an actual text file.. a theoretician's > 'concrete syntax' is still an abstract encoding. I assure you that I'm an industrial programmer and I don't confuse concrete with abstract syntax. When the message said that the procedure to invert a parser is mechanical and independent of the parser implementation, it meant exactly that and for what _you_ call `concrete syntax'. The message itself contained the proof: the complete code. If you consider that evidence unsatisfactory, please send me your parser (that can be compiled with OCaml 3.09) and I will send you the proof of inversion without modifying a bit of your code. You can run that code for yourself. You could just send me compiled *.cmi and *cmo files instead (but I need to check the precise version of my system then so it can load your files). > This shows that monads have all the same problems as > ordinary imperative code if you over use them > (as of course is common practice in low level procedural > programming languages). Partly I agree with you. Still, with monad and an expressive type system, one can do better (for example, assure that operations on memory pointers respect alignment and sizes of memory area, even if we permit pointer arithmetic). That was the subject of a recent paper on strongly typed memory areas http://okmij.org/ftp/Computation/resource-aware-prog/tfp.pdf We can even statically reason about some timing constraints. Paul Snively wrote: > what the issues in providing delimited continuations natively for > ocamlopt are? I guess the major issue is of not actually trying. I believe the simple and inefficient version (which just copies the stack) can be done relatively quickly. I need to check if there are any absolute pointers in the stack to locations on the stack itself (but I don't think they are). In any case, frame tables describe exactly what is on the stack. A more efficient version is to make the stack segmented; that takes longer. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Incremental, undoable parsing in OCaml as the 2007-07-05 23:00 ` [Caml-list] Incremental, undoable parsing in OCaml as the oleg @ 2007-07-06 4:40 ` skaller 0 siblings, 0 replies; 10+ messages in thread From: skaller @ 2007-07-06 4:40 UTC (permalink / raw) To: oleg; +Cc: psnively, caml-list On Thu, 2007-07-05 at 16:00 -0700, oleg@pobox.com wrote: > The claim of *mechanical*, automatic and universal inversion of > existing, unmodified, separately compiled parsers, etc. stands. > > I believe we have a mis-understanding. Probably. > There are TWO implementations > of delimited continuations for OCaml. One is the monadic, implemented > in pure 100% OCaml and available for both bytecode and native > compilers. The other is non-monadic implementation, which is at > present available for byte-code only. Paul Snively asked we about the > first, monadic one. To use that implementation, one indeed has to > re-write the parser and everything else for that matter in monadic > style. So we agree, the monadic implementation requires a rewrite. > The original message was talking about the non-monadic, direct style > approach. There, the inversion of the parser works automatically and > mechanically. There is no need to re-write the parser or anything > else. Obviously you have rewrite the interfaces.. :) But yes, this seems to be the source of confusion. A bytecode version is not only feasible .. Ocaml's vmthreads have already provided control inversion for years -- but only for blocking I/O operations. Native code control inversion is not so easy. In absence of foreign code (C code or whatever) it can probably be done most easily by patching Ocaml native code generators so they stop using the stack and use a heap based spaghetti stack instead. Inversion by copying or switching stacks in a more general context isn't possible, unfortunately. By 'more general' I mean in the presence of arbitrary foreign code 'up stack'. For example, you can't expect C++ callbacks which throw exceptions to work if you fiddle the stack. I think Hans Boehm has the most data on this, since to implement his conservative collector he needs to know where just about everything is on a wide class of platforms. [However Felix compiler is 100% pure ocaml.. ] -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion 2007-07-05 8:13 ` oleg 2007-07-05 12:39 ` skaller @ 2007-07-05 13:23 ` skaller 2007-07-05 13:54 ` Paul Snively 2 siblings, 0 replies; 10+ messages in thread From: skaller @ 2007-07-05 13:23 UTC (permalink / raw) To: oleg; +Cc: psnively, caml-list On Thu, 2007-07-05 at 01:13 -0700, oleg@pobox.com wrote: > The beauty of delimited continuations is that we take any function > that accepts some kind of `callback' and pass a specifically > constructed callback (which is essentially shift f f). Thus we have > accomplished inversion. The function in question can be a map-like > traversal combinator (the callback is the function to map), a parser > (the callback here is the stream, the function that provides data to > the parser), a GUI system with its own event loop, etc. Unfortunately what you need for a GUI is not control inverting the callback -- programmers in industry do that every day: it's an extremely common task. Of course the way they do it is naive and laborious. But it isn't the real problem.. it's the GUI event loop that needs to be inverted. Luckily that can be done for both X- and MS- Windows by the simple expedient of not writing an event loop in the first place!! In both systems, there is no event loop. The API provides a function to read the message queue -- writing an event loop is the users job .. so inverting it is easy .. just don't write it! Unfortunately .. doing that throws away all the GUI toolkits out there -- GTK is gone, for example. They were all written the wrong way .. Rewriting GTK in a monadic style is not going to be so easy .. it's rather a lot of code and much of the interface is *based* on the standard C model, where continuations are passed by pushing the return address implicitly onto the machine stack. Luckily .. most of the work done in a GUI tool kit can be done much better using Open GL anyhow .. and OpenGL is algorithmic not event driven. In particular SDL is also algorithmic, not even driven. So .. if you really want to invert GTK .. you are left with the system supported control inversion device, which is more or less universal and works with some C code at least .. Posix threads. After all the *primary* job of an operating system is precisely control inversion. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion 2007-07-05 8:13 ` oleg 2007-07-05 12:39 ` skaller 2007-07-05 13:23 ` [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion skaller @ 2007-07-05 13:54 ` Paul Snively 2 siblings, 0 replies; 10+ messages in thread From: Paul Snively @ 2007-07-05 13:54 UTC (permalink / raw) To: oleg; +Cc: skaller, caml-list Oleg, Thank you for the detailed explanation--it's as clear as can be. Given that, can I ask for a follow-up explanation as to what the issues in providing delimited continuations natively for ocamlopt are? Many thanks and best regards, Paul On Jul 5, 2007, at 1:13 AM, oleg@pobox.com wrote: > I'm afraid that cannot be done. That is, short of re-writing > make_lexer and all other library functions in the monadic style... ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-07-06 4:40 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-07-03 11:38 Incremental, undoable parsing in OCaml as the general parser inversion oleg 2007-07-03 15:28 ` [Caml-list] " skaller 2007-07-04 19:41 ` Paul Snively 2007-07-04 23:42 ` Paul Snively 2007-07-05 8:13 ` oleg 2007-07-05 12:39 ` skaller 2007-07-05 23:00 ` [Caml-list] Incremental, undoable parsing in OCaml as the oleg 2007-07-06 4:40 ` skaller 2007-07-05 13:23 ` [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion skaller 2007-07-05 13:54 ` Paul Snively
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox