From: Paul Snively <psnively@mac.com>
To: oleg@pobox.com
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion
Date: Wed, 4 Jul 2007 12:41:19 -0700 [thread overview]
Message-ID: <D691E383-3CEA-4617-9D2B-0F5232B75516@mac.com> (raw)
In-Reply-To: <20070703113836.2317EAD43@Adric.metnet.fnmoc.navy.mil>
[-- Attachment #1: Type: text/plain, Size: 640 bytes --]
Oleg,
Thanks for this! I've attempted to rewrite this in terms of your
monadic version of delimited continuations, as included in the
pa_monad syntax extension distribution. I'm attaching my effort so
far, which I have run in an OCaml 3.09.3 toploop in which I've done:
#use "topfind";;
#camlp4o;;
#require "monad";;
#load "cc.cmo";;
I'm having some difficulty, I think, due to stream_req being a
recursive type that I need to turn into a type that sometimes
represents a computation that returns the (recursive) type. Any
advice you (or others) might have would be most welcome.
Many thanks and best regards,
Paul Snively
[-- Attachment #2: incremental-parsing.ml --]
[-- Type: application/octet-stream, Size: 1781 bytes --]
open Cc;;
open Stream;;
open Printf;;
let lexer = Genlex.make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"];;
let stream_fn1 str = fun pos ->
(* printf "stream_fn1: asked for char #%d\n" pos; *)
if pos < String.length str then Some str.[pos] else None;;
let show_token_stream = Stream.iter (function
| Genlex.Kwd s -> printf "Stream elem: Kwd %s\n" s
| Genlex.Ident s -> printf "Stream elem: Ident %s\n" s
| Genlex.Int d -> printf "Stream elem: int %d\n" d
| Genlex.Float f -> printf "Stream elem: float %g\n" f
| Genlex.String s -> printf "Stream elem: \"%s\"\n" s
| Genlex.Char c -> printf "Stream elem: '%c'\n" c
);;
let test1 = show_token_stream (lexer (Stream.from (stream_fn1
"let x = 1 + 2 in (* comment *) \"xxx\" ^ x")));;
type stream_req = ReqDone | ReqChar of (int * stream_req Cc.m);;
let stream_inv p = fun pos -> run (shiftP p (fun sk -> ReqChar (pos,sk)));;
let rec filler buffer buffer_pos = function ReqDone -> ReqDone
| ReqChar (pos,k) as req ->
let pos_rel = pos - buffer_pos in
let _ =
(* we don't accumulate already emptied buffers. We could. *)
assert (pos_rel >= 0) in
if pos_rel < String.length buffer then
(* desired char is already in buffer *)
filler buffer buffer_pos (k (Some buffer.[pos_rel]))
else
(* buffer underflow. Ask the user to fill the buffer *)
req
;;
let finish (ReqChar (pos,k)) = filler "" pos (run (k (return None)));;
let cont str (ReqChar (pos,k) as req) = filler str pos req;;
let finish (ReqChar (pos,k)) = filler "" pos (run (k (return None)));;
let test2c0 = perform
p <-- new_prompt ();
return (pushP p (return
(filler "" 0 (
show_token_stream (lexer (Stream.from (stream_inv p))); ReqDone))));;
let test2c1 = cont "let x = 1 + 2 " test2c0;;
[-- Attachment #3: Type: text/plain, Size: 7679 bytes --]
On Jul 3, 2007, at 4:38 AM, oleg@pobox.com wrote:
>
> This message is to show that the incremental parsing in OCaml is a
> solved problem -- essentially for any parser. Moreover, the procedure
> to convert from a regular parser to an incremental one is independent
> of the parser implementation. The incremental parsing lets us parse
> from a stream that is only partially known. The parser may
> report what it can and will ask for more input. When more input is
> supplied, the parsing resumes. No OS threads are required. The
> solution to control inversion is to use delimited continuations --
> which are designed for the purpose of giving fine control over
> continuations in direct-style programs.
>
> skaller wrote:
>> This problem is not restricted to parsers .. it's a general
>> problem with Ocaml and also C, C++, and most other languages.
>>
>> My language Felix solves this problem for procedures with
>> user space threading .. but it doesn't work with functions.
>> [And the builtin GLR parser is purely functional:]
>
> The solution below works especially well with functions. That is, if
> the parser maintains no visible state, the parsing is not only
> incremental but is also undoable and restartable. If, after ingesting
> the current chunk of input the parser reported a problem, we can `go
> back' and supply a different.
>
>> Other languages like Scheme and I think Haskell and MLton
>> have more general solutions because they're not restricted
>> to the archaic concept of a linear stack.
>
> Linear or segmented stack is an implementation detail. The central
> issue is delimited continuations. In fact, OCaml has a rare
> distinction of being the first system with widely available native
> delimited continuations (the native implementation of `control' was
> announced for Scheme48 back at ICFP02, but it was not included in the
> Scheme48 distribution). So, OCaml is well ahead of the pack.
>
> The solution is illustrated below. For simplicity, we'll be using
> Genlex.make_lexer of OCaml 3.09 as our parser (it is quite
> inconvenient for me at the moment to upgrade to OCaml 3.10). Code
> fragments (cut and pasted from the eshell buffer) are given below
> within [[ ... ]] brackets.
>
> First, preliminaries:
>
> [[
>
> (* Open the DelimCC library
> http://pobox.com/~oleg/ftp/Computation/Continuations.html#caml-shift
> *)
> open Delimcc;;
> let shift p f = take_subcont p (fun sk () ->
> push_prompt p (fun () -> (f (fun c ->
> push_prompt p (fun () -> push_subcont sk (fun () -> c))))))
> ;;
>
> open Stream;;
> open Printf;;
> ]]
>
> First, we define the lexer and demonstrate non-incremental parsing. We
> apply the lexer to a sample stream and print out the produced tokens.
> We use the standard library Stream.iter function.
>
> [[
> let lexer = Genlex.make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"];;
>
> let stream_fn1 str = fun pos ->
> (* printf "stream_fn1: asked for char #%d\n" pos; *)
> if pos < String.length str then Some str.[pos] else None;;
>
> let show_token_stream = Stream.iter (function
> | Genlex.Kwd s -> printf "Stream elem: Kwd %s\n" s
> | Genlex.Ident s -> printf "Stream elem: Ident %s\n" s
> | Genlex.Int d -> printf "Stream elem: int %d\n" d
> | Genlex.Float f -> printf "Stream elem: float %g\n" f
> | Genlex.String s -> printf "Stream elem: \"%s\"\n" s
> | Genlex.Char c -> printf "Stream elem: '%c'\n" c
> );;
> ]]
>
> Evaluating
> [[
> let test1 = show_token_stream (lexer (Stream.from (stream_fn1
> "let x = 1 + 2 in (* comment *) \"xxx\" ^ x")));;
> ]]
>
> produces the expected result
>
> Stream elem: Kwd let
> Stream elem: Ident x
> Stream elem: Kwd =
> Stream elem: int 1
> Stream elem: Kwd +
> Stream elem: int 2
> Stream elem: Ident in
> Stream elem: "xxx"
> Stream elem: Ident ^
> Stream elem: Ident x
> val test1 : unit = ()
>
>
> We now invert the parser, in the following two lines:
> [[
> type stream_req = ReqDone | ReqChar of int * (char option ->
> stream_req);;
> let stream_inv p = fun pos -> shift p (fun sk -> ReqChar (pos,sk));;
> ]]
>
> That is it. If we wish to ask the user for chunks (rather than
> characters) of data, we need a simple buffering layer.
>
> [[
> let rec filler buffer buffer_pos = function ReqDone -> ReqDone
> | ReqChar (pos,k) as req ->
> let pos_rel = pos - buffer_pos in
> let _ =
> (* we don't accumulate already emptied buffers. We could. *)
> assert (pos_rel >= 0) in
> if pos_rel < String.length buffer then
> (* desired char is already in buffer *)
> filler buffer buffer_pos (k (Some buffer.[pos_rel]))
> else
> (* buffer underflow. Ask the user to fill the buffer *)
> req
> ;;
>
>
> let cont str (ReqChar (pos,k) as req) = filler str pos req;;
> let finish (ReqChar (pos,k)) = filler "" pos (k None);;
>
> ]]
>
> We are all set. Please compare the iterator below with test1 above.
> The composition "show_token_stream (lexer (Stream.from" is exactly as
> before. We merely replaced the stream function stream_fn with
> stream_inv. In calculus terms, we replaced "x" with "x+dx".
>
> [[
> let test2c0 =
> let p = new_prompt () in
> push_prompt p (fun () ->
> filler "" 0 (
> show_token_stream (lexer (Stream.from (stream_inv p)));
> ReqDone));;
> ]]
>
> Now, the result is
> val test2c0 : stream_req = ReqChar (0, <fun>)
>
> That is, the parser is suspended, awaiting the character number 0. We
> now feed the parser chunks of input. The chunks do
> not have to contain complete lexems. For example, a comment may start
> in one chunk and end in another.
>
> [[
> let test2c1 = cont "let x = 1 + 2 " test2c0;;
> ]]
> Stream elem: Kwd let
> Stream elem: Ident x
> Stream elem: Kwd =
> Stream elem: int 1
> Stream elem: Kwd +
> Stream elem: int 2
> val test2c1 : stream_req = ReqChar (14, <fun>)
>
> The parser ate the chunk, produced some tokens, and waits for more
> input.
>
> [[
> let test2c2 = cont "in (* com " test2c1;;
> ]]
>
> Stream elem: Ident in
> val test2c2 : stream_req = ReqChar (24, <fun>)
>
> Here, the chunk contains a non-terminating comment.
>
> [[
> let test2c3 = cont "ment *) " test2c2;;
> ]]
> val test2c3 : stream_req = ReqChar (32, <fun>)
>
> [[
> let test2c4 = cont "\"xx" test2c3;;
> ]]
> val test2c4 : stream_req = ReqChar (35, <fun>)
>
> [[
> let test2c5 = cont "x\" " test2c4;;
> ]]
> Stream elem: "xxx"
> val test2c5 : stream_req = ReqChar (38, <fun>)
>
> Finally the parser produced something, because it got the matching
> double quote.
>
> [[
> let test2c6 = cont " ^ x" test2c5;;
> ]]
>
> Stream elem: Ident ^
> val test2c6 : stream_req = ReqChar (42, <fun>)
> [[
> let test2c7 = finish test2c6;;
> ]]
>
> Stream elem: Ident x
> val test2c7 : stream_req = ReqDone
>
> And we are finished. As we said earlier, the parser can be
> restartable and undoable. Suppose, we have changed our mind and
> decided to continue parsing after the checkpoint test2c6:
>
> [[
> let test2c71 = cont "yx * 10" test2c6;;
> let test2c8 = finish test2c71;;
> ]]
>
> Stream elem: Ident xyx
> Stream elem: Kwd *
> val test2c71 : stream_req = ReqChar (49, <fun>)
> Stream elem: int 10
> val test2c8 : stream_req = ReqDone
>
> and so we get an `alternative' parse, of an alternative stream. We
> can go
> back to any checkpoint test2c1, test2c2, etc. and continue parsing
> from that point.
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
next prev parent reply other threads:[~2007-07-04 19:41 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-03 11:38 oleg
2007-07-03 15:28 ` [Caml-list] " skaller
2007-07-04 19:41 ` Paul Snively [this message]
2007-07-04 23:42 ` Paul Snively
2007-07-05 8:13 ` oleg
2007-07-05 12:39 ` skaller
2007-07-05 23:00 ` [Caml-list] Incremental, undoable parsing in OCaml as the oleg
2007-07-06 4:40 ` skaller
2007-07-05 13:23 ` [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion skaller
2007-07-05 13:54 ` Paul Snively
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D691E383-3CEA-4617-9D2B-0F5232B75516@mac.com \
--to=psnively@mac.com \
--cc=caml-list@inria.fr \
--cc=oleg@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox