From: Gerd Stolpmann <gerd@gerd-stolpmann.de>
To: Paul Argentoff <argentoff@rtelekom.ru>
Cc: caml-list@inria.fr
Subject: Re: :pxp_evpull notation (was: yet another silly question on PXP)
Date: Sun, 27 Feb 2005 20:05:24 +0100 [thread overview]
Message-ID: <1109531124.5835.12.camel@localhost.localdomain> (raw)
In-Reply-To: <86vf8gk45m.fsf_-_@paul.rtelekom.ru>
Am Freitag, den 25.02.2005, 19:14 +0300 schrieb Paul Argentoff:
> Dear Gerd Stolpmann,
>
> Let GS = "Gerd Stolpmann" in
> written_by GS =>
>
> GS> See the file doc/PREPROCESSOR which is part of the distribution
> GS> tarball.
>
> Thanks again for a reference. My next question is about :pxp_evpull
> notation. Can I make such a construct:
>
> let pile = <:pxp_evpull<
> <foo> (: some_fun () :) >>
>
> where some_fun generates a further "subtree" using the same pxp_evpull
> notation.
Yes, this works. some_fun is called when the events for the children of
foo are generated. You must have
some_fun : unit -> Pxp_types.event option
and some_fun is repeatedly called until it returns None.
pxp_evpull generates automata where every state returns an event.
External functions like some_fun are represented as loops, i.e. the next
state is the same state when the function returns Some _, and the
following state for None.
For your example, <:pxp_evpull< <foo> (: some_fun () :) >>, the
automaton is:
let _ =
let _eid = Pxp_dtd.Entity.create_entity_id () in
let rec _generator =
let _state = ref 0 in
fun _arg ->
match !_state with
0 ->
let ev = Pxp_types.E_start_tag ("foo", [], None, _eid) in
_state := 1; Some ev
| 1 ->
begin match some_fun () _arg with
None -> _state := 2; _generator _arg
| Some Pxp_types.E_end_of_stream -> _generator _arg
| Some ev -> Some ev
end
| 2 ->
let ev = Pxp_types.E_end_tag ("foo", _eid) in _state := 3; Some ev
| 3 -> None
| _ -> assert false
in
_generator
(output generated with "camlp4 -I ... pa_o.cmo pa_op.cmo pcre.cma
unix.cma netstring.cma pxp_pp.cma pr_o.cmo sample.ml")
some_fun can even be another pxp_evtree automaton.
> My task really is to build a converter from a huge (>100M) text file (or
> string Stream.t) to a huge xml file. Of course, I need to do all job with
> lazy streams to avoid out-of-memory exceptions.
Pull parsers are your friend. They were created with such applications
in mind.
Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de
------------------------------------------------------------
next prev parent reply other threads:[~2005-02-27 19:05 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-02-22 17:07 yet another silly question on PXP Paul Argentoff
2005-02-22 17:34 ` [Caml-list] " Jerome Simeon
2005-02-22 18:25 ` Paul Argentoff
2005-02-22 19:03 ` Gerd Stolpmann
2005-02-24 7:49 ` Paul Argentoff
2005-02-24 12:11 ` Paul Argentoff
2005-02-25 7:35 ` Paul Argentoff
2005-02-25 16:14 ` :pxp_evpull notation (was: yet another silly question on PXP) Paul Argentoff
2005-02-27 19:05 ` Gerd Stolpmann [this message]
2005-02-28 10:24 ` :pxp_evpull notation Paul Argentoff
2005-02-28 10:39 ` Gerd Stolpmann
2005-02-28 11:00 ` Paul Argentoff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1109531124.5835.12.camel@localhost.localdomain \
--to=gerd@gerd-stolpmann.de \
--cc=argentoff@rtelekom.ru \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox