* yet another silly question on PXP
@ 2005-02-22 17:07 Paul Argentoff
2005-02-22 17:34 ` [Caml-list] " Jerome Simeon
2005-02-22 18:25 ` Paul Argentoff
0 siblings, 2 replies; 12+ messages in thread
From: Paul Argentoff @ 2005-02-22 17:07 UTC (permalink / raw)
To: caml-list
Hello world!
I have recently found a features in PXP named "pull parser", "event
interface". I hope these things can help me with such a problems as xmpp
streams parsing or huuuuge files parsing using Ocaml lazy streams (to avoid
"Out of memory" errors). Can anybody suggest an url/other place to read
more on these? I'm now reading the pxp source comments and version infos
from it's site.
Thanks.
--
Yours truly, WBR, Paul Argentoff.
Jabber: paul@jabber.rtelekom.ru
RIPE: PA1291-RIPE
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] yet another silly question on PXP
2005-02-22 17:07 yet another silly question on PXP Paul Argentoff
@ 2005-02-22 17:34 ` Jerome Simeon
2005-02-22 18:25 ` Paul Argentoff
1 sibling, 0 replies; 12+ messages in thread
From: Jerome Simeon @ 2005-02-22 17:34 UTC (permalink / raw)
To: Paul Argentoff; +Cc: caml-list
Those are just a pull variant of a SAX parser.
People at BEA have done some work on that (They call it token stream):
Daniela Florescu, Chris Hillery, Donald Kossmann, Paul Lucas, Fabio
Riccardi, Till Westmann, Michael J. Carey, Arvind Sundararajan, Geetika
Agrawal: The BEA/XQRL Streaming XQuery Processor. VLDB 2003: 997-1008
http://www.informatik.uni-trier.de/~ley/db/conf/vldb/vldb2003.html#FlorescuHKLRWCSA03
The XTiSP system which was presented at PLAN-X in January seems to have
something similar as well:
# XTiSP presented by Keisuke Nakano (UTokyo) http://xtisp.psdlab.org/
XML pull token streams also used extensively inside the Galax's query
engine.
There are probably other projects using those.
- Jerome
caml-list-admin@yquem.inria.fr wrote on 02/22/2005 12:07:18 PM:
> Hello world!
>
> I have recently found a features in PXP named "pull parser", "event
> interface". I hope these things can help me with such a problems as xmpp
> streams parsing or huuuuge files parsing using Ocaml lazy streams (to
avoid
> "Out of memory" errors). Can anybody suggest an url/other place to read
> more on these? I'm now reading the pxp source comments and version infos
> from it's site.
>
> Thanks.
> --
> Yours truly, WBR, Paul Argentoff.
> Jabber: paul@jabber.rtelekom.ru
> RIPE: PA1291-RIPE
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] yet another silly question on PXP
2005-02-22 17:07 yet another silly question on PXP Paul Argentoff
2005-02-22 17:34 ` [Caml-list] " Jerome Simeon
@ 2005-02-22 18:25 ` Paul Argentoff
2005-02-22 19:03 ` Gerd Stolpmann
1 sibling, 1 reply; 12+ messages in thread
From: Paul Argentoff @ 2005-02-22 18:25 UTC (permalink / raw)
To: caml-list
Dear Paul Argentoff,
Let PA = "Paul Argentoff" in
written_by PA =>
PA> Hello world! I have recently found a features in PXP named "pull
PA> parser", "event interface". I hope these things can help me with such
PA> a problems as xmpp streams parsing or huuuuge files parsing using
PA> Ocaml lazy streams (to avoid "Out of memory" errors). Can anybody
PA> suggest an url/other place to read more on these? I'm now reading the
PA> pxp source comments and version infos from it's site.
One more question: where can I find any documentation (besides comments) on
pxp-pp library? How can I use it?
--
Yours truly, WBR, Paul Argentoff.
Jabber: paul@jabber.rtelekom.ru
RIPE: PA1291-RIPE
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] yet another silly question on PXP
2005-02-22 18:25 ` Paul Argentoff
@ 2005-02-22 19:03 ` Gerd Stolpmann
2005-02-24 7:49 ` Paul Argentoff
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Gerd Stolpmann @ 2005-02-22 19:03 UTC (permalink / raw)
To: Paul Argentoff; +Cc: caml-list
Am Dienstag, den 22.02.2005, 21:25 +0300 schrieb Paul Argentoff:
> Dear Paul Argentoff,
>
> Let PA = "Paul Argentoff" in
> written_by PA =>
>
> PA> Hello world! I have recently found a features in PXP named "pull
> PA> parser", "event interface". I hope these things can help me with such
> PA> a problems as xmpp streams parsing or huuuuge files parsing using
> PA> Ocaml lazy streams (to avoid "Out of memory" errors). Can anybody
> PA> suggest an url/other place to read more on these? I'm now reading the
> PA> pxp source comments and version infos from it's site.
>
> One more question: where can I find any documentation (besides comments) on
> pxp-pp library? How can I use it?
See the file doc/PREPROCESSOR which is part of the distribution tarball.
Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de
------------------------------------------------------------
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] yet another silly question on PXP
2005-02-22 19:03 ` Gerd Stolpmann
@ 2005-02-24 7:49 ` Paul Argentoff
2005-02-24 12:11 ` Paul Argentoff
2005-02-25 16:14 ` :pxp_evpull notation (was: yet another silly question on PXP) Paul Argentoff
2 siblings, 0 replies; 12+ messages in thread
From: Paul Argentoff @ 2005-02-24 7:49 UTC (permalink / raw)
To: Gerd Stolpmann; +Cc: caml-list
Dear Gerd Stolpmann,
Let GS = "Gerd Stolpmann" in
written_by GS =>
GS> See the file doc/PREPROCESSOR
thnx
--
Yours truly, WBR, Paul Argentoff.
Jabber: paul@jabber.rtelekom.ru
RIPE: PA1291-RIPE
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] yet another silly question on PXP
2005-02-22 19:03 ` Gerd Stolpmann
2005-02-24 7:49 ` Paul Argentoff
@ 2005-02-24 12:11 ` Paul Argentoff
2005-02-25 7:35 ` Paul Argentoff
2005-02-25 16:14 ` :pxp_evpull notation (was: yet another silly question on PXP) Paul Argentoff
2 siblings, 1 reply; 12+ messages in thread
From: Paul Argentoff @ 2005-02-24 12:11 UTC (permalink / raw)
To: Gerd Stolpmann; +Cc: caml-list
Dear Gerd Stolpmann,
Let GS = "Gerd Stolpmann" in
written_by GS =>
GS> See the file doc/PREPROCESSOR which is part of the distribution
GS> tarball.
Ok. But I can't compile it with OCamlMakeFile. Is there any way to do that?
--
Yours truly, WBR, Paul Argentoff.
Jabber: paul@jabber.rtelekom.ru
RIPE: PA1291-RIPE
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] yet another silly question on PXP
2005-02-24 12:11 ` Paul Argentoff
@ 2005-02-25 7:35 ` Paul Argentoff
0 siblings, 0 replies; 12+ messages in thread
From: Paul Argentoff @ 2005-02-25 7:35 UTC (permalink / raw)
To: Gerd Stolpmann; +Cc: caml-list
Dear Paul Argentoff,
Let PA = "Paul Argentoff" in
written_by PA =>
PA> But I can't compile it with OCamlMakeFile. Is there any way to do
PA> that?
Here's the workaround I found:
In the first line of preprocessed file I write (*pp sh pp.sh *) -- that's
OCamlMakefile standard except that I use as a preprocesssor a custom sh
script which is generated from within Makefile as a .PHONY target. Here's
an example of my Makefile fragment:
PACKS= zip \
equeue \
netclient \
pxp-engine \
pxp-ulex-utf8 \
pxp-pp \
annexlib \
postgresql \
dbi
PPPACKS= netstring \
pcre
USE_CAMLP4 = yes
PPLIBS = unix.cma \
pcre.cma \
netstring.cma \
pxp_pp.cma
PRE_TARGETS = pp.sh
.PHONY: pp.sh
pp.sh:
echo -n "camlp4o" >pp.sh
$(foreach pack, ${PACKS}, echo -n " -I `ocamlfind query ${pack}`" >>pp.sh;) \
$(foreach pack, ${PPPACKS}, echo -n " -I `ocamlfind query ${pack}`" >>pp.sh;) \
echo -n " -I `ocamlc -where`" >>pp.sh
$(foreach lib, ${PPLIBS}, echo -n " ${lib}" >>pp.sh;) \
echo -n " "$$\1 >>pp.sh
The latter part may not seem that elegant, but it's what I could do at last
last night after reading those gnu make manuals...
--
Yours truly, WBR, Paul Argentoff.
Jabber: paul@jabber.rtelekom.ru
RIPE: PA1291-RIPE
^ permalink raw reply [flat|nested] 12+ messages in thread
* :pxp_evpull notation (was: yet another silly question on PXP)
2005-02-22 19:03 ` Gerd Stolpmann
2005-02-24 7:49 ` Paul Argentoff
2005-02-24 12:11 ` Paul Argentoff
@ 2005-02-25 16:14 ` Paul Argentoff
2005-02-27 19:05 ` Gerd Stolpmann
2 siblings, 1 reply; 12+ messages in thread
From: Paul Argentoff @ 2005-02-25 16:14 UTC (permalink / raw)
To: Gerd Stolpmann; +Cc: caml-list
Dear Gerd Stolpmann,
Let GS = "Gerd Stolpmann" in
written_by GS =>
GS> See the file doc/PREPROCESSOR which is part of the distribution
GS> tarball.
Thanks again for a reference. My next question is about :pxp_evpull
notation. Can I make such a construct:
let pile = <:pxp_evpull<
<foo> (: some_fun () :) >>
where some_fun generates a further "subtree" using the same pxp_evpull
notation.
My task really is to build a converter from a huge (>100M) text file (or
string Stream.t) to a huge xml file. Of course, I need to do all job with
lazy streams to avoid out-of-memory exceptions.
--
Yours truly, WBR, Paul Argentoff.
Jabber: paul@jabber.rtelekom.ru
RIPE: PA1291-RIPE
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: :pxp_evpull notation (was: yet another silly question on PXP)
2005-02-25 16:14 ` :pxp_evpull notation (was: yet another silly question on PXP) Paul Argentoff
@ 2005-02-27 19:05 ` Gerd Stolpmann
2005-02-28 10:24 ` :pxp_evpull notation Paul Argentoff
0 siblings, 1 reply; 12+ messages in thread
From: Gerd Stolpmann @ 2005-02-27 19:05 UTC (permalink / raw)
To: Paul Argentoff; +Cc: caml-list
Am Freitag, den 25.02.2005, 19:14 +0300 schrieb Paul Argentoff:
> Dear Gerd Stolpmann,
>
> Let GS = "Gerd Stolpmann" in
> written_by GS =>
>
> GS> See the file doc/PREPROCESSOR which is part of the distribution
> GS> tarball.
>
> Thanks again for a reference. My next question is about :pxp_evpull
> notation. Can I make such a construct:
>
> let pile = <:pxp_evpull<
> <foo> (: some_fun () :) >>
>
> where some_fun generates a further "subtree" using the same pxp_evpull
> notation.
Yes, this works. some_fun is called when the events for the children of
foo are generated. You must have
some_fun : unit -> Pxp_types.event option
and some_fun is repeatedly called until it returns None.
pxp_evpull generates automata where every state returns an event.
External functions like some_fun are represented as loops, i.e. the next
state is the same state when the function returns Some _, and the
following state for None.
For your example, <:pxp_evpull< <foo> (: some_fun () :) >>, the
automaton is:
let _ =
let _eid = Pxp_dtd.Entity.create_entity_id () in
let rec _generator =
let _state = ref 0 in
fun _arg ->
match !_state with
0 ->
let ev = Pxp_types.E_start_tag ("foo", [], None, _eid) in
_state := 1; Some ev
| 1 ->
begin match some_fun () _arg with
None -> _state := 2; _generator _arg
| Some Pxp_types.E_end_of_stream -> _generator _arg
| Some ev -> Some ev
end
| 2 ->
let ev = Pxp_types.E_end_tag ("foo", _eid) in _state := 3; Some ev
| 3 -> None
| _ -> assert false
in
_generator
(output generated with "camlp4 -I ... pa_o.cmo pa_op.cmo pcre.cma
unix.cma netstring.cma pxp_pp.cma pr_o.cmo sample.ml")
some_fun can even be another pxp_evtree automaton.
> My task really is to build a converter from a huge (>100M) text file (or
> string Stream.t) to a huge xml file. Of course, I need to do all job with
> lazy streams to avoid out-of-memory exceptions.
Pull parsers are your friend. They were created with such applications
in mind.
Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de
------------------------------------------------------------
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: :pxp_evpull notation
2005-02-27 19:05 ` Gerd Stolpmann
@ 2005-02-28 10:24 ` Paul Argentoff
2005-02-28 10:39 ` Gerd Stolpmann
0 siblings, 1 reply; 12+ messages in thread
From: Paul Argentoff @ 2005-02-28 10:24 UTC (permalink / raw)
To: Gerd Stolpmann; +Cc: caml-list
Dear Gerd Stolpmann,
Let GS = "Gerd Stolpmann" in
written_by GS =>
GS> some_fun can even be another pxp_evtree automaton.
pxp_evtree? That sounds a bit new. I cannot find such a notation in PXP
1.95. Or you're speaking figuratively?
--
Yours truly, WBR, Paul Argentoff.
Jabber: paul@jabber.rtelekom.ru
RIPE: PA1291-RIPE
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: :pxp_evpull notation
2005-02-28 10:24 ` :pxp_evpull notation Paul Argentoff
@ 2005-02-28 10:39 ` Gerd Stolpmann
2005-02-28 11:00 ` Paul Argentoff
0 siblings, 1 reply; 12+ messages in thread
From: Gerd Stolpmann @ 2005-02-28 10:39 UTC (permalink / raw)
To: Paul Argentoff; +Cc: caml-list
Am Montag, den 28.02.2005, 13:24 +0300 schrieb Paul Argentoff:
> Dear Gerd Stolpmann,
>
> Let GS = "Gerd Stolpmann" in
> written_by GS =>
>
> GS> some_fun can even be another pxp_evtree automaton.
>
> pxp_evtree? That sounds a bit new. I cannot find such a notation in PXP
> 1.95. Or you're speaking figuratively?
Sorry, I meant pxp_evpull.
Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de
------------------------------------------------------------
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-02-28 11:00 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-22 17:07 yet another silly question on PXP Paul Argentoff
2005-02-22 17:34 ` [Caml-list] " Jerome Simeon
2005-02-22 18:25 ` Paul Argentoff
2005-02-22 19:03 ` Gerd Stolpmann
2005-02-24 7:49 ` Paul Argentoff
2005-02-24 12:11 ` Paul Argentoff
2005-02-25 7:35 ` Paul Argentoff
2005-02-25 16:14 ` :pxp_evpull notation (was: yet another silly question on PXP) Paul Argentoff
2005-02-27 19:05 ` Gerd Stolpmann
2005-02-28 10:24 ` :pxp_evpull notation Paul Argentoff
2005-02-28 10:39 ` Gerd Stolpmann
2005-02-28 11:00 ` Paul Argentoff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox