* xpath or alternatives @ 2009-09-28 12:17 Richard Jones 2009-09-28 12:48 ` [Caml-list] " Yaron Minsky 2009-09-30 13:39 ` Stefano Zacchiroli 0 siblings, 2 replies; 21+ messages in thread From: Richard Jones @ 2009-09-28 12:17 UTC (permalink / raw) To: caml-list I need to do some relatively simple extraction of fields from an XML document. In Perl I would use xpath, very specifically if $xml was an XML document[1] stored as a string, then: my $p = XML::XPath->new (xml => $xml); my @disks = $p->findnodes ('//devices/disk/source/@dev'); push (@disks, $p->findnodes ('//devices/disk/source/@file')); This isn't type safe or pretty, but it is very easy to use for quick and dirty extraction. What is the OCaml equivalent for this sort of code? Alain Frisch has a library called Xpath (http://alain.frisch.fr/soft.html#xpath), but unfortunately this relies on the now obsolete wlex program. Is there a completely alternative way to do this? Better still, in 3 lines of code?? Rich. [1] for XML doc, see: http://libvirt.org/formatdomain.html -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-28 12:17 xpath or alternatives Richard Jones @ 2009-09-28 12:48 ` Yaron Minsky 2009-09-28 15:06 ` Till Varoquaux 2009-09-30 13:39 ` Stefano Zacchiroli 1 sibling, 1 reply; 21+ messages in thread From: Yaron Minsky @ 2009-09-28 12:48 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list I don't have the code in front of me, but I've done something like this using the list monad. i.e., using bind (= concat-map) and map chained together, along with a couple operators I wrote for lifting bits of XML documents into lists, by say returning the subnodes of the present node as a list. It was quite effective. I got the inspiration from a similar tool we have for navigating s-expressions, which we should release at some point... Yaron Minsky On Sep 28, 2009, at 8:17 AM, Richard Jones <rich@annexia.org> wrote: > > I need to do some relatively simple extraction of fields from an XML > document. In Perl I would use xpath, very specifically if $xml was an > XML document[1] stored as a string, then: > > my $p = XML::XPath->new (xml => $xml); > my @disks = $p->findnodes ('//devices/disk/source/@dev'); > push (@disks, $p->findnodes ('//devices/disk/source/@file')); > > This isn't type safe or pretty, but it is very easy to use for quick > and dirty extraction. > > What is the OCaml equivalent for this sort of code? > > Alain Frisch has a library called Xpath > (http://alain.frisch.fr/soft.html#xpath), but unfortunately this > relies on the now obsolete wlex program. > > Is there a completely alternative way to do this? Better still, in 3 > lines of code?? > > Rich. > > [1] for XML doc, see: http://libvirt.org/formatdomain.html > > -- > Richard Jones > Red Hat > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-28 12:48 ` [Caml-list] " Yaron Minsky @ 2009-09-28 15:06 ` Till Varoquaux 2009-09-29 23:00 ` Mikkel Fahnøe Jørgensen 0 siblings, 1 reply; 21+ messages in thread From: Till Varoquaux @ 2009-09-28 15:06 UTC (permalink / raw) To: Yaron Minsky; +Cc: Richard Jones, caml-list There are a few projects out here: xtisp http://www.xtisp.org xstream http://yquem.inria.fr/~frisch/xstream/ and of course the good old cduce/xduce/ocamlduce. All in all naive querying is not hard and tree automata: (e.g.) http://www.grappa.univ-lille3.fr/~filiot/tata/ can provide a good middle ground between efficiency and simplicity. The problem you might run into is that XML is a tricky format to deal with and some of these tools will choke up on complex files (namespaces,switching character encoding, weird entities in the DTD etc..). Till P.S.: Alain has a good paper on how to compile queries (as done in cduce). I am just too lazy to look for it. On Mon, Sep 28, 2009 at 8:48 AM, Yaron Minsky <yminsky@gmail.com> wrote: > I don't have the code in front of me, but I've done something like this > using the list monad. i.e., using bind (= concat-map) and map chained > together, along with a couple operators I wrote for lifting bits of XML > documents into lists, by say returning the subnodes of the present node as a > list. > > It was quite effective. I got the inspiration from a similar tool we have > for navigating s-expressions, which we should release at some point... > > Yaron Minsky > > On Sep 28, 2009, at 8:17 AM, Richard Jones <rich@annexia.org> wrote: > >> >> I need to do some relatively simple extraction of fields from an XML >> document. In Perl I would use xpath, very specifically if $xml was an >> XML document[1] stored as a string, then: >> >> my $p = XML::XPath->new (xml => $xml); >> my @disks = $p->findnodes ('//devices/disk/source/@dev'); >> push (@disks, $p->findnodes ('//devices/disk/source/@file')); >> >> This isn't type safe or pretty, but it is very easy to use for quick >> and dirty extraction. >> >> What is the OCaml equivalent for this sort of code? >> >> Alain Frisch has a library called Xpath >> (http://alain.frisch.fr/soft.html#xpath), but unfortunately this >> relies on the now obsolete wlex program. >> >> Is there a completely alternative way to do this? Better still, in 3 >> lines of code?? >> >> Rich. >> >> [1] for XML doc, see: http://libvirt.org/formatdomain.html >> >> -- >> Richard Jones >> Red Hat >> >> _______________________________________________ >> Caml-list mailing list. Subscription management: >> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list >> Archives: http://caml.inria.fr >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-28 15:06 ` Till Varoquaux @ 2009-09-29 23:00 ` Mikkel Fahnøe Jørgensen 2009-09-30 10:16 ` Richard Jones 0 siblings, 1 reply; 21+ messages in thread From: Mikkel Fahnøe Jørgensen @ 2009-09-29 23:00 UTC (permalink / raw) To: Till Varoquaux; +Cc: Yaron Minsky, caml-list, Richard Jones In line with what Yaron suggests, you can use a combinator parser. I do this to parse json, and this parser could be adapted to xml by focusing on basic syntax and ignoring the details, or you could prefilter xml and use the json parser directly. See the Fleece parser embedded here: There is also the object abstraction that dives into an object hierarchy after parsing, see the Objects module. The combination of these two makes it quite easy to work on structured data, but 3 lines only come after some xml adaptation work - but you can see many one-liner json access in the last part of the file. http://git.dvide.com/pub/symbiosis/tree/myocamlbuild_config.ml Otherwise there is xmlm which is self-contained in single xml file, and as I recall, has some sort of zipper navigator. (I initially intended to use it before deciding on the json format): http://erratique.ch/software/xmlm ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-29 23:00 ` Mikkel Fahnøe Jørgensen @ 2009-09-30 10:16 ` Richard Jones 2009-09-30 10:36 ` Sebastien Mondet ` (3 more replies) 0 siblings, 4 replies; 21+ messages in thread From: Richard Jones @ 2009-09-30 10:16 UTC (permalink / raw) To: Mikkel Fahnøe Jørgensen; +Cc: Till Varoquaux, Yaron Minsky, caml-list On Wed, Sep 30, 2009 at 01:00:15AM +0200, Mikkel Fahnøe Jørgensen wrote: > In line with what Yaron suggests, you can use a combinator parser. > > I do this to parse json, and this parser could be adapted to xml by > focusing on basic syntax and ignoring the details, or you could > prefilter xml and use the json parser directly. > > See the Fleece parser embedded here: > > There is also the object abstraction that dives into an object > hierarchy after parsing, see the Objects module. The combination of > these two makes it quite easy to work on structured data, but 3 lines > only come after some xml adaptation work - but you can see many > one-liner json access in the last part of the file. > > http://git.dvide.com/pub/symbiosis/tree/myocamlbuild_config.ml > > Otherwise there is xmlm which is self-contained in single xml file, > and as I recall, has some sort of zipper navigator. (I initially > intended to use it before deciding on the json format): > > http://erratique.ch/software/xmlm It's interesting you mention xmlm, because I couldn't write the code using xmlm at all. The discussion here has got quite theoretical, but it's not helping me to write the original 3 lines of Perl in OCaml. my $p = XML::XPath->new (xml => $xml); my @disks = $p->findnodes ('//devices/disk/source/@dev'); push (@disks, $p->findnodes ('//devices/disk/source/@file')); My best effort, using xml-light, is around 40 lines: http://git.et.redhat.com/?p=libguestfs.git;a=blob;f=ocaml/examples/viewer.ml;h=ef6627b1b92a4fff7d4fa1fa4aca63eeffc05ece;hb=HEAD#l322 Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 10:16 ` Richard Jones @ 2009-09-30 10:36 ` Sebastien Mondet 2009-09-30 10:49 ` Mikkel Fahnøe Jørgensen ` (2 subsequent siblings) 3 siblings, 0 replies; 21+ messages in thread From: Sebastien Mondet @ 2009-09-30 10:36 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list > The discussion here has got quite theoretical, but it's not helping > me to write the original 3 lines of Perl in OCaml. > > my $p = XML::XPath->new (xml => $xml); > my @disks = $p->findnodes ('//devices/disk/source/@dev'); > push (@disks, $p->findnodes ('//devices/disk/source/@file')); > > My best effort, using xml-light, is around 40 lines: > > http://git.et.redhat.com/?p=libguestfs.git;a=blob;f=ocaml/examples/viewer.ml;h=ef6627b1b92a4fff7d4fa1fa4aca63eeffc05ece;hb=HEAD#l322 > Galax is (or was ??) an XQuery implementation in ocaml and XPath 2.0 is included in XQuery... so maybe you can use it... the site does not seem to respond now: http://www.galaxquery.org/ but there is a debian package: http://upsilon.cc/~zack/blog/posts/2008/02/galax_in_debian/ > Rich. > > -- > Richard Jones > Red Hat > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 10:16 ` Richard Jones 2009-09-30 10:36 ` Sebastien Mondet @ 2009-09-30 10:49 ` Mikkel Fahnøe Jørgensen 2009-09-30 11:05 ` Dario Teixeira 2009-10-28 2:22 ` Daniel Bünzli 3 siblings, 0 replies; 21+ messages in thread From: Mikkel Fahnøe Jørgensen @ 2009-09-30 10:49 UTC (permalink / raw) To: Richard Jones; +Cc: Till Varoquaux, Yaron Minsky, caml-list 2009/9/30 Richard Jones <rich@annexia.org>: > On Wed, Sep 30, 2009 at 01:00:15AM +0200, Mikkel Fahnøe Jørgensen wrote: >> In line with what Yaron suggests, you can use a combinator parser. > It's interesting you mention xmlm, because I couldn't write > the code using xmlm at all. If you can manage to convert an xml document into a json like tagged tree structure, then a simple solution like module Value = struct 56 type value_type = 57 Object of (string * value_type) list 58 | Array of value_type list 59 | String of string 60 | Int of int 61 | Float of float 62 | Bool of bool 63 | Null 64 end 65 .. 665 let get_object v = match v with Object x -> x 666 | _ -> fail "json object expected" .. 685 let pattern_path value names = 686 let rec again value = function 687 | "*" :: names -> List.iter (fun (n, v) -> try again v names 688 with Invalid_argument _ | Not_found -> ()) (get_object value) 689 | name :: names -> again (List.assoc name (get_object value)) names 690 | [] -> raise (Found value) 691 in try again value names; raise Not_found with Found value -> value 692 combined with a path split function 22 let split c s = 23 let n = String.length s in 24 let rec again i lst = 25 begin try let k = String.rindex_from s i c in 26 again (k - 1) ((if i = k then "" else (String.sub s (k + 1) (i - k))) :: lst) 27 with _ -> (String.sub s 0 (i + 1)) :: lst 28 end 29 in again (n - 1) [] will do almost exactly what you are asking for - notice the "*" searches broadly in all subtrees. You can add your own xpath like functions as you discover a need for them. I believe that the xmlm examples has a tree transformation operation that would easily be adapted to produce a json like tree, if modified a little. let out_tree o t = let frag = function | E (tag, childs) -> `El (tag, childs) | D d -> `Data d in Xmlm.output_doc_tree frag o t > My best effort, using xml-light, is around 40 lines: If you spend those 40 lines on a layer on top of a lightweight xml parser, you might get away with 3 lines the next time. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 10:16 ` Richard Jones 2009-09-30 10:36 ` Sebastien Mondet 2009-09-30 10:49 ` Mikkel Fahnøe Jørgensen @ 2009-09-30 11:05 ` Dario Teixeira 2009-09-30 11:57 ` Richard Jones 2009-10-28 2:22 ` Daniel Bünzli 3 siblings, 1 reply; 21+ messages in thread From: Dario Teixeira @ 2009-09-30 11:05 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list Hi, Ocamlduce has been mentioned before in this thread, but I didn't catch the reason why it has been discarded as a solution. Is it because you don't want to carry the extra (large) dependency, or is there some other reason? And on the subject of simple XML parsers for Ocaml, there's also the aptly named Simplexmlparser from the Ocsigen project [1]. It's about as spartan as one can conceive, yet sufficient for a large subset of XML extraction tasks. Cheers, Dario Teixeira [1] http://ocsigen.org/docu/1.2.0/Simplexmlparser.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 11:05 ` Dario Teixeira @ 2009-09-30 11:57 ` Richard Jones 2009-09-30 12:59 ` Richard Jones 0 siblings, 1 reply; 21+ messages in thread From: Richard Jones @ 2009-09-30 11:57 UTC (permalink / raw) To: Dario Teixeira; +Cc: caml-list On Wed, Sep 30, 2009 at 04:05:03AM -0700, Dario Teixeira wrote: > Hi, > > Ocamlduce has been mentioned before in this thread, but I didn't catch > the reason why it has been discarded as a solution. Is it because you > don't want to carry the extra (large) dependency, or is there some other > reason? Actually the reason is that I thought it wasn't available for 3.11.1, but I just checked the website and it is, and ocamlduce does seem to be the obvious solution for this problem. (However I'll need to try and see if I can come up with the equivalent code). > And on the subject of simple XML parsers for Ocaml, there's also the > aptly named Simplexmlparser from the Ocsigen project [1]. It's about > as spartan as one can conceive, yet sufficient for a large subset of > XML extraction tasks. > > [1] http://ocsigen.org/docu/1.2.0/Simplexmlparser.html Thanks - but if I understand that page correctly, then isn't it just parsing XML into a tree? Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 11:57 ` Richard Jones @ 2009-09-30 12:59 ` Richard Jones 2009-09-30 13:33 ` Till Varoquaux 0 siblings, 1 reply; 21+ messages in thread From: Richard Jones @ 2009-09-30 12:59 UTC (permalink / raw) To: caml-list On Wed, Sep 30, 2009 at 12:57:23PM +0100, Richard Jones wrote: > On Wed, Sep 30, 2009 at 04:05:03AM -0700, Dario Teixeira wrote: > > Hi, > > > > Ocamlduce has been mentioned before in this thread, but I didn't catch > > the reason why it has been discarded as a solution. Is it because you > > don't want to carry the extra (large) dependency, or is there some other > > reason? > > Actually the reason is that I thought it wasn't available for 3.11.1, > but I just checked the website and it is, and ocamlduce does seem to > be the obvious solution for this problem. (However I'll need to try > and see if I can come up with the equivalent code). Do any cduce developers want to give me a clue here? It would seem like I need something along these lines: let devs = match xml with | {{ <domain>[<devices>[<source dev=(String & dev) ..>[]]] }} -> dev | {{ <domain>[<devices>[<source file=(String & file) ..>[]]] }} -> file in However according to the compiler, devs has type <XML>. In any case, I think I may need either the map or map* operator, since I want to match all, not just the first one. Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 12:59 ` Richard Jones @ 2009-09-30 13:33 ` Till Varoquaux 2009-09-30 14:01 ` Richard Jones 0 siblings, 1 reply; 21+ messages in thread From: Till Varoquaux @ 2009-09-30 13:33 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list OCamlduce (Alain correct me if I am wrong) basically maintains two separate type systems side by side (the Xduce one and the Ocaml one). This is done in order to make Ocamlduce maintainable by keeping a clear separation. As a result you have to explicitly convert values between type systems using {:...:}. These casts are type safe but do lead to some work at runtime. Also note that ocaml's string are Latin1 and not String in the XML world. So: let devs = match xml with | {{ <domain>[<devices>[<source dev=(Latin1 & dev) ..>[]]] }} -> {:dev:} | {{ <domain>[<devices>[<source file=(Latin1 & file) ..>[]]] }} -> {:file:} in Should work (I'm rusty and have nothing to check handy). Till On Wed, Sep 30, 2009 at 8:59 AM, Richard Jones <rich@annexia.org> wrote: > On Wed, Sep 30, 2009 at 12:57:23PM +0100, Richard Jones wrote: >> On Wed, Sep 30, 2009 at 04:05:03AM -0700, Dario Teixeira wrote: >> > Hi, >> > >> > Ocamlduce has been mentioned before in this thread, but I didn't catch >> > the reason why it has been discarded as a solution. Is it because you >> > don't want to carry the extra (large) dependency, or is there some other >> > reason? >> >> Actually the reason is that I thought it wasn't available for 3.11.1, >> but I just checked the website and it is, and ocamlduce does seem to >> be the obvious solution for this problem. (However I'll need to try >> and see if I can come up with the equivalent code). > > Do any cduce developers want to give me a clue here? It would seem > like I need something along these lines: > > let devs = match xml with > | {{ <domain>[<devices>[<source dev=(String & dev) ..>[]]] }} -> dev > | {{ <domain>[<devices>[<source file=(String & file) ..>[]]] }} -> file in > > However according to the compiler, devs has type <XML>. In any case, > I think I may need either the map or map* operator, since I want to > match all, not just the first one. > > Rich. > > -- > Richard Jones > Red Hat > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 13:33 ` Till Varoquaux @ 2009-09-30 14:01 ` Richard Jones 2009-09-30 14:28 ` Till Varoquaux 2009-09-30 14:51 ` Alain Frisch 0 siblings, 2 replies; 21+ messages in thread From: Richard Jones @ 2009-09-30 14:01 UTC (permalink / raw) To: caml-list On Wed, Sep 30, 2009 at 09:33:07AM -0400, Till Varoquaux wrote: > OCamlduce (Alain correct me if I am wrong) basically maintains two > separate type systems side by side (the Xduce one and the Ocaml one). > This is done in order to make Ocamlduce maintainable by keeping a > clear separation. As a result you have to explicitly convert values > between type systems using {:...:}. These casts are type safe but do > lead to some work at runtime. > > Also note that ocaml's string are Latin1 and not String in the XML world. So: > > let devs = match xml with > | {{ <domain>[<devices>[<source dev=(Latin1 & dev) ..>[]]] }} -> {:dev:} > | {{ <domain>[<devices>[<source file=(Latin1 & file) ..>[]]] }} -> > {:file:} in > > Should work (I'm rusty and have nothing to check handy). I tried variations on the above, but couldn't get it to work. ocamlduce is very fond of a mysterious error called "Error: Subtyping failed", which is very difficult for me to understand, and therefore must be absolutely impossible for someone not used to strong typing. This is where I'm heading at the moment (sorry, my previous example missed a <disk> level inside <devices>), so: let xml = from_string xml in prerr_endline (Ocamlduce.to_string xml); let devs = {{ map [xml] with | <domain..>[<devices..>[<disk..>[<source dev=(Latin1 & s) ..>_]]] | <domain..>[<devices..>[<disk..>[<source file=(Latin1 & s) ..>_]]] -> [s] | _ -> [] }} in prerr_endline (Ocamlduce.to_string devs); +1 : this compiles -1 : it doesn't work, devs is empty This is what the first prerr_endline prints: <domain type="kvm" id="2">[ <name>[ 'CentOS5x32' ] <uuid>[ '2ce397d9-1931-feb1-8ad8-15f22c4f18af' ] <memory>[ '524288' ] <currentMemory>[ '524288' ] <vcpu>[ '1' ] <os>[ <type arch="x86_64" machine="pc-0.11">[ 'hvm' ] <boot dev="hd">[ ] ] <features>[ <acpi>[ ] <apic>[ ] <pae>[ ] ] <clock offset="utc">[ ] <on_poweroff>[ 'destroy' ] <on_reboot>[ 'restart' ] <on_crash>[ 'restart' ] <devices>[ <emulator>[ '/usr/bin/qemu-kvm' ] <disk type="block" device="disk">[ <source dev="/dev/vg_trick/CentOS5x32">[ ] <target bus="ide" dev="hda">[ ] ] <interface type="network">[ <mac address="54:52:00:3c:76:11">[ ] <source network="default">[ ] <target dev="vnet0">[ ] ] <serial type="pty">[ <source path="/dev/pts/7">[ ] <target port="0">[ ] ] <console type="pty" tty="/dev/pts/7">[ <source path="/dev/pts/7">[ ] <target port="0">[ ] ] <input type="mouse" bus="ps2">[ ] <graphics autoport="yes" port="5900" type="vnc">[ ] <video>[ <model type="cirrus" vram="9216" heads="1">[ ] ] ] ] and what the second prerr_endline prints: "" Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 14:01 ` Richard Jones @ 2009-09-30 14:28 ` Till Varoquaux 2009-09-30 14:51 ` Alain Frisch 1 sibling, 0 replies; 21+ messages in thread From: Till Varoquaux @ 2009-09-30 14:28 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list If I am not mistaken you are selecting a domain whose first child is a device node whose only child is disk node ... instead of: <domain..>[<devices..>[<disk..>[<source dev=(Latin1 & s) ..>_]]] you should aim for something in the vein of: <domain ..> [_* (<devices..> (<disk..>(<source dev=(Latin1 & s)>| <souce file = (Latin1 &s)>_)* |_)* _*] Till On Wed, Sep 30, 2009 at 10:01 AM, Richard Jones <rich@annexia.org> wrote: > On Wed, Sep 30, 2009 at 09:33:07AM -0400, Till Varoquaux wrote: >> OCamlduce (Alain correct me if I am wrong) basically maintains two >> separate type systems side by side (the Xduce one and the Ocaml one). >> This is done in order to make Ocamlduce maintainable by keeping a >> clear separation. As a result you have to explicitly convert values >> between type systems using {:...:}. These casts are type safe but do >> lead to some work at runtime. >> >> Also note that ocaml's string are Latin1 and not String in the XML world. So: >> >> let devs = match xml with >> | {{ <domain>[<devices>[<source dev=(Latin1 & dev) ..>[]]] }} -> {:dev:} >> | {{ <domain>[<devices>[<source file=(Latin1 & file) ..>[]]] }} -> >> {:file:} in >> >> Should work (I'm rusty and have nothing to check handy). > > I tried variations on the above, but couldn't get it to work. > ocamlduce is very fond of a mysterious error called "Error: Subtyping > failed", which is very difficult for me to understand, and therefore > must be absolutely impossible for someone not used to strong typing. > > This is where I'm heading at the moment (sorry, my previous > example missed a <disk> level inside <devices>), so: > > let xml = from_string xml in > prerr_endline (Ocamlduce.to_string xml); > > let devs = {{ map [xml] with > | <domain..>[<devices..>[<disk..>[<source dev=(Latin1 & s) ..>_]]] > | <domain..>[<devices..>[<disk..>[<source file=(Latin1 & s) ..>_]]] -> [s] > | _ -> [] }} in > prerr_endline (Ocamlduce.to_string devs); > > +1 : this compiles > -1 : it doesn't work, devs is empty > > This is what the first prerr_endline prints: > > <domain > type="kvm" > id="2">[ > <name>[ 'CentOS5x32' ] > <uuid>[ '2ce397d9-1931-feb1-8ad8-15f22c4f18af' ] > <memory>[ '524288' ] > <currentMemory>[ '524288' ] > <vcpu>[ '1' ] > <os>[ <type arch="x86_64" machine="pc-0.11">[ 'hvm' ] <boot dev="hd">[ ] ] > <features>[ <acpi>[ ] <apic>[ ] <pae>[ ] ] > <clock offset="utc">[ ] > <on_poweroff>[ 'destroy' ] > <on_reboot>[ 'restart' ] > <on_crash>[ 'restart' ] > <devices>[ > <emulator>[ '/usr/bin/qemu-kvm' ] > <disk > type="block" > device="disk">[ > <source dev="/dev/vg_trick/CentOS5x32">[ ] > <target bus="ide" dev="hda">[ ] > ] > <interface > type="network">[ > <mac address="54:52:00:3c:76:11">[ ] > <source network="default">[ ] > <target dev="vnet0">[ ] > ] > <serial type="pty">[ <source path="/dev/pts/7">[ ] <target port="0">[ ] ] > <console > type="pty" > tty="/dev/pts/7">[ > <source path="/dev/pts/7">[ ] > <target port="0">[ ] > ] > <input type="mouse" bus="ps2">[ ] > <graphics autoport="yes" port="5900" type="vnc">[ ] > <video>[ <model type="cirrus" vram="9216" heads="1">[ ] ] > ] > ] > > and what the second prerr_endline prints: > > "" > > Rich. > > -- > Richard Jones > Red Hat > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 14:01 ` Richard Jones 2009-09-30 14:28 ` Till Varoquaux @ 2009-09-30 14:51 ` Alain Frisch 2009-09-30 15:09 ` Richard Jones 1 sibling, 1 reply; 21+ messages in thread From: Alain Frisch @ 2009-09-30 14:51 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list Richard Jones wrote: > let devs = {{ map [xml] with > | <domain..>[<devices..>[<disk..>[<source dev=(Latin1 & s) ..>_]]] > | <domain..>[<devices..>[<disk..>[<source file=(Latin1 & s) ..>_]]] -> [s] > | _ -> [] }} in The following should work: let l = {{ [xml] }} in let l = {{ map l with <domain..>l -> l | _ -> [] }} in let l = {{ map l with <devices..>l -> l | _ -> [] }} in let l = {{ map l with <disk..>l -> l | _ -> [] }} in let l = {{ map l with <source dev=(Latin1 & s) ..>_ | <source file=(Latin1 & s) ..>_-> s | _ -> [] }} in ... let () = let l = {{ [xml] }} in let l = {{ (((l.(<domain..>_)) / .(<devices..>_)) / .(<disk..>_)) / }} in let l = {{ map l with <source dev=(Latin1 & s) ..>_ | <source file=(Latin1 & s) ..>_ -> s | _ -> [] }} in .. This uses the constructions e/ and e.(t) as described in the manual. That said, using OCamlDuce for this kind of XML data-extraction seems just crazy to me. Cheers, Alain ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 14:51 ` Alain Frisch @ 2009-09-30 15:09 ` Richard Jones 2009-09-30 15:18 ` Alain Frisch 0 siblings, 1 reply; 21+ messages in thread From: Richard Jones @ 2009-09-30 15:09 UTC (permalink / raw) To: Alain Frisch; +Cc: caml-list On Wed, Sep 30, 2009 at 04:51:01PM +0200, Alain Frisch wrote: > Richard Jones wrote: > > let devs = {{ map [xml] with > > | <domain..>[<devices..>[<disk..>[<source dev=(Latin1 & s) ..>_]]] > > | <domain..>[<devices..>[<disk..>[<source file=(Latin1 & s) ..>_]]] -> > > [s] > > | _ -> [] }} in > > The following should work: > > let l = {{ [xml] }} in > let l = {{ map l with <domain..>l -> l | _ -> [] }} in > let l = {{ map l with <devices..>l -> l | _ -> [] }} in > let l = {{ map l with <disk..>l -> l | _ -> [] }} in > let l = {{ map l with <source dev=(Latin1 & s) ..>_ > | <source file=(Latin1 & s) ..>_-> s > | _ -> [] }} in > ... > > let () = > let l = {{ [xml] }} in > let l = {{ (((l.(<domain..>_)) / .(<devices..>_)) / .(<disk..>_)) / }} in > let l = {{ map l with <source dev=(Latin1 & s) ..>_ > | <source file=(Latin1 & s) ..>_ -> s > | _ -> [] }} in > .. Thanks Alain. My latest attempt was similar to your version 1 above, and it works :-) Now my code looks like your version 2: let xml = from_string xml in let xs = {{ [xml] }} in let xs = {{ (((xs.(<domain..>_)) / .(<devices..>_)) / .(<disk..>_)) / }} in let xs = {{ map xs with | <source dev=(Latin1 & s) ..>_ | <source file=(Latin1 & s) ..>_ -> [s] | _ -> [] }} in {: xs :} (plus the boilerplate for interfacing xml-light and CDuce). We're getting close to the xpath/perl solution (8 lines vs 3 lines), with some added type safety and the possibility of validating the XML. On the other hand, the code is hard to understand. It's not clear to me what the .( ) syntax means, nor why there is an apparently trailing / character. > This uses the constructions e/ and e.(t) as described in the manual. > > That said, using OCamlDuce for this kind of XML data-extraction seems > just crazy to me. I have some comments: (A) "Subtyping failed" is a very common error, but is only mentioned briefly in the manual. I have no idea what these errors mean, so they should have more explanation. Here is a simple one which was caused by me using a value instead of a list (but that is not at all obvious from the error message): Error: Subtyping failed Latin1 <= [ Latin1* ] Sample: [ Latin1Char ] (B) I think the interfacing code here: http://yquem.inria.fr/~frisch/ocamlcduce/samples/expat/ http://yquem.inria.fr/~frisch/ocamlcduce/samples/pxp/ http://yquem.inria.fr/~frisch/ocamlcduce/samples/xmllight/ should be distributed along with ocamlduce. Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 15:09 ` Richard Jones @ 2009-09-30 15:18 ` Alain Frisch 0 siblings, 0 replies; 21+ messages in thread From: Alain Frisch @ 2009-09-30 15:18 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list Richard Jones wrote: > On the other hand, the code is hard to understand. It's not clear to > me what the .( ) syntax means, nor why there is an apparently trailing > / character. From the manual: If the x-expression e evaluates to an x-sequence, the construction e/ will result in a new x-sequence obtained by taking in order all the children of the XML elements from the sequence e. For instance, the x-expression [<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ]/ evaluates to the x-value [ 1 2 3 6 7 8 ]. If the x-expression e evaluates to an x-sequence, the construction e.(t) (where t is an x-type) will result in a new x-sequence obtained by filtering e to keep only the elements of type t. For instance, the x-expression [<a>[ 1 2 3 ] 4 5 <b>[ 6 7 8 ] ].(Int) evaluates to the x-value [ 4 5 ]. > I have some comments: > > (A) "Subtyping failed" is a very common error, but is only mentioned > briefly in the manual. I have no idea what these errors mean, so they > should have more explanation. Here is a simple one which was caused > by me using a value instead of a list (but that is not at all obvious > from the error message): > > Error: Subtyping failed Latin1 <= [ Latin1* ] > Sample: > [ Latin1Char ] The error tells you that Latin1 is not a subtype of [ Latin1* ]. It probably means that you are trying to use a value of type Latin1 where a value of type [ Latin1* ] is expected. > (B) I think the interfacing code here: > > http://yquem.inria.fr/~frisch/ocamlcduce/samples/expat/ > http://yquem.inria.fr/~frisch/ocamlcduce/samples/pxp/ > http://yquem.inria.fr/~frisch/ocamlcduce/samples/xmllight/ > > should be distributed along with ocamlduce. There was a GODI package that includes them. It would be ok to put these files in the distribution without compiling them (otherwise it would create a dependency on more OCaml packages). It's up to Stéphane Glondu, the new maintainer of OCamlDuce. Cheers, Alain ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 10:16 ` Richard Jones ` (2 preceding siblings ...) 2009-09-30 11:05 ` Dario Teixeira @ 2009-10-28 2:22 ` Daniel Bünzli 3 siblings, 0 replies; 21+ messages in thread From: Daniel Bünzli @ 2009-10-28 2:22 UTC (permalink / raw) To: Richard Jones, Mikkel Fahnøe Jørgensen; +Cc: caml-list Sorry for the late reply. On Wed, Sep 30, 2009 at 01:00:15AM +0200, Mikkel Fahnøe Jørgensen wrote: > Otherwise there is xmlm which is self-contained in single xml file, > and as I recall, has some sort of zipper navigator. (I initially > intended to use it before deciding on the json format): The cursor api was removed from the library in 1.0.0. On Wed, Sep 30, 2009 at 6:16 PM, Richard Jones <rich@annexia.org> wrote: > It's interesting you mention xmlm, because I couldn't write > the code using xmlm at all. Why ? That doesn't feel like an insurmontable task. Below is a function that extracts from a (sub)tree's sequence of signals the attributes' data of an absolute path (i.e. the particular xpath pattern you're after if I understand correctly). Each attribute's data is stored in a separate list. The function is simpler than it looks, in essence it's just a recursive case analysis on signals. In the function [aux], [pos] maintains the current path in the parse tree. [mismatch] counts the level of mismatch w.r.t. the [path] we are looking for. let absolute_path_atts i path atts = let rec aux i pos mismatch path accs = match Xmlm.input i with | `El_start (tag, atts) -> if mismatch > 0 then aux i (tag :: pos) (mismatch + 1) path accs else begin match path with | n :: path' when n = tag -> if path' <> [] then aux i (tag :: pos) 0 path' accs else let update_acc ((att, acc) as v) = try att, (List.assoc att atts) :: acc with Not_found -> v in aux i (tag :: pos) 0 [] (List.map update_acc accs) | _ -> aux i (tag :: pos) (mismatch + 1) path accs end | `El_end -> begin match pos with | _ :: [] -> List.rev_map (fun (att, acc) -> List.rev acc) accs | tag :: pos' -> if mismatch > 0 then aux i pos' (mismatch - 1) path accs else aux i pos' 0 (tag :: path) accs | [] -> assert false end | `Data _ -> aux i pos mismatch path accs | `Dtd _ -> assert false in let accs = List.rev_map (fun att -> att, []) atts in begin match Xmlm.peek i with | `El_start _ -> aux i [] 0 path accs | `Dtd _ | `El_end | `Data _ -> invalid_arg "no subtree here" end Now your function becomes something like this : let get_devices_from_xml xml = try let i = Xmlm.make_input (`String (0, xml)) in ignore (Xmlm.input i); (* `Dtd signal *) let path = ["", "domain"; "","devices"; "", "disk"; "", "source"] in match absolute_path_atts i path ["", "dev"; "", "file"] with | [devs; files] when Xmlm.eoi i -> devs @ files | _ -> failwith "xml document not well-formed" with | Xmlm.Error ((l,c), e) -> failwith (Printf.sprintf "%d:%d: %s" l c (Xmlm.error_message e)) I know this is still more effort than you'd like, but Xmlm is purposedly low-level and will remain. It provides only a robust xmlm parser convenient (I believe) to develop higher-level abstractions to process the insane uses of this standard. It would be nice to develop a module using xmlm to provide a (non-camlp4) dsl for xml queries. Unfortunately I do not have the time for that at the moment (unless someone wants to fund me to do that...). Best, Daniel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-28 12:17 xpath or alternatives Richard Jones 2009-09-28 12:48 ` [Caml-list] " Yaron Minsky @ 2009-09-30 13:39 ` Stefano Zacchiroli 2009-09-30 14:49 ` Gerd Stolpmann 1 sibling, 1 reply; 21+ messages in thread From: Stefano Zacchiroli @ 2009-09-30 13:39 UTC (permalink / raw) To: caml-list; +Cc: PXP Users ML On Mon, Sep 28, 2009 at 01:17:45PM +0100, Richard Jones wrote: > I need to do some relatively simple extraction of fields from an XML > document. In Perl I would use xpath, very specifically if $xml was an > XML document[1] stored as a string, then: > > my $p = XML::XPath->new (xml => $xml); > my @disks = $p->findnodes ('//devices/disk/source/@dev'); > push (@disks, $p->findnodes ('//devices/disk/source/@file')); I've just realized that this thread can look a bit ridiculous, at least for people used to other languages where XPath implementations can even be found in the language standard library (the best solutions we have thus far are: a 40-line xml-light solution, the need to use a modified version of the OCaml compiler [yes, I know, it is compatible, but still ...], Galax with unreachable homepage, ...). So, I was wondering, has anybody ever tried to develop an XPath implementation on top of, say, PXP? The original announcement page of PXP (now archived) mentions "rumors" about people which, back then, were developing it. Has anything ever been released? At first glance, it doesn't seem to exist any specific typing problem, at least with XPath 1.0, since the PXP node interface is already common for all node types. Sure XPath 2.0, when static typing is in use, can be better integrated with the language, but that's probably already happening in Galax. [ Cc-ing the PXP mailing list ] Cheers. -- Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7 zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/ Dietro un grande uomo c'è ..| . |. Et ne m'en veux pas si je te tutoie sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 13:39 ` Stefano Zacchiroli @ 2009-09-30 14:49 ` Gerd Stolpmann 2009-09-30 15:12 ` Stefano Zacchiroli 0 siblings, 1 reply; 21+ messages in thread From: Gerd Stolpmann @ 2009-09-30 14:49 UTC (permalink / raw) To: Stefano Zacchiroli; +Cc: caml-list, PXP Users ML Am Mittwoch, den 30.09.2009, 15:39 +0200 schrieb Stefano Zacchiroli: > On Mon, Sep 28, 2009 at 01:17:45PM +0100, Richard Jones wrote: > > I need to do some relatively simple extraction of fields from an XML > > document. In Perl I would use xpath, very specifically if $xml was an > > XML document[1] stored as a string, then: > > > > my $p = XML::XPath->new (xml => $xml); > > my @disks = $p->findnodes ('//devices/disk/source/@dev'); > > push (@disks, $p->findnodes ('//devices/disk/source/@file')); > > I've just realized that this thread can look a bit ridiculous, at least > for people used to other languages where XPath implementations can even > be found in the language standard library (the best solutions we have > thus far are: a 40-line xml-light solution, the need to use a modified > version of the OCaml compiler [yes, I know, it is compatible, but still > ...], Galax with unreachable homepage, ...). > > So, I was wondering, has anybody ever tried to develop an XPath > implementation on top of, say, PXP? The original announcement page of > PXP (now archived) mentions "rumors" about people which, back then, were > developing it. Has anything ever been released? No. However, there is a little XPath evaluator in SVN: https://godirepo.camlcity.org/svn/lib-pxp/trunk/src/pxp-engine/pxp_xpath.ml I have never found the time to complete it, and to add some syntax extension for painless use. But maybe somebody wants to take this over? Gerd > At first glance, it doesn't seem to exist any specific typing problem, > at least with XPath 1.0, since the PXP node interface is already common > for all node types. Sure XPath 2.0, when static typing is in use, can be > better integrated with the language, but that's probably already > happening in Galax. > > [ Cc-ing the PXP mailing list ] > > Cheers. > -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 14:49 ` Gerd Stolpmann @ 2009-09-30 15:12 ` Stefano Zacchiroli 2009-09-30 15:22 ` Jordan Schatz 0 siblings, 1 reply; 21+ messages in thread From: Stefano Zacchiroli @ 2009-09-30 15:12 UTC (permalink / raw) To: caml-list, PXP Users ML On Wed, Sep 30, 2009 at 04:49:37PM +0200, Gerd Stolpmann wrote: > No. However, there is a little XPath evaluator in SVN: > https://godirepo.camlcity.org/svn/lib-pxp/trunk/src/pxp-engine/pxp_xpath.ml Cool, and you have even already implemented all of the XPath 1.0 standard library! > I have never found the time to complete it, and to add some syntax > extension for painless use. But maybe somebody wants to take this > over? If I'm not mistaken, more than a syntax extension that evaluator needs a parser from concrete syntax to the abstract syntax you've already implemented. Once you have that, I don't think there is really a need of any syntax extension, what would be wrong in using it as follows: let nodes = xpath_eval ~xpath:(xpath "/foo/bar[2]/@baz") tree in let nodes2 = xpath_eval ~expr:"/foo/bar[2]/@baz" in ... we already use regexps this way and is more than handy. Or am I missing something here? I don't have energy to volunteer myself, but I duly note that Alain's old XPath implementation already contains a parser that can be reused (whereas the lexer should be changed, as already observed; most likely the lexer should be ported to Ulex). All in all, it is probably just a matter of integration work (modulo the limitations of the current evaluator, of course). Any volunteer? :-) Cheers. -- Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7 zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/ Dietro un grande uomo c'è ..| . |. Et ne m'en veux pas si je te tutoie sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Caml-list] xpath or alternatives 2009-09-30 15:12 ` Stefano Zacchiroli @ 2009-09-30 15:22 ` Jordan Schatz 0 siblings, 0 replies; 21+ messages in thread From: Jordan Schatz @ 2009-09-30 15:22 UTC (permalink / raw) To: caml-list, PXP Users ML I hope this is germane, I am very new to Ocaml. Do these help at all? http://packages.debian.org/sid/libxml-light-ocaml-dev http://tech.motion-twin.com/xmllight.html I expect it wouldn't be to difficult to write a wrapper around libxml http://xmlsoft.org/index.html -Jordan ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2009-10-28 2:22 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-09-28 12:17 xpath or alternatives Richard Jones 2009-09-28 12:48 ` [Caml-list] " Yaron Minsky 2009-09-28 15:06 ` Till Varoquaux 2009-09-29 23:00 ` Mikkel Fahnøe Jørgensen 2009-09-30 10:16 ` Richard Jones 2009-09-30 10:36 ` Sebastien Mondet 2009-09-30 10:49 ` Mikkel Fahnøe Jørgensen 2009-09-30 11:05 ` Dario Teixeira 2009-09-30 11:57 ` Richard Jones 2009-09-30 12:59 ` Richard Jones 2009-09-30 13:33 ` Till Varoquaux 2009-09-30 14:01 ` Richard Jones 2009-09-30 14:28 ` Till Varoquaux 2009-09-30 14:51 ` Alain Frisch 2009-09-30 15:09 ` Richard Jones 2009-09-30 15:18 ` Alain Frisch 2009-10-28 2:22 ` Daniel Bünzli 2009-09-30 13:39 ` Stefano Zacchiroli 2009-09-30 14:49 ` Gerd Stolpmann 2009-09-30 15:12 ` Stefano Zacchiroli 2009-09-30 15:22 ` Jordan Schatz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox