From: Drup <drupyog+caml@zoho.com>
To: "Anton Bachin" <antonbachin@yahoo.com>,
"François Bobot" <francois.bobot@cea.fr>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] [ANN] Lambda Soup - HTML scraping and rewriting with CSS selectors
Date: Mon, 23 Nov 2015 18:16:41 +0100 [thread overview]
Message-ID: <565349F9.6020405@zoho.com> (raw)
In-Reply-To: <98E819C0-76A2-4038-A5E6-DFBDC08DF7FA@yahoo.com>
There seems to be a slight misunderstanding about how tyxml is
constructed, so let me clarify things a bit.
- Tyxml doesn't have a canonical xml datatype, it's functorized over a
generic Xml signature (implemented in [Xml_sigs.T]). As far as tyxml is
concerned, xml nodes are a fully abstract type and can only be
constructed. Multiple modules implements this signature in the ocsigen
stack (two in js_of_ocaml's Tyxml_js, tree in eliom) that presents
different characteristics. In particular some of them are really
abstracts (React signals ...) and I doubt you could construct selectors
over them in a meaningful way (but I would be happy to be proven wrong).
- Another signature, [Xml_sigs.ITERABLE], implement global iteration
over xml trees. It is not necessary for an XML implementation used by
tyxml to respect it and, in particular, it is not implemented for
js_of_ocaml's Tyxml_js. As pointed out previously, it doesn't make sense
for all implementations, but we could implement it for some of them.
- There is no signature for mutation (at the moment). This may be an
interesting improvement.
- The [Xml] module implements a "bare" XML datatype that is not really
used by ocsigen, but can be used to build simple xml trees in a typeful
manner (and then print them). It also answers ITERABLE.
Now, in order to type lambda_soup using tyxml's types: It's going to be
a bit of work. You can perfectly reuse all tyxml's type, but you need
typeful combinators instead of strings, otherwise you have no way to
know what your selection is going to return. You may be able to cheat
your way through by creating a fake xml module and instantiate tyxml's
functors on it to create all the combinators (that would be fun :p)
In any case, you will pay typesafety by a significant increase in
verbosity and awkwardness. I'm not sure it's worth the effort, since a
lot of real world html trees are not correct and that you never really
need to select tyxml-constructed trees anyway. Simple compatibility with
tyxml is much easier: you just have to agree with tyxml's signatures
(which would deserve a bit of a cleanup).
[Xml_sigs.T]:
https://github.com/ocsigen/tyxml/blob/master/lib/xml_sigs.mli#L21
[Xml_sigs.ITERABLE]:
https://github.com/ocsigen/tyxml/blob/master/lib/xml_sigs.mli#L70
[Xml]: https://github.com/ocsigen/tyxml/blob/master/lib/xml.mli
[Tyxml_js]:
https://github.com/ocsigen/js_of_ocaml/blob/master/lib/tyxml/tyxml_js.mli
next prev parent reply other threads:[~2015-11-23 17:17 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-16 21:01 Anton Bachin
2015-11-17 9:31 ` François Bobot
2015-11-22 7:58 ` Anton Bachin
2015-11-23 10:44 ` François Bobot
2015-11-23 16:26 ` Anton Bachin
2015-11-23 17:16 ` Drup [this message]
2015-11-23 17:35 ` Anton Bachin
2015-11-23 17:41 ` Anton Bachin
2015-11-23 18:20 ` Drup
2015-11-23 19:02 ` Anton Bachin
2015-11-24 8:35 ` François Bobot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=565349F9.6020405@zoho.com \
--to=drupyog+caml@zoho.com \
--cc=antonbachin@yahoo.com \
--cc=caml-list@inria.fr \
--cc=francois.bobot@cea.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox