* [Caml-list] OCaml HTML parsing & manipulation @ 2014-08-10 17:38 Jacques du Preez [not found] ` <20140810.224256.1353397051109538039.Christophe.Troestler@umons.ac.be> 0 siblings, 1 reply; 4+ messages in thread From: Jacques du Preez @ 2014-08-10 17:38 UTC (permalink / raw) To: Caml List [-- Attachment #1: Type: text/plain, Size: 330 bytes --] Hi, I've been searching for an OCaml library to parse HTML, and then be able to query and manipulate it similar to jQuery. The JSoup Java library, http://jsoup.org, allows me to do this. Is there something like this for OCaml? Thanks! ============================== Jacques du Preez Web: OpenLandscape.net Twitter: @jacquesdp [-- Attachment #2: Type: text/html, Size: 502 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <20140810.224256.1353397051109538039.Christophe.Troestler@umons.ac.be>]
* Re: [Caml-list] OCaml HTML parsing & manipulation [not found] ` <20140810.224256.1353397051109538039.Christophe.Troestler@umons.ac.be> @ 2014-08-11 6:57 ` Jacques du Preez 2014-08-12 9:48 ` Paolo Donadeo 2014-08-24 21:54 ` Andrew Herron 0 siblings, 2 replies; 4+ messages in thread From: Jacques du Preez @ 2014-08-11 6:57 UTC (permalink / raw) To: OCaml Mailing List [-- Attachment #1: Type: text/plain, Size: 764 bytes --] Thanks. I eventually discovered ocamlnet, but I'm hoping there's maybe more than 1 option? ============================== Jacques du Preez Web: OpenLandscape.net Twitter: @jacquesdp On Sun, Aug 10, 2014 at 10:42 PM, Christophe Troestler < Christophe.Troestler@umons.ac.be> wrote: > Hi, > > On Sun, 10 Aug 2014 19:38:39 +0200, Jacques du Preez wrote: > > > > I've been searching for an OCaml library to parse HTML, and then be able > to > > query and manipulate it similar to jQuery. > > > > The JSoup Java library, http://jsoup.org, allows me to do this. Is there > > something like this for OCaml? > > Nethtml in ocamlnet partly does what you need (you can easily write > recursive functions to extract the desired data from the HTML tree). > > Best, > C. > [-- Attachment #2: Type: text/html, Size: 1244 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] OCaml HTML parsing & manipulation 2014-08-11 6:57 ` Jacques du Preez @ 2014-08-12 9:48 ` Paolo Donadeo 2014-08-24 21:54 ` Andrew Herron 1 sibling, 0 replies; 4+ messages in thread From: Paolo Donadeo @ 2014-08-12 9:48 UTC (permalink / raw) To: OCaml mailing list [-- Attachment #1: Type: text/plain, Size: 245 bytes --] On Mon, Aug 11, 2014 at 8:57 AM, Jacques du Preez <jacquesdpz@gmail.com> wrote: > Thanks. I eventually discovered ocamlnet, but I'm hoping there's maybe > more than 1 option? The HTML parser in Ocamlnet is actually very robust. -- *Paolo* [-- Attachment #2: Type: text/html, Size: 607 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] OCaml HTML parsing & manipulation 2014-08-11 6:57 ` Jacques du Preez 2014-08-12 9:48 ` Paolo Donadeo @ 2014-08-24 21:54 ` Andrew Herron 1 sibling, 0 replies; 4+ messages in thread From: Andrew Herron @ 2014-08-24 21:54 UTC (permalink / raw) To: Jacques du Preez; +Cc: OCaml Mailing List [-- Attachment #1: Type: text/plain, Size: 1386 bytes --] I did an evaluation of HTML parsers back in February. Most of the options are XML parsers, and a lot of them are very old. Other than Nethtml, I came up with two alternatives to consider: http://erratique.ch/software/xmlm https://github.com/facebook/pfff/tree/master/lang_html I didn't end up spending much time on either. It quickly became clear that Nethtml was what I needed. It handles content that isn't strictly valid, which was important to me, and has good performance. Cheers, Andy On Mon, Aug 11, 2014 at 4:57 PM, Jacques du Preez <jacquesdpz@gmail.com> wrote: > Thanks. I eventually discovered ocamlnet, but I'm hoping there's maybe > more than 1 option? > > ============================== > Jacques du Preez > > Web: OpenLandscape.net > Twitter: @jacquesdp > > > On Sun, Aug 10, 2014 at 10:42 PM, Christophe Troestler < > Christophe.Troestler@umons.ac.be> wrote: > >> Hi, >> >> On Sun, 10 Aug 2014 19:38:39 +0200, Jacques du Preez wrote: >> > >> > I've been searching for an OCaml library to parse HTML, and then be >> able to >> > query and manipulate it similar to jQuery. >> > >> > The JSoup Java library, http://jsoup.org, allows me to do this. Is >> there >> > something like this for OCaml? >> >> Nethtml in ocamlnet partly does what you need (you can easily write >> recursive functions to extract the desired data from the HTML tree). >> >> Best, >> C. >> > > [-- Attachment #2: Type: text/html, Size: 2408 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-08-24 21:54 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-10 17:38 [Caml-list] OCaml HTML parsing & manipulation Jacques du Preez [not found] ` <20140810.224256.1353397051109538039.Christophe.Troestler@umons.ac.be> 2014-08-11 6:57 ` Jacques du Preez 2014-08-12 9:48 ` Paolo Donadeo 2014-08-24 21:54 ` Andrew Herron
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox