From: Gabriel Kerneis <gabriel.kerneis@enst.fr>
To: "Till Varoquaux" <till.varoquaux@gmail.com>, caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Fast XML parser
Date: Thu, 19 Jul 2007 08:24:21 +0200 [thread overview]
Message-ID: <E1IBPR6-0000rx-I4@kerneis.info> (raw)
In-Reply-To: <9d3ec8300707181548n2c7ffa01xa9d2bea20c90c056@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]
Le Thu, 19 Jul 2007 00:48:07 +0200, "Till Varoquaux"
<till.varoquaux@gmail.com> a écrit :
> Ouch,
>
> I beg to differ, if you want speed and can work stream (linear
> top-down left-right exploration of the graph), you want an event based
> xml parser. expat is probably one of the fastest (the c library is
> known to be a speed demon). PXP does everything including talking
> klingon and controlling the kitchen sink. It provides an event based
> layer.
I certainly wouldn't recommend xml-light for *every* project where an
XML parser is needed, but look at the OP's requirements :
> > > I am interested in parsing Wiki markup language that has a few
> > > tags, like <pre>...</pre>, <math>...,</math>.
> > > These tags are sparse, meaning that the ratio of number of tags /
> > > number of bytes is low.
On such a simple case, xml-light (which is basically a simple ocamllex
file + a few things to build the syntax tree) should perform quite
well. I know it doesn't handle DTD, etc. but in *that* case, who cares ?
> Ultimately if you are parsing very simple files and are aiming for
> pure speed you could write a simple lexer with ocamllex and use that
> as base layer.
That could be a solution, and (provided the licence you chose for your
project is compatible) you could even use xml-light as an example to
begin with (stripping things you don't need).
Kind regards,
--
Gabriel
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2007-07-19 6:24 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-18 21:58 Luca de Alfaro
2007-07-18 22:11 ` [Caml-list] " Gabriel Kerneis
2007-07-18 22:48 ` Till Varoquaux
2007-07-19 6:24 ` Gabriel Kerneis [this message]
2007-07-19 9:02 ` Till Varoquaux
2007-07-19 11:38 ` Richard Jones
2007-07-20 7:01 ` Jon Harrop
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1IBPR6-0000rx-I4@kerneis.info \
--to=gabriel.kerneis@enst.fr \
--cc=caml-list@yquem.inria.fr \
--cc=till.varoquaux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox