ANNOUNCE: Xmlm - Daniel Bünzli

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: "Daniel Bünzli" <daniel.buenzli@epfl.ch>
To: caml-list@yquem.inria.fr
Cc: hump@caml.inria.fr
Subject: ANNOUNCE: Xmlm
Date: Tue, 27 Feb 2007 02:16:43 +0100	[thread overview]
Message-ID: <C5A132EA-5E93-4AA9-8834-4312BF240F32@epfl.ch> (raw)

Xmlm is an OCaml module providing sequential XML input/output and
a persistent cursor. It aims at making non valid XML processing
robust and painless.

The sequential interface can be used to process documents without
building an in-memory representation. It also lets the programmer
translate its own data structures to an XML representation and
vice-versa.

The cursor allows to navigate and update a simple in-memory tree
representation of XML documents. Updates performed by the cursor
are persistent (non destructive).

To facilitate direct integration into projects, Xmlm is made of a
single module and distributed under a BSD license.

Project home page : <http://code.google.com/p/xmlm>

Your feedback is welcome,

Daniel

P.S.

Why another XML parser ?

Dissatisfaction about existing solutions either too complete and
complex or too britlle and restrictive. Besides it seems all
existing parsers force you to read the whole document in
memory. Here are some points that motivated the design of Xmlm.

1. Easy to integrate into projects without introducing external
    dependencies. A single module provides everything including
    documentation (ocamldoc) and the license.

2. Well documented. Features and limitations of the parser are precisely
    documented.

3. Easy to use yet flexible api.
   - Choice between sequential (SAX-like) or tree (DOM-like) processing.
   - Construction/deconstruction of user data structures from/to xml  
documents.
   - Tree processing with persistent cursor (zipper).
   - Simple white space handling options for character data.
   - Character encodings are translated to UTF-8.
     UTF-8 is the only encoding the programmer needs to handle.
   - Character references and predefined entities are resolved.
     Other entity references can be resolved via a user provided  
callback.
   - Early access to data to allow parse time data transformations.
   - Parse time element pruning.

4. Robust parsing. Does not assume an xml subset.
   - Supports major encodings :  ASCII, UTF-8, UTF-16 (LE and BE),  
ISO-8559-1.
   - Parses qualified names (namespaces).
   - Tail-recursive.

5. Limitations. If you need one of these things use PXP.
   - Comments, processing instructions and standalone declaration are
     dropped by the parser (it is a feature).
   - No DTD support (but it can be extracted and written as a raw  
string).
   - No validity support.

next             reply	other threads:[~2007-02-27  1:14 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-27  1:16 Daniel Bünzli [this message]
2007-02-27  8:28 ` [Caml-list] " Stefano Zacchiroli
2007-02-27 11:29   ` Daniel Bünzli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C5A132EA-5E93-4AA9-8834-4312BF240F32@epfl.ch \
    --to=daniel.buenzli@epfl.ch \
    --cc=caml-list@yquem.inria.fr \
    --cc=hump@caml.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox