From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 4C6D7BC0A for ; Tue, 27 Feb 2007 02:14:59 +0100 (CET) Received: from mail18.bluewin.ch (mail18.bluewin.ch [195.186.19.64]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l1R1EwEA018481 for ; Tue, 27 Feb 2007 02:14:59 +0100 Received: from [10.0.1.2] (85.2.106.83) by mail18.bluewin.ch (Bluewin 7.3.121) id 45C31EFA005999AB; Tue, 27 Feb 2007 01:14:56 +0000 Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Cc: hump@caml.inria.fr Content-Transfer-Encoding: 7bit From: =?ISO-8859-1?Q?Daniel_B=FCnzli?= Subject: ANNOUNCE: Xmlm Date: Tue, 27 Feb 2007 02:16:43 +0100 To: caml-list@yquem.inria.fr X-Mailer: Apple Mail (2.752.2) X-Miltered: at discorde with ID 45E38612.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; bunzli:01 buenzli:01 ocaml:01 in-memory:01 vice-versa:01 in-memory:01 parser:01 restrictive:01 parsers:01 dependencies:01 ocamldoc:01 parser:01 sax-like:01 encodings:01 translated:01 Xmlm is an OCaml module providing sequential XML input/output and a persistent cursor. It aims at making non valid XML processing robust and painless. The sequential interface can be used to process documents without building an in-memory representation. It also lets the programmer translate its own data structures to an XML representation and vice-versa. The cursor allows to navigate and update a simple in-memory tree representation of XML documents. Updates performed by the cursor are persistent (non destructive). To facilitate direct integration into projects, Xmlm is made of a single module and distributed under a BSD license. Project home page : Your feedback is welcome, Daniel P.S. Why another XML parser ? Dissatisfaction about existing solutions either too complete and complex or too britlle and restrictive. Besides it seems all existing parsers force you to read the whole document in memory. Here are some points that motivated the design of Xmlm. 1. Easy to integrate into projects without introducing external dependencies. A single module provides everything including documentation (ocamldoc) and the license. 2. Well documented. Features and limitations of the parser are precisely documented. 3. Easy to use yet flexible api. - Choice between sequential (SAX-like) or tree (DOM-like) processing. - Construction/deconstruction of user data structures from/to xml documents. - Tree processing with persistent cursor (zipper). - Simple white space handling options for character data. - Character encodings are translated to UTF-8. UTF-8 is the only encoding the programmer needs to handle. - Character references and predefined entities are resolved. Other entity references can be resolved via a user provided callback. - Early access to data to allow parse time data transformations. - Parse time element pruning. 4. Robust parsing. Does not assume an xml subset. - Supports major encodings : ASCII, UTF-8, UTF-16 (LE and BE), ISO-8559-1. - Parses qualified names (namespaces). - Tail-recursive. 5. Limitations. If you need one of these things use PXP. - Comments, processing instructions and standalone declaration are dropped by the parser (it is a feature). - No DTD support (but it can be extracted and written as a raw string). - No validity support.