From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 1205ABBAF for ; Thu, 18 Sep 2008 10:38:47 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAMWw0UjUVZsV/2dsb2JhbAC5ZIFn X-IronPort-AV: E=Sophos;i="4.32,420,1217800800"; d="scan'208";a="29316572" Received: from cerberus.snarc.org ([212.85.155.21]) by mail4-smtp-sop.national.inria.fr with ESMTP; 18 Sep 2008 10:38:41 +0200 Received: by cerberus.snarc.org (Postfix, from userid 1000) id D623E129EC; Thu, 18 Sep 2008 09:38:53 +0100 (BST) Date: Thu, 18 Sep 2008 09:38:53 +0100 From: Vincent Hanquez To: Dario Teixeira Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] XML library for validating MathML Message-ID: <20080918083853.GA15219@snarc.org> References: <103293.54569.qm@web54606.mail.re2.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <103293.54569.qm@web54606.mail.re2.yahoo.com> X-Warning: Email may contain unsmilyfied humor and/or satire. User-Agent: Mutt/1.5.18 (2008-05-17) X-Spam: no; 0.00; dtds:01 subset:01 pxp:01 1.0:98 ffff:98 char:01 wrote:01 parsing:01 caml-list:01 expression:02 expression:02 parse:02 supported:02 string:02 library:03 On Wed, Sep 17, 2008 at 11:58:05AM -0700, Dario Teixeira wrote: > Given a string containing a mathematical expression in the MathML > markup, I need to verify that the expression is indeed valid MathML. > I am therefore looking for an XML library that can verify an expression > against a given DTD. > > Now, I have tried Xml-light, and the code I used is listed below. > Unfortunately, it fails when trying to parse MathML's DTD (it's the > standard DTD from the W3C). I have tried simpler DTDs, and it does work > with them; am I therefore correct in assuming that Xml-light can only > handle a particular version/subset of DTD features? I don't know about validation (i'll probably suggest looking at PXP tho), but xml-light is very bad for XML compliance. the library is (happily) parsing XML files that it shouldn't, which tell a lots concerning its validation abilities ... for example, the XML supported character range is not even checked: Xml 1.0 specification -- 2.2 Characters Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] others problems include (uncomplete list): - complete unicode un-awareness - funny & wrong entities handling -- Vincent