Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Dario Teixeira <darioteixeira@yahoo.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] XML library for validating MathML
Date: Thu, 18 Sep 2008 20:28:03 +0200	[thread overview]
Message-ID: <1221762483.17456.42.camel@flake.lan.gerd-stolpmann.de> (raw)
In-Reply-To: <359336.28901.qm@web54601.mail.re2.yahoo.com>


Am Donnerstag, den 18.09.2008, 10:58 -0700 schrieb Dario Teixeira:
> Hi,
> 
> Well, as it turns out, building a basic "Hello World" in PXP is relatively
> simple (I followed the manual which is very helpful in the beginning).
> However, though the DTD validation works fine with the simple examples I tried,
> it fails for a MathML document.  Note that I am using the DTD as provided
> by the W3C, available from here:  http://www.w3.org/Math/DTD/mathml2.tgz
> 
> When processing the MathML DTD, PXP outputs a few a warnings about entities
> declared twice, about names reserved for future extensions, and quite a
> lot of warnings about code points that cannot be represented.  I can ignore
> those for now.

Code points: Note that PXP defaults to ISO-8859-1 as character set. Use
it in UTF-8 mode to get rid of these warnings.

> When it does fail, this is the error produced:
> 
> In entity ent-isonum = PUBLIC "-//W3C//ENTITIES Numeric and Special Graphic for MathML 2.0//EN" "isonum.ent", at line 28, position 44:
> Called from entity [dtd] = SYSTEM "mathml2.dtd", line 1969, position 0:
> ERROR (Well-formedness constraint): The character '&' must be written as '&amp;'
> 
> 
> Looking at the "isonum.ent" file (packaged with the W3C zip), these are
> the contents of line 28, where the error occurs:
> 
> <!ENTITY amp              "&#x26;&#x00026;" ><!--=ampersand -->

Well, the inner entities are again expanded when an entity is expanded.
The correct way to define &amp; is

<!ENTITY amp "&#x26;#x26;">

i.e. no second &. At _definition_ time this gives "&#x26;" (the first
&#x26; is expanded), and at _use_ time you get finally &. With the wrong
definition you get && at definition time, and this is simply an illegal
character sequence.

PXP defines by default &amp; as "&#38;#38;" which is just the same in
decimal notation, and also recommended by the XML spec.

That W3C docs are erroneous is nothing new, although it is a bit
surprising that they cannot even stick to the basics of their own
formalism. I suppose they used a hacked SGML parser for developing
MathML, since SGML is more liberal about lexical details.

Gerd

> 
> 
> Though 0x26 is indeed the codepoint for the ampersand character, I don't
> get why it appears twice.  Is this a case of double escaping?  Could this
> be the reason PXP chokes?
> 
> Any thoughts?
> 
> Best regards,
> Dario Teixeira
> 
> P.S.  This is the programme I used for testing.  Its code is pretty much
>       lifted from the PXP manual:
> 
> 
> open Pxp_document
> open Pxp_yacc
> 
> class warner =
> object
>         method warn w = print_endline ("WARNING: " ^ w)
> end
> 
> let rec print_structure n =
>         let ntype = n#node_type
>         in match ntype with
>                 | T_element name ->
>                         print_endline ("Element of type " ^ name);
>                         let children = n # sub_nodes
>                         in List.iter print_structure children
>                 | T_data ->
>                         print_endline "Data"
>                 | _ ->
>                         assert false
> 
> let () =
>         try
>                 let config = {default_config with warner = new warner} in
>                 let doc = parse_document_entity config (from_file "test.xml") default_spec
>                 in print_structure (doc#root)
>         with
>                 exc -> print_endline (Pxp_types.string_of_exn exc)
> 
> 
> 
>       
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------



  reply	other threads:[~2008-09-18 18:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-17 18:58 Dario Teixeira
2008-09-17 22:13 ` [Caml-list] " Richard Jones
2008-09-18  2:58   ` Matt Gushee
2008-09-18  8:06     ` Re : " Adrien
2008-09-18  8:38 ` Vincent Hanquez
2008-09-18  9:12   ` Till Varoquaux
2008-09-18  9:44     ` Vincent Hanquez
2008-09-18 11:52     ` Gerd Stolpmann
2008-09-18 13:35       ` Markus Mottl
2008-09-19 11:30       ` Matt Gushee
2008-09-18 14:26 ` Dario Teixeira
2008-09-18 17:58   ` Dario Teixeira
2008-09-18 18:28     ` Gerd Stolpmann [this message]
2008-09-18 20:44       ` Dario Teixeira
2008-09-18 20:48         ` Gerd Stolpmann
2008-09-19 13:23         ` Stefano Zacchiroli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1221762483.17456.42.camel@flake.lan.gerd-stolpmann.de \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@yquem.inria.fr \
    --cc=darioteixeira@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox