* xmlm and names(paces)
@ 2008-02-06 20:44 Bünzli Daniel
2008-02-06 20:59 ` [Caml-list] " David Teller
2008-02-06 21:52 ` Alain Frisch
0 siblings, 2 replies; 8+ messages in thread
From: Bünzli Daniel @ 2008-02-06 20:44 UTC (permalink / raw)
To: caml-list List
Hello,
As I previously said on this list I'm adding better namespace support
to xmlm. Up to now xmlm just parsed qualified names into their prefix
and local part (prefix, local). Now I'd like to provide the client
with expanded names (uri, local).
Initially I planned to give the client choice between getting
qualified names or expanded names. However the prefix of qualified
names is really meaningless (it can be alpha converted) and thus
cannot be used to recognize anything in a document. One of the aim of
xmlm is simplicity, as such \x13I think xmlm should only provide expanded
names.
However maybe I'm missing something so I'd like to ask the list if
someone think there is any use for clients to get qualified names ? If
I you do please tell me.
Best,
Daniel
P.S. There is no distinction betwen qualified and expanded names if
you parse documents that have no prefixes and no default namespace
declarations.
P.P.S. Name expansion has a performance cost but if I support only
expanded names I can better reduce it.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] xmlm and names(paces)
2008-02-06 20:44 xmlm and names(paces) Bünzli Daniel
@ 2008-02-06 20:59 ` David Teller
2008-02-06 21:26 ` Bünzli Daniel
2008-02-06 21:52 ` Alain Frisch
1 sibling, 1 reply; 8+ messages in thread
From: David Teller @ 2008-02-06 20:59 UTC (permalink / raw)
To: Bünzli Daniel; +Cc: caml-list List
As far as I know, the only difference is when you try and produce
human-readable XML documents. In this case, there are often ad-hoc
conventions regarding which prefix maps to what namespace (e.g. for the
same namespace, xhtml: is more readable than, say, ns1:) -- which might
be useful for people writing, say, editors.
There might also be a difference when browsers are attempting to recover
broken xml, but that's probably not an issue here.
Cheers,
David
On Wed, 2008-02-06 at 21:44 +0100, Bünzli Daniel wrote:
> However maybe I'm missing something so I'd like to ask the list if
> someone think there is any use for clients to get qualified names ? If
> I you do please tell me.
--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act
brings liquidations.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] xmlm and names(paces)
2008-02-06 20:59 ` [Caml-list] " David Teller
@ 2008-02-06 21:26 ` Bünzli Daniel
0 siblings, 0 replies; 8+ messages in thread
From: Bünzli Daniel @ 2008-02-06 21:26 UTC (permalink / raw)
To: caml-list List
Le 6 févr. 08 à 21:59, David Teller a écrit :
> As far as I know, the only difference is when you try and produce
> human-readable XML documents. In this case, there are often ad-hoc
> conventions regarding which prefix maps to what namespace (e.g. for
> the
> same namespace, xhtml: is more readable than, say, ns1:) -- which
> might
> be useful for people writing, say, editors.
As a side note I'll also add that xmlm's handling of namespaces is
going to keep attributes of the xmlns namespace instead of removing
them. This allows to know which prefix bindings were done in the
document. Given how output is going to work chaining xmlm's input with
output you'll get the same prefixes that were present on input
(provided the same ns is not bound to two different prefixes in a
given context in which case there may be differences).
Best,
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] xmlm and names(paces)
2008-02-06 20:44 xmlm and names(paces) Bünzli Daniel
2008-02-06 20:59 ` [Caml-list] " David Teller
@ 2008-02-06 21:52 ` Alain Frisch
2008-02-06 22:56 ` Bünzli Daniel
1 sibling, 1 reply; 8+ messages in thread
From: Alain Frisch @ 2008-02-06 21:52 UTC (permalink / raw)
To: Bünzli Daniel; +Cc: caml-list List
Bünzli Daniel wrote:
> As I previously said on this list I'm adding better namespace support to
> xmlm. Up to now xmlm just parsed qualified names into their prefix and
> local part (prefix, local). Now I'd like to provide the client with
> expanded names (uri, local).
>
> Initially I planned to give the client choice between getting qualified
> names or expanded names. However the prefix of qualified names is really
> meaningless (it can be alpha converted) and thus cannot be used to
> recognize anything in a document. One of the aim of xmlm is simplicity,
> as such \x13I think xmlm should only provide expanded names.
The problem with expanded names is that it makes it quite tedious to
pattern-match on element/attribute names (uri are long!). Of course, it
is a trivial exercise in Camlp4 to create a nice syntax for that.
Another option is to let the client provide a mapping from uri to fixed
prefixes. (PXP can do that kind of prefix normalization.)
It is also a good idea to be able to parse XML documents that conform
to the XML spec but not the XML Namespaces spec.
What about something like that:
type name = string * [`N of string * string|`U of string * string|`X]
The first component of name gives the full unparsed name from the XML
document. The second component gives the qname decomposition; it can be
either a known normalized prefix (relative to a dictionnary provided by
the client) or an unknown URI. Or it can be an error (the document does
not conform to the XML Namespaces spec). If the client does not provide
any dictionnary of known prefixes, there will be no `N node. If the
parser is run a non-namespace mode, there will be only `X nodes.
examples:
("html:p", `N ("xhtml", "p"))
the prefix html refers to the known xhtml namespace
("foo:x", `U ("http://unknownnamespaceuri", "x"))
the prefix foo refers to a unknown uri
("x:y", `X)
the prefix x is not bound to any namespace
("x::z", `X)
name is ill-formed w.r.t. the XML Namespaces spec.
Also, it is necessary to give the client a way to know the namespace
bindings in scope at any node. Some XML languages like XML-Schema need
this information. A possible way to do it is just to keep the xmlns
declarations as regular attributes.
As a minor alternative, in order to reduce the syntactic overhead, it is
possible to use a single string to encode the three possible cases. E.g.:
"p:xhtml" Known uri
"x::http://unknownnamespaceuri" Unknown uri
"" Ill-formed qname
-- Alain
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] xmlm and names(paces)
2008-02-06 21:52 ` Alain Frisch
@ 2008-02-06 22:56 ` Bünzli Daniel
2008-02-06 23:51 ` Bünzli Daniel
2008-02-07 22:03 ` Alain Frisch
0 siblings, 2 replies; 8+ messages in thread
From: Bünzli Daniel @ 2008-02-06 22:56 UTC (permalink / raw)
To: Alain Frisch; +Cc: caml-list List
Le 6 févr. 08 à 22:52, Alain Frisch a écrit :
> The problem with expanded names is that it makes it quite tedious to
> pattern-match on element/attribute names (uri are long!).
Agreed.
> Another option is to let the client provide a mapping from uri to
> fixed prefixes. (PXP can do that kind of prefix normalization.)
In xmlm you can do that yourself on input when you get callbacked and
do the reverse translation just before outputing
start tags. Of course this means more work for the client, but it
makes the basic interface simpler and it allows the client
to use variants instead of simply shorter prefixes.
> It is also a good idea to be able to parse XML documents that conform
> to the XML spec but not the XML Namespaces spec.
But don't you automatically get that ?
A document that has no xmlns namespace declarations and no prefixes if
parsed according to the xmlns spec will result in names with empty
namespace names.
The other problem I see is if there are external prefix declarations,
but for that, as I did for external entity references, I have a
callback that allows you to bind an undeclared prefix to an uri.
> What about something like that:
>
> type name = string * [`N of string * string|`U of string * string|`X]
[...]
Too heavy weight for my taste. With xmlm I try to give a reasonable
default for xml IO, not the full blown complexity. So I think going
with qualified names only is ok, the client can transform its own way
if it whishes (e.g. uri replacement).
> Also, it is necessary to give the client a way to know the namespace
> bindings in scope at any node. Some XML languages like XML-Schema
> need this information. A possible way to do it is just to keep the
> xmlns declarations as regular attributes.
I was planning on keeping the xmlns declarations, but I ignored some
languages actually *need* this information, what is it used for in xml-
schema ?
Thanks for the comments,
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] xmlm and names(paces)
2008-02-06 22:56 ` Bünzli Daniel
@ 2008-02-06 23:51 ` Bünzli Daniel
2008-02-07 22:03 ` Alain Frisch
1 sibling, 0 replies; 8+ messages in thread
From: Bünzli Daniel @ 2008-02-06 23:51 UTC (permalink / raw)
To: Alain Frisch; +Cc: caml-list List
Le 6 févr. 08 à 23:56, Bünzli Daniel a écrit :
> Of course this means more work for the client, but it makes the
> basic interface simpler and it allows the client
> to use variants instead of simply shorter prefixes.
Now that I think of it instead of (what I have now) :
> type name = string * string
> type attribute = name * string
> type tag = name * attribute list
>
> val input : ?enc:encoding option -> ?strip:bool ->
> ?ns: (string -> string option) ->
> ?entity: (string -> string option) ->
> ?prolog: (dtd -> unit) ->
> ?prune:(tag -> 'a -> bool) ->
> ?s:(tag -> 'a -> 'a) ->
> ?e:(tag -> 'a -> 'a) ->
> ?d:(string -> 'a -> 'a) -> 'a -> input ->
> [ `Value of 'a | `Error of (int * int) * error ]
(~s is for start tag, ~e is for end tag (the full start tag is given
again), ~d is for data)
Why not have a callback ~name which is given the expanded name
(uri,local) allows the client to do whathever it wishes and
a callback ~att to build attributes :
> val input : ?enc:encoding option -> ?strip:bool ->
> ?ns: (string -> string option) ->
> ?entity: (string -> string option) ->
> name: (string -> string -> 'n) ->
> att: ('n -> string -> 'att)
> ?prolog: (dtd -> unit) ->
> ?prune:('n * 'att -> 'a -> bool) ->
> ?s:('n * 'att list -> 'a -> 'a) ->
> ?e:('n * 'att list -> 'a -> 'a) ->
> ?d:(string -> 'a -> 'a) -> 'a -> input ->
> [ `Value of 'a | `Error of (int * int) * error ]
This allows you to give precise variant cases for the things your
process and have a catch all case for what you are not interested in.
Output would be polymorphised accordingly and the client provides
inverses of ~name ('n -> string * string) and ~att ('att -> 'n *
string).
On the other hand the work performed by ~name and ~att can be done by
the client in ~s or ~e. The only thing the latter solution brings is
to avoid folding over the tag type if you want to transform it. So
finally I don't think it is worth it (especially because the
polymorphised output feels cumbersome).
Any comment ?
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] xmlm and names(paces)
2008-02-06 22:56 ` Bünzli Daniel
2008-02-06 23:51 ` Bünzli Daniel
@ 2008-02-07 22:03 ` Alain Frisch
1 sibling, 0 replies; 8+ messages in thread
From: Alain Frisch @ 2008-02-07 22:03 UTC (permalink / raw)
To: Bünzli Daniel; +Cc: caml-list
Bünzli Daniel wrote:
> Le 6 févr. 08 à 22:52, Alain Frisch a écrit :
>> It is also a good idea to be able to parse XML documents that conform
>> to the XML spec but not the XML Namespaces spec.
>
> But don't you automatically get that ?
>
> A document that has no xmlns namespace declarations and no prefixes if
> parsed according to the xmlns spec will result in names with empty
> namespace names.
The following documents are well-formed w.r.t. the XML spec:
<a::::x/> (syntactically invalid QName w.r.t. XML Namespaces)
<a:x/> (unbound prefix)
>> type name = string * [`N of string * string|`U of string * string|`X]
> [...]
>
> Too heavy weight for my taste.
Actually, I agree. What about doing nothing about namespaces? For the
event-based API at least, and maybe for the tree-based one as well, it
doesn't seem too bad to let the client manage it (maybe you can simply
provide a small module to help manage namespace dictionnaries and
resolution).
-- Alain
^ permalink raw reply [flat|nested] 8+ messages in thread
* xmlm and names(paces)
@ 2008-02-07 8:13 oleg
2008-02-07 8:59 ` [Caml-list] " Bünzli Daniel
0 siblings, 1 reply; 8+ messages in thread
From: oleg @ 2008-02-07 8:13 UTC (permalink / raw)
To: caml-list
Buenzli Daniel wrote:
> As I previously said on this list I'm adding better namespace support to
> xmlm. Up to now xmlm just parsed qualified names into their prefix and
> local part (prefix, local). Now I'd like to provide the client with
> expanded names (uri, local).
>
> Initially I planned to give the client choice between getting qualified
> names or expanded names. However the prefix of qualified names is really
> meaningless (it can be alpha converted) and thus cannot be used to
> recognize anything in a document. One of the aim of xmlm is simplicity,
> as such I think xmlm should only provide expanded names.
It should be mentioned that the prefixes of qualified names cannot
just be alpha-converted. It is quite common to see the following,
quoted from http://www.w3.org/TR/xmlschema-0/
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
...
<xsd:element name="comment" type="xsd:string"/>
One can plainly see that the prefix, xsd, appears inside a quoted
string! If one wishes to rename the prefix xsd into just 's', one has
to look inside quoted strings (of course, not every occurrence of xsd
inside quoted string is the prefix. A quoted string, the content of an
attribute, may just as well be an opaque quoted string).
One may really wonder what kind of people wrote all those voluminous
XML recommendations.
So, ideally one may wish to keep the original prefix (in addition to
its corresponding URL). It is also reasonable for a user to specify a
`shortcut'. Unlike the prefix, which is chosen by the author of the
document, a shortcut is chosen by the person who invokes a parser. In
the SSAX parser, the user specifies the association of URI with
shortcuts. The parser, having resolved the QName prefix to a URI, maps
that URI to the user-specified shortcut, if present. The shortcuts are
extensively discussed in
http://okmij.org/ftp/Scheme/SXML.html#Namespaces
Incidentally, some of the design decisions of SSAX (despite being
produced by an enemy) might be pertinent to this discussion. SSAX is
actually a SAX parser, or a big macro that builds a parser out of
user-provided callbacks and reasonable defaults. One can use SSAX to
parse XML on the fly or to convert XML to anything one chooses. There
is also an instantiation of SSAX with reasonable callbacks that make
SSAX a DOM parser, converting XML into one particular output format,
SXML. Experience shows that this particular instantiation satisfies
most of the users. Still I have come across several users who needed
the full SSAX (e.g., for streaming conversion of XML into something
else).
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] xmlm and names(paces)
2008-02-07 8:13 oleg
@ 2008-02-07 8:59 ` Bünzli Daniel
0 siblings, 0 replies; 8+ messages in thread
From: Bünzli Daniel @ 2008-02-07 8:59 UTC (permalink / raw)
To: caml-list List
Le 7 févr. 08 à 09:13, oleg@okmij.org a écrit :
> It should be mentioned that the prefixes of qualified names cannot
> just be alpha-converted. It is quite common to see the following,
> quoted from http://www.w3.org/TR/xmlschema-0/
>
> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
> ...
> <xsd:element name="comment" type="xsd:string"/>
Argh ! I now understand Alain's comment about the need to keep that
information. This is a complete misuse of the namespace recommandation
which says nothing about binding pefixes in attribute and character
data. The w3c is really hopeless.
> One may really wonder what kind of people wrote all those voluminous
> XML recommendations.
You tell me.
> So, ideally one may wish to keep the original prefix (in addition to
> its corresponding URL). It is also reasonable for a user to specify a
> `shortcut'. Unlike the prefix, which is chosen by the author of the
> document, a shortcut is chosen by the person who invokes a parser.
As mentionned in my previous email with xmlm you can do that by
yourself since all the info is there and you have full control on the
parsing result. However for the aformentionned case this may mean a
lot of work. On the other hand xml schema seems to be seen as a broken
technology (even the xml spec editor says so iirc). So the question
is, is it worth complexifiying the interface to facilitate the parsing
of this obviously broken and marginal (is it ?) case.
> Incidentally, some of the design decisions of SSAX (despite being
> produced by an enemy) might be pertinent to this discussion.
Thanks for the link, I will have a look at it (functional programming
languages are no enemies).
Best,
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-02-07 22:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-06 20:44 xmlm and names(paces) Bünzli Daniel
2008-02-06 20:59 ` [Caml-list] " David Teller
2008-02-06 21:26 ` Bünzli Daniel
2008-02-06 21:52 ` Alain Frisch
2008-02-06 22:56 ` Bünzli Daniel
2008-02-06 23:51 ` Bünzli Daniel
2008-02-07 22:03 ` Alain Frisch
2008-02-07 8:13 oleg
2008-02-07 8:59 ` [Caml-list] " Bünzli Daniel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox