From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p27KISqb009736 for ; Mon, 7 Mar 2011 21:18:28 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApkEAM7LdE3AbSoIe2dsb2JhbACEK6IoFQEBFiIEIa1RkG8NgRqDRXYE X-IronPort-AV: E=Sophos;i="4.62,278,1297033200"; d="scan'208";a="77298435" Received: from einhorn.in-berlin.de ([192.109.42.8]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 07 Mar 2011 21:18:23 +0100 X-Envelope-From: oliver@first.in-berlin.de X-Envelope-To: Received: from first (e178006068.adsl.alicedsl.de [85.178.6.68]) (authenticated bits=0) by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id p27KILow027549 for ; Mon, 7 Mar 2011 21:18:21 +0100 Received: by first (Postfix, from userid 1000) id 6374A15408DE; Mon, 7 Mar 2011 21:18:21 +0100 (CET) Date: Mon, 7 Mar 2011 21:18:21 +0100 From: oliver To: caml-list@inria.fr Message-ID: <20110307201821.GA1639@siouxsie> References: <20110306225242.GA9087@siouxsie> <1299500875.30035.31.camel@thinkpad> <20110307125748.GA4977@siouxsie> <1299505237.30035.82.camel@thinkpad> <20110307144415.GA1600@siouxsie> <1299510858.30035.95.camel@thinkpad> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1299510858.30035.95.camel@thinkpad> User-Agent: Mutt/1.5.20 (2009-06-14) X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8 Subject: Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work? On Mon, Mar 07, 2011 at 04:14:18PM +0100, Gerd Stolpmann wrote: > Am Montag, den 07.03.2011, 15:44 +0100 schrieb oliver: > > On Mon, Mar 07, 2011 at 02:40:37PM +0100, Gerd Stolpmann wrote: > > > Am Montag, den 07.03.2011, 13:57 +0100 schrieb oliver: > > > > On Mon, Mar 07, 2011 at 01:27:55PM +0100, Gerd Stolpmann wrote: > > > > > Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver: > > > > > > Hello, > > > > > > > > > > > > tried around using the simple-dtd argument > > > > > > for Nethtme.parse. > > > > > > > > > > > > It changes the behaviour compared to > > > > > > the default behaviour, but I could not find out > > > > > > how this works. > > > > > > > > > > > > Someone here who can explain me this > > > > > > argument and describe, how it can be used? > > > > > > > > > > Maybe the HTML specification would be a good reference here: > > > > > http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that > > > > > most HTML elements are either an inline element, a block element, or > > > > > both ("flow" element). The grammar of HTML is described in terms of > > > > > these classes. For instance, a P tag (paragraph) is a block element and > > > > > contains block elements whereas B (bold) is an inline element and > > > > > contains inline elements. From this follows that you cannot put a P > > > > > inside a B:

something

is illegal. > > > > > > > > > > The parser needs this information to resolve such input, i.e. do > > > > > something with bad HTML. As HTML allows tag minimization (many end tags > > > > > can be omitted), the parser can read this as:

something

> > > > > (and the in the input is ignored). > > > > > > > > > > If all start and all end tags are written out, changing the > > > > > simplified_dtd does not make any difference. > > > > > > > > > > There is no normative text that says how to read bad HTML. Because of > > > > > this, it is - to a large degree - an interpretation of HTML what you put > > > > > into simplified_dtd. > > > > > > > > > > > The description IMHO is not sufficient to explain > > > > > > this feature. > > > > > > > > > > I'd say your formal knowledge about HTML is insufficient. > > > > [...] > > > > > > > > If formal HTML spec is sufficient to know the behaviour of the module, > > > > there would no need to have the dtd-argument, which seems, follwoinjg your > > > > explanations, to change the behavior in a way that it does NOT follow > > > > the formal specifications. > > > > > > There is no standard regarding that (except that HTML is also SGML, and > > > there are some rules for that in SGML). You could also reject bad HTML. > > > But this has not become common practice (unlike for XML, for instance). > > > > > > So, it depends on your HTML documents how you want to fix bad HTML. > > > That's the reason why you can configure it. > > [...] > > > > But it's not mentioned how the dtd-Argument works. > > > > Does it change the behaviour only for those tags that > > are mentioned in the dtd-argument? > > What about the other args? Will they stay as before (default)? > > I think this is pretty clear: if you set the dtd arg, you pass a > completely new dtd in, overriding any default. [...] Thanks for the answer, that was what I wanted to know. > This is how optional > arguments work in Ocaml. No, this is how you have implemented it. An optional argument also could be used to just overwrite those entries of the default dtd that are passed in. Like in a record, where just a changed part of it needs to be mentioned and the rest stays, or in an association list, where newly added items will be found because they will be found first, which means only newly added items are changed. [...] > Btw, you could have easily answered that yourself by looking at the > source. I mean just for the case that you do not see the obvious. Yes, I could have looked into the sources, maybe next time I will do it, even I prefer docs that are clear enough to avoid the need to look into the sources of a library. Ciao, Oliver