* Serialisation of PXP DTDs @ 2008-10-22 20:11 Dario Teixeira 2008-10-22 23:05 ` Sylvain Le Gall 2008-10-23 14:55 ` [Caml-list] " Gerd Stolpmann 0 siblings, 2 replies; 24+ messages in thread From: Dario Teixeira @ 2008-10-22 20:11 UTC (permalink / raw) To: caml-list Hi, I am using PXP to parse the MathML2 DTD. This is a fairly large DTD, which even on a fast machine takes several seconds to parse. I am therefore looking at ways to serialise a parsed DTD, in a such a way that it can be reused by other processes. Does PXP already offer primitives for (un)serialising DTDs? (I couldn't find any). Note that using Marshal is out of the question, because DTDs are stored as objects, and we all know that objects cannot be serialised across process boundaries. But are there alternative solutions I'm overlooking? On a more general but related note, I think we should start an OSP discussion about standardising serialisation methods. The rationale should be obvious. Myself, I am partial to Sexplib, since it is reasonably fast, very simple to use, human-readable, and future-proof. I reckon that bin-prot could also be considered, as long as at some point the binary format is "set in stone", or at least deserialisers are always backwards compatible. Any other opinions? Thanks for your time! Cheers, Dario Teixeira ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Serialisation of PXP DTDs 2008-10-22 20:11 Serialisation of PXP DTDs Dario Teixeira @ 2008-10-22 23:05 ` Sylvain Le Gall 2008-10-23 15:34 ` [Caml-list] " Dario Teixeira 2008-10-23 14:55 ` [Caml-list] " Gerd Stolpmann 1 sibling, 1 reply; 24+ messages in thread From: Sylvain Le Gall @ 2008-10-22 23:05 UTC (permalink / raw) To: caml-list On 22-10-2008, Dario Teixeira <darioteixeira@yahoo.com> wrote: > Hi, > > I am using PXP to parse the MathML2 DTD. This is a fairly large DTD, > which even on a fast machine takes several seconds to parse. I am > therefore looking at ways to serialise a parsed DTD, in a such a way > that it can be reused by other processes. > > Does PXP already offer primitives for (un)serialising DTDs? (I couldn't > find any). Note that using Marshal is out of the question, because DTDs > are stored as objects, and we all know that objects cannot be serialised > across process boundaries. But are there alternative solutions I'm > overlooking? > > On a more general but related note, I think we should start an OSP > discussion about standardising serialisation methods. The rationale > should be obvious. Myself, I am partial to Sexplib, since it is > reasonably fast, very simple to use, human-readable, and future-proof. > I reckon that bin-prot could also be considered, as long as at some > point the binary format is "set in stone", or at least deserialisers > are always backwards compatible. Any other opinions? > You seem to have already some ideas. The best, before doing any discussion on this topic is to try to implement/benchmark the different solution (at least doing something partial). Sexplib/bin-prot/json/marshal need to be compared on a real example. You seems to need this for a particular task. Could you try to implement on your particular example the different approach and give us some benchmark/ease of use/ease of implement level ? Without this number, I think an OSP discussion is pointless. (but with this number at least on a small example, if your use case is not easy, I think an OSP discussion will be very interesting). Regards, Sylvain Le Gall ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-22 23:05 ` Sylvain Le Gall @ 2008-10-23 15:34 ` Dario Teixeira 2008-10-23 16:37 ` Stefano Zacchiroli 2008-10-23 16:46 ` Markus Mottl 0 siblings, 2 replies; 24+ messages in thread From: Dario Teixeira @ 2008-10-23 15:34 UTC (permalink / raw) To: caml-list, Sylvain Le Gall Hi, First, and concerning the more general problem of serialisation: one often comes across a situation where a library encodes a very complex and opaque value of type Foobar.t, but offers no dedicated (de)serialisation functions. The assumption is of course that users can just use Marshal and be done with. There are however situations for which Marshal is badly suited. Long term storage is one, and portability is another. Moreover, if the value is an object and you wish to carry it across process boundaries, using Marshal is just not possible. Take also into consideration that the Sexplib syntax extension makes it trivial to add (de)serialisers to a data structure. The litmus test for considering this task should be "is there any reasonable situation where users of my Foobar library would like to serialise Foobar.t values in a portable and long-term manner?". If the answer is yes, I reckon it wouldn't be too much to ask that the library adds support for Sexplib. Note that in the paragraph above, "Sexplib" may be substituted by another serialisation mechanism. And this gets us to the question of performance numbers you speak of. In fact, besides performance I would like to bring other variables to the table: - ease of use - "future-proofness" - portability - human-readability Sexplib scores very good on ease of use, future-proofness, and portability, and reasonably good on performance and human-readability. My guess is that bin-prot has better performance but worse portability and future-proofness, and nill human-readability. Marshal gets top scores in performance and ease of use, but fails miserably in future-proofness, human-readability, and portability. As for my particular problem with PXP DTDs, I will look at writing a (de)serialiser by hand. According to Gerd it shouldn't be too much trouble. Best regards, Dario Teixeira ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 15:34 ` [Caml-list] " Dario Teixeira @ 2008-10-23 16:37 ` Stefano Zacchiroli 2008-10-23 16:53 ` Markus Mottl 2008-10-23 19:26 ` Dario Teixeira 2008-10-23 16:46 ` Markus Mottl 1 sibling, 2 replies; 24+ messages in thread From: Stefano Zacchiroli @ 2008-10-23 16:37 UTC (permalink / raw) To: caml-list On Thu, Oct 23, 2008 at 08:34:21AM -0700, Dario Teixeira wrote: > - ease of use > - "future-proofness" > - portability > - human-readability > > Sexplib scores very good on ease of use, future-proofness, and ^^^^^^^^^^^^^^^^ Does it? I mean, as long as types are as simples are pairs we will probably write down the very same S-expression, but for more complex types you hand up having to choose how to encode them in S-expressions. Such design choices can need to be changed in the future as more types will be supported. I fail to see why the future-proofness of such choices should be better than that of bin-prot. Yes, in case of changes you can imagine writing converters from the old format to the new one, but you can do that also for binary representations. In fact, doing that in OCaml using bitmatch would lead to the same code as per S-expressions, I believe. Beside this comment, thanks for the nice analysis. -- Stefano Zacchiroli -*- PhD in Computer Science \ PostDoc @ Univ. Paris 7 zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/ Dietro un grande uomo c'è sempre /oo\ All one has to do is hit the right uno zaino -- A.Bergonzoni \__/ keys at the right time -- J.S.Bach ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 16:37 ` Stefano Zacchiroli @ 2008-10-23 16:53 ` Markus Mottl 2008-10-23 19:26 ` Dario Teixeira 1 sibling, 0 replies; 24+ messages in thread From: Markus Mottl @ 2008-10-23 16:53 UTC (permalink / raw) To: caml-list On Thu, Oct 23, 2008 at 12:37 PM, Stefano Zacchiroli <zack@upsilon.cc> wrote: > I mean, as long as types are as simples are pairs we will probably > write down the very same S-expression, but for more complex types you > hand up having to choose how to encode them in S-expressions. Such > design choices can need to be changed in the future as more types will > be supported. I fail to see why the future-proofness of such choices > should be better than that of bin-prot. Both the S-expression converters and the binary protocol already support all extensionally defined datatypes in OCaml, and there are no plans to change their representation. I think it is fair to say that both of them are reasonably future-safe. Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 16:37 ` Stefano Zacchiroli 2008-10-23 16:53 ` Markus Mottl @ 2008-10-23 19:26 ` Dario Teixeira 2008-10-23 21:05 ` Mauricio Fernandez 1 sibling, 1 reply; 24+ messages in thread From: Dario Teixeira @ 2008-10-23 19:26 UTC (permalink / raw) To: caml-list, Stefano Zacchiroli > I mean, as long as types are as simples are pairs we will > probably write down the very same S-expression, but for more > complex types you hand up having to choose how to encode them > in S-expressions. Such design choices can need to be changed > in the future as more types will be supported. I fail to see > why the future-proofness of such choices > should be better than that of bin-prot. Hi, Well, there's several types of "future-proofness". If in the far-future I was faced with the task of reverse-engineering and deserialising a structure about whose contents I only had a rough idea, then a human-readable text-format like that of S-expressions would simplify things enormously. On a more down-to-earth scenario, bear in mind that S-expressions offer forward-compatibility as long as you are only adding to a structure. For example, suppose I have a type foobar_t with two constructors: type foobar_t = One | Two If later on I add a third constructor "Three" to this type, the deserialiser for the new version can still read S-expressions written with the serialiser for the old version. Cheers, Dario Teixeira ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 19:26 ` Dario Teixeira @ 2008-10-23 21:05 ` Mauricio Fernandez 2008-10-23 22:18 ` Gerd Stolpmann 2008-10-23 22:21 ` Dario Teixeira 0 siblings, 2 replies; 24+ messages in thread From: Mauricio Fernandez @ 2008-10-23 21:05 UTC (permalink / raw) To: caml-list On Thu, Oct 23, 2008 at 12:26:54PM -0700, Dario Teixeira wrote: > > I mean, as long as types are as simples are pairs we will > > probably write down the very same S-expression, but for more > > complex types you hand up having to choose how to encode them > > in S-expressions. Such design choices can need to be changed > > in the future as more types will be supported. I fail to see > > why the future-proofness of such choices > > should be better than that of bin-prot. > > Hi, > > Well, there's several types of "future-proofness". If in the far-future I > was faced with the task of reverse-engineering and deserialising a structure > about whose contents I only had a rough idea, then a human-readable > text-format like that of S-expressions would simplify things enormously. On > a more down-to-earth scenario, bear in mind that S-expressions offer > forward-compatibility as long as you are only adding to a structure. > > For example, suppose I have a type foobar_t with two > constructors: > > type foobar_t = One | Two > > If later on I add a third constructor "Three" to this type, > the deserialiser for the new version can still read S-expressions > written with the serialiser for the old version. I have been working for a while on a self-describing, compact, extensible binary protocol, along with an OCaml implementation which I intent to release in not too long. It differs from sexplib and that bin-prot in two main ways: * the data model is deliberately more limited, as the format is meant to be de/encodable in multiple languages. * it is extensible at several levels, achieving both forward and backward compatibility across changes in the data type You can think of it as an extensible Protocol Buffers[1] with a richer data model (albeit not in 1:1 accordance with OCaml's for the above mentioned reason). In the criteria you gave in another message, namely (1) ease of use (2) "future-proofness" (3) portability (4) human-readability, it does fairly well at the 3 first ones --- especially at (2) and (3), which were poorly supported by existing solutions (I looked into bin-prot, sexplib, Google's Protocol Buffers, Thrift and XDR; I also referred to IIOP and ITU-T X.690 DER during the design). Being a binary format, it obviously doesn't do that well at (4), but it is possible to get a human-readable dump of the binary data even in the absence of the interface definition, making reverse-engineering no harder than sexplib (and arguably easier in some ways). For example, here's a bogus message definition to illustrate (2) and (4). This protocol definition is fed to the compiler, which generates the OCaml type definitions, as well as the encoders/decoders and pretty-printers (as you can see, the specification uses a mix of OCaml, Haskell and C++ syntax, but it's pretty clear IMO) type sum_type 'a 'b 'c = A 'a | B 'b | C 'c message complex_rtt = A { a1 : [(int * [|bool|])]; a2 : [ sum_type<int, string, long> ] } | B { b1 : bool; b2 : (string * [int]) } The protocol is extensible in the sense that you can add new constructors to a sum or message type, add new elements to a tuple, and replace any primitive type by a sum type including the original type. For instance, if at some point in time we find that the b1 field should have a different type, we can do type bool_or_something 'a = Orig unboxed_bool | New_constructor 'a and then ... | B { b1 : bool_or_something<some_type>; ... } This, along with a way to specify default values, allows both forward and backward compatibility. The compiler generates a pretty printer for these structures, useful for debugging. Here's a message generated randomly: { Complex_rtt.a1 = [ ((-5378), [| false; false; false; true; true |]); (3942717140522000971, [| false; true; true; true; false |]); ((-6535386320450295), [| false |]); ((-238860767206), [| |]); (1810196202, [| false; false; true; true |]) ]; Complex_rtt.a2 = [ Sum_type.A (-13830); Sum_type.A 369334576; Sum_type.A 83; Sum_type.A (-3746796577167465774); Sum_type.A (-1602586945) ] } Now, this is the information decoded in the absence of the above definitions (iow., what you'd have to work with if you were reverse-engineering the protocol): T0 { T0 [ T0 { Vint_t0 (-5378); T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1)]}; T0 { Vint_t0 3942717140522000971; T0 [ Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1); Vint_t0 (-1); Vint_t0 0]}; T0 { Vint_t0 (-6535386320450295); T0 [ Vint_t0 0]}; T0 { Vint_t0 (-238860767206); T0 [ ]}; T0 { Vint_t0 1810196202; T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1)]}]; T0 [ T0 { Vint_t0 (-13830)}; T0 { Vint_t0 369334576}; T0 { Vint_t0 83}; T0 { Vint_t0 (-3746796577167465774)}; T0 { Vint_t0 (-1602586945)}]} (I'm still changing some details so it might look better than this shortly.) It's not a drop-in solution like sexplib's "with sexp", by design (since it is meant to allow interoperability between different languages), but it's still fairly easy to use. If you're interested in this, tell me and I'll let you know when it's ready for serious usage. [1] http://code.google.com/p/protobuf/ -- Mauricio Fernandez - http://eigenclass.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 21:05 ` Mauricio Fernandez @ 2008-10-23 22:18 ` Gerd Stolpmann 2008-10-23 22:50 ` Mauricio Fernandez 2008-10-23 22:21 ` Dario Teixeira 1 sibling, 1 reply; 24+ messages in thread From: Gerd Stolpmann @ 2008-10-23 22:18 UTC (permalink / raw) To: Mauricio Fernandez; +Cc: caml-list Am Donnerstag, den 23.10.2008, 23:05 +0200 schrieb Mauricio Fernandez: > I have been working for a while on a self-describing, compact, extensible > binary protocol, along with an OCaml implementation which I intent to release > in not too long. > > It differs from sexplib and that bin-prot in two main ways: > * the data model is deliberately more limited, as the format is meant to be > de/encodable in multiple languages. > * it is extensible at several levels, achieving both forward and backward > compatibility across changes in the data type > > You can think of it as an extensible Protocol Buffers[1] with a richer data > model (albeit not in 1:1 accordance with OCaml's for the above mentioned > reason). Have you looked at ICEP (see zeroc.com)? It has bindings for many languages, even for Ocaml (http://oss.wink.com/hydro/). It is, however, not self-describing. Anyway, you may find there ideas for portability. Gerd > In the criteria you gave in another message, namely > (1) ease of use > (2) "future-proofness" > (3) portability > (4) human-readability, > > it does fairly well at the 3 first ones --- especially at (2) and (3), which > were poorly supported by existing solutions (I looked into bin-prot, sexplib, > Google's Protocol Buffers, Thrift and XDR; I also referred to IIOP and ITU-T > X.690 DER during the design). Being a binary format, it obviously doesn't do > that well at (4), but it is possible to get a human-readable dump of the > binary data even in the absence of the interface definition, making > reverse-engineering no harder than sexplib (and arguably easier in some ways). > > For example, here's a bogus message definition to illustrate (2) and (4). > This protocol definition is fed to the compiler, which generates the OCaml > type definitions, as well as the encoders/decoders and pretty-printers (as you > can see, the specification uses a mix of OCaml, Haskell and C++ syntax, but > it's pretty clear IMO) > > type sum_type 'a 'b 'c = A 'a | B 'b | C 'c > > message complex_rtt = > A { > a1 : [(int * [|bool|])]; > a2 : [ sum_type<int, string, long> ] > } > | B { > b1 : bool; > b2 : (string * [int]) > } > > The protocol is extensible in the sense that you can add new constructors to a > sum or message type, add new elements to a tuple, and replace any primitive > type by a sum type including the original type. For instance, if at some point > in time we find that the b1 field should have a different type, we can do > > type bool_or_something 'a = Orig unboxed_bool | New_constructor 'a > > and then > ... > | B { b1 : bool_or_something<some_type>; ... } > > This, along with a way to specify default values, allows both forward and > backward compatibility. > > The compiler generates a pretty printer for these structures, useful for > debugging. Here's a message generated randomly: > > { > Complex_rtt.a1 = > [ ((-5378), [| false; false; false; true; true |]); > (3942717140522000971, [| false; true; true; true; false |]); > ((-6535386320450295), [| false |]); ((-238860767206), [| |]); > (1810196202, [| false; false; true; true |]) ]; > Complex_rtt.a2 = > [ Sum_type.A (-13830); Sum_type.A 369334576; Sum_type.A 83; > Sum_type.A (-3746796577167465774); Sum_type.A (-1602586945) ] } > > Now, this is the information decoded in the absence of the above definitions > (iow., what you'd have to work with if you were reverse-engineering the > protocol): > > T0 { > T0 [ > T0 { Vint_t0 (-5378); > T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 0; Vint_t0 (-1); > Vint_t0 (-1)]}; > T0 { Vint_t0 3942717140522000971; > T0 [ Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1); Vint_t0 (-1); > Vint_t0 0]}; > T0 { Vint_t0 (-6535386320450295); T0 [ Vint_t0 0]}; > T0 { Vint_t0 (-238860767206); T0 [ ]}; > T0 { Vint_t0 1810196202; > T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1)]}]; > T0 [ T0 { Vint_t0 (-13830)}; T0 { Vint_t0 369334576}; T0 { Vint_t0 83}; > T0 { Vint_t0 (-3746796577167465774)}; T0 { Vint_t0 (-1602586945)}]} > > (I'm still changing some details so it might look better than this shortly.) > > It's not a drop-in solution like sexplib's "with sexp", by design (since it is > meant to allow interoperability between different languages), but it's still > fairly easy to use. > > If you're interested in this, tell me and I'll let you know when it's ready for > serious usage. > > [1] http://code.google.com/p/protobuf/ > -- ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 22:18 ` Gerd Stolpmann @ 2008-10-23 22:50 ` Mauricio Fernandez 0 siblings, 0 replies; 24+ messages in thread From: Mauricio Fernandez @ 2008-10-23 22:50 UTC (permalink / raw) To: Gerd Stolpmann On Fri, Oct 24, 2008 at 12:18:50AM +0200, Gerd Stolpmann wrote: > > Am Donnerstag, den 23.10.2008, 23:05 +0200 schrieb Mauricio Fernandez: > > I have been working for a while on a self-describing, compact, extensible > > binary protocol, along with an OCaml implementation which I intent to release > > in not too long. > > > > It differs from sexplib and that bin-prot in two main ways: > > * the data model is deliberately more limited, as the format is meant to be > > de/encodable in multiple languages. > > * it is extensible at several levels, achieving both forward and backward > > compatibility across changes in the data type > > > > You can think of it as an extensible Protocol Buffers[1] with a richer data > > model (albeit not in 1:1 accordance with OCaml's for the above mentioned > > reason). > > Have you looked at ICEP (see zeroc.com)? It has bindings for many > languages, even for Ocaml (http://oss.wink.com/hydro/). > > It is, however, not self-describing. Anyway, you may find there ideas > for portability. I've just taken a quick look at the manual (in particular, the definition of the Slice language and the Data Encoding section of the Ice protocol). Even though it solves a different problem, it looks very interesting --- both as a source of inspiration, as you say, and for its intended use as a middleware technology. Thanks a lot for the reference. Regards, -- Mauricio Fernandez - http://eigenclass.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 21:05 ` Mauricio Fernandez 2008-10-23 22:18 ` Gerd Stolpmann @ 2008-10-23 22:21 ` Dario Teixeira 2008-10-23 23:36 ` Mauricio Fernandez 1 sibling, 1 reply; 24+ messages in thread From: Dario Teixeira @ 2008-10-23 22:21 UTC (permalink / raw) To: caml-list, Mauricio Fernandez Hi, > This protocol definition is fed to the compiler, which > generates the OCaml type definitions, as well as the > encoders/decoders and pretty-printers (as you can see, > the specification uses a mix of OCaml, Haskell and C++ > syntax, but it's pretty clear IMO) Basically the XDR approach, but with a syntax inspired by more modern, functional languages, right? > It's not a drop-in solution like sexplib's "with sexp", > by design (since it is meant to allow interoperability between > different languages), but it's still fairly easy to use. Personally, I think that a sexplib-like syntax extension is the killer feature for serialisation libraries, and the reason why I was immediately swayed by sexplib. However, writing a sexplib-like syntax extension for your serialisation library would entail solving the reverse problem now handled by your compiler. This might not always be possible because some features of Ocaml's type system might not map neatly into your format. Nevertheless, the sheer convenience of the syntax extension approach makes it worth while having, even if on occasion the preprocessor were to produce an error message stating that it could not convert a certain structure. For reference purposes, you could even have the syntax extension output to an external file the inferred structure definition in your language format! (I know this would be a very complex project, but it does illustrate the power of Camlp4). Anyway, what you described looks very interesting. Keep us posted! Cheers, Dario Teixeira ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 22:21 ` Dario Teixeira @ 2008-10-23 23:36 ` Mauricio Fernandez 2008-10-24 9:11 ` Mikkel Fahnøe Jørgensen 0 siblings, 1 reply; 24+ messages in thread From: Mauricio Fernandez @ 2008-10-23 23:36 UTC (permalink / raw) To: Dario Teixeira; +Cc: caml-list On Thu, Oct 23, 2008 at 03:21:01PM -0700, Dario Teixeira wrote: > Hi, > > > This protocol definition is fed to the compiler, which > > generates the OCaml type definitions, as well as the > > encoders/decoders and pretty-printers (as you can see, > > the specification uses a mix of OCaml, Haskell and C++ > > syntax, but it's pretty clear IMO) > > Basically the XDR approach, but with a syntax inspired > by more modern, functional languages, right? Yes, something like XDR (and Google's Protocol Buffers, and Facebook's Thrift, and and :) with richer data types (algebraic and polymorphic types, etc.) and a self-describing encoding that allows you to extend the type definitions while ensuring interoperability. > > It's not a drop-in solution like sexplib's "with sexp", > > by design (since it is meant to allow interoperability between > > different languages), but it's still fairly easy to use. > > Personally, I think that a sexplib-like syntax extension is the killer > feature for serialisation libraries, and the reason why I was immediately > swayed by sexplib. However, writing a sexplib-like syntax extension for > your serialisation library would entail solving the reverse problem now > handled by your compiler. This might not always be possible because some > features of Ocaml's type system might not map neatly into your format. > Nevertheless, the sheer convenience of the syntax extension approach makes > it worth while having, even if on occasion the preprocessor were to produce > an error message stating that it could not convert a certain structure. For > reference purposes, you could even have the syntax extension output to an > external file the inferred structure definition in your language format! (I > know this would be a very complex project, but it does illustrate the power > of Camlp4). In fact, the wire format easily supports all of OCaml's type system (bin-prot does, after all, and this is essentially a self-describing, extensible bin-prot). I introduced limitations in the data schema to ensure extensibility and portability. Any OCaml type can be encoded easily, but not all possible changes to an OCaml type are safe with regard to protocol compatibility. Using a separate language makes it easier to prevent altogether (by making them impossible to express) or catch such errors. Leaving unsafe protocol modifications aside (which just means that you have to be careful when you change a type), the approach you suggest (supporting only a subset of OCaml's type system in a "with protocol"-style syntax extension) seems very doable. However, sexplib seems to be the safest option for convenient, more or less future-proof serialization in OCaml, for the time being. Cheers, -- Mauricio Fernandez - http://eigenclass.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 23:36 ` Mauricio Fernandez @ 2008-10-24 9:11 ` Mikkel Fahnøe Jørgensen 2008-10-24 14:03 ` Markus Mottl 2008-10-24 21:39 ` Mauricio Fernandez 0 siblings, 2 replies; 24+ messages in thread From: Mikkel Fahnøe Jørgensen @ 2008-10-24 9:11 UTC (permalink / raw) To: Dario Teixeira, caml-list I guess this discussion is an overkill for the problem at hand, but speaking of binary extensible protocols, have you looked at ASN.1? It is an abstraction over any number of encodings. At least one binary encoding has extension bits to allow future growth of object collections and similar. Mikkel 2008/10/24 Mauricio Fernandez <mfp@acm.org>: > On Thu, Oct 23, 2008 at 03:21:01PM -0700, Dario Teixeira wrote: >> Hi, >> >> > This protocol definition is fed to the compiler, which >> > generates the OCaml type definitions, as well as the >> > encoders/decoders and pretty-printers (as you can see, >> > the specification uses a mix of OCaml, Haskell and C++ >> > syntax, but it's pretty clear IMO) >> >> Basically the XDR approach, but with a syntax inspired >> by more modern, functional languages, right? > > Yes, something like XDR (and Google's Protocol Buffers, and Facebook's Thrift, > and and :) with richer data types (algebraic and polymorphic types, etc.) and a > self-describing encoding that allows you to extend the type definitions while > ensuring interoperability. > >> > It's not a drop-in solution like sexplib's "with sexp", >> > by design (since it is meant to allow interoperability between >> > different languages), but it's still fairly easy to use. >> >> Personally, I think that a sexplib-like syntax extension is the killer >> feature for serialisation libraries, and the reason why I was immediately >> swayed by sexplib. However, writing a sexplib-like syntax extension for >> your serialisation library would entail solving the reverse problem now >> handled by your compiler. This might not always be possible because some >> features of Ocaml's type system might not map neatly into your format. >> Nevertheless, the sheer convenience of the syntax extension approach makes >> it worth while having, even if on occasion the preprocessor were to produce >> an error message stating that it could not convert a certain structure. For >> reference purposes, you could even have the syntax extension output to an >> external file the inferred structure definition in your language format! (I >> know this would be a very complex project, but it does illustrate the power >> of Camlp4). > > In fact, the wire format easily supports all of OCaml's type system (bin-prot > does, after all, and this is essentially a self-describing, extensible > bin-prot). I introduced limitations in the data schema to ensure extensibility > and portability. Any OCaml type can be encoded easily, but not all possible > changes to an OCaml type are safe with regard to protocol compatibility. Using > a separate language makes it easier to prevent altogether (by making them > impossible to express) or catch such errors. > > Leaving unsafe protocol modifications aside (which just means that you have to > be careful when you change a type), the approach you suggest (supporting only > a subset of OCaml's type system in a "with protocol"-style syntax extension) > seems very doable. However, sexplib seems to be the safest option for > convenient, more or less future-proof serialization in OCaml, for the time > being. > > Cheers, > -- > Mauricio Fernandez - http://eigenclass.org > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-24 9:11 ` Mikkel Fahnøe Jørgensen @ 2008-10-24 14:03 ` Markus Mottl 2008-10-25 18:58 ` Mauricio Fernandez 2008-10-24 21:39 ` Mauricio Fernandez 1 sibling, 1 reply; 24+ messages in thread From: Markus Mottl @ 2008-10-24 14:03 UTC (permalink / raw) To: Mikkel Fahnøe Jørgensen; +Cc: Dario Teixeira, caml-list On Fri, Oct 24, 2008 at 5:11 AM, Mikkel Fahnøe Jørgensen <mikkel@dvide.com> wrote: > I guess this discussion is an overkill for the problem at hand, but > speaking of binary extensible protocols, have you looked at ASN.1? It > is an abstraction over any number of encodings. At least one binary > encoding has extension bits to allow future growth of object > collections and similar. Note that it is perfectly safe to grow sum types with bin-prot. It was designed that way intentionally. It's just not safe to reorder or remove elements. Nobody needs to reorder elements, because it doesn't make any operational difference in the program. Backward compatibility of protocols you define necessarily requires the presence of old constructors in sum types anyway so you may not want to remove those in any case. There is hardly any harm from the protocol perspective in leaving old constructors in there. Note, too, that polymorphic variants even allow reordering with bin-prot. They are also generally safer, because they are always encoded as 32bit integers, thus making it extremely unlikely to get accidental "good" matches when reading incompatible protocols (at the expense of space and a tiny bit of performance). Except for human-readability, I think bin-prot should scale very well on the other requirements of serialization protocols once it has been ported to architectures with unusual endianness (almost all machines are little endian nowadays so hardly anybody on this list should be affected). Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-24 14:03 ` Markus Mottl @ 2008-10-25 18:58 ` Mauricio Fernandez 2008-10-26 18:15 ` Markus Mottl 0 siblings, 1 reply; 24+ messages in thread From: Mauricio Fernandez @ 2008-10-25 18:58 UTC (permalink / raw) To: caml-list On Fri, Oct 24, 2008 at 10:03:47AM -0400, Markus Mottl wrote: > On Fri, Oct 24, 2008 at 5:11 AM, Mikkel Fahnøe Jørgensen > <mikkel@dvide.com> wrote: > > I guess this discussion is an overkill for the problem at hand, but > > speaking of binary extensible protocols, have you looked at ASN.1? It > > is an abstraction over any number of encodings. At least one binary > > encoding has extension bits to allow future growth of object > > collections and similar. > > Note that it is perfectly safe to grow sum types with bin-prot. It > was designed that way intentionally. It's just not safe to reorder or > remove elements. Nobody needs to reorder elements, because it doesn't > make any operational difference in the program. Backward > compatibility of protocols you define necessarily requires the > presence of old constructors in sum types anyway so you may not want > to remove those in any case. There is hardly any harm from the > protocol perspective in leaving old constructors in there. > > Note, too, that polymorphic variants even allow reordering with > bin-prot. (...) > > Except for human-readability, I think bin-prot should scale very well > on the other requirements of serialization protocols once it has been > ported to architectures with unusual endianness (almost all machines > are little endian nowadays so hardly anybody on this list should be > affected). Unfortunately, growing sum types is far from being the only protocol extension of interest. There's a trivial extension which, I suspect, will be at least as common in practice, namely adding new fields to a record (or new elements to a tuple). bin-prot is unable to handle it adequately --- a self-describing format like the one I'm working on is required. You might argue that this extension is subsumed by the ability to grow sum types, since you can go from type record = { a : int } with bin_io type msg = A of record to type record1 = { a : int } with bin_io type record2 = { a' : int; b : int } with bin_io type msg = A of record1 | B of record2 (Note how special care has to be taken to tag the record --- "explicit tagging" in ASN.1 parlance.) However, this merely solves a part of a problem: that all serializations according to an old type belong to the possible serializations for an updated type, or, in other words, that new consumers be able to read data written by old producers. Even with the above encoding (not with any arbitrary type definition, but with a carefully constructed one), with bin-prot, this implies that producers not be updated before consumers. My design lifts that restriction and allows an old consumer to read the data from a new producer when new fields have been added to a record or a tuple. It even allows a node to operate on data it doesn't understand completely (e.g., when a new constructor is used): it can for instance update one field it does know while leaving those it is unable to interpret (or doesn't even know about!) unmodified. I think this is very important in many of the scenarios where one would need an extensible binary protocol. Google's Protocol Buffers support this; I'm not sure this is explicitly supported by Facebook's Thrift compiler, but IIRC the protocol should allow it. AFAICS the ability to process data not understood in full requires the use of a self-describing format like the one I'm working on. -- Mauricio Fernandez - http://eigenclass.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-25 18:58 ` Mauricio Fernandez @ 2008-10-26 18:15 ` Markus Mottl 2008-10-26 19:47 ` Mauricio Fernandez 0 siblings, 1 reply; 24+ messages in thread From: Markus Mottl @ 2008-10-26 18:15 UTC (permalink / raw) To: caml-list On Sat, Oct 25, 2008 at 2:58 PM, Mauricio Fernandez <mfp@acm.org> wrote: > Unfortunately, growing sum types is far from being the only protocol extension > of interest. There's a trivial extension which, I suspect, will be at > least as common in practice, namely adding new fields to a record (or new > elements to a tuple). bin-prot is unable to handle it adequately --- a > self-describing format like the one I'm working on is required. If you add a tag to a sum type, previous protocol implementations cannot read these values, whereas new implementations will be able to read both protocols. With records / tuples it is exactly the other way round: you could, in principle, read both in the old implementation, which just needs to drop new, unknown fields, whereas the new implementation requires these fields and hence cannot parse old protocols. I don't see how any approach could "hande" the respective unsolvable case. If a receiver doesn't know how to handle a tag, or if it requires data that is not there, you'll be stuck. Note, too, that even if you created an implementation which allows handling extended records in old protocols, this would undoubtly come at a pretty hefty cost. The only efficient way to do that would be to exchange protocols and generate code at runtime to translate quickly between protocols. I don't think it's worth it. > You might argue that this extension is subsumed by the ability to grow sum types, > since you can go from > > type record = { a : int } with bin_io > type msg = A of record > > to > > type record1 = { a : int } with bin_io > type record2 = { a' : int; b : int } with bin_io > type msg = A of record1 | B of record2 > > (Note how special care has to be taken to tag the record --- "explicit > tagging" in ASN.1 parlance.) This is surely a clean way to extend protocols without losing backward compatibility. > My design lifts that restriction and allows an old consumer to read the data > from a new producer when new fields have been added to a record or a tuple. I'd probably bet that simply putting a protocol translator in front of some old application you don't want to / cannot recompile would be about as efficient. Unless, of course, you go for the "generate efficient translation code from a new protocol specifications at runtime" approach, which seems very hard to implement. And it wouldn't even be as general, since an intermediate translator could translate between previously completely unrelated, arbitrary protocols (as long as you can define a meaningful translation). It's hard to imagine that anybody wouldn't want to use a type safe language with pattern matching (like OCaml) to specify that part... > AFAICS the ability to process data not understood in full requires the use of > a self-describing format like the one I'm working on. I'd go for the protocol translator. Especially if two protocols share a lot of structure, it should be trivial to define translations. Another very reasonable approach, which does not diminish performance, would be to exchange protocol versions. Assuming that one side is always more recent than the other, they should be able to support old protocols directly. Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-26 18:15 ` Markus Mottl @ 2008-10-26 19:47 ` Mauricio Fernandez 0 siblings, 0 replies; 24+ messages in thread From: Mauricio Fernandez @ 2008-10-26 19:47 UTC (permalink / raw) To: caml-list On Sun, Oct 26, 2008 at 02:15:18PM -0400, Markus Mottl wrote: > On Sat, Oct 25, 2008 at 2:58 PM, Mauricio Fernandez <mfp@acm.org> wrote: > > Unfortunately, growing sum types is far from being the only protocol extension > > of interest. There's a trivial extension which, I suspect, will be at > > least as common in practice, namely adding new fields to a record (or new > > elements to a tuple). bin-prot is unable to handle it adequately --- a > > self-describing format like the one I'm working on is required. (...) > With records / tuples it is exactly the other way round: you could, in > principle, read both in the old implementation, which just needs to drop > new, unknown fields, whereas the new implementation requires these fields > and hence cannot parse old protocols. This (having old consumers ignore extra fields) is what bin-prot doesn't support because records/tuples aren't self-delimited. It can only be done at the outermost level if you prepend the length of the message, and breaks as soon as you have a nested record or tuple type. In my format, records and tuples are self-delimited, so this is supported trivially. Note that it is possible for a new implementation to read old protocols by specifying default values for missing fields. This basically amounts to turning newly added fields into generalized option types (the only diff being whether the 'a option -> 'a conversion is controlled at the level of the type definition or distributed throughout the code). New code has to cope with the possibility that the fields might be None, that's all. Old code never sees those fields and works unmodified. > I don't see how any approach could "hande" the respective unsolvable > case. If a receiver doesn't know how to handle a tag, or if it > requires data that is not there, you'll be stuck. The former case is indeed unsolvable if the reader is to operate with that field in a specific (not polymorphic) way. (It can still do things involving only other fields, though.) In the second case, however, the receiver has got the advantage of hindsight: it knows that the extra data might not be present, and the code can cope with that. > Note, too, that even if you created an implementation which allows > handling extended records in old protocols, this would undoubtly come > at a pretty hefty cost. The only efficient way to do that would be to > exchange protocols and generate code at runtime to translate quickly > between protocols. I don't think it's worth it. ? I haven't optimized the generated code yet, but I'm seeing only a 25% drop in decoding speed compared to Marshal in my preliminary tests. Extra fields aren't even decoded, just saved in encoded form and appended to the output when serializing again. > > You might argue that this extension is subsumed by the ability to grow sum types, > > since you can go from > > > > type record = { a : int } with bin_io > > type msg = A of record > > > > to > > > > type record1 = { a : int } with bin_io > > type record2 = { a' : int; b : int } with bin_io > > type msg = A of record1 | B of record2 > > > > (Note how special care has to be taken to tag the record --- "explicit > > tagging" in ASN.1 parlance.) > > This is surely a clean way to extend protocols without losing backward > compatibility. It's bothersome for the programmer (picture type msg = ... | F of record6 and record6 = { a''''' : int; b'''': int; c''': float; d'': foo; e': bar; f : baz), and arguably worse than extending the record directly, because, as you said above, the receiver will not know how to handle the "B" tag, even though it would be perfectly able to decode the subset of the record it understands. It's safe only in one direction (new code can read old data). > > My design lifts that restriction and allows an old consumer to read the data > > from a new producer when new fields have been added to a record or a tuple. > > I'd probably bet that simply putting a protocol translator in front of > some old application you don't want to / cannot recompile would be > about as efficient. It's not always a matter of not recompiling the application, but rather of not having recompiled it *yet*: in a system with multiple nodes, it is hard to migrate them all to the updated code atomically... Putting a protocol translator in front of the old code is just as hard as updating it: it also means that all exchanges have to stop while the protocol translators are put in place --- hardly any advantage over just migrating to updated code. > > AFAICS the ability to process data not understood in full requires the use of > > a self-describing format like the one I'm working on. > > I'd go for the protocol translator. Especially if two protocols share > a lot of structure, it should be trivial to define translations. > Another very reasonable approach, which does not diminish performance, > would be to exchange protocol versions. Assuming that one side is > always more recent than the other, they should be able to support old > protocols directly. Protocol negotiation is not always possible. Consider the case of data stored on disk (or on any dummy server that only knows about files, not protocols) and accessed directly without an intermediate translation layer. -- Mauricio Fernandez - http://eigenclass.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-24 9:11 ` Mikkel Fahnøe Jørgensen 2008-10-24 14:03 ` Markus Mottl @ 2008-10-24 21:39 ` Mauricio Fernandez 2008-10-24 22:27 ` Mikkel Fahnøe Jørgensen 1 sibling, 1 reply; 24+ messages in thread From: Mauricio Fernandez @ 2008-10-24 21:39 UTC (permalink / raw) To: caml-list On Fri, Oct 24, 2008 at 11:11:10AM +0200, Mikkel Fahnøe Jørgensen wrote: > I guess this discussion is an overkill for the problem at hand, but > speaking of binary extensible protocols, have you looked at ASN.1? It > is an abstraction over any number of encodings. At least one binary > encoding has extension bits to allow future growth of object > collections and similar. Yes, I referred to it indirectly in my previous message. Indeed, ASN.1 supports disjoint unions ("tagged types") that would allow to extend a type. It is obviously possible to build extensible protocols with ASN.1, but if I understand it correctly, not all protocols expressed in ASN.1's abstract syntax are automatically extensible --- it requires some care when designing them (i.e., tagging). My main problem with ASN.1 is that even the distinguished encoding rules are fairly complex; also, explicit tagging results in relatively heavy serialization too. My protocol family is both substantially simpler and better adapted for extensibility. For example, the generic pretty-printer (able to decode any message) takes ~40 lines of code. -- Mauricio Fernandez - http://eigenclass.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-24 21:39 ` Mauricio Fernandez @ 2008-10-24 22:27 ` Mikkel Fahnøe Jørgensen 2008-10-25 19:19 ` Mauricio Fernandez 0 siblings, 1 reply; 24+ messages in thread From: Mikkel Fahnøe Jørgensen @ 2008-10-24 22:27 UTC (permalink / raw) To: caml-list > serialization too. My protocol family is both substantially simpler and better > adapted for extensibility. For example, the generic pretty-printer (able to > decode any message) takes ~40 lines of code. > I see - somehow it reminds me of stackish - kind of S expressions backwards I guess - apparently with good performance, but also tag'ed I reckon. http://www.zedshaw.com/essays/stackish_xml_alternative.html More specifically regarding DTD's: Since I have been playing around with Ragel: http://www.complang.org/ragel/ I was also wondering about converting DTD's to state-machines with a stack, then feed them to a Ragel input file and have Ragel produce a table that can be run by a small interpreter. I did something similar for an XML parser as a kind of DTD replacement, although I manually wrote the state-machines and compiled to C, not a table. For OCaml you would link in the C interpreter, or rewrite it in OCaml. Mikkel ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-24 22:27 ` Mikkel Fahnøe Jørgensen @ 2008-10-25 19:19 ` Mauricio Fernandez 0 siblings, 0 replies; 24+ messages in thread From: Mauricio Fernandez @ 2008-10-25 19:19 UTC (permalink / raw) To: caml-list On Sat, Oct 25, 2008 at 12:27:08AM +0200, Mikkel Fahnøe Jørgensen wrote: > > serialization too. My protocol family is both substantially simpler and better > > adapted for extensibility. For example, the generic pretty-printer (able to > > decode any message) takes ~40 lines of code. > > > > I see - somehow it reminds me of stackish - kind of S expressions > backwards I guess - apparently with good performance, but also tag'ed > I reckon. > > http://www.zedshaw.com/essays/stackish_xml_alternative.html heh, I read about Stackish a while ago (a few years?). Besides being human-readable, Stackish uses tags in a different way. Whereas Stackish uses tags for the node names (behaving like Google's Protocol Buffers or Facebook's Thrift in this regard), in my design tags are like OCaml's: a way to encode different constructors for a given field. For instance, if you have a field ... length : float ... and latter decide that a mere float is not enough, and it should be actually type len = Cm of float | Inch of float ... length : float ... my system assigns a tag to each constructor, the way OCaml does (the original type definition carries a default tag which corresponds to the Cm constructor). AFAICS this can only be encoded in a roundabout way in Stackish, since it doesn't have sum types. > More specifically regarding DTD's: > Since I have been playing around with Ragel: http://www.complang.org/ragel/ > I was also wondering about converting DTD's to state-machines with a > stack, then feed them to a Ragel input file and have Ragel produce a > table that can be run by a small interpreter. > I did something similar for an XML parser as a kind of DTD > replacement, although I manually wrote the state-machines and compiled > to C, not a table. > For OCaml you would link in the C interpreter, or rewrite it in OCaml. Turning each data schema into a state-machine sounds like a fair amount of work. What I was looking for and ended up implementing is similar in spirit to bin-prot's "with bin_io" extension, with the difference that the type is specified using a language-independent abstract syntax instead of OCaml's type language, and that the wire format is designed to allow extensions happening in both producers and consumers non-atomically. -- Mauricio Fernandez - http://eigenclass.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 15:34 ` [Caml-list] " Dario Teixeira 2008-10-23 16:37 ` Stefano Zacchiroli @ 2008-10-23 16:46 ` Markus Mottl 1 sibling, 0 replies; 24+ messages in thread From: Markus Mottl @ 2008-10-23 16:46 UTC (permalink / raw) To: Dario Teixeira; +Cc: caml-list, Sylvain Le Gall On Thu, Oct 23, 2008 at 11:34 AM, Dario Teixeira <darioteixeira@yahoo.com> wrote: > Sexplib scores very good on ease of use, future-proofness, and > portability, and reasonably good on performance and human-readability. > My guess is that bin-prot has better performance but worse portability > and future-proofness, and nill human-readability. Marshal gets > top scores in performance and ease of use, but fails miserably in > future-proofness, human-readability, and portability. Bin-prot is settled in its design. We heavily rely on it here at Jane Street and store TBs of data in it so there is no way it's going to change. I would say it is future-proof. Portability could be improved, of course, e.g. to bigendian architectures, etc., but that's not hard to do. Performance is definitely competitive to marshal: writing is noticably faster, and reading only marginally slower. It also requires a little less storage space. Main problem here is actually that it doesn't support shared / cyclic datastructures. I don't think anybody would blame it for not being human-readable, because that's the nature of binary protocols ;-) Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Serialisation of PXP DTDs 2008-10-22 20:11 Serialisation of PXP DTDs Dario Teixeira 2008-10-22 23:05 ` Sylvain Le Gall @ 2008-10-23 14:55 ` Gerd Stolpmann 1 sibling, 0 replies; 24+ messages in thread From: Gerd Stolpmann @ 2008-10-23 14:55 UTC (permalink / raw) To: Dario Teixeira; +Cc: caml-list Am Mittwoch, den 22.10.2008, 13:11 -0700 schrieb Dario Teixeira: > Hi, > > I am using PXP to parse the MathML2 DTD. This is a fairly large DTD, > which even on a fast machine takes several seconds to parse. I am > therefore looking at ways to serialise a parsed DTD, in a such a way > that it can be reused by other processes. > > Does PXP already offer primitives for (un)serialising DTDs? (I couldn't > find any). Note that using Marshal is out of the question, because DTDs > are stored as objects, and we all know that objects cannot be serialised > across process boundaries. But are there alternative solutions I'm > overlooking? No, there is currently no built-in function to serialize DTD's. The DTD objects are, however, mostly containers, and you can get all their properties by invoking methods of the object interface. That allows it to do your own serialization. You are a bit dependent on the PXP version then, but I don't think the interface of DTD's will change anytime soon. Gerd > > On a more general but related note, I think we should start an OSP > discussion about standardising serialisation methods. The rationale > should be obvious. Myself, I am partial to Sexplib, since it is > reasonably fast, very simple to use, human-readable, and future-proof. > I reckon that bin-prot could also be considered, as long as at some > point the binary format is "set in stone", or at least deserialisers > are always backwards compatible. Any other opinions? > > Thanks for your time! > Cheers, > Dario Teixeira > > > > > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs
@ 2008-10-23 18:41 Dario Teixeira
2008-10-23 18:58 ` Markus Mottl
0 siblings, 1 reply; 24+ messages in thread
From: Dario Teixeira @ 2008-10-23 18:41 UTC (permalink / raw)
To: Markus Mottl; +Cc: caml-list
Hi,
> Bin-prot is settled in its design. We heavily rely on it here
> at Jane Street and store TBs of data in it so there is no way
> it's going to change. I would say it is future-proof.
Thanks for the clarification, Markus, and I will take a closer
look at bin-prot. One question, however: is it possible to use
*both* the sexplib and bin-prot syntax extensions on the same
structure? That way convenience for the developer is preserved,
and users can choose which side of the performance vs readability
trade-off they prefer.
Cheers,
Dario
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 18:41 [Caml-list] " Dario Teixeira @ 2008-10-23 18:58 ` Markus Mottl 2008-10-23 20:04 ` Dario Teixeira 0 siblings, 1 reply; 24+ messages in thread From: Markus Mottl @ 2008-10-23 18:58 UTC (permalink / raw) To: Dario Teixeira; +Cc: caml-list On Thu, Oct 23, 2008 at 2:41 PM, Dario Teixeira <darioteixeira@yahoo.com> wrote: > Thanks for the clarification, Markus, and I will take a closer > look at bin-prot. One question, however: is it possible to use > *both* the sexplib and bin-prot syntax extensions on the same > structure? That way convenience for the developer is preserved, > and users can choose which side of the performance vs readability > trade-off they prefer. Absolutely, both converters are supported simultaneously. That's why we had to factor out the type-conv package, because some code needs to be shared. Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Caml-list] Re: Serialisation of PXP DTDs 2008-10-23 18:58 ` Markus Mottl @ 2008-10-23 20:04 ` Dario Teixeira 0 siblings, 0 replies; 24+ messages in thread From: Dario Teixeira @ 2008-10-23 20:04 UTC (permalink / raw) To: Markus Mottl; +Cc: caml-list Hi, > Absolutely, both converters are supported simultaneously. > That's why we had to factor out the type-conv package, because > some code needs to be shared. Excellent! I asked because I remember giving it a go some months ago and running into preprocessor errors. But now I realise what the mistake was: instead of "with sexp with bin_io", one should write "with sexp, bin_io"... Cheers, Dario Teixeira ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2008-10-26 19:47 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-10-22 20:11 Serialisation of PXP DTDs Dario Teixeira 2008-10-22 23:05 ` Sylvain Le Gall 2008-10-23 15:34 ` [Caml-list] " Dario Teixeira 2008-10-23 16:37 ` Stefano Zacchiroli 2008-10-23 16:53 ` Markus Mottl 2008-10-23 19:26 ` Dario Teixeira 2008-10-23 21:05 ` Mauricio Fernandez 2008-10-23 22:18 ` Gerd Stolpmann 2008-10-23 22:50 ` Mauricio Fernandez 2008-10-23 22:21 ` Dario Teixeira 2008-10-23 23:36 ` Mauricio Fernandez 2008-10-24 9:11 ` Mikkel Fahnøe Jørgensen 2008-10-24 14:03 ` Markus Mottl 2008-10-25 18:58 ` Mauricio Fernandez 2008-10-26 18:15 ` Markus Mottl 2008-10-26 19:47 ` Mauricio Fernandez 2008-10-24 21:39 ` Mauricio Fernandez 2008-10-24 22:27 ` Mikkel Fahnøe Jørgensen 2008-10-25 19:19 ` Mauricio Fernandez 2008-10-23 16:46 ` Markus Mottl 2008-10-23 14:55 ` [Caml-list] " Gerd Stolpmann 2008-10-23 18:41 [Caml-list] " Dario Teixeira 2008-10-23 18:58 ` Markus Mottl 2008-10-23 20:04 ` Dario Teixeira
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox