From: Brian Hurt <bhurt@spnz.org>
To: Dario Teixeira <darioteixeira@yahoo.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Long-term storage of values
Date: Thu, 28 Feb 2008 20:14:27 -0500 (EST) [thread overview]
Message-ID: <Pine.LNX.4.64.0802281922420.16618@localhost> (raw)
In-Reply-To: <191751.36007.qm@web54607.mail.re2.yahoo.com>
On Thu, 28 Feb 2008, Dario Teixeira wrote:
> Hi,
>
> Suppose I have a value of type Story.t, fairly complex in its definition.
> I wish to store this value in a DB (like Postgresql) for posterity.
> At the moment, I am storing in the DB the marshalled representation
> of the data; whenever I need to use it again in the Ocaml programme
> I simply fetch it from the DB and unmarshal it.
The following is just my opinion, not that of my employeer.
You're making two mistakes.
Mistake #1: treating a database as a dumb object store. This is a really
popular idea right now- Hibernate does this, as does Ruby on Rails, and a
number of other ORM packages take this effective approach. On the other
hand, dynamically typed languages are also really popular.
A database is an incredibly powerfull tool, used correctly. Used
correctly, they allow you to handle huge amounts of data shared between
multiple different clients with great flexibility and good performance.
Used incorrectly, they tend to be bloated, slow, pigs. There are a lot of
things databases aren't good at- multidimensional data, for example, or
recursive ("tree-structured") data. Databases have some signifigant
limitations. Every single element in a relation (aka table) has to be
exactly the same type- no superclasses, no variant types. Worse yet, SQL
isn't even Turing-complete. It's the world's oldest, most popular, DSL.
So "used correctly" is tricky to define, because relational databases are
a paradigm, not unlike functional programming or object oriented
programming. But the trick is that you're designing, and coding, to the
database, and you can't hide that or ignore it. Some things are easy:
databases are really good at filtering, joining, some simple mapping and
aggregation. The first few "levels" of data handling should be done in
SQL in the database- you should never be sucking whole tables down. If
you do try to hide the essential nature of the database, you're run right
into the meatgrinder of it's limitations. Used correctly, you get the
advantages and avoid the disadvantages.
So, mistake number one: either use the data, and structure your data (at
that layer) to take advantage of it, or don't use a database.
Mistake number two: file formats (and this includes marshalled data
structures), are wire protocols, and need to be designed to be as abstract
as possible- to reveal as little about the internal structure of the
program as possible (preferrably none at all).
This is an idea that gets reinvented time after time, and it always ends
in tears and recriminations: have some magic protocol that allows programs
to communication directly- just have program X call a function or pass an
object to program Y directly, and have the protocol handle all the mucking
about with serializing/deserializing data, converting function calls into
request/response messages, etc. Sun RPC, COM, CORBA, OLE, XML-RPC,
and SOAP are the implementations that spring to mind. Object
serialization hits the exact same problem: it doesn't matter whether
program X and Y are communicating via TCP/IP sockets, files, or
quantum-tachyon entanglement.
Sooner or later (and generally sooner), it'll happen: program X will ask
to some function, or pass some type of data, that program Y doesn't have
any knowledge of. It may be because version X is a newer version of the
program/protocol, and the function/data type has been added. It may be
because X is an older version, and the function/data type has since been
removed. In any case, the first time this happens is when the tears and
recriminations start.
Versioning simply makes it more painfully obvious that you're shackled to
the past. You want to get rid of that pesky function? You can't, because
older versions of the protocol require it to be there. Don't need a peice
of data anymore? Tough, older versions of the protocol still require it.
The best thing versioning gives you is the ability to error out early, and
make a more sensible error message ("Sorry, but protocol support >= 2.14
is required!"), but it doesn't solve the problem.
The best solution I've found is to be aware that, when you're
communicating with the outside world, you're implementing a *protocol*.
And that protocol should be, as I said, as abstract as possible and reveal
as little about the structure of the program as possible. So I can change
the program enormously, even reimplement it from scratch in a different
language, without great difficulty. Consider SMTP, HTTP, and YAML as
examples of protocols or generic file formats done right.
Note that you can do protocol design, and then implement it is Corba or
XML. A sure that you've done this is the existance of a "translation
layer" - comments like "OK, now we translate the XML data structure into
our internal data structure" and such like. THe successfull projects I've
seen that used these technologies did this (or got lucky and grew into
this).
So that's mistake number two: you're communicating between different
versions of the program with an ill-defined (at best) and not
generic protocol/file format.
Fix these two problems, and I'm willing to bet most of the rest of the
problems go away too.
Brian
next prev parent reply other threads:[~2008-02-29 0:54 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-28 18:41 Dario Teixeira
2008-02-28 20:01 ` [Caml-list] " David MENTRE
2008-02-28 20:01 ` Thomas Fischbacher
2008-02-28 20:05 ` Mathias Kende
2008-02-28 22:09 ` Basile STARYNKEVITCH
2008-02-29 14:45 ` Martin Jambon
2008-02-29 19:09 ` Jake Donham
2008-02-28 23:42 ` Erik de Castro Lopo
2008-02-29 1:14 ` Brian Hurt [this message]
2008-02-29 7:40 ` Gabriel Kerneis
2008-02-29 10:19 ` Berke Durak
2008-02-29 18:05 ` Markus Mottl
2008-02-29 11:44 ` Richard Jones
2008-02-29 14:09 ` Brian Hurt
2008-03-01 14:15 ` Dario Teixeira
2008-03-20 21:03 ` Dario Teixeira
2008-03-20 21:32 ` Martin Jambon
2008-03-20 22:41 ` Dario Teixeira
2008-03-20 23:00 ` Martin Jambon
2008-03-21 14:01 ` Dario Teixeira
2008-03-21 14:28 ` Martin Jambon
2008-03-21 14:34 ` Martin Jambon
2008-03-20 21:42 ` Daniel Bünzli
2008-03-20 22:33 ` Dario Teixeira
2008-03-20 21:43 ` Gerd Stolpmann
2008-03-21 14:37 ` Dario Teixeira
2008-03-21 15:24 ` Richard Jones
2008-03-22 12:14 ` David MENTRE
2008-03-21 16:04 ` Gerd Stolpmann
2008-03-21 10:32 ` Berke Durak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0802281922420.16618@localhost \
--to=bhurt@spnz.org \
--cc=caml-list@yquem.inria.fr \
--cc=darioteixeira@yahoo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox