Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Christophe TROESTLER <debian00@tiscali.be>
To: Richard Jones <rich@annexia.org>
Cc: caml-list@inria.fr
Subject: [Caml-list] Re: OcamlDBI
Date: Mon, 01 Mar 2004 03:10:21 +0100 (CET)	[thread overview]
Message-ID: <20040301.031021.51509654.debian00@tiscali.be> (raw)
In-Reply-To: <20040229210015.GA31888@redhat.com>

Dear Rich,

I am sorry, I realized after the fact that the following is a bit long.

On Sun, 29 Feb 2004, Richard Jones <<rich@annexia.org> wrote:
> 
> What I would _really_ like to do is to use some Obj.magic to build
> tuples of the correct type, but that risks a segmentation fault at
> run time if the programmer makes a mistake. [...] I was thinking
> something along the lines of
>
> let data = sth#map (fun (i, s) ->
>                      do something with int i and string s) ...
>
> The tuple of type int * string could be generated magically?

That would be nice but is not quite possible.  Indeed what would be
the type of #map ?  So this is only feasible with Camlp4 and some
annotations to indicate to Camlp4 the type of the tuple.  The problem
is that it would be nice to add the types to the prepare[_cached]
statement but then they must be transported to the appropriate #map,
#iter,...  places.  As Camlp4 is just about syntax, I do not see how
to do that -- since, at worst, the #map, #iter,... may be in a
different file!  But since Camlp4 can only do what Caml can do (only
with a more terse and convenient syntax), this still begs the question
of what pure Caml should do.

So my point is first to try to understand what Caml should do for
DYNAMIC queries (where nothing is [supposed to be] known at compile
time: neither the types, nor the length, nor even the DB,...) and only
later see whether one can add syntactic sugar.  I'd like, as much as
you do, to find an efficient and convenient manner to address that
problem.

The code you send me is good food for thought.  Thanks for sharing it
(as I said I am only a casual user of DB).  I got why you want lists
on the input.  But this code also shows shortcomings that one should
deal with.  For example

  clickstream_is_null = row.(3) = ""

suggest you identify a NULL string to an empty string!!!

Because of the dynamic nature of the queries, the result cannot be
[fully] typed.  Hence strings.  But, I am sorry to insist, strings
tend to make the code UNPORTABLE because encodings are different from
one DB engine to another.  This goes against the purpose if DBI --
database independence!  The solution, then, I beg, is to create a new
type [sql_t] (see db.ml attached to the previous mail and the P.S.)
that "mimics" SQL types and can then be used to make automatic
conversions between the two.  In fact, we do not need to care so much
about SQL types but rather that we have enough types on the Caml side
to make the translations.  (1) I think it is not convenient for the
user to have

    [`Tinyint of int | `Int of int | ...]

as all we want to know on the Caml side is that we got an int.  Maybe
another type should be defined to report the column types more finely,
e.g. [`TinyInt | `SmallInt | `Int | ...] ?  (2) Also, another decision
that must be made is how to handle NULL values.  I think it is better
to have one global `Null instead of duplicating every type into [`X of
x] and [`Xopt of x option].  This is closer to the way SQL works.
Moreover, if the programmer knows that a column is not null, he can
match the type with [`X x] instead of having to worry whether the
column has been declared NOT NULL (in which case the matching should
be [`X x]) or not (in which case the matching should be [`Xopt (Some
x)]).  Finally, polymorphic variants make it convenient (see below).

Let me change some of your code to illustrate how translating to
[sqt_t array] works (see at the bottom of this mail for the types used).

>          (* Convert out, ready for grouping by days. *)
>          let hits = sth#map (fun row ->
>                                let t = row.(0) in
>                                let day = printable_day t in
>                                let time = printable_time t in
>                                let addr = row.(1) in
>                                let url = row.(2) in
>                                let clickstream = row.(3) in
>                                let clickstream_is_null = row.(3) = "" in
>                                let sessionid = row.(4) in
>                                let sessionid_is_null = row.(4) = "" in
>                                (day, (time, addr, url,
>                                       clickstream, clickstream_is_null,
>                                       sessionid, sessionid_is_null))) in

<CODE>
(* I am not sure what you do with "hits" -- whether you process them
   after of print them -- so I decided to keep the structures
   (believing formatting the data for output should happen in a single
   place -- or since you may want different back-ends: HTML, TeX,...). *)
let hits =
  sth#map (function
           | [|`Timestamp t; `String addr; `String url;
               (`Null | `String _) as clickstream;
               (`Null | `Int _) as sessionid |] ->
               (t.day, (t.time, addr, url, clickstream, sessionid))
           | _ -> assert(false))
    (* REMARK: The assert() line is supposed not to happen.  Assert
       was chosen because it will report line no if triggered. *)
</CODE>

Note that clickstream and sessionid do not have the same type as
before.  They now contain their nullity:

  clickstream : [> `Null | `String of string ]
  sessionid   : [> `Int of int | `Null ]

This is good as trying to access the value without caring about its
possible nullity will result into a type error.  This should be as
convenient to use as the couples (clickstream, clickstream_is_null) :
string * bool.  At least so I think.  If possible, I will be glad to
see the code using clickstream and sessionid to indeed be able to
judge.

Note that empty strings are now different from NULL values.

>   let sessions = sth#map (fun row ->
>                             let sessionid = int_of_string row.(0) in
>                             let url = row.(1) in
>                             let t = int_of_string row.(2) in
>                             sessionid, (url, t)) in
> 

let sessions =
  sth#map (function
           | [| `Int sessionid; `String url; `Int t |] -> (sessionid, (url, t))
           | _ -> assert(false))

> > * I feel one could use polymorphic variants in a more terse way to
> >   bind variables. [...]
> I'm not sure I understand this point.  Can you explain some more?

I just did not understand why you use `IntRef instead of simply `Int
to bind variables.  I was trying to explain that I think this is safe.

> > * method finish : unit
>
> Hmm ... #finish is not very well-defined. [...] Not clear if it's
> better or worse to do this using a finalizer instead.

Well, I thought that's what finalizers are for.  But maybe somebody on
the ML will raise points I have not thought about.

Cheers,
ChriS


P.S.  Feature request: it would be nice to have a standard way to work
with time and dates.  A module in the standard library would be nice.

---
P.S.2.  Here is the part of db.ml I sent Rich that allows to understand
the above.

type date = {
  year : int;
  month : int;
  day : int;
}

type time = {
  hour : int;
  min : int;
  sec : int;
}

type datetime = {
  date : date;
  time : time;
  microsecs : int;
  utc_offset : int;
}

module type Decimal =
sig
  type t
  val to_string : t -> string
  val to_float : t -> float
  val of_string : t -> t
    (* Operations on Decimal values ??? *)
end

module type Blob =
sig
  type t
  val length : t -> int64
  val get_chunk : pos:int64 -> len:int -> string
  val to_file : t -> string -> unit
  val of_file : string -> t
  val of_string : string -> t
end

type sql_t = [ `Null
             | `Int of int
             | `Float of float
             | `String of string
             | `Byte of char
             | `Bool of bool
             | `Bigint of Big_int.big_int
             | `Decimal of Decimal.t
             | `Date of date
             | `Time of time
             | `Timestamp of datetime
             | `Interval of datetime (* utc_offset irrelevant *)
             | `Blob of Blob.t
             ]

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


       reply	other threads:[~2004-03-01  2:10 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20040229.202452.53575435.Christophe.Troestler@umh.ac.be>
     [not found] ` <20040229210015.GA31888@redhat.com>
2004-03-01  2:10   ` Christophe TROESTLER [this message]
     [not found]     ` <20040301092842.GC7705@redhat.com>
     [not found]       ` <20040301.195138.25223321.Christophe.Troestler@umh.ac.be>
2004-03-01 19:20         ` Richard Jones
2004-03-01 19:32           ` Jeremy Chatfield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040301.031021.51509654.debian00@tiscali.be \
    --to=debian00@tiscali.be \
    --cc=caml-list@inria.fr \
    --cc=rich@annexia.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox