Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Pierre Weis <pierre.weis@inria.fr>
To: frisch@clipper.ens.fr (Alain Frisch)
Cc: pierre.weis@inria.fr, caml-list@inria.fr
Subject: Re: [Caml-list] Num library
Date: Fri, 11 Oct 2002 22:01:07 +0200 (MET DST)	[thread overview]
Message-ID: <200210112001.WAA15870@pauillac.inria.fr> (raw)
In-Reply-To: <Pine.SOL.4.44.0210102047030.7956-100000@clipper.ens.fr> from Alain Frisch at "Oct 10, 102 08:53:57 pm"

> On Thu, 10 Oct 2002, Pierre Weis wrote:
> 
> > Once more, just one question to you: which layer of the Num library
> > had you benchmarked ?
> 
> Mmmh, I did not intend to launch a new flamewar here ...
> 
> Maybe you can give some information yourself about the library. If I want
> to manipulate only integers (no rational numbers), most of which will be
> small integers, is it better to use Num or Big_int ?  Would Num benefit to
> be specialized to (small+big) integers (that is, throwing away the Ratio
> constructor) ?  Because int values and Big_int values can be distinguished
> at runtime, one could imagine implementing the union "by hand" (without
> explicit tagging, and thus avoiding a lot of memory allocation and
> garbage collection). What would be the gain ?

The Num library is fairly sophisticated and fairly well integrated
into the Caml language. We still need some lexical additions to be
quite confortable to input numbers, but still it is a really mature
and useful library.

Now, some quick info to understand why it is so difficult to bench
unubounded precision arithmetic packages.

Our Num library has 4 arithmetic layers:

- nat: positive integers. Allocation is explicit, operations are pure
side effects (result is stored in operands).

- big_int: signed integers. Allocation is implicit, operations are
functional.

- ratio: rational numbers. Allocation is implicit, operations are
functional, but normalization is a side-effect. Normalization is
either automatic or explicitely required by the user. You have to know
that this sophisticated normalization discipline is just mandatory to
obtain good performance with this layer: just changing the
normalization regime of rational arithmetic primitives can have an
exponential impact on the runtime (either systematic normalization or
no normalization at all may exponentiallly win or loose depending on
the computation at hand!)

- num: the higer layer. Numbers are small integers, big integers, or
rational numbers. Necessary coercion between those types are handled
automatically by the library.

The num layer is the more convenient to write programs. The nat layer
is the most efficient if you just need integers. The big_int layer is
generally a bit faster that the num layer, but if you want to be
really efficient you need to use the nat layer. The programmation in
the nat layer is a bit difficult, but the efficiency is (normally)
what you gain: you end with no garbage collection at all, the numbers
are allocated from the beginning of the computation with the right
size to contain the final result, the operations are minimized (both
in the size of operands and in the number of operations required to
obtain the final result). We use to say that nat programming was the
algorithmic proof level: you must state and prove your invariants
while programming; a very profitable and refreshing exercice.

Also the Num library is optimized for computation, not for
printing. The idea being that normally you compute a lot and just
print the final result. If your bench is just printing a lot of big
numbers that are easily obtained with almost no computation (for
instance factorial numbers), you will think the Num library to be very
slow. However, we know that in the Num library the printing of big
numbers is too slow and has to be revisited to go faster.

As for the suggested improvement on the num type ``by hand'', some
Caml compilers were (once) able to automatically do the optimization
you just described (by the way not only for small ints but for the 3
constructors of the type). As you may suspect, the gain depends a lot
on the application at hand. In some cases, the speedup was impressive
(I remember a matrices package that almost never use big numbers,
except in case of ill-conditioned problems: the improvement of num
arithmetic versus big_int arithmetic was impressive). However, you
cannot have the same arithmetic efficiency with small integers
represented as num values since you need to check for potential
overflows for each operation (meaning at least 5 to 10 instructions
instead of 1).

Also this optimization does not fit very well with the pattern
matching compiler technology of the current compiler. In addition, it
is not clear that it will be a big win in general (I mean out of the
big number arithmetic domain, where it is clearly a win).

> General question: the library seems to be extremely stable since a several
> years. Does it mean you consider that it is just good as it is, or does it
> mean you don't want to continue working on it  ?

Yes, the library is extremely stable, and this is an extremely
desirable property: bugs are also extremely well chased.

Concerning the development of the library, I once wrote a new
implementation of the library from scratch. Unfortunately, I stopped
when facing the problem of fast printing which is not so trivial. We
may also consider to reimplement the Num library on top of another big
numbers implementation (GMP or Numerix, if only some volonteers added
assembly code for as many processors as necessary to run on all
the platforms where Objective Caml is available).

Best regards,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2002-10-11 20:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-10  8:42 Alain Frisch
2002-10-10  9:42 ` sebastien FURIC
2002-10-10 14:56   ` Yaron M. Minsky
2002-10-10 17:14     ` Pierre Weis
2002-10-10 18:53       ` Alain Frisch
2002-10-11 20:01         ` Pierre Weis [this message]
2002-10-10 19:45       ` Yaron M. Minsky
2002-10-10 17:08   ` Pierre Weis
2002-10-11  9:26     ` Sebastien Furic
2002-10-11 20:17       ` Pierre Weis
2002-10-11 10:22     ` Alessandro Baretta
2002-10-11 13:23 ` Claude Marche
2002-10-11 16:14   ` Sebastien Furic
2002-10-11 14:08 ` Jean-Christophe Filliatre
2002-10-11 18:35   ` "custom" operators in caml (was: Re: [Caml-list] Num library) Chris Hecker
2002-10-11 20:30   ` [Caml-list] Num library Pierre Weis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200210112001.WAA15870@pauillac.inria.fr \
    --to=pierre.weis@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=frisch@clipper.ens.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox