From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA10491 for caml-redistribution; Thu, 7 Oct 1999 09:59:25 +0200 (MET DST)
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id CAA17731 for <caml-list@pauillac.inria.fr>; Thu, 7 Oct 1999 02:19:56 +0200 (MET DST)
Received: from beach.frankfurt.netsurf.de (beach.frankfurt.netsurf.de [194.64.181.2])
	by nez-perce.inria.fr (8.8.7/8.8.7) with ESMTP id CAA20076
	for <caml-list@inria.fr>; Thu, 7 Oct 1999 02:19:54 +0200 (MET DST)
Received: from ice.darmstadt.netsurf.de (board-96.darmstadt.netsurf.de [194.163.86.224])
	by beach.frankfurt.netsurf.de (8.8.5/8.8.5) with ESMTP id CAA26254;
	Thu, 7 Oct 1999 02:19:53 +0200 (MET DST)
Received: from localhost (localhost [[UNIX: localhost]])
	by ice.darmstadt.netsurf.de (8.9.3/8.9.3) id CAA21664;
	Thu, 7 Oct 1999 02:16:33 +0200
From: Gerd Stolpmann <Gerd.Stolpmann@darmstadt.netsurf.de>
Reply-To: Gerd.Stolpmann@darmstadt.netsurf.de
Organization: privat
To: William Chesters <williamc@dai.ed.ac.uk>
Subject: Re: speed versus C
Date: Thu, 7 Oct 1999 00:49:10 +0200
X-Mailer: KMail [version 1.0.21]
Content-Type: text/plain
References: <199910061521.QAA02123@toy.william.bogus>
Cc: caml-list@inria.fr
MIME-Version: 1.0
Message-Id: <99100702163301.21475@ice>
Content-Transfer-Encoding: 8bit
Sender: weis

On Wed, 06 Oct 1999, William Chesters wrote:
>Gerd Stolpmann writes:
> > - The recursive data types of Ocaml are often more problem-oriented than
> >   "imperative" data structures. Consider you want to fill an (C) array with an
> >   unknown number of elements. The typical solution puts the elements into the
> >   array until it is full, and then enlarges the array by reallocating new
> >   memory. If you use the Ocaml lists instead, you do not have this problem.
>
>   But you do have N little problems (N memory allocations and
>structure initialisations); you waste large amounts of memory in
>headers and pointers, thereby filling up your caches; and you end up
>with a data structure which takes at just as long to traverse and
>entirely fails to support any other kind of access.  If you do the
>sums, you will find that an array-based data structure (`Vector')
>works out cheaper any way you care to look at it---as long as you
>increase its capacity exponentially, e.g. doubling it each time it
>runs out.  This stipulation is not a problem because the space you
>thereby waste is still less than the amount you would lose to the
>spine of a list (and is less pernicious since it doesn't have to churn
>through the cache).

The basis of your argumentation is that the array solution works better with
current hardware and typical problem sizes. For example, memcpy is very fast
because it is specially supported by the hardware, much faster than
initializing the same number of elements of a list. This is absolutely true,
but I think that it only demonstrates that my example was too simple.

Caml can only be faster than C if you take into account that programmers are not
perfect. In C, they often do not choose the most efficient data structure
because it would be complicated and/or require an enormous programming
discipline. An example are C strings which are far from being optimal, and the
alternative would be to use a library which implements strings with a length
field. In order to be compatible with the existing kind of strings these
improved strings are typically not abstract, and simply define a struct such
that you can anywhere access their components: just another source of errors.
There are a lot more problems with sophisticated data structures, such as
uninitialized pointers, the question who is responsible for memory deallocation,
and indexes out of the bounds of arrays. Some of these problems can be solved
by additional layers, but then C is not faster any longer. This is always the
same bad alternative in C: Write an efficient program and accept the risk of
errors, or write an average program and have some more protection against your
own imperfectness.

In Caml, choosing sophisticated data structures is less error-prone, and
programmers are more willing to implement and use them. Even the basic data
types (i.e. without abstraction layers) provide often a good protection against
stupid faults, and you can use them directly without wrapping them in functions
or modules. This simply means that a comparable programming style which tries
to avoid errors leads to faster code in Caml than in C, because you do not need
to encapsulate the details of accessing the structure. For example, list
construction x::l leads to very efficient code in Caml, and the comparable
implementation in C requires an additional function abstraction which
guarantees that the new list cell is properly initialized.

I know that there are some hackers who can live without abstraction but the
avarage programmer is happy if he/she has a means to avoid unnecessary errors.

My first example looks now far better, because you have forgotten to count the
function calls to hide the array implementation. In Caml, they are not
necessary.

>   Incidentally, implementing a Vector in ocaml is slightly fiddly,
>because you have to keep valid pointers of the right type in even the
>unused part all the time.  This means o.a. delaying the creation of
>the underlying array until you have at least one element to put in it.

Caml is simply not good at arrays. I think it is not a good idea to adopt too
much imperative style.

Gerd

--
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 100             
64293 Darmstadt     EMail:   Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany                     
----------------------------------------------------------------------------