From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA10491 for caml-redistribution; Thu, 7 Oct 1999 09:59:25 +0200 (MET DST) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id CAA17731 for ; Thu, 7 Oct 1999 02:19:56 +0200 (MET DST) Received: from beach.frankfurt.netsurf.de (beach.frankfurt.netsurf.de [194.64.181.2]) by nez-perce.inria.fr (8.8.7/8.8.7) with ESMTP id CAA20076 for ; Thu, 7 Oct 1999 02:19:54 +0200 (MET DST) Received: from ice.darmstadt.netsurf.de (board-96.darmstadt.netsurf.de [194.163.86.224]) by beach.frankfurt.netsurf.de (8.8.5/8.8.5) with ESMTP id CAA26254; Thu, 7 Oct 1999 02:19:53 +0200 (MET DST) Received: from localhost (localhost [[UNIX: localhost]]) by ice.darmstadt.netsurf.de (8.9.3/8.9.3) id CAA21664; Thu, 7 Oct 1999 02:16:33 +0200 From: Gerd Stolpmann Reply-To: Gerd.Stolpmann@darmstadt.netsurf.de Organization: privat To: William Chesters Subject: Re: speed versus C Date: Thu, 7 Oct 1999 00:49:10 +0200 X-Mailer: KMail [version 1.0.21] Content-Type: text/plain References: <199910061521.QAA02123@toy.william.bogus> Cc: caml-list@inria.fr MIME-Version: 1.0 Message-Id: <99100702163301.21475@ice> Content-Transfer-Encoding: 8bit Sender: weis On Wed, 06 Oct 1999, William Chesters wrote: >Gerd Stolpmann writes: > > - The recursive data types of Ocaml are often more problem-oriented than > > "imperative" data structures. Consider you want to fill an (C) array with an > > unknown number of elements. The typical solution puts the elements into the > > array until it is full, and then enlarges the array by reallocating new > > memory. If you use the Ocaml lists instead, you do not have this problem. > > But you do have N little problems (N memory allocations and >structure initialisations); you waste large amounts of memory in >headers and pointers, thereby filling up your caches; and you end up >with a data structure which takes at just as long to traverse and >entirely fails to support any other kind of access. If you do the >sums, you will find that an array-based data structure (`Vector') >works out cheaper any way you care to look at it---as long as you >increase its capacity exponentially, e.g. doubling it each time it >runs out. This stipulation is not a problem because the space you >thereby waste is still less than the amount you would lose to the >spine of a list (and is less pernicious since it doesn't have to churn >through the cache). The basis of your argumentation is that the array solution works better with current hardware and typical problem sizes. For example, memcpy is very fast because it is specially supported by the hardware, much faster than initializing the same number of elements of a list. This is absolutely true, but I think that it only demonstrates that my example was too simple. Caml can only be faster than C if you take into account that programmers are not perfect. In C, they often do not choose the most efficient data structure because it would be complicated and/or require an enormous programming discipline. An example are C strings which are far from being optimal, and the alternative would be to use a library which implements strings with a length field. In order to be compatible with the existing kind of strings these improved strings are typically not abstract, and simply define a struct such that you can anywhere access their components: just another source of errors. There are a lot more problems with sophisticated data structures, such as uninitialized pointers, the question who is responsible for memory deallocation, and indexes out of the bounds of arrays. Some of these problems can be solved by additional layers, but then C is not faster any longer. This is always the same bad alternative in C: Write an efficient program and accept the risk of errors, or write an average program and have some more protection against your own imperfectness. In Caml, choosing sophisticated data structures is less error-prone, and programmers are more willing to implement and use them. Even the basic data types (i.e. without abstraction layers) provide often a good protection against stupid faults, and you can use them directly without wrapping them in functions or modules. This simply means that a comparable programming style which tries to avoid errors leads to faster code in Caml than in C, because you do not need to encapsulate the details of accessing the structure. For example, list construction x::l leads to very efficient code in Caml, and the comparable implementation in C requires an additional function abstraction which guarantees that the new list cell is properly initialized. I know that there are some hackers who can live without abstraction but the avarage programmer is happy if he/she has a means to avoid unnecessary errors. My first example looks now far better, because you have forgotten to count the function calls to hide the array implementation. In Caml, they are not necessary. > Incidentally, implementing a Vector in ocaml is slightly fiddly, >because you have to keep valid pointers of the right type in even the >unused part all the time. This means o.a. delaying the creation of >the underlying array until you have at least one element to put in it. Caml is simply not good at arrays. I think it is not a good idea to adopt too much imperative style. Gerd -- ---------------------------------------------------------------------------- Gerd Stolpmann Telefon: +49 6151 997705 (privat) Viktoriastr. 100 64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat) Germany ----------------------------------------------------------------------------