Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: John Prevost <j.prevost@gmail.com>
To: caml-list@yquem.inria.fr, Hal Daume III <hdaume@isi.edu>
Subject: Re: [Caml-list] bigarrays much lower than normal ones
Date: Sun, 31 Oct 2004 12:26:34 -0500	[thread overview]
Message-ID: <d849ad2a0410310926466aae40@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.44.0410310750180.22156-100000@albini.isi.edu>

On Sun, 31 Oct 2004 08:05:46 -0800 (PST), Hal Daume III <hdaume@isi.edu> wrote:
> I've been hitting the limiting size of normal float arrays and was having
> a look at the Bigarray module.  Unfortunately, it seems roughly 3-4 times
> *slower* than the standard array, which is pretty much unacceptable for
> me.  Am I doing something naively wrong, or are the Bigarrays truly this
> slow?

It is indeed possible to speed things up by quite a lot.  There are a
few factors at play that make your code slower than it has to be, but
the dominant factor is that your Bigarray version of normalize is much
more polymorphic than it needs to be:

val normalize : (float, 'a, 'b) Bigarray.Array1.t -> unit = <fun>

Compared to this for the normal array version:

val normalize : float array -> unit = <fun>

In the Array version, there's only one type parameter to worry about
at compile time--what goes into the array.  That defines everything
you need to know.  This is important because the compiler makes
optimizations for arrays of floating point numbers when it has the
ability to.  When a function is polymorphic, on the other hand, it has
to generate more generic code.

You noted that float32 was slower than float64: That's because
O'Caml's native float representation is always a 64-bit value.  In the
polymorphic version of normalize, the code has to figure out whether
it's working with a float32 or a float64 representation when it pulls
the values out.  The other type variable, which defines the array
layout (C or Fortran) also needs to be cut down to avoid over-generic
code.

I tried some other modifications, trying to remove overhead from
bounds checking--but it turns out that those modifications actually
slowed things down.  :)  In any case, the version with restricted
polymorphism on normalize sped things up a *lot*.

Unmodified Array:
real    1m30.292s
user    1m30.190s
sys     0m0.110s

Unmodified Bigarray:
real    3m31.446s
user    3m31.310s
sys     0m0.130s

Modified Bigarray (restricted polymorphism):
real    1m37.916s
user    1m37.810s
sys     0m0.120s

------------
open Bigarray

let normalize
 (a : (float, Bigarray.float64_elt, Bigarray.c_layout) Bigarray.Array1.t) =
 let _N = Array1.dim a in
 let rec sum n acc =
   if n >= _N then acc
   else sum (n+1) (acc +. Array1.get a n) in
 let s = sum 0 0. in
   for i = 0 to _N - 1 do
     Array1.set a i (Array1.get a i /. s);
   done;
   ()

let _ =
 let a = Array1.create float64 c_layout 1000000 in
   for iter = 1 to 100 do
     for i = 0 to 999999 do
       let i' = float_of_int i in
         Array1.set a i (log (0.01 *. i' *. i' +. 3. *. i' +. 4.));
     done;
     normalize a;
   done;
   ()
------------

You see that the one thing I changed here was to add the type
constraint in the definition of normalize, and it became almost as
fast as the normal array version.

The other thing I'll point out is that you can write:

Array1.set a i x; Array1.get a i

as

a.{i} <- x; a.{i}

Which can be quite a bit easier to read.  If I recall right, this
works for arrays of more than one dimension, as well.  I can't seem to
find the documentation for this feature, however.

John.


  reply	other threads:[~2004-10-31 17:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-31 16:05 Hal Daume III
2004-10-31 17:26 ` John Prevost [this message]
2004-10-31 17:41 ` [Caml-list] " malc
2004-11-01  0:05 ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d849ad2a0410310926466aae40@mail.gmail.com \
    --to=j.prevost@gmail.com \
    --cc=caml-list@yquem.inria.fr \
    --cc=hdaume@isi.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox