[Caml-list] optimizing numerical code

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

* [Caml-list] optimizing numerical code
@ 2011-05-18 18:35 Joel Reymont
  2011-05-18 18:50 ` Alain Frisch
  0 siblings, 1 reply; 7+ messages in thread
From: Joel Reymont @ 2011-05-18 18:35 UTC (permalink / raw)
  To: caml-list

Consider the following two functions that I'm trying to optimize.

Why is there an allocation for every iteration of the loop in divergence1 but not in divergence2?

What else can I do to make divergence2 faster, apart from using -unsafe?

	Thanks in advance, Joel

---

let js_divergence1 v1 v2 =
  let acc = ref 0. in
  for i = 0 to (Array.length v1) - 1 do
    let x = v1.(i)
    and y = v2.(i) in
    let m = 0.5 *. (x +. y) in
    let d1 = x *. log (x /. m) 
    and d2 = y *. log (y /. m) in
    acc := !acc +. d1 +. d2
  done;
  (!acc)

let js_divergence2 v1 v2 =
  let sum1 = ref 0.
  and sum2 = ref 0. in
  for i = 0 to (Array.length v1) - 1 do
    let x = v1.(i)
    and y = v2.(i) in
    let m = 2. /. (x +. y) in
    let d1 = x *. log (x *. m) 
    and d2 = y *. log (y *. m) in
    sum1 := !sum1 +. d1;
    sum2 := !sum2 +. d2;
  done;
  !sum1 +. !sum2

ocamlopt -dcmm

(function camlFoo__js_divergence1_1030 (v1/1031: addr v2/1032: addr)
 (let acc/1033 "camlFoo__3"
   (let (i/1034 1 bound/1066 (+ (or (>>u (load (+a v1/1031 -8)) 9) 1) -2))
     (catch
       (if (> i/1034 bound/1066) (exit 17)
         (loop
           (let
             (x/1067
                (seq (checkbound (>>u (load (+a v1/1031 -8)) 9) i/1034)
                  (load float64u (+a (+a v1/1031 (<< i/1034 2)) -4)))
              y/1068
                (seq (checkbound (>>u (load (+a v2/1032 -8)) 9) i/1034)
                  (load float64u (+a (+a v2/1032 (<< i/1034 2)) -4)))
              m/1069 (*f 0.5 (+f x/1067 y/1068))
              d1/1070 (*f x/1067 (extcall "log" (/f x/1067 m/1069) float))
              d2/1071 (*f y/1068 (extcall "log" (/f y/1068 m/1069) float)))
             (assign acc/1033
                       (alloc 1277
                         (+f (+f (load float64u acc/1033) d1/1070) d2/1071))))
           (let i/1065 i/1034 (assign i/1034 (+ i/1034 2))
             (if (== i/1065 bound/1066) (exit 17) []))))
     with(17) []))
   acc/1033))

(function camlFoo__js_divergence2_1040 (v1/1041: addr v2/1042: addr)
 (let (sum1/1055 0. sum2/1056 0.)
   (let (i/1045 1 bound/1058 (+ (or (>>u (load (+a v1/1041 -8)) 9) 1) -2))
     (catch
       (if (> i/1045 bound/1058) (exit 16)
         (loop
           (let
             (x/1059
                (seq (checkbound (>>u (load (+a v1/1041 -8)) 9) i/1045)
                  (load float64u (+a (+a v1/1041 (<< i/1045 2)) -4)))
              y/1060
                (seq (checkbound (>>u (load (+a v2/1042 -8)) 9) i/1045)
                  (load float64u (+a (+a v2/1042 (<< i/1045 2)) -4)))
              m/1061 (/f 2. (+f x/1059 y/1060))
              d1/1062 (*f x/1059 (extcall "log" (*f x/1059 m/1061) float))
              d2/1063 (*f y/1060 (extcall "log" (*f y/1060 m/1061) float)))
             (assign sum1/1055 (+f sum1/1055 d1/1062))
             (assign sum2/1056 (+f sum2/1056 d2/1063)))
           (let i/1057 i/1045 (assign i/1045 (+ i/1045 2))
             (if (== i/1057 bound/1058) (exit 16) []))))
     with(16) []))
   (alloc 1277 (+f sum1/1055 sum2/1056))))

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] optimizing numerical code
  2011-05-18 18:35 [Caml-list] optimizing numerical code Joel Reymont
@ 2011-05-18 18:50 ` Alain Frisch
  2011-05-19  8:24   ` Alain Frisch
  0 siblings, 1 reply; 7+ messages in thread
From: Alain Frisch @ 2011-05-18 18:50 UTC (permalink / raw)
  To: Joel Reymont; +Cc: caml-list

On 5/18/2011 8:35 PM, Joel Reymont wrote:
> Consider the following two functions that I'm trying to optimize.
>
> Why is there an allocation for every iteration of the loop in divergence1 but not in divergence2?

First, note that the compiler does indeed unbox the reference cell acc 
as a local variable. It turns out that the float contained in the 
reference is not itself unboxed. This is due to the heuristic used by 
ocamlopt to unbox floats. Currently, a float variable is unboxed only if 
all the places where its value is used are unboxing contexts. A way to 
force unboxing in divergence1 is to replace (!acc) with (!acc +. 0) at 
the end. Without this change, the compilers can find a use of !acc as a 
boxed float (because this is what the function needs to return) and so 
it decides not to unbox acc at all.

See also:
http://caml.inria.fr/mantis/view.php?id=5204

(The more_unboxing branch in the OCaml SVN implements a different 
unboxing strategy.)

-- Alain

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] optimizing numerical code
  2011-05-18 18:50 ` Alain Frisch
@ 2011-05-19  8:24   ` Alain Frisch
  2011-05-19  8:37     ` Joel Reymont
  0 siblings, 1 reply; 7+ messages in thread
From: Alain Frisch @ 2011-05-19  8:24 UTC (permalink / raw)
  To: Joel Reymont; +Cc: caml-list

On 05/18/2011 08:50 PM, Alain Frisch wrote:
> On 5/18/2011 8:35 PM, Joel Reymont wrote:
>> Consider the following two functions that I'm trying to optimize.
>>
>> Why is there an allocation for every iteration of the loop in
>> divergence1 but not in divergence2?
>
> First, note that the compiler does indeed unbox the reference cell acc
> as a local variable. It turns out that the float contained in the
> reference is not itself unboxed. This is due to the heuristic used by
> ocamlopt to unbox floats. Currently, a float variable is unboxed only if
> all the places where its value is used are unboxing contexts. A way to
> force unboxing in divergence1 is to replace (!acc) with (!acc +. 0) at
> the end. Without this change, the compilers can find a use of !acc as a
> boxed float (because this is what the function needs to return) and so
> it decides not to unbox acc at all.
>
>
> See also:
> http://caml.inria.fr/mantis/view.php?id=5204
>
> (The more_unboxing branch in the OCaml SVN implements a different
> unboxing strategy.)

Actually, even in this branch, acc would not be unboxed. The new 
heuristic unboxes float variables unless they are both needed in boxed 
form AND mutated, which is the case for acc in your function.

-- Alain

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] optimizing numerical code
  2011-05-19  8:24   ` Alain Frisch
@ 2011-05-19  8:37     ` Joel Reymont
  2011-05-19 10:59       ` Alain Frisch
  0 siblings, 1 reply; 7+ messages in thread
From: Joel Reymont @ 2011-05-19  8:37 UTC (permalink / raw)
  To: Alain Frisch; +Cc: caml-list


On May 19, 2011, at 10:24 AM, Alain Frisch wrote:

> Actually, even in this branch, acc would not be unboxed. The new heuristic unboxes float variables unless they are both needed in boxed form AND mutated, which is the case for acc in your function.

Shouldn't it be unboxed, though?

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] optimizing numerical code
  2011-05-19  8:37     ` Joel Reymont
@ 2011-05-19 10:59       ` Alain Frisch
  2011-05-19 12:40         ` Gerd Stolpmann
  0 siblings, 1 reply; 7+ messages in thread
From: Alain Frisch @ 2011-05-19 10:59 UTC (permalink / raw)
  To: Joel Reymont; +Cc: caml-list

On 05/19/2011 10:37 AM, Joel Reymont wrote:
>
> On May 19, 2011, at 10:24 AM, Alain Frisch wrote:
>
>> Actually, even in this branch, acc would not be unboxed. The new heuristic unboxes float variables unless they are both needed in boxed form AND mutated, which is the case for acc in your function.
>
> Shouldn't it be unboxed, though?

Yes, in this case, it would make a lot of sense to work with an unboxed 
variable and only box it when needed. But finding a good heuristic is 
not trivial. Always applying this approach of "on-demand" boxing might 
have drawbacks: more allocations (if the same value is used several 
times in boxed form) and less sharing of allocations (the compiler 
combines adjacent allocations to avoid some overhead). In the example, 
it is easy to find out that the variable is used only once in boxed form 
(in a context executed at most once).

Here is another "lazy" approach that could be worth trying. Consider a 
float variable x which is assigned and used both in boxed and unboxed 
form in the same function. Internally, we keep two variants of it: 
x_boxed and x_unboxed. When x is assigned the result of some float 
expression e:

- if e can be computed directly in unboxed form (e.g. it is the result 
of some numerical operation), assign x_unboxed only;

- if e comes in boxed form (e.g. it is the result of some function 
call), assign x_boxed and x_unboxed (by dereferencing x_boxed);

When x is used an an unboxing context, use x_unboxed. When x is used as 
a boxed value (e.g. as an argument to a function call, or as the return 
value of the current function), check (at runtime) if the content of 
x_boxed is equal to x_unboxed; if yes, return x_boxed; otherwise, box 
x_unboxed, store the new block in x_boxed and returns it.

This scheme guarantees that at most one allocation happens between two 
successive assignment to the variable, and that no allocation happens if 
it turns out that the current value is never used in boxed form. This 
comes at the price of some extra equality checks between floats and 
conditional jumps, but it might be worth trying.

-- Alain

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] optimizing numerical code
  2011-05-19 10:59       ` Alain Frisch
@ 2011-05-19 12:40         ` Gerd Stolpmann
  2011-06-09  9:02           ` Alain Frisch
  0 siblings, 1 reply; 7+ messages in thread
From: Gerd Stolpmann @ 2011-05-19 12:40 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Joel Reymont, caml-list

Am Donnerstag, den 19.05.2011, 12:59 +0200 schrieb Alain Frisch:
> On 05/19/2011 10:37 AM, Joel Reymont wrote:
> >
> > On May 19, 2011, at 10:24 AM, Alain Frisch wrote:
> >
> >> Actually, even in this branch, acc would not be unboxed. The new heuristic unboxes float variables unless they are both needed in boxed form AND mutated, which is the case for acc in your function.
> >
> > Shouldn't it be unboxed, though?
> 
> Yes, in this case, it would make a lot of sense to work with an unboxed 
> variable and only box it when needed. But finding a good heuristic is 
> not trivial. Always applying this approach of "on-demand" boxing might 
> have drawbacks: more allocations (if the same value is used several 
> times in boxed form) and less sharing of allocations (the compiler 
> combines adjacent allocations to avoid some overhead). In the example, 
> it is easy to find out that the variable is used only once in boxed form 
> (in a context executed at most once).

Would it make sense to let the developer help when deciding which
occurrences of a variable are important and which not? I'm thinking here
of a function

let rare = identity

with the additional pragma that the argument of rare is considered as an
expression that is rarely evaluated. In our example:

let js_divergence1 v1 v2 =
  let acc = ref 0. in
  for i = 0 to (Array.length v1) - 1 do
    let x = v1.(i)
    and y = v2.(i) in
    let m = 0.5 *. (x +. y) in
    let d1 = x *. log (x /. m) 
    and d2 = y *. log (y /. m) in
    acc := !acc +. d1 +. d2
  done;
  rare (!acc)              (* HERE *)

For the check whether a float can be unboxed, rare occurrences are
simply ignored. Maybe it is also helpful for other decisions, e.g.
register allocation.

> Here is another "lazy" approach that could be worth trying. 

In some sense, this goes into the same direction, only that rare-ness is
determined at runtime.

Gerd

> Consider a 
> float variable x which is assigned and used both in boxed and unboxed 
> form in the same function. Internally, we keep two variants of it: 
> x_boxed and x_unboxed. When x is assigned the result of some float 
> expression e:
> 
> - if e can be computed directly in unboxed form (e.g. it is the result 
> of some numerical operation), assign x_unboxed only;
> 
> - if e comes in boxed form (e.g. it is the result of some function 
> call), assign x_boxed and x_unboxed (by dereferencing x_boxed);
> 
> 
> When x is used an an unboxing context, use x_unboxed. When x is used as 
> a boxed value (e.g. as an argument to a function call, or as the return 
> value of the current function), check (at runtime) if the content of 
> x_boxed is equal to x_unboxed; if yes, return x_boxed; otherwise, box 
> x_unboxed, store the new block in x_boxed and returns it.
> 
> This scheme guarantees that at most one allocation happens between two 
> successive assignment to the variable, and that no allocation happens if 
> it turns out that the current value is never used in boxed form. This 
> comes at the price of some extra equality checks between floats and 
> conditional jumps, but it might be worth trying.
> 
> 
> -- Alain
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] optimizing numerical code
  2011-05-19 12:40         ` Gerd Stolpmann
@ 2011-06-09  9:02           ` Alain Frisch
  0 siblings, 0 replies; 7+ messages in thread
From: Alain Frisch @ 2011-06-09  9:02 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Joel Reymont, caml-list

On 05/19/2011 02:40 PM, Gerd Stolpmann wrote:
> Would it make sense to let the developer help when deciding which
> occurrences of a variable are important and which not? I'm thinking here
> of a function
>
> let rare = identity
>
> with the additional pragma that the argument of rare is considered as an
> expression that is rarely evaluated. In our example:
>
> let js_divergence1 v1 v2 =
>    let acc = ref 0. in
>    for i = 0 to (Array.length v1) - 1 do
>      let x = v1.(i)
>      and y = v2.(i) in
>      let m = 0.5 *. (x +. y) in
>      let d1 = x *. log (x /. m)
>      and d2 = y *. log (y /. m) in
>      acc := !acc +. d1 +. d2
>    done;
>    rare (!acc)              (* HERE *)
>
> For the check whether a float can be unboxed, rare occurrences are
> simply ignored. Maybe it is also helpful for other decisions, e.g.
> register allocation.
>
>> Here is another "lazy" approach that could be worth trying.
>
> In some sense, this goes into the same direction, only that rare-ness is
> determined at runtime.

I think it's good to explore more efficient "generic" compilation 
schemes first before introducing pragma to drive optimization heuristics 
(the choice of pragma would anyway depend on the generic compilation 
scheme).

For instance, it could very well be the case that a compilation scheme 
which guarantees that "float" variables are always represented in 
unboxed form within a function's body would improve performance a lot 
in most common cases and would already be good enough. It remains to 
decide what to do when the variable needs to be accessed in "boxed" form 
as well (e.g. when the float is passed to a function which cannot be 
inlined). Possible choices:

  - Always create a new boxed value at the place the boxed version is 
needed.

  - Variant: keep a cache of the boxed value; check that the cache is 
up-to-date when the boxed version is needed (no overhead for 
assignment). (Optionally, if the assigned value comes in boxed form, 
e.g. as the result of a function call, keep that as the new value for 
the cache.)

This can also be combined with a simple analysis that simplifies e.g. 
cases where the variable is assigned and then necessarily used in boxed 
form (we can then box early and avoid extra checks when the boxed value 
is needed).

All these variants and optimizations are rather straightforward to 
implement and I suspect they would give very good results. The only 
tricky point is actually to detect "float" variables at the level of the 
"Cmm" intermediate language. Currently, ocamlopt only performs inlining 
when all uses of the variable are in "float unboxing" contexts. The 
"more_unboxing" branch assumes that a variable is a float
when it is accessed at least once in a "float unboxing" context. 
Unfortunately, this is unsafe in presence of GADTs (because a variable 
can then have different types in different pattern matching branches 
within the same function body). A clean solution would be to propagate 
some (limited) type information from higher-level intermediate languages 
down to the cmm level. This require some work but is not very difficult 
(and we need a very limited form of type information here).

Btw, the same kind of unboxing would also be useful for bigarrays. One 
could "unbox" the underlying data pointer and thus avoid some overhead 
when accessing individual cells of the bigarray.

Alain

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-06-09  9:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-18 18:35 [Caml-list] optimizing numerical code Joel Reymont
2011-05-18 18:50 ` Alain Frisch
2011-05-19  8:24   ` Alain Frisch
2011-05-19  8:37     ` Joel Reymont
2011-05-19 10:59       ` Alain Frisch
2011-05-19 12:40         ` Gerd Stolpmann
2011-06-09  9:02           ` Alain Frisch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox