From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ober.14@osu.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE 
	autolearn=disabled version=3.1.3
Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82])
	by yquem.inria.fr (Postfix) with ESMTP id D878DBC6B
	for <caml-list@yquem.inria.fr>; Mon,  7 Jan 2008 14:48:43 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Aq4HAGi8gUdDWxLC/2dsb2JhbACBWKYH
X-IronPort-AV: E=Sophos;i="4.24,253,1196636400"; 
   d="scan'208";a="6363867"
Received: from ip67-91-18-194.z18-91-67.customer.algx.net (HELO server1.bertec.net) ([67.91.18.194])
  by mail1-smtp-roc.national.inria.fr with ESMTP; 07 Jan 2008 14:48:43 +0100
Received: from kuba.bertec.net (kuba.bertec.net [192.168.2.16])
	by server1.bertec.net (Postfix) with ESMTP id E7B2A105830
	for <caml-list@yquem.inria.fr>; Mon,  7 Jan 2008 08:48:41 -0500 (EST)
From: Kuba Ober <ober.14@osu.edu>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Performance questions, -inline, ...
Date: Mon, 7 Jan 2008 08:48:40 -0500
User-Agent: KMail/1.9.6 (enterprise 0.20071123.740460)
References: <200801031128.30183.ober.14@osu.edu> <200801051936.23521.jon@ffconsultancy.com>
In-Reply-To: <200801051936.23521.jon@ffconsultancy.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200801070848.40809.ober.14@osu.edu>
X-Spam: no; 0.00; -inline:01 ocaml:01 hofs:01 polymorphism:01 avoided:01 ocaml:01 rewrites:01 gcc:01 ocamlopt:01 -inline:01 -unsafe:01 command-line:01 -unsafe:01 vectors:01 const:01 

On Saturday 05 January 2008, Jon Harrop wrote:
> Optimizing numerical code is discussed in detail in my book OCaml for
> Scientists. You may also be interested in a very similar thread where I
> optimized someone's almost identical code on the beginners list recently.
> There is also this relevant blog post of mine:
>
>   http://ocamlnews.blogspot.com/2007/09/spectral-norm.html
>
> Essentially, your benchmark has rediscovered the fact that abstractions
> (HOFs, polymorphism etc.) are prohibitively slow for high-performance
> numerics and must be avoided.

In the case of the particular OCaml implementation - yes. In general - no. I 
hope we agree here.

> On Thursday 03 January 2008 16:28:30 Kuba Ober wrote:
> > I haven't looked at assembly output yet, but I've run into some
> > unexpected behavior in my benchmarks.
>
> Your benchmarks aren't sufficiently well defined to convey information
> about anything specific, so you're highly likely to misinterpret what you
> see.

They are straight rewrites from C code and are used to compare how gcc and 
OCaml stack up.

> > This was compiled by ocamlopt -inline 100 -unsafe,
>
> You should use Array.unsafe_get and _set rather than the command-line
> option -unsafe because the latter breaks correct code.

This option is not affecting the execution speed in my case and thus can be 
dropped.

> > What I wonder is why vector-to-vector add is so much faster than
> > (constant) scalar to vector add. Vectors are preinitialized each time
> > with a 1.0000, 1.0001, ... sequence.
>
> This is also highly likely to be platform and architecture dependent.

The benchmark was run on the same machine and on the same day as the C code it 
was rewritten from :)

> > (* generic scalar operation *)
> > let op1 op const nloop =
> > 	let accum = ref start in
> > 	for i = 1 to nloop do
> > 		accum := op !accum const
> > 	done
>
> You probably meant to return "!accum".

The return value is ignored anyway. It's a benchmark, noone cares what
the result is. Or is it for performance reasons??

> > (* generic vector operation *)
> > let op2 op const a b (nloop : int) =
> > 	let len = Array.length a in
> > 	for j = 0 to len-1 do
> > 		for i = 0 to len-1 do
> > 			b.(i) <- op a.(i) b.(i)
> > 		done;
> > 	done
> >
> > (** addition **)
> > let add1 nloop =
> > 	let accum = ref start in
> > 	for i = 1 to nloop do
> > 		accum := !accum +. addconst
> > 	done
>
> Again, should probably return "!accum". However, you can encourage OCaml to
> unbox the float by returning "!accum + 0.0" instead.

OK, I don't quite get it. Are you talking about what the function should 
return? If so, are you implying that the function body will be compiled 
differently (better?) if a different type is returned?

> > let add2 = op1 ( +. ) addconst
>
> This should be slower than "add1".

But shouldn't. The way I write the code in this case should not affect the 
assembly the least bit. The loop with explicit operand, functional recursion 
and generic function-as-an-argument approach should all generate same 
assembly. Obviously enough they don't :)

> > let add3 a b nloop =
> > 	let len = Array.length a in
> > 	for j = 0 to len-1 do
> > 		for i = 0 to len-1 do
> > 			b.(i) <- a.(i) +. addconst
> > 		done;
> > 	done
>
> The loop over "j" can be removed.

Well, the goal was to iterate Array.length^2 times :)

> > let add4 = op2 ( +. ) addconst
>
> This will be slow because "op2" is polymorphic and "+." will not be
> inlined.

Yeah, I see that, but that shouldn't be the case if OCaml were to be serious 
in that department :)

> > let add5 a b nloop =
> > 	let len = Array.length a in
> > 	for j = 0 to len-1 do
> > 		for i = 0 to len-1 do
> > 			b.(i) <- a.(i) +. b.(i)
> > 		done;
> > 	done
>
> This increments "b" by "a" many times. Replace the repeated adds with a
> single multiply for each element.

It's benchmark code. It's supposed to check performance of 
increment-vector-by-a-vector operation. I was hoping it would be obvious 
enough :(

Cheers, Kuba