From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE autolearn=disabled version=3.1.3 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id D878DBC6B for ; Mon, 7 Jan 2008 14:48:43 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq4HAGi8gUdDWxLC/2dsb2JhbACBWKYH X-IronPort-AV: E=Sophos;i="4.24,253,1196636400"; d="scan'208";a="6363867" Received: from ip67-91-18-194.z18-91-67.customer.algx.net (HELO server1.bertec.net) ([67.91.18.194]) by mail1-smtp-roc.national.inria.fr with ESMTP; 07 Jan 2008 14:48:43 +0100 Received: from kuba.bertec.net (kuba.bertec.net [192.168.2.16]) by server1.bertec.net (Postfix) with ESMTP id E7B2A105830 for ; Mon, 7 Jan 2008 08:48:41 -0500 (EST) From: Kuba Ober To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Performance questions, -inline, ... Date: Mon, 7 Jan 2008 08:48:40 -0500 User-Agent: KMail/1.9.6 (enterprise 0.20071123.740460) References: <200801031128.30183.ober.14@osu.edu> <200801051936.23521.jon@ffconsultancy.com> In-Reply-To: <200801051936.23521.jon@ffconsultancy.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801070848.40809.ober.14@osu.edu> X-Spam: no; 0.00; -inline:01 ocaml:01 hofs:01 polymorphism:01 avoided:01 ocaml:01 rewrites:01 gcc:01 ocamlopt:01 -inline:01 -unsafe:01 command-line:01 -unsafe:01 vectors:01 const:01 On Saturday 05 January 2008, Jon Harrop wrote: > Optimizing numerical code is discussed in detail in my book OCaml for > Scientists. You may also be interested in a very similar thread where I > optimized someone's almost identical code on the beginners list recently. > There is also this relevant blog post of mine: > > http://ocamlnews.blogspot.com/2007/09/spectral-norm.html > > Essentially, your benchmark has rediscovered the fact that abstractions > (HOFs, polymorphism etc.) are prohibitively slow for high-performance > numerics and must be avoided. In the case of the particular OCaml implementation - yes. In general - no. I hope we agree here. > On Thursday 03 January 2008 16:28:30 Kuba Ober wrote: > > I haven't looked at assembly output yet, but I've run into some > > unexpected behavior in my benchmarks. > > Your benchmarks aren't sufficiently well defined to convey information > about anything specific, so you're highly likely to misinterpret what you > see. They are straight rewrites from C code and are used to compare how gcc and OCaml stack up. > > This was compiled by ocamlopt -inline 100 -unsafe, > > You should use Array.unsafe_get and _set rather than the command-line > option -unsafe because the latter breaks correct code. This option is not affecting the execution speed in my case and thus can be dropped. > > What I wonder is why vector-to-vector add is so much faster than > > (constant) scalar to vector add. Vectors are preinitialized each time > > with a 1.0000, 1.0001, ... sequence. > > This is also highly likely to be platform and architecture dependent. The benchmark was run on the same machine and on the same day as the C code it was rewritten from :) > > (* generic scalar operation *) > > let op1 op const nloop = > > let accum = ref start in > > for i = 1 to nloop do > > accum := op !accum const > > done > > You probably meant to return "!accum". The return value is ignored anyway. It's a benchmark, noone cares what the result is. Or is it for performance reasons?? > > (* generic vector operation *) > > let op2 op const a b (nloop : int) = > > let len = Array.length a in > > for j = 0 to len-1 do > > for i = 0 to len-1 do > > b.(i) <- op a.(i) b.(i) > > done; > > done > > > > (** addition **) > > let add1 nloop = > > let accum = ref start in > > for i = 1 to nloop do > > accum := !accum +. addconst > > done > > Again, should probably return "!accum". However, you can encourage OCaml to > unbox the float by returning "!accum + 0.0" instead. OK, I don't quite get it. Are you talking about what the function should return? If so, are you implying that the function body will be compiled differently (better?) if a different type is returned? > > let add2 = op1 ( +. ) addconst > > This should be slower than "add1". But shouldn't. The way I write the code in this case should not affect the assembly the least bit. The loop with explicit operand, functional recursion and generic function-as-an-argument approach should all generate same assembly. Obviously enough they don't :) > > let add3 a b nloop = > > let len = Array.length a in > > for j = 0 to len-1 do > > for i = 0 to len-1 do > > b.(i) <- a.(i) +. addconst > > done; > > done > > The loop over "j" can be removed. Well, the goal was to iterate Array.length^2 times :) > > let add4 = op2 ( +. ) addconst > > This will be slow because "op2" is polymorphic and "+." will not be > inlined. Yeah, I see that, but that shouldn't be the case if OCaml were to be serious in that department :) > > let add5 a b nloop = > > let len = Array.length a in > > for j = 0 to len-1 do > > for i = 0 to len-1 do > > b.(i) <- a.(i) +. b.(i) > > done; > > done > > This increments "b" by "a" many times. Replace the repeated adds with a > single multiply for each element. It's benchmark code. It's supposed to check performance of increment-vector-by-a-vector operation. I was hoping it would be obvious enough :( Cheers, Kuba