From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE autolearn=disabled version=3.1.3 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id D472BBC69 for ; Thu, 3 Jan 2008 17:28:31 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ah4FAGSbfEdDWxLC/2dsb2JhbACBV5AKmFY X-IronPort-AV: E=Sophos;i="4.24,240,1196636400"; d="scan'208";a="6251412" Received: from ip67-91-18-194.z18-91-67.customer.algx.net (HELO server1.bertec.net) ([67.91.18.194]) by mail1-smtp-roc.national.inria.fr with ESMTP; 03 Jan 2008 17:28:31 +0100 Received: from kuba.bertec.net (kuba.bertec.net [192.168.2.16]) by server1.bertec.net (Postfix) with ESMTP id E94FFCDFBC for ; Thu, 3 Jan 2008 11:28:30 -0500 (EST) From: Kuba Ober To: caml-list@yquem.inria.fr Subject: Performance questions, -inline, ... Date: Thu, 3 Jan 2008 11:28:30 -0500 User-Agent: KMail/1.9.6 (enterprise 0.20071123.740460) MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801031128.30183.ober.14@osu.edu> X-Spam: no; 0.00; -inline:01 ocamlopt:01 -inline:01 -unsafe:01 mips:01 utime:01 vectors:01 inlining:01 cheers:01 mips:01 const:01 accum:01 accum:01 const:01 1.3:98 I haven't looked at assembly output yet, but I've run into some unexpected behavior in my benchmarks. This was compiled by ocamlopt -inline 100 -unsafe, the results and code are below (MIPS is obtained by dividing 50 million iterations by (Unix.times ()) . Unix.tms_utime it took to run). I haven't included the timing etc. code (it's part of a larger benchmark). What I wonder is why vector-to-vector add is so much faster than (constant) scalar to vector add. Vectors are preinitialized each time with a 1.0000, 1.0001, ... sequence. Also, the very bad performance from generic vector-to-vector *with* inlining is another puzzler, whereas generic add of scalar-to-scalar performs similarly to straight-coded one. Cheers, Kuba * add1: add scalar to scalar 120 MIPS * add3: add scalar to vector 250 MIPS * add5: add vector to vector 320 MIPS * add2: generic add scalar to scalar 100 MIPS * add4: generic add vector to vector 38 MIPS let start = 1.3 (* generic scalar operation *) let op1 op const nloop = let accum = ref start in for i = 1 to nloop do accum := op !accum const done (* generic vector operation *) let op2 op const a b (nloop : int) = let len = Array.length a in for j = 0 to len-1 do for i = 0 to len-1 do b.(i) <- op a.(i) b.(i) done; done (** addition **) let add1 nloop = let accum = ref start in for i = 1 to nloop do accum := !accum +. addconst done let add2 = op1 ( +. ) addconst let add3 a b nloop = let len = Array.length a in for j = 0 to len-1 do for i = 0 to len-1 do b.(i) <- a.(i) +. addconst done; done let add4 = op2 ( +. ) addconst let add5 a b nloop = let len = Array.length a in for j = 0 to len-1 do for i = 0 to len-1 do b.(i) <- a.(i) +. b.(i) done; done