From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id PAA07662; Thu, 27 Mar 2003 15:32:57 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id PAA06837 for ; Thu, 27 Mar 2003 15:32:56 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id h2REWl516944; Thu, 27 Mar 2003 15:32:47 +0100 (MET) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id PAA07953; Thu, 27 Mar 2003 15:32:47 +0100 (MET) Date: Thu, 27 Mar 2003 15:32:47 +0100 From: Xavier Leroy To: Issac Trotts Cc: OCaml List Subject: Re: [Caml-list] DFT in OCaml vs. C Message-ID: <20030327153247.A5015@pauillac.inria.fr> References: <3E82A960.2070707@ucdavis.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <3E82A960.2070707@ucdavis.edu>; from ijtrotts@ucdavis.edu on Wed, Mar 26, 2003 at 11:33:52PM -0800 X-Spam: no; 0.00; caml-list:01 fourier:01 timings:01 achieves:01 notoriously:01 gcc:01 feats:01 ocamlopt:01 camlcvs:01 cvsweb:01 delivers:99 ocaml:01 variants:01 arithmetic:01 variant:02 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > Here's a numerical mini-benchmark comparing C to OCaml > on a simple implementation of the Discrete Fourier Transform: > [...] > So the C version was about five times as fast. > I'd be interested if anyone on this list knows of a way > to make it perform as well as the C version (without using the FFT.) It can be done, but not on a Pentium 3. Here are my timings: Pentium 4 Pentium 4 SSE2 Alpha 21264 (2 GHz) (2 GHz) (500 MHz) C 20 20 36 OCaml (your code) 113 40 52 OCaml (variant 1) 90 26 40 OCaml (variant 2) 72 38 100 Variants 1 and 2 differ on the complex multiply step: Your code: let a2=c*. !a -. s*. !b and b2=c*. !b +. s*. !a in a := a2; b := b2; Variant 1: let x = s *. !a in a := c*. !a -. s*. !b; b := c*. !b +. x Variant 2: let olda = !a and oldb = !b in a := c *. olda -. s *. oldb; b := c *. oldb +. s *. olda The "Pentium 4 SSE2" column is an experimental code generator for the Pentium 4 that uses SSE2 instructions and registers for floating-point computations. (Before you ask: no, it's not publically available, but will be the basis for the x86_64 code generator as soon as the hardware becomes available.) As you can see above, variant 1 achieves almost the performance of C on platforms that have a regular register-based FP arithmetic unit. However, the x86 floating-point stack (what OCaml uses for compatibility with Pentium 3 and earlier processors) is notoriously cranky and hard to generate efficient code for. gcc manages to exploit instruction-level parallelism between the "re" and "im" computations via amazing feats (fxch instructions, etc), but the ocamlopt x86 code generator just generates very sequential code... So, unless you have an Alpha at hand, you'd better consider FFT. There's an FFT implementation that I use as a benchmark here: http://camlcvs.inria.fr/cgi-bin/cvsweb.cgi/ocaml/test/fft.ml and it delivers about 2/3 of the performances of C, even on the Pentium. - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners