From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 4DFD7BC75 for ; Fri, 25 Feb 2005 18:57:22 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j1PHvLW7013525 for ; Fri, 25 Feb 2005 18:57:22 +0100 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA20797 for ; Fri, 25 Feb 2005 18:57:21 +0100 (MET) Received: from yquem.inria.fr (yquem.inria.fr [128.93.8.37]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j1PHvLkF013521; Fri, 25 Feb 2005 18:57:21 +0100 Received: by yquem.inria.fr (Postfix, from userid 18180) id 3E098BC75; Fri, 25 Feb 2005 18:57:21 +0100 (CET) Date: Fri, 25 Feb 2005 18:57:21 +0100 From: Xavier Leroy To: Christophe TROESTLER Cc: "O'Caml Mailing List" Subject: Re: [Caml-list] NBody (one more question) Message-ID: <20050225175721.GB25527@yquem.inria.fr> References: <20050207.195724.87945401.Christophe.Troestler@umh.ac.be> <20050208104312.GA10035@yquem.inria.fr> <20050224.231855.40627447.debian00@tiscali.be> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050224.231855.40627447.debian00@tiscali.be> User-Agent: Mutt/1.3.28i X-Miltered: at concorde with ID 421F6701.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 421F6701.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 gcc:01 gcc:01 -wall:01 -lm:01 ocaml:01 -wall:01 -lm:01 ocaml:01 ocamlopt:01 -unsafe:01 -unsafe:01 surprising:01 ocamlopt:01 alignment:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: > When I compile the C code with -O0 (with gcc -o nbody.gcc -Wall > --fast-math nbody.c -lm), I get a time of 1.513s which is comparable > to OCaml (1.607s). But as soon as I turn on -O options (as with gcc > -o nbody.gcc -Wall -O1 --fast-math nbody.c -lm), the running time > drops down to 0.871s (0.58%). Can somebody tell me what is the > optimization that has such an effect and whether it could be applied > to OCaml ? First, make sure the Caml code is compiled with bounds checking off (ocamlopt -unsafe), otherwise the comparison isn't quite fair. But even with -unsafe, you are correct that the Caml code is significantly slower than the gcc -O1 code. This is especially surprising because the assembly code generated by ocamlopt and gcc look very similar. So, I don't think you can pinpoint the speed difference on a particular optimization. My current guess would be alignment issues: - data alignment: float arrays are 4-aligned in OCaml, 8-aligned in C, so if you're unlucky you can end up with slower unaligned accesses on every Caml float. - code alignment: it could be that OCaml doesn't perform sufficient alignment on function entry points and loop points. The proper alignments for various implementations of the x86 architecture are a mystery to me. Again, these are just wild guesses. To understand what is going on inside the chip, one would need to use performance monitoring counters. Unfortunately, I never felt motivated enough to shell out the $$$ for Intel's VTune analyzer... - Xavier Leroy