From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id IAA14461; Sat, 14 Jun 2003 08:11:46 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id IAA14652 for ; Sat, 14 Jun 2003 08:11:44 +0200 (MET DST) Received: from mail.exomi.com (mail.exomi.com [217.169.64.72]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id h5E6BhT17746 for ; Sat, 14 Jun 2003 08:11:43 +0200 (MET DST) Received: from exomi.com (unknown [10.0.5.5]) by mail.exomi.com (Postfix) with ESMTP id ABF5A5D03; Sat, 14 Jun 2003 09:11:42 +0300 (EEST) Date: Sat, 14 Jun 2003 09:11:41 +0300 Subject: Re: [Caml-list] FP's and HyperThreading Processors Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v552) Cc: To: "David McClain" From: Ville-Pertti Keinonen In-Reply-To: <00ad01c331f2$1bbacee0$0201a8c0@dylan> Message-Id: <10E87304-9E2F-11D7-9895-000393863F70@exomi.com> Content-Transfer-Encoding: 7bit X-Mailer: Apple Mail (2.552) X-Spam: no; 0.00; caml-list:01 wrappers:01 alignment:01 allocations:01 386,:01 suboptimal:01 aligned:01 arrays:01 linked:01 -bit:01 ocaml:01 align:98 routines:02 uniform:02 float:02 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > P4 linked into the program. On the old P2 system, this code spend > roughly > 60% of its time inside that vendor code for FFT's. This is no longer > true, > as other tests show very high performance of just those vendor > routines. Does calling from OCaml affect the performance of the FFTs? You presumably use your own C wrappers to access the vendor code from your OCaml code - do you align the stack? For floating point intensive code, lack of alignment can cause significant differences in performance. I remember seeing 40% differences (worst case) in performance when I ran into this several years ago, and I don't think the situation has improved with more recent cpus. OCaml currently doesn't ensure more than word alignment for stack or allocations on i386, so you are going to have suboptimal performance for any floating point code written in OCaml (as well as float arrays allocated in OCaml but manipulated elsewhere). However, if you're calling external code (presumably compiled in an alignment-preserving way), you should at least try ensuring that the stack is aligned to a 64-bit boundary (or 128-bit boundary if SSE is used for anything) in case that affects performance. If you're actually making heavy use of hyperthreading, it may make memory access patterns less uniform and performance highly unpredictable. Have you tried comparing performance with and without hyperthreading enabled? Anyway, it sounds like you might be running into several issues. The only certain way to find out which are the most significant ones is to do large amounts of profiling and testing. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners