From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA04017; Sun, 7 Dec 2003 19:30:19 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA04019 for ; Sun, 7 Dec 2003 19:30:18 +0100 (MET) Received: from herd.plethora.net (herd.plethora.net [205.166.146.1]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id hB7IUH121754 for ; Sun, 7 Dec 2003 19:30:17 +0100 (MET) Received: from bhurt.plethora.net (bhurt.plethora.net [205.166.146.49]) by herd.plethora.net (8.11.6/8.10.1) with ESMTP id hB7IU7C20580; Sun, 7 Dec 2003 12:30:09 -0600 (CST) Date: Sun, 7 Dec 2003 13:30:45 -0600 (CST) From: Brian Hurt X-X-Sender: bhurt@localhost.localdomain To: Nuutti Kotivuori cc: caml-list@inria.fr Subject: Re: [Caml-list] Object-oriented access bottleneck In-Reply-To: <87n0a4cx6e.fsf@naked.iki.fi> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 bottleneck:01 inlining:01 inlining:01 benchmarked:01 slower:01 misses:01 misses:01 underlying:01 penalty:02 pointer:03 pointer:03 wrote:03 redirect:95 functions:05 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Sun, 7 Dec 2003, Nuutti Kotivuori wrote: > Well sure, that will help and is a good idea in general. But it will > never allow for inlining of the function body into the calling > function, and as such will never solve the underlying problem. I actually question the value of inlining as a performance improvement, unless it leads to other signifigant optimizations. Function calls simply aren't that expensive anymore, on today's OOO super-scalar speculative-execution CPUs. A direct call, i.e. one not through a function pointer, I benchmarked out at about 1.5 clocks on an AMD K6-3. Probably less on a more advanced CPU. Indirect calls, i.e. through a function pointer, are slower only due to the load to use penalty. If the pointer is in L1 cache, an indirect call is probably only 3-8 clocks. Cache misses are the big cost. Hitting L1 cache, the cheapest memory access, is generally 2-4 clocks. L2 cache is generally 6-30 clocks. Missing cache entirely and having to go to main memory is 100-300+ clocks. Inlining expands the code size, and thus means you're likely having more expensive cache misses. At 300 clocks/cache miss, it doesn't take all that many cache misses to totally overwhealm the small advantages gained by inlining functions. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners