From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA04017; Sun, 7 Dec 2003 19:30:19 +0100 (MET)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA04019 for <caml-list@pauillac.inria.fr>; Sun, 7 Dec 2003 19:30:18 +0100 (MET)
Received: from herd.plethora.net (herd.plethora.net [205.166.146.1])
	by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id hB7IUH121754
	for <caml-list@inria.fr>; Sun, 7 Dec 2003 19:30:17 +0100 (MET)
Received: from bhurt.plethora.net (bhurt.plethora.net [205.166.146.49])
	by herd.plethora.net (8.11.6/8.10.1) with ESMTP id hB7IU7C20580;
	Sun, 7 Dec 2003 12:30:09 -0600 (CST)
Date: Sun, 7 Dec 2003 13:30:45 -0600 (CST)
From: Brian Hurt <bhurt@spnz.org>
X-X-Sender: bhurt@localhost.localdomain
To: Nuutti Kotivuori <naked+caml@naked.iki.fi>
cc: caml-list@inria.fr
Subject: Re: [Caml-list] Object-oriented access bottleneck
In-Reply-To: <87n0a4cx6e.fsf@naked.iki.fi>
Message-ID: <Pine.LNX.4.44.0312071318110.5009-100000@localhost.localdomain>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Loop: caml-list@inria.fr
X-Spam: no; 0.00; caml-list:01 bottleneck:01 inlining:01 inlining:01 benchmarked:01 slower:01 misses:01 misses:01 underlying:01 penalty:02 pointer:03 pointer:03 wrote:03 redirect:95 functions:05 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

On Sun, 7 Dec 2003, Nuutti Kotivuori wrote:

> Well sure, that will help and is a good idea in general. But it will
> never allow for inlining of the function body into the calling
> function, and as such will never solve the underlying problem.

I actually question the value of inlining as a performance improvement, 
unless it leads to other signifigant optimizations.  Function calls simply 
aren't that expensive anymore, on today's OOO super-scalar 
speculative-execution CPUs.  A direct call, i.e. one not through a 
function pointer, I benchmarked out at about 1.5 clocks on an AMD K6-3.  
Probably less on a more advanced CPU.  Indirect calls, i.e. through a 
function pointer, are slower only due to the load to use penalty.  If the 
pointer is in L1 cache, an indirect call is probably only 3-8 clocks.

Cache misses are the big cost.  Hitting L1 cache, the cheapest memory 
access, is generally 2-4 clocks.  L2 cache is generally 6-30 clocks.  
Missing cache entirely and having to go to main memory is 100-300+ clocks.  
Inlining expands the code size, and thus means you're likely having more 
expensive cache misses.  At 300 clocks/cache miss, it doesn't take all 
that many cache misses to totally overwhealm the small advantages gained 
by inlining functions.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners