From: Brian Hurt <bhurt@spnz.org>
To: Xavier Leroy <xavier.leroy@inria.fr>
Cc: Ocaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] Object-oriented access bottleneck
Date: Mon, 8 Dec 2003 15:37:17 -0600 (CST) [thread overview]
Message-ID: <Pine.LNX.4.44.0312081514270.5009-101000@localhost.localdomain> (raw)
In-Reply-To: <20031208200228.A26466@pauillac.inria.fr>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4711 bytes --]
On Mon, 8 Dec 2003, Xavier Leroy wrote:
> > How much slower is a member function call compared to a non-member
> > function call? I was given the impression that it was signifigantly
> > slower, but you're making it sound like it isn't.
>
> In the non-member case, it depends very much whether the function
> being called is statically known or is computed at run-time. In the
> first case, a call to a static address is generated. In the latter
> case, a call to a function pointer is generated.
>
> Here are some figures I collected circa 1998 on a PowerPC 603 processor:
>
> type of call cost of call
>
> inlined function 0 cycles
> call to known function 4 cycles
> call to unknown function 11 cycles
> method invocation 18 cycles
Attached is some microbenchmarking code I just whomped together. It gives
both C and Ocaml timings on tight loops of functions calls. The results:
$ ./fcall2
time0: 1000000000 loops in 1.680000 seconds = 1.680000 nanoseconds/loop
time1: 1000000000 loops in 1.680000 seconds = 1.680000 nanoseconds/loop
time2: 1000000000 loops in 5.700000 seconds = 5.700000 nanoseconds/loop
time3: 1000000000 loops in 10.110000 seconds = 10.110000 nanoseconds/loop
$ ./fcall
time0(): 1000000000 loops in 2.570000 seconds = 2.570000 nanoseconds/loops
time1(): 1000000000 loops in 5.170000 seconds = 5.170000 nanoseconds/loops
time2(): 1000000000 loops in 5.860000 seconds = 5.860000 nanoseconds/loops
$ gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.2/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--disable-checking --with-system-zlib --enable-__cxa_atexit
--host=i386-redhat-linux
Thread model: posix
gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
$ ocamlopt -v
The Objective Caml native-code compiler, version 3.07
Standard library directory: /usr/local/lib/ocaml
$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(tm) XP 2200+
stepping : 1
cpu MHz : 1800.109
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3591.37
$
My conclusions:
- Despite my best efforts, ocaml inlined (and removed) the call to foo.
In general this is good, but it did defeat my efforts to time how long a
direct function call took. For some reason it didn't also eliminate the
for loops. If I had to guess, Ocaml's call to a known function is about
the same speed as C's.
- Ocaml loops faster than C. Go figure.
- Ocaml's call to an unknown function (passed in via argument) is a little
slower than C's call via function pointer, by maybe 2 clocks. Not
signifigant.
- Ocaml's call to a member function is slower than a call to an unknown
function by a factor of about 2.
- All of these times are small enough to be insignifigant. By comparison,
on this CPU a mispredicted branch costs about 8ns (~14 clocks). Outside
of a microbenchmark, they are unlikely to be noticable, as other factors
are likely to overwhealm the minor costs shown here (garbage collection,
cache performance, etc.).
- 4 loads at 2 clocks apeice (they hit cache) is about 4.4ns- the vast
majority of the difference between a call to an unknown function and a
call to a member function. As such, signifigant improvements are highly
unlikely. Even my design for a hash table uses three loads, and would
only be 1.1ns or so faster (at best).
- Ocaml is surprisingly fast.
- This thread devolved into pendancy several posts ago :-).
>
> I'm pretty sure that more modern processors have a bigger gap between
> the "call to known function" and "call to unknown function" cases.
Actually, no. The 603 did a little bit of speculation, but not much.
Modern CPUs speculate aggressively. The biggest problem with call via
pointer is the load to use penalty- in a direct call, the address it's
jumping to is right there. With a call through a pointer, it's got to
load the pointer before it knows where to jump. But even then, some CPUs
do data speculation.
--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian
[-- Attachment #2: Type: APPLICATION/x-gzip, Size: 1002 bytes --]
next prev parent reply other threads:[~2003-12-08 20:36 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-12-07 2:39 Nuutti Kotivuori
2003-12-07 2:59 ` Nicolas Cannasse
2003-12-07 11:22 ` Benjamin Geer
2003-12-07 14:12 ` Nicolas Cannasse
2003-12-07 18:04 ` Nuutti Kotivuori
2003-12-07 10:27 ` Jacques Garrigue
2003-12-07 19:46 ` Nuutti Kotivuori
2003-12-08 1:07 ` Jacques Garrigue
2003-12-08 15:08 ` Nuutti Kotivuori
2003-12-08 15:42 ` Richard Jones
2003-12-09 0:26 ` Nicolas Cannasse
2003-12-09 12:10 ` Nuutti Kotivuori
2003-12-09 13:17 ` Olivier Andrieu
2003-12-09 13:53 ` Nuutti Kotivuori
2003-12-08 17:51 ` Brian Hurt
2003-12-08 18:19 ` brogoff
2003-12-08 20:09 ` Brian Hurt
2003-12-08 19:02 ` Xavier Leroy
2003-12-08 21:37 ` Brian Hurt [this message]
2003-12-08 21:06 ` Nuutti Kotivuori
2003-12-08 22:30 ` malc
2003-12-07 18:23 ` Brian Hurt
2003-12-07 18:14 ` Nuutti Kotivuori
2003-12-07 19:30 ` Brian Hurt
2003-12-07 23:50 ` Abdulaziz Ghuloum
2003-12-08 17:29 ` Brian Hurt
2003-12-08 18:48 ` Nuutti Kotivuori
2003-12-08 10:17 ` Nuutti Kotivuori
2003-12-08 19:51 ` skaller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.44.0312081514270.5009-101000@localhost.localdomain \
--to=bhurt@spnz.org \
--cc=caml-list@inria.fr \
--cc=xavier.leroy@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox