* amd64 code gen
@ 2007-02-26 2:22 skaller
0 siblings, 0 replies; only message in thread
From: skaller @ 2007-02-26 2:22 UTC (permalink / raw)
To: caml-list; +Cc: felix-impl
Has anything changed in ocaml amd64 code generator?
I'm using this:
Objective Caml version 3.09.3+rc1
gcc (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
GNAT 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
Felix 1.1.3_rc1
gcc uses: gcc -O3 -fomit-frame-pointer -funroll-loops
Felix uses: -O3 -fomit-frame-pointer --inline
Ocamlopt isn't given any special flags.
My latest performance test shows:
Rankings for ack on rosella
felix 15 162.46 [N= 1 SD= 0%]
felix 14 31.25 [N= 4 SD= 0%]
gccopt 14 45.42 [N= 7 SD= 0%]
ocamlopt 14 74.03 [N= 6 SD= 0%]
felix 13 5.31 [N= 9 SD= 0%]
gnat 13 9.68 [N= 11 SD= 0%]
gccopt 13 9.78 [N= 10 SD= 0%]
ocamlopt 13 13.73 [N= 11 SD= 0%]
felix 12 1.22 [N= 21 SD= 0%]
gccopt 12 2.37 [N= 22 SD= 0%]
The second column is the y in ack(x,y), the third
the time in seconds on my box, N is the number of samples,
and SD (is supposed to be) the standard deviation.
Felix always outperformed other languages on this test,
possibly due to some lucky combination. A quick check
that something in my test harness isn't going astray:
skaller@rosella:/work/felix/svn/felix/felix/trunk$ time
speed/exes/felix/ack 13
Ack(3,13): 65533
real 0m5.256s
user 0m5.244s
sys 0m0.012s
skaller@rosella:/work/felix/svn/felix/felix/trunk$ time
speed/exes/ocamlopt/ack 13
Ack(3,13): 65533
real 0m13.664s
user 0m13.625s
sys 0m0.040s
Felix is more that twice as fast as Ocamlopt, and trashes gcc
as well. The thing is, I haven't done any significant work on the Felix
optimiser.
But the last time I checked, whilst Felix did beat both gcc
and Ocaml, it was only by a reasonable margin: 2-10% or something.
I may have pointed out the graphs (but those old graphs are gone
with the latest upload).
The actual C++ code generated by Felix is this
(with comments elided for clarity).
int FLX_REGPARM ack(FLX_APAR_DECL int x, int y){
_us2 ack_mv_986;
_us2 ack_mv_996;
start_5803:;
ack_mv_986 = x == 0 ;
ifnot(ack_mv_986==1) goto _5798;
return y + 1 ;
_5798:;
ack_mv_996 = y == 0 ;
ifnot(ack_mv_996==1) goto _5799;
y = 1;
x = x - 1 ;
goto start_5803;
_5799:;
y = ack(FLX_FPAR_PASS x, y - 1 );
x = x - 1 ;
goto start_5803;
}
C is using:
int Ack(int M, int N) {
if (M==0) return N +1;
else if(N==0) return Ack(M-1,1);
else return Ack(M-1, Ack(M,N-1));
}
and Ocaml is using:
let rec ack m n = match m,n with
| 0,n -> n + 1
| m,0 -> ack (m - 1) 1
| m,n -> ack (m - 1) (ack m (n - 1))
;;
What REALLY worries me is that Felix has this:
int FLX_REGPARM ack(FLX_APAR_DECL int x, int y){
Well that FLX_APAR_DECL is passing a pointer to the
global object .. which is not in fact used .. that's
a BUG in the optimiser (it did not previously do this!)
which WOULD significantly slow the Felix code down
if it were actually passed -- it actually looks like gcc
is optimising this argument away.
Performance on this test, given reasonable optimisations
such as self-tail rec calls (and in Felix case proper
inlining of typeclass virtual functions .. which I mention
because for the Takfp test it is NOT being done ..)
is dependent entirely on the number of words of machine
stack used to implement the non-tail function call.
The minimal number of words is 3 given a dumb optimiser.
(It can actually be reduced to 1 in this case through a couple
of clever optimisations).
Loop unrolling by 3 reduces the number of return addresses to 1/3
per call giving 7/9 performance improvement.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2007-02-26 2:22 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-26 2:22 amd64 code gen skaller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox