* [Caml-list] Interesting optimization
@ 2004-07-22 21:07 Daniel Bünzli
2004-07-22 21:40 ` John Prevost
0 siblings, 1 reply; 4+ messages in thread
From: Daniel Bünzli @ 2004-07-22 21:07 UTC (permalink / raw)
To: caml-list
Hello,
I usually try not to be too much obsessed with speed, but I had the
following interesting experience. While rearanging some checksum code I
thought that I had rewritten it in a more efficient way. However I
turned out not to be the case.
I can boil the example to the following (n.b. loops don't compute
anything usefull). Basically I rewrote update into update'.
--- test.ml ---
let update c =
let c' = ref !c in
for n = 0 to max_int do
c' := !c' land 0xff
done;
c := !c'
let update' c =
for n = 0 to max_int do
c := !c land 0xff;
done
let compute use_ref =
let x = ref 2 in
if use_ref then update' x else update x;
print_int !x
let main () =
let use_ref = ref false in
let args = [("-ref", (Arg.Set use_ref), "use reference directly")] in
Arg.parse args (fun _ -> ()) "";
compute !use_ref
let () = main ()
---------------
> ocamlopt -o test.opt test.ml
> time ./test.opt
2
real 0m3.500s
user 0m3.230s
sys 0m0.010s
> time ./test.opt -ref
2
real 0m7.599s
user 0m7.550s
sys 0m0.030s
The few that I can read of ppc assembly tells me that in update the
value of c' is directly stored in a register whearas update' accesses
memory on each iteration.
Note that this is not restricted to int's, it occured to me with an
int32. I guess it should work with anything that gets into a register.
Cheers,
Daniel
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Interesting optimization
2004-07-22 21:07 [Caml-list] Interesting optimization Daniel Bünzli
@ 2004-07-22 21:40 ` John Prevost
2004-07-23 0:10 ` Brian Hurt
0 siblings, 1 reply; 4+ messages in thread
From: John Prevost @ 2004-07-22 21:40 UTC (permalink / raw)
To: Ocaml Mailing List
On Thu, 22 Jul 2004 23:07:03 +0200, Daniel Bünzli
<daniel.buenzli@epfl.ch> wrote:
> I usually try not to be too much obsessed with speed, but I had the
> following interesting experience. While rearanging some checksum code I
> thought that I had rewritten it in a more efficient way. However I
> turned out not to be the case.
>
> I can boil the example to the following (n.b. loops don't compute
> anything usefull). Basically I rewrote update into update'.
Here's a slight addition to your comparison, which may or may not be
of interest to you. I added this function:
let update'' c =
let rec loop c' a =
if a >= 0
then loop (c' land 0xff) (succ a)
else c'
in
c := (loop !c 0)
Basically, the same thing as your update, only using a tail-recursive
loop with an argument accumulator instead of using a for loop with a
reference accumulator. Note that I had to dodge a little on the loop
end condition, since a will never be > max_int, which would be the
normal test for the end of the loop.
While I'll pretty much always prefer the tail-call version (which
avoids references altogether), the fact that the reference has so
little impact when used in the way you described is very nice to know.
using update:
real 0m9.707s
user 0m9.710s
sys 0m0.010s
using update':
real 0m16.172s
user 0m16.140s
sys 0m0.030s
using update'':
real 0m8.321s
user 0m8.320s
sys 0m0.000s
John.
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Interesting optimization
2004-07-22 21:40 ` John Prevost
@ 2004-07-23 0:10 ` Brian Hurt
2004-07-23 1:22 ` John Prevost
0 siblings, 1 reply; 4+ messages in thread
From: Brian Hurt @ 2004-07-23 0:10 UTC (permalink / raw)
To: John Prevost; +Cc: Ocaml Mailing List
On Thu, 22 Jul 2004, John Prevost wrote:
> let update'' c =
> let rec loop c' a =
> if a >= 0
> then loop (c' land 0xff) (succ a)
> else c'
> in
> c := (loop !c 0)
let update''' c =
let rec loop c i =
let c = c land 0xff in
if (i < max_int) then
loop c (i+1)
else
c
in
loop c 0
;;
Haven't tried timing it. But the core code on the x86 becomes:
Temp__loop_59:
.L101:
andl $511, %eax
movl $2147483647, %ecx
cmpl %ecx, %ebx
jge .L100
addl $2, %ebx
jmp .L101
.align 16
I'm slightly disappointed that Ocaml didn't just
cmpl $2147483647, %ebx
and not use ecx at all. Not that this is that big a problem. The only
other possible optimizations would be to recognize that the entire loop is
pointless, and the function can be replaced by:
let update c = c land 0xff;;
I feel comfortable leaving that particular optimization in the hands of
the programmer, however.
--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Interesting optimization
2004-07-23 0:10 ` Brian Hurt
@ 2004-07-23 1:22 ` John Prevost
0 siblings, 0 replies; 4+ messages in thread
From: John Prevost @ 2004-07-23 1:22 UTC (permalink / raw)
To: Ocaml Mailing List
update''' is a bit slower than update'' is on my system--midway
between the two reference versions:
using update''':
real 0m12.940s
user 0m12.930s
sys 0m0.010s
Just for reference, here's the generated intel assembly for the loop
part of each version:
update:
.L106:
andl $511, %esi
movl %ecx, %edx
addl $2, %ecx
movl $2147483647, %ebx (1)
cmpl %ebx, %edx
jne .L106
update':
.L109:
movl (%eax), %ebx
andl $511, %ebx
movl %ebx, (%eax)
movl %ecx, %edx
addl $2, %ecx
movl $2147483647, %ebx
cmpl %ebx, %edx
jne .L109
update'':
camlTest__loop_72:
.L101:
cmpl $1, %ebx
jl .L100
addl $2, %ebx
andl $511, %eax
jmp .L101
update''':
camlTest__loop_77:
.L103:
andl $511, %eax
movl $2147483647, %ecx
cmpl %ecx, %ebx
jge .L102
addl $2, %ebx
jmp .L103
Looking at the above, and comparing to two more versions:
let update'4 c =
let rec loop c' a =
if a = max_int
then c'
else loop (c' land 0xff) (succ a)
in
c := (loop !c 0)
camlTest__loop_83:
.L105:
movl $2147483647, %ecx
cmpl %ecx, %ebx
jne .L104
ret
.align 16
.L104:
addl $2, %ebx
andl $511, %eax
jmp .L105
let update'5 c =
let c' = ref !c in
for n = max_int downto 0 do
c' := !c' land 0xff
done;
c := !c'
.L108:
andl $511, %edx
movl %ebx, %ecx (3)
subl $2, %ebx
cmpl $1, %ecx (2)
jne .L108
which has about the same performance as loop''', I feel justified in
saying the following:
The main interest here is what Daniel Bünzli originally noted: if a
reference is defined and then used in the local scope of a function,
you avoid ever actually working with the reference itself, which
removes that level of overhead.
Everything else is really all about squeezing the last ounce of
performance out, by as little as a single instruction. There's really
not much profit in this, unless the loop in question is very important
and is known to be a bottleneck. In fact, don't read anything more I
say unless you're really interested in that--unless you plan to look
at your own assembly code by hand, it's better to trust the compiler
than to second-guess it.
Specific notes from the above and from a couple of other exploratory functions:
a) if ... then ... else ... expressions bounding a tail loop should
have the result case in the "else" clause. Putting the result case in
the "then" clause results in a little skip. In essence, the compiler
assumes that the true case is the expected case.
b) match expressions seem to be man-handled more by the compiler. You
can still predict what's going to happen, however. As a general rule,
pattern matching works well when at least one case has an actual value
to match against. If every pattern is a don't care and some have
guards, expect it to behave somewhat like the "if" bit above: the
first case is assumed to be the expected case. When it matters and
you're not sure, look at the assembly.
c) as a special warning:
match e with true -> e1 | false -> e2
is much less efficient than
if e then e1 else e2
In short, if you know which branch is more expected, put it in the
"then" clause. And trust that pattern matching is pretty smart unless
you do something really silly (like case c, or like using only guarded
don't-care patterns.)
John.
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-07-23 1:23 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-07-22 21:07 [Caml-list] Interesting optimization Daniel Bünzli
2004-07-22 21:40 ` John Prevost
2004-07-23 0:10 ` Brian Hurt
2004-07-23 1:22 ` John Prevost
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox