* [Caml-list] Interesting optimization @ 2004-07-22 21:07 Daniel Bünzli 2004-07-22 21:40 ` John Prevost 0 siblings, 1 reply; 4+ messages in thread From: Daniel Bünzli @ 2004-07-22 21:07 UTC (permalink / raw) To: caml-list Hello, I usually try not to be too much obsessed with speed, but I had the following interesting experience. While rearanging some checksum code I thought that I had rewritten it in a more efficient way. However I turned out not to be the case. I can boil the example to the following (n.b. loops don't compute anything usefull). Basically I rewrote update into update'. --- test.ml --- let update c = let c' = ref !c in for n = 0 to max_int do c' := !c' land 0xff done; c := !c' let update' c = for n = 0 to max_int do c := !c land 0xff; done let compute use_ref = let x = ref 2 in if use_ref then update' x else update x; print_int !x let main () = let use_ref = ref false in let args = [("-ref", (Arg.Set use_ref), "use reference directly")] in Arg.parse args (fun _ -> ()) ""; compute !use_ref let () = main () --------------- > ocamlopt -o test.opt test.ml > time ./test.opt 2 real 0m3.500s user 0m3.230s sys 0m0.010s > time ./test.opt -ref 2 real 0m7.599s user 0m7.550s sys 0m0.030s The few that I can read of ppc assembly tells me that in update the value of c' is directly stored in a register whearas update' accesses memory on each iteration. Note that this is not restricted to int's, it occured to me with an int32. I guess it should work with anything that gets into a register. Cheers, Daniel ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Interesting optimization 2004-07-22 21:07 [Caml-list] Interesting optimization Daniel Bünzli @ 2004-07-22 21:40 ` John Prevost 2004-07-23 0:10 ` Brian Hurt 0 siblings, 1 reply; 4+ messages in thread From: John Prevost @ 2004-07-22 21:40 UTC (permalink / raw) To: Ocaml Mailing List On Thu, 22 Jul 2004 23:07:03 +0200, Daniel Bünzli <daniel.buenzli@epfl.ch> wrote: > I usually try not to be too much obsessed with speed, but I had the > following interesting experience. While rearanging some checksum code I > thought that I had rewritten it in a more efficient way. However I > turned out not to be the case. > > I can boil the example to the following (n.b. loops don't compute > anything usefull). Basically I rewrote update into update'. Here's a slight addition to your comparison, which may or may not be of interest to you. I added this function: let update'' c = let rec loop c' a = if a >= 0 then loop (c' land 0xff) (succ a) else c' in c := (loop !c 0) Basically, the same thing as your update, only using a tail-recursive loop with an argument accumulator instead of using a for loop with a reference accumulator. Note that I had to dodge a little on the loop end condition, since a will never be > max_int, which would be the normal test for the end of the loop. While I'll pretty much always prefer the tail-call version (which avoids references altogether), the fact that the reference has so little impact when used in the way you described is very nice to know. using update: real 0m9.707s user 0m9.710s sys 0m0.010s using update': real 0m16.172s user 0m16.140s sys 0m0.030s using update'': real 0m8.321s user 0m8.320s sys 0m0.000s John. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Interesting optimization 2004-07-22 21:40 ` John Prevost @ 2004-07-23 0:10 ` Brian Hurt 2004-07-23 1:22 ` John Prevost 0 siblings, 1 reply; 4+ messages in thread From: Brian Hurt @ 2004-07-23 0:10 UTC (permalink / raw) To: John Prevost; +Cc: Ocaml Mailing List On Thu, 22 Jul 2004, John Prevost wrote: > let update'' c = > let rec loop c' a = > if a >= 0 > then loop (c' land 0xff) (succ a) > else c' > in > c := (loop !c 0) let update''' c = let rec loop c i = let c = c land 0xff in if (i < max_int) then loop c (i+1) else c in loop c 0 ;; Haven't tried timing it. But the core code on the x86 becomes: Temp__loop_59: .L101: andl $511, %eax movl $2147483647, %ecx cmpl %ecx, %ebx jge .L100 addl $2, %ebx jmp .L101 .align 16 I'm slightly disappointed that Ocaml didn't just cmpl $2147483647, %ebx and not use ecx at all. Not that this is that big a problem. The only other possible optimizations would be to recognize that the entire loop is pointless, and the function can be replaced by: let update c = c land 0xff;; I feel comfortable leaving that particular optimization in the hands of the programmer, however. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Interesting optimization 2004-07-23 0:10 ` Brian Hurt @ 2004-07-23 1:22 ` John Prevost 0 siblings, 0 replies; 4+ messages in thread From: John Prevost @ 2004-07-23 1:22 UTC (permalink / raw) To: Ocaml Mailing List update''' is a bit slower than update'' is on my system--midway between the two reference versions: using update''': real 0m12.940s user 0m12.930s sys 0m0.010s Just for reference, here's the generated intel assembly for the loop part of each version: update: .L106: andl $511, %esi movl %ecx, %edx addl $2, %ecx movl $2147483647, %ebx (1) cmpl %ebx, %edx jne .L106 update': .L109: movl (%eax), %ebx andl $511, %ebx movl %ebx, (%eax) movl %ecx, %edx addl $2, %ecx movl $2147483647, %ebx cmpl %ebx, %edx jne .L109 update'': camlTest__loop_72: .L101: cmpl $1, %ebx jl .L100 addl $2, %ebx andl $511, %eax jmp .L101 update''': camlTest__loop_77: .L103: andl $511, %eax movl $2147483647, %ecx cmpl %ecx, %ebx jge .L102 addl $2, %ebx jmp .L103 Looking at the above, and comparing to two more versions: let update'4 c = let rec loop c' a = if a = max_int then c' else loop (c' land 0xff) (succ a) in c := (loop !c 0) camlTest__loop_83: .L105: movl $2147483647, %ecx cmpl %ecx, %ebx jne .L104 ret .align 16 .L104: addl $2, %ebx andl $511, %eax jmp .L105 let update'5 c = let c' = ref !c in for n = max_int downto 0 do c' := !c' land 0xff done; c := !c' .L108: andl $511, %edx movl %ebx, %ecx (3) subl $2, %ebx cmpl $1, %ecx (2) jne .L108 which has about the same performance as loop''', I feel justified in saying the following: The main interest here is what Daniel Bünzli originally noted: if a reference is defined and then used in the local scope of a function, you avoid ever actually working with the reference itself, which removes that level of overhead. Everything else is really all about squeezing the last ounce of performance out, by as little as a single instruction. There's really not much profit in this, unless the loop in question is very important and is known to be a bottleneck. In fact, don't read anything more I say unless you're really interested in that--unless you plan to look at your own assembly code by hand, it's better to trust the compiler than to second-guess it. Specific notes from the above and from a couple of other exploratory functions: a) if ... then ... else ... expressions bounding a tail loop should have the result case in the "else" clause. Putting the result case in the "then" clause results in a little skip. In essence, the compiler assumes that the true case is the expected case. b) match expressions seem to be man-handled more by the compiler. You can still predict what's going to happen, however. As a general rule, pattern matching works well when at least one case has an actual value to match against. If every pattern is a don't care and some have guards, expect it to behave somewhat like the "if" bit above: the first case is assumed to be the expected case. When it matters and you're not sure, look at the assembly. c) as a special warning: match e with true -> e1 | false -> e2 is much less efficient than if e then e1 else e2 In short, if you know which branch is more expected, put it in the "then" clause. And trust that pattern matching is pretty smart unless you do something really silly (like case c, or like using only guarded don't-care patterns.) John. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-07-23 1:23 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-07-22 21:07 [Caml-list] Interesting optimization Daniel Bünzli 2004-07-22 21:40 ` John Prevost 2004-07-23 0:10 ` Brian Hurt 2004-07-23 1:22 ` John Prevost
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox