* OCaml image blending performance
@ 2008-02-06 20:29 Ilmari Heikkinen
2008-02-06 22:12 ` [Caml-list] " Pal-Kristian Engstad
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Ilmari Heikkinen @ 2008-02-06 20:29 UTC (permalink / raw)
To: caml-list
Hi,
I was writing some image blending operations to get to grips with OCaml,
and wrote the same code in C as well. Asking (and receiving) advice for
optimizing the code on freenode #ocaml, I was told to post the code here
as it might be an interesting compiler test.
The C and Caml versions don't produce the same results, but should
have the same amount of computation (don't take my word for it though,
I don't know why the results differ.)
The source files are:
http://glimr.rubyforge.org/cake/blend.ml
http://glimr.rubyforge.org/cake/blend2.ml
http://glimr.rubyforge.org/cake/blend.c
Or as a tarball:
wget http://glimr.rubyforge.org/cake/blend_test.tar.gz
tar zxf blend_test.tar.gz
cd blend_test
./build.sh
cblend
real 0m1.466s
user 0m1.456s
sys 0m0.008s
blend
real 0m5.463s
user 0m5.456s
sys 0m0.012s
blend2
real 0m3.423s
user 0m3.404s
sys 0m0.012s
Use them as you wish.
--
Ilmari Heikkinen
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] OCaml image blending performance
2008-02-06 20:29 OCaml image blending performance Ilmari Heikkinen
@ 2008-02-06 22:12 ` Pal-Kristian Engstad
2008-02-06 23:30 ` Jon Harrop
2008-02-06 23:34 ` Jon Harrop
[not found] ` <854c25eb0802070607t4b2ed641xa3bc19e8384ce23@mail.gmail.com>
2 siblings, 1 reply; 10+ messages in thread
From: Pal-Kristian Engstad @ 2008-02-06 22:12 UTC (permalink / raw)
To: Ilmari Heikkinen; +Cc: caml-list
If you are looking for speed, this should be done in assembly...:
http://ompf.org/forum/viewtopic.php?f=11&t=494
PKE.
Ilmari Heikkinen wrote:
> Hi,
>
> I was writing some image blending operations to get to grips with OCaml,
> and wrote the same code in C as well. Asking (and receiving) advice for
> optimizing the code on freenode #ocaml, I was told to post the code here
> as it might be an interesting compiler test.
>
> The C and Caml versions don't produce the same results, but should
> have the same amount of computation (don't take my word for it though,
> I don't know why the results differ.)
>
> The source files are:
> http://glimr.rubyforge.org/cake/blend.ml
> http://glimr.rubyforge.org/cake/blend2.ml
> http://glimr.rubyforge.org/cake/blend.c
>
> Or as a tarball:
>
> wget http://glimr.rubyforge.org/cake/blend_test.tar.gz
> tar zxf blend_test.tar.gz
> cd blend_test
> ./build.sh
>
> cblend
>
> real 0m1.466s
> user 0m1.456s
> sys 0m0.008s
>
> blend
>
> real 0m5.463s
> user 0m5.456s
> sys 0m0.012s
>
> blend2
>
> real 0m3.423s
> user 0m3.404s
> sys 0m0.012s
>
> Use them as you wish.
>
> --
> Ilmari Heikkinen
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
--
Pål-Kristian Engstad (engstad@naughtydog.com),
Lead Graphics & Engine Programmer,
Naughty Dog, Inc., 1601 Cloverfield Blvd, 6000 North,
Santa Monica, CA 90404, USA. Ph.: (310) 633-9112.
"Most of us would do well to remember that there is a reason Carmack
is Carmack, and we are not Carmack.",
Jonathan Blow, 2/1/2006, GD Algo Mailing List
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] OCaml image blending performance
2008-02-06 20:29 OCaml image blending performance Ilmari Heikkinen
2008-02-06 22:12 ` [Caml-list] " Pal-Kristian Engstad
@ 2008-02-06 23:34 ` Jon Harrop
2008-02-07 11:01 ` Richard Jones
[not found] ` <854c25eb0802070607t4b2ed641xa3bc19e8384ce23@mail.gmail.com>
2 siblings, 1 reply; 10+ messages in thread
From: Jon Harrop @ 2008-02-06 23:34 UTC (permalink / raw)
To: caml-list
On Wednesday 06 February 2008 20:29:05 Ilmari Heikkinen wrote:
> The C and Caml versions don't produce the same results, but should
> have the same amount of computation (don't take my word for it though,
> I don't know why the results differ.)
You should post working code if you want people to optimize it. I suspect
you've incorrectly assumed that OCaml has full-width ints in this case
because it does work here on my 64-bit machine.
> ./build.sh
>
> cblend
>
> real 0m1.466s
> user 0m1.456s
> sys 0m0.008s
>
> blend
>
> real 0m5.463s
> user 0m5.456s
> sys 0m0.012s
>
> blend2
>
> real 0m3.423s
> user 0m3.404s
> sys 0m0.012s
On 2.2GHz AMD64, I get:
$ ./build.sh
cblend
real 0m6.337s
user 0m6.048s
sys 0m0.032s
blend
real 0m14.965s
user 0m14.281s
sys 0m0.096s
blend2
real 0m9.639s
user 0m9.333s
sys 0m0.056s
OCaml is not competitive here for two main reasons:
. Full-size integer arithmetic is not very fast in OCaml.
. Abstractions often cost performance in OCaml.
In this case, most of the speed loss can be regained by simply aggressively
inlining everything, which is exactly what you (ab)used C macros for in the C
code.
If you want to get more interesting then you can look at metaprogramming in
OCaml, either by generating OCaml code, using MetaOCaml or JIT compiling
using LLVM. You should be able to meet or beat C that way. At the very least,
you can write OCaml programs to take the pain away from writing C programs by
hand... :-)
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] OCaml image blending performance
2008-02-06 23:34 ` Jon Harrop
@ 2008-02-07 11:01 ` Richard Jones
2008-02-07 11:13 ` Dominique Martinet
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Richard Jones @ 2008-02-07 11:01 UTC (permalink / raw)
To: caml-list
On Wed, Feb 06, 2008 at 11:34:02PM +0000, Jon Harrop wrote:
> In this case, most of the speed loss can be regained by simply
> aggressively inlining everything, which is exactly what you (ab)used
> C macros for in the C code.
I don't understand this. In 'blend2.ml' (which I was responsible for)
C macros are used to inline all the OCaml functions the same as in the
C version. Yet it's still 70% slower than the C version.
My suspicion was that it was to do with his use of a string as a byte
array.
Rich.
--
Richard Jones
Red Hat
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] OCaml image blending performance
2008-02-07 11:01 ` Richard Jones
@ 2008-02-07 11:13 ` Dominique Martinet
2008-02-07 12:29 ` Mauricio Fernandez
2008-02-07 15:04 ` Jon Harrop
2 siblings, 0 replies; 10+ messages in thread
From: Dominique Martinet @ 2008-02-07 11:13 UTC (permalink / raw)
To: Richard Jones; +Cc: caml-list
I haven't looked at the code nor the build options, but you might want
to try the unsafe get and set functions if you did your own bundary
checks before, instead of having caml check it everytime it access
some data.
It saved us roughtly 20% of running time on a project involving ALOT
of array accesses
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] OCaml image blending performance
2008-02-07 11:01 ` Richard Jones
2008-02-07 11:13 ` Dominique Martinet
@ 2008-02-07 12:29 ` Mauricio Fernandez
2008-02-07 15:04 ` Jon Harrop
2 siblings, 0 replies; 10+ messages in thread
From: Mauricio Fernandez @ 2008-02-07 12:29 UTC (permalink / raw)
To: caml-list
On Thu, Feb 07, 2008 at 11:01:54AM +0000, Richard Jones wrote:
> On Wed, Feb 06, 2008 at 11:34:02PM +0000, Jon Harrop wrote:
> > In this case, most of the speed loss can be regained by simply
> > aggressively inlining everything, which is exactly what you (ab)used
> > C macros for in the C code.
>
> I don't understand this. In 'blend2.ml' (which I was responsible for)
> C macros are used to inline all the OCaml functions the same as in the
> C version. Yet it's still 70% slower than the C version.
>
> My suspicion was that it was to do with his use of a string as a byte
> array.
I get a 30% speedup by unrolling the BLEND loop and performing some additional
CSE (just getting rid of that extra (j_+3), actually).
I think the major culprit is 31-bit integer arithmetic: there are lots of
"orl $1, ...", "sarl $1, ...", decl and incl in the generated code.
If Bigarray.Array1 had unsafe_get and unsafe_set, operating on int32 values
could be faster (ocamlopt is often smart enough to do without boxed int32s).
Currently, bigarray bound checking is performed even with -unsafe, however.
--
Mauricio Fernandez - http://eigenclass.org
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] OCaml image blending performance
2008-02-07 11:01 ` Richard Jones
2008-02-07 11:13 ` Dominique Martinet
2008-02-07 12:29 ` Mauricio Fernandez
@ 2008-02-07 15:04 ` Jon Harrop
2 siblings, 0 replies; 10+ messages in thread
From: Jon Harrop @ 2008-02-07 15:04 UTC (permalink / raw)
To: caml-list
On Thursday 07 February 2008 11:01:54 Richard Jones wrote:
> On Wed, Feb 06, 2008 at 11:34:02PM +0000, Jon Harrop wrote:
> > In this case, most of the speed loss can be regained by simply
> > aggressively inlining everything, which is exactly what you (ab)used
> > C macros for in the C code.
>
> I don't understand this. In 'blend2.ml' (which I was responsible for)
> C macros are used to inline all the OCaml functions the same as in the
> C version. Yet it's still 70% slower than the C version.
>
> My suspicion was that it was to do with his use of a string as a byte
> array.
I suspect OCaml's 31-bit integers are responsible for the remaining slowdown.
You might try using 32-bit integers (Int32) but I'm not sure ocamlopt will
unbox them as aggressively as it does floats (I've never tried it).
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <854c25eb0802070607t4b2ed641xa3bc19e8384ce23@mail.gmail.com>]
* Re: [Caml-list] OCaml image blending performance
[not found] ` <854c25eb0802070607t4b2ed641xa3bc19e8384ce23@mail.gmail.com>
@ 2008-02-07 14:10 ` Ilmari Heikkinen
2008-02-07 14:08 ` Jon Harrop
0 siblings, 1 reply; 10+ messages in thread
From: Ilmari Heikkinen @ 2008-02-07 14:10 UTC (permalink / raw)
To: caml-list
On Feb 7, 2008 1:34 AM, Jon Harrop <jon@ffconsultancy.com> wrote:
> On Wednesday 06 February 2008 20:29:05 Ilmari Heikkinen wrote:
> > The C and Caml versions don't produce the same results, but should
> > have the same amount of computation (don't take my word for it though,
> > I don't know why the results differ.)
>
> You should post working code if you want people to optimize it. I suspect
> you've incorrectly assumed that OCaml has full-width ints in this case
> because it does work here on my 64-bit machine.
Actually, it does work here as well, I probably tested earlier with extra
blend modes enabled in C. Sorry about the confusion, should've done a
more thorough test before posting.
> OCaml is not competitive here for two main reasons:
>
> . Full-size integer arithmetic is not very fast in OCaml.
> . Abstractions often cost performance in OCaml.
>
> In this case, most of the speed loss can be regained by simply aggressively
> inlining everything, which is exactly what you (ab)used C macros for in the C
> code.
The C macros were more about making the code shorter by emulating HOFs,
blend2.ml does inline everything. Coming within 2x of C with no major hackery
on this sort of code is impressive though.
--
Ilmari Heikkinen
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] OCaml image blending performance
2008-02-07 14:10 ` Ilmari Heikkinen
@ 2008-02-07 14:08 ` Jon Harrop
0 siblings, 0 replies; 10+ messages in thread
From: Jon Harrop @ 2008-02-07 14:08 UTC (permalink / raw)
To: caml-list
On Thursday 07 February 2008 14:10:22 Ilmari Heikkinen wrote:
> On Feb 7, 2008 1:34 AM, Jon Harrop <jon@ffconsultancy.com> wrote:
> > In this case, most of the speed loss can be regained by simply
> > aggressively inlining everything, which is exactly what you (ab)used C
> > macros for in the C code.
>
> The C macros were more about making the code shorter by emulating HOFs,
> blend2.ml does inline everything. Coming within 2x of C with no major
> hackery on this sort of code is impressive though.
Getting within 2x of C is often regarded as disappointing in the OCaml
community. :-)
I've played with some similar int-intensive algorithms and never managed to
get close to C. OCaml's float performance is where it really shines.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-02-07 15:08 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-06 20:29 OCaml image blending performance Ilmari Heikkinen
2008-02-06 22:12 ` [Caml-list] " Pal-Kristian Engstad
2008-02-06 23:30 ` Jon Harrop
2008-02-06 23:34 ` Jon Harrop
2008-02-07 11:01 ` Richard Jones
2008-02-07 11:13 ` Dominique Martinet
2008-02-07 12:29 ` Mauricio Fernandez
2008-02-07 15:04 ` Jon Harrop
[not found] ` <854c25eb0802070607t4b2ed641xa3bc19e8384ce23@mail.gmail.com>
2008-02-07 14:10 ` Ilmari Heikkinen
2008-02-07 14:08 ` Jon Harrop
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox