TWIMC, 

I've played a little bit with different optimization options in flambda 4.04, and finally, all three versions of the loop: curried, uncurried, and the for-loop, have the same performance, though they still loose about 30% to the C version, due to tagging. 

Basically, this means, that flambda was able to get rid of the allocation. I don't actually know which of the options finally made the difference, but this is how I compiled it.

ocamlopt.opt -c -S -inlining-report -unbox-closures -O3 -rounds 8 -inline-max-depth 256 -inline-max-unroll 1024 -o loop.cmx loop.ml
ocamlopt.opt loop.cmx -o loop.native

Regards,
Ivan




On Tue, Jul 11, 2017 at 8:54 AM, Simon Cruanes <simon.cruanes.2007@m4x.org> wrote:
Hello,

Iterators in OCaml have been the topic of many discussions. Another
option for fast iterators is https://github.com/c-cube/sequence ,
which (with flambda) should compile down to loops and tests on this kind
of benchmark. With the attached additional file on 4.04.0+flambda,
I obtain the following (where sequence is test-seq):

$ for i in test-* ; do echo $i ; time ./$i ; done
test-c_loop
5000000100000000
./$i  0.08s user 0.00s system 97% cpu 0.085 total
test-f_loop
5000000100000000
./$i  0.10s user 0.00s system 96% cpu 0.100 total
test-loop
5000000100000000
./$i  0.18s user 0.00s system 97% cpu 0.184 total
test-seq
5000000100000000
./$i  0.11s user 0.00s system 97% cpu 0.113 total
test-stream
5000000100000000
./$i  0.44s user 0.00s system 98% cpu 0.449 total


Note that sequence is imperative underneath, but can be safely used as a
functional structure.

--
Simon Cruanes

http://weusepgp.info/
key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3  7D8D 4AC0 1D08 49AA 62B6