Re: [Caml-list] Optimizing pure-functional streams

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: Alexey Egorov <alex.only.d@gmail.com>
To: Ivan Gotovchits <ivg@ieee.org>
Cc: caml users <caml-list@inria.fr>
Subject: Re: [Caml-list] Optimizing pure-functional streams
Date: Tue, 11 Jul 2017 03:28:16 +0500	[thread overview]
Message-ID: <CAJannG7JW6kfEP+myW49Sc39Kng-Np-ex0ChO-5KuM0E-Acc2Q@mail.gmail.com> (raw)
In-Reply-To: <CALdWJ+wFOTRDtLxAVwqbox-6YHL8OK0t5atBKnEmK9pHHBDMig@mail.gmail.com>

Hi Ivan,

thanks for your analysis!

Which version of OCaml you are using? I'm tried on 4.04.1 and trunk
(with and without flambda) and still getting the same (poor)
performance for recursive version, no matter if curried or uncurried
loop is used.

2017-07-11 2:29 GMT+05:00 Ivan Gotovchits <ivg@ieee.org>:
> Hi Alexey,
>
> The recursion version is slower because you're allocating a tuple on each
> recursive call. Try to use a curried form of a function, and you will a
> performance that is even better than the for-loop (10% better in my case).
> E.g.,
>
>
>
> let loop high =
>   let rec loop i = function
>     | t when i > high -> t
>     | t when i mod 2 = 0 -> loop (i + 1) (t + i * 2)
>     | t -> loop (i + 1) t in
>   loop 1 0
>
>
> The for-loop version is indeed slower than the C version because of tagging,
> c.f., the OCaml implementation:
>
> movq $1, %rbx
> movq $3, %rdi
> cmpq %rax, %rdi
> jg .L100
>
>
> .L101:
> movq %rdi, %rsi
> sarq $1, %rsi
> movq %rsi, %rdx
> shr movq $1, %rbx
> movq $3, %rdi
> cmpq %rax, %rdi
> jg .L100
>
> .L101:
> movq %rdi, %rsi
> sarq $1, %rsi
> moq $63, %rdx
> movq %rsi, %rcx
> addq %rdx, %rcx
> andq $-2, %rcx
> subq %rcx, %rsi
> leaq 1(%rsi,%rsi), %rsi
> cmpq $1, %rsi
> jne .L102
> leaq -2(%rbx,%rdi,2), %rbx
> .L102:
> movq %rdi, %rsi
> addq $2, %rdi
> cmpq %rax, %rsi
> jne .L101
> .L100:
> movq %rbx, %rax
> ret
>
>
> With the GCC output:
>
>
> testl %edi, %edi
> jle .L6
> addl $1, %edi
> movl $4, %ecx
> movl $1, %edx
> xorl %eax, %eax
> jmp .L3
> .L5:
> leaq (%rax,%rcx), %rsi
> testb $1, %dl
> cmove %rsi, %rax
> addq $2, %rcx
> .L3:
> addl $1, %edx
> cmpl %edi, %edx
> jne .L5
> rep ret
> .L6:
> xorl %eax, %eax
> ret
>
>
> Notice this `sar` and `shr` instructions, they are here due to the tagging.
> The GCC compiler was also able to leverage the conditional move instruction
> `cmove`, though it looks nice I'm not sure how much it actually won. There
> is no need to use flambda for the recursive and for-loop as they are already
> optimized to the maximum.
>
> Concerning your stream implementation, you may compare it to the Core
> Kernel's Sequence. It is basically the same, so you can play the
> benchmarking game :)
>
> The general way to implement a stream with a zero performance penalty (i.e.,
> as fast as a for loop) is to use staging. This topic is explored in detail
> in the Oleg Kiselyov et alas Strymonad [1] paper. The implementation, as
> well as other interesting discussions about streams, are available on his
> homepage [2] (there are tons of very interesting stuff there!).
>
> Best,
> Ivan
>
> [1]: http://okmij.org/ftp/meta-programming/strymonas.pdf
> [2]: http://okmij.org/ftp/Streams.html
>
>
>
>
>
> On Mon, Jul 10, 2017 at 4:26 PM, Alexey Egorov <alex.only.d@gmail.com>
> wrote:
>>
>> Hello,
>>
>> Michael Snoyman recently posted very interesting article about Haskell
>> streams and Rust iterators (
>> https://www.fpcomplete.com/blog/2017/07/iterators-streams-rust-haskell
>> ) and I tried to do the same in OCaml.
>>
>> You can find my code here -
>> https://gist.github.com/anonymous/127e9116b362d561c5dfa9cce6b453f3
>>
>> There is four different versions:
>>
>> 1) c_loop.c - plain C loop
>> 2) f_loop.ml - for-loop in OCaml
>> 3) loop.ml - OCaml loop using recursion
>> 4) stream.ml - OCaml loop using streams
>>
>> Here is the results:
>>
>> c_loop.c:   0m0.105s
>> f_loop.ml:  0m0.184s
>> loop.ml:     0m0.207s
>> stream.ml: 0m0.605s
>>
>> It's not suprise that C version is fastest - at least, it uses untagged
>> ints.
>> What surprises me is that recursion is slower than for-loop - can't
>> OCaml compile it down to the same imperative loop?
>>
>> And it's very dissapointing that stream version is 3 times slower than
>> recursion-based.
>> The problem, I believe, is that OCaml is unable to optimize
>> intermediate stream constructors away (as GHC do).
>>
>> I tried to add inline annotations, which helps a little - is there
>> something I can do to improve it further?
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

next prev parent reply	other threads:[~2017-07-10 22:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-10 20:26 Alexey Egorov
2017-07-10 21:29 ` Ivan Gotovchits
2017-07-10 22:28   ` Alexey Egorov [this message]
2017-07-11 12:40     ` Ivan Gotovchits
2017-07-11 12:54 ` Simon Cruanes
2017-07-11 17:37   ` Ivan Gotovchits
2017-07-11 17:46     ` Yotam Barnoy
2017-07-11 18:04       ` Gabriel Scherer
2017-07-11 18:15         ` Yotam Barnoy
2017-07-11 18:55           ` Ivan Gotovchits

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJannG7JW6kfEP+myW49Sc39Kng-Np-ex0ChO-5KuM0E-Acc2Q@mail.gmail.com \
    --to=alex.only.d@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=ivg@ieee.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox