From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ivg@ieee.org>
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by sympa.inria.fr (Postfix) with ESMTPS id 16D8280198
	for <caml-list@sympa.inria.fr>; Tue, 11 Jul 2017 14:40:11 +0200 (CEST)
Authentication-Results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=ivg@ieee.org; spf=Pass smtp.mailfrom=ivg@ieee.org; spf=None smtp.helo=postmaster@mail-yw0-f169.google.com
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  ivg@ieee.org) identity=pra; client-ip=209.85.161.169;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="ivg@ieee.org"; x-sender="ivg@ieee.org";
  x-conformance=sidf_compatible
Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of
  ivg@ieee.org designates 209.85.161.169 as permitted sender)
  identity=mailfrom; client-ip=209.85.161.169;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="ivg@ieee.org"; x-sender="ivg@ieee.org";
  x-conformance=sidf_compatible; x-record-type="v=spf1"
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  postmaster@mail-yw0-f169.google.com) identity=helo;
  client-ip=209.85.161.169;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="ivg@ieee.org";
  x-sender="postmaster@mail-yw0-f169.google.com";
  x-conformance=sidf_compatible
IronPort-PHdr: =?us-ascii?q?9a23=3A9gDUgxZyzjgY3gIn4VQ8Wvf/LSx+4OfEezUN459i?=
 =?us-ascii?q?sYplN5qZrsyzbnLW6fgltlLVR4KTs6sC0LuJ9fi4EUU7or+5+EgYd5JNUxJXwe?=
 =?us-ascii?q?43pCcHRPC/NEvgMfTxZDY7FskRHHVs/nW8LFQHUJ2mPw6arXK99yMdFQviPgRp?=
 =?us-ascii?q?OOv1BpTSj8Oq3Oyu5pHfeQtFiT6/bL9oMBm6sRjau9ULj4dlNqs/0AbCrGFSe+?=
 =?us-ascii?q?RRy2NoJFaTkAj568yt4pNt8Dletuw4+cJYXqr0Y6o3TbpDDDQ7KG81/9HktQPC?=
 =?us-ascii?q?TQSU+HQRVHgdnwdSDAjE6BH6WYrxsjf/u+Fg1iSWIdH6QLYpUjul8qlrVQToiD?=
 =?us-ascii?q?8ZODEl7GHZhMtwjKdBrxKgoRx03orYbY6ROfZ7eK7WYNEUSndbXstJVSNBDIOy?=
 =?us-ascii?q?YYUMAeQcI+hXs5LwqEESoRakHwSgGP/jxz1Oi3Tr3aM6yeMhEQTe0QIjAdIBqn?=
 =?us-ascii?q?LUp8j0OqcVTeC1y7fIwinDb/NXxTf985XDfxcgofGSUrJ9asvRxlcxGAzblFmQ?=
 =?us-ascii?q?rpblPzyM2+kLrmOV4e1gVee1hG4mrQF8ujmvxsE2ionInI0Z0F7E9T9hzIYxIt?=
 =?us-ascii?q?24T1Z7bcShEJtUry2aLJd2Qtk8TG5yvSY20LgGuZqjcCgFyZQn2x7fa+GcfISS?=
 =?us-ascii?q?/h3jU+ORLS94hX1/eLK/gBGy/VK8xe37U8m4yFhKoTJBktnLsXANzwbf6s2DSv?=
 =?us-ascii?q?dl8EehwSqA1wfW6uFcJUA7i7bbJIA7zrEskZoTtFzPHij1mEXzja+WdF8o+u+y?=
 =?us-ascii?q?6+ToZLjtu5ySN5dshw3gLqgjntazDOc4PwQUQmSW+Pmw2Kf+8UD4RLhHiOA9nL?=
 =?us-ascii?q?PDv5DAP8sbo7a0Aw9L3YYn7BayFzKm384ZnXkDNV5EeByGg5TwN1HAPfz1DPOy?=
 =?us-ascii?q?j06jkDdswPDGMbnhDYvXInffl7fheK5x609ayAUt0dBS/4xYBq0FLf7pWUL8tM?=
 =?us-ascii?q?bUAgI4PgCp2errFdRw24cGVWKKGKCZMafSsVGS5uIoJumBfI4VuCjyK/U+5v7h?=
 =?us-ascii?q?k2E2lkEHcamux5sXZ2i0Hu56LEWBfXrsntABHH8WsQUkSezqjESOUTpSZ3apQ6?=
 =?us-ascii?q?Ix/So7CYKjDYfbXI+hmr2B3CGhHp1XfG9KEF6MEW27P7mDDt4IZTKfM4dMnTse?=
 =?us-ascii?q?UqbpH4Yl2AHoswn+2vxrBuXR8ywc85nk0Y4myffUkEQT6zVyR+uaz2aTRGF1gn?=
 =?us-ascii?q?hAEz4o04h+rEFwjFCZ3v4r0LRjCdVP6qYRAU8BPpnGwrkmWt0=3D?=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0D0AADGxmRZhqmhVdFdHAEBBAEBCgEBF?=
 =?us-ascii?q?wEBBAEBCgEBhBOBFAeFH5pQgkKFbI8HA1wshUoCgzEHQxQBAQEBAQEBAQEBARI?=
 =?us-ascii?q?BAQEICwsIKC+CMwyCWQEBAQEBAQEjBBkBASwLAQQLCwsDCg0dAgIhARIBBQEKE?=
 =?us-ascii?q?gYTEg2JeAMIBQgQnww/ix9rgWw6gwUBAQWEJQMKg3gBAQEBAQUBAQEBAQEBGQg?=
 =?us-ascii?q?SgxYvgx6DeYEMglcpgheCZoJhiAEMjzOHLjuHSIcLS4RugiU+j0KLfYgCFB+BF?=
 =?us-ascii?q?Q8ngSuBAghJGgaEOiofgg8kNgEEiCkBAQE?=
X-IPAS-Result: =?us-ascii?q?A0D0AADGxmRZhqmhVdFdHAEBBAEBCgEBFwEBBAEBCgEBhBO?=
 =?us-ascii?q?BFAeFH5pQgkKFbI8HA1wshUoCgzEHQxQBAQEBAQEBAQEBARIBAQEICwsIKC+CM?=
 =?us-ascii?q?wyCWQEBAQEBAQEjBBkBASwLAQQLCwsDCg0dAgIhARIBBQEKEgYTEg2JeAMIBQg?=
 =?us-ascii?q?Qnww/ix9rgWw6gwUBAQWEJQMKg3gBAQEBAQUBAQEBAQEBGQgSgxYvgx6DeYEMg?=
 =?us-ascii?q?lcpgheCZoJhiAEMjzOHLjuHSIcLS4RugiU+j0KLfYgCFB+BFQ8ngSuBAghJGga?=
 =?us-ascii?q?EOiofgg8kNgEEiCkBAQE?=
X-IronPort-AV: E=Sophos;i="5.40,346,1496095200"; 
   d="scan'208,217";a="231206112"
Received: from mail-yw0-f169.google.com ([209.85.161.169])
  by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 11 Jul 2017 14:40:09 +0200
Received: by mail-yw0-f169.google.com with SMTP id v193so48167541ywg.2
        for <caml-list@inria.fr>; Tue, 11 Jul 2017 05:40:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=ieee-org.20150623.gappssmtp.com; s=20150623;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc;
        bh=+z9OBlKpQMprXI6l6F9hoSN4MRWNcu07tlcJKGwkL/s=;
        b=nEzSzhVMlkcuneoeNOg8WdC2boUwhEUhN7dgrnujJ8Q3bMCIfpn74/oHY80HzFaOic
         NpP3SJ5bMuJlSJnU50bYifV9gU7qmfUZaTmFA5Xv4kjsL7yYeomwEWa4l7WTC+YHBwyU
         j+4W7JIml11M5ovzDtsppq0zJf6fFFvcrWPJctN00icKfICBh6qpLSbN0EylwqZu0eKm
         5oR9CpN4KAEDO8BGPdKp0ldJliroOfrrpmoorNIIwa7djJyrh6KhffrB+I1QgQpNZF0A
         mldZEkZ4HoPLMNyjslRvw/069pN3unmA3N824fr4yiyNJiklM2IkO4fwDCRyJAwwaHpM
         KepQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to:cc;
        bh=+z9OBlKpQMprXI6l6F9hoSN4MRWNcu07tlcJKGwkL/s=;
        b=ILeuk7ceRQNdyw5pKwSXGb5CsNYyjO08/pAcUVh4VEI9bCI4q5r0liljX9H9bzarwF
         ZlFMP3Yg5AXVq9qNwqBAZLX9EpwYvh/qjrgsrEYFWvq89ryAurlmBnVZBZQhDbIRaVuH
         F947nghIZzYMa3cgcG6DSYWg/toDTJPT/4nQOYqGqPfrHbZ55Eaf1WrPLtTxK3fkR1r+
         UJ5Z6tKec303YBMBqXYVkukI7XMeWGcZWHQkR4UhyH7gnlR6XOK7MtNi5WR2n9yG5AmB
         dwRWFmsrdhIKTPTGxuiKQD53ft1JOv74MpA+XdceMIP79YWTsk5qhwz31JabMHv4gbj0
         OF7w==
X-Gm-Message-State: AIVw111WnzhNe1nSCFZGemeptXppcYSIU/QjsEB/7h+vXUNmsGLqB6Sc
	jy2oZgnkKrxbhwokfkGFRIJeZh/4Ye/J
X-Received: by 10.129.107.198 with SMTP id g189mr4314891ywc.314.1499776807279;
 Tue, 11 Jul 2017 05:40:07 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.129.39.7 with HTTP; Tue, 11 Jul 2017 05:40:06 -0700 (PDT)
In-Reply-To: <CAJannG7JW6kfEP+myW49Sc39Kng-Np-ex0ChO-5KuM0E-Acc2Q@mail.gmail.com>
References: <CAJannG4mcHu6rDsb+LBfydCmhhN1F+OW7pUWDyWTnkvRP6r46w@mail.gmail.com>
 <CALdWJ+wFOTRDtLxAVwqbox-6YHL8OK0t5atBKnEmK9pHHBDMig@mail.gmail.com> <CAJannG7JW6kfEP+myW49Sc39Kng-Np-ex0ChO-5KuM0E-Acc2Q@mail.gmail.com>
From: Ivan Gotovchits <ivg@ieee.org>
Date: Tue, 11 Jul 2017 08:40:06 -0400
Message-ID: <CALdWJ+wUBU+A+c4vy476HKXdP-16NxP8uGtjygRidfN8rXGXdw@mail.gmail.com>
To: Alexey Egorov <alex.only.d@gmail.com>
Cc: caml users <caml-list@inria.fr>
Content-Type: multipart/alternative; boundary="001a11473b36d76cc8055409ff6a"
Subject: Re: [Caml-list] Optimizing pure-functional streams

--001a11473b36d76cc8055409ff6a
Content-Type: text/plain; charset="UTF-8"

On Mon, Jul 10, 2017 at 6:28 PM, Alexey Egorov <alex.only.d@gmail.com>
wrote:

> Hi Ivan,
>
> thanks for your analysis!
>
> Which version of OCaml you are using? I'm tried on 4.04.1 and trunk
> (with and without flambda) and still getting the same (poor)
> performance for recursive version, no matter if curried or uncurried
> loop is used.
>

It works the same on any version of OCaml. Are you sure, that you're
actually recompiling the code?
But if you insist, then I was trying on 4.03.0 and 4.04.0+flambda.

P.S. I would like to retract my statement that the recursive version is 10%
faster, in fact it is the same, it was a measurement error.


>
> 2017-07-11 2:29 GMT+05:00 Ivan Gotovchits <ivg@ieee.org>:
> > Hi Alexey,
> >
> > The recursion version is slower because you're allocating a tuple on each
> > recursive call. Try to use a curried form of a function, and you will a
> > performance that is even better than the for-loop (10% better in my
> case).
> > E.g.,
> >
> >
> >
> > let loop high =
> >   let rec loop i = function
> >     | t when i > high -> t
> >     | t when i mod 2 = 0 -> loop (i + 1) (t + i * 2)
> >     | t -> loop (i + 1) t in
> >   loop 1 0
> >
> >
> > The for-loop version is indeed slower than the C version because of
> tagging,
> > c.f., the OCaml implementation:
> >
> > movq $1, %rbx
> > movq $3, %rdi
> > cmpq %rax, %rdi
> > jg .L100
> >
> >
> > .L101:
> > movq %rdi, %rsi
> > sarq $1, %rsi
> > movq %rsi, %rdx
> > shr movq $1, %rbx
> > movq $3, %rdi
> > cmpq %rax, %rdi
> > jg .L100
> >
> > .L101:
> > movq %rdi, %rsi
> > sarq $1, %rsi
> > moq $63, %rdx
> > movq %rsi, %rcx
> > addq %rdx, %rcx
> > andq $-2, %rcx
> > subq %rcx, %rsi
> > leaq 1(%rsi,%rsi), %rsi
> > cmpq $1, %rsi
> > jne .L102
> > leaq -2(%rbx,%rdi,2), %rbx
> > .L102:
> > movq %rdi, %rsi
> > addq $2, %rdi
> > cmpq %rax, %rsi
> > jne .L101
> > .L100:
> > movq %rbx, %rax
> > ret
> >
> >
> > With the GCC output:
> >
> >
> > testl %edi, %edi
> > jle .L6
> > addl $1, %edi
> > movl $4, %ecx
> > movl $1, %edx
> > xorl %eax, %eax
> > jmp .L3
> > .L5:
> > leaq (%rax,%rcx), %rsi
> > testb $1, %dl
> > cmove %rsi, %rax
> > addq $2, %rcx
> > .L3:
> > addl $1, %edx
> > cmpl %edi, %edx
> > jne .L5
> > rep ret
> > .L6:
> > xorl %eax, %eax
> > ret
> >
> >
> > Notice this `sar` and `shr` instructions, they are here due to the
> tagging.
> > The GCC compiler was also able to leverage the conditional move
> instruction
> > `cmove`, though it looks nice I'm not sure how much it actually won.
> There
> > is no need to use flambda for the recursive and for-loop as they are
> already
> > optimized to the maximum.
> >
> > Concerning your stream implementation, you may compare it to the Core
> > Kernel's Sequence. It is basically the same, so you can play the
> > benchmarking game :)
> >
> > The general way to implement a stream with a zero performance penalty
> (i.e.,
> > as fast as a for loop) is to use staging. This topic is explored in
> detail
> > in the Oleg Kiselyov et alas Strymonad [1] paper. The implementation, as
> > well as other interesting discussions about streams, are available on his
> > homepage [2] (there are tons of very interesting stuff there!).
> >
> > Best,
> > Ivan
> >
> > [1]: http://okmij.org/ftp/meta-programming/strymonas.pdf
> > [2]: http://okmij.org/ftp/Streams.html
> >
> >
> >
> >
> >
> > On Mon, Jul 10, 2017 at 4:26 PM, Alexey Egorov <alex.only.d@gmail.com>
> > wrote:
> >>
> >> Hello,
> >>
> >> Michael Snoyman recently posted very interesting article about Haskell
> >> streams and Rust iterators (
> >> https://www.fpcomplete.com/blog/2017/07/iterators-streams-rust-haskell
> >> ) and I tried to do the same in OCaml.
> >>
> >> You can find my code here -
> >> https://gist.github.com/anonymous/127e9116b362d561c5dfa9cce6b453f3
> >>
> >> There is four different versions:
> >>
> >> 1) c_loop.c - plain C loop
> >> 2) f_loop.ml - for-loop in OCaml
> >> 3) loop.ml - OCaml loop using recursion
> >> 4) stream.ml - OCaml loop using streams
> >>
> >> Here is the results:
> >>
> >> c_loop.c:   0m0.105s
> >> f_loop.ml:  0m0.184s
> >> loop.ml:     0m0.207s
> >> stream.ml: 0m0.605s
> >>
> >> It's not suprise that C version is fastest - at least, it uses untagged
> >> ints.
> >> What surprises me is that recursion is slower than for-loop - can't
> >> OCaml compile it down to the same imperative loop?
> >>
> >> And it's very dissapointing that stream version is 3 times slower than
> >> recursion-based.
> >> The problem, I believe, is that OCaml is unable to optimize
> >> intermediate stream constructors away (as GHC do).
> >>
> >> I tried to add inline annotations, which helps a little - is there
> >> something I can do to improve it further?
> >>
> >> --
> >> Caml-list mailing list.  Subscription management and archives:
> >> https://sympa.inria.fr/sympa/arc/caml-list
> >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> >> Bug reports: http://caml.inria.fr/bin/caml-bugs
> >
> >
>

--001a11473b36d76cc8055409ff6a
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On M=
on, Jul 10, 2017 at 6:28 PM, Alexey Egorov <span dir=3D"ltr">&lt;<a href=3D=
"mailto:alex.only.d@gmail.com" target=3D"_blank">alex.only.d@gmail.com</a>&=
gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=3D"im HOEnZ=
b">Hi Ivan,<br>
<br>
thanks for your analysis!<br>
<br>
Which version of OCaml you are using? I&#39;m tried on 4.04.1 and trunk<br>
(with and without flambda) and still getting the same (poor)<br>
performance for recursive version, no matter if curried or uncurried<br>
loop is used.<br></span></blockquote><div><br></div><div>It works the same =
on any version of OCaml. Are you sure, that you&#39;re actually recompiling=
 the code?=C2=A0</div><div>But if you insist, then I was trying on 4.03.0 a=
nd 4.04.0+flambda.=C2=A0</div><div><br></div><div>P.S. I would like to retr=
act my statement that the recursive version is 10% faster, in fact it is th=
e same, it was a measurement error.=C2=A0</div><div>=C2=A0</div><blockquote=
 class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc soli=
d;padding-left:1ex"><span class=3D"im HOEnZb">
<br>
2017-07-11 2:29 GMT+05:00 Ivan Gotovchits &lt;<a href=3D"mailto:ivg@ieee.or=
g">ivg@ieee.org</a>&gt;:<br>
</span><div class=3D"HOEnZb"><div class=3D"h5">&gt; Hi Alexey,<br>
&gt;<br>
&gt; The recursion version is slower because you&#39;re allocating a tuple =
on each<br>
&gt; recursive call. Try to use a curried form of a function, and you will =
a<br>
&gt; performance that is even better than the for-loop (10% better in my ca=
se).<br>
&gt; E.g.,<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; let loop high =3D<br>
&gt;=C2=A0 =C2=A0let rec loop i =3D function<br>
&gt;=C2=A0 =C2=A0 =C2=A0| t when i &gt; high -&gt; t<br>
&gt;=C2=A0 =C2=A0 =C2=A0| t when i mod 2 =3D 0 -&gt; loop (i + 1) (t + i * =
2)<br>
&gt;=C2=A0 =C2=A0 =C2=A0| t -&gt; loop (i + 1) t in<br>
&gt;=C2=A0 =C2=A0loop 1 0<br>
&gt;<br>
&gt;<br>
&gt; The for-loop version is indeed slower than the C version because of ta=
gging,<br>
&gt; c.f., the OCaml implementation:<br>
&gt;<br>
&gt; movq $1, %rbx<br>
&gt; movq $3, %rdi<br>
&gt; cmpq %rax, %rdi<br>
&gt; jg .L100<br>
&gt;<br>
&gt;<br>
&gt; .L101:<br>
&gt; movq %rdi, %rsi<br>
&gt; sarq $1, %rsi<br>
&gt; movq %rsi, %rdx<br>
&gt; shr movq $1, %rbx<br>
&gt; movq $3, %rdi<br>
&gt; cmpq %rax, %rdi<br>
&gt; jg .L100<br>
&gt;<br>
&gt; .L101:<br>
&gt; movq %rdi, %rsi<br>
&gt; sarq $1, %rsi<br>
&gt; moq $63, %rdx<br>
&gt; movq %rsi, %rcx<br>
&gt; addq %rdx, %rcx<br>
&gt; andq $-2, %rcx<br>
&gt; subq %rcx, %rsi<br>
&gt; leaq 1(%rsi,%rsi), %rsi<br>
&gt; cmpq $1, %rsi<br>
&gt; jne .L102<br>
&gt; leaq -2(%rbx,%rdi,2), %rbx<br>
&gt; .L102:<br>
&gt; movq %rdi, %rsi<br>
&gt; addq $2, %rdi<br>
&gt; cmpq %rax, %rsi<br>
&gt; jne .L101<br>
&gt; .L100:<br>
&gt; movq %rbx, %rax<br>
&gt; ret<br>
&gt;<br>
&gt;<br>
&gt; With the GCC output:<br>
&gt;<br>
&gt;<br>
&gt; testl %edi, %edi<br>
&gt; jle .L6<br>
&gt; addl $1, %edi<br>
&gt; movl $4, %ecx<br>
&gt; movl $1, %edx<br>
&gt; xorl %eax, %eax<br>
&gt; jmp .L3<br>
&gt; .L5:<br>
&gt; leaq (%rax,%rcx), %rsi<br>
&gt; testb $1, %dl<br>
&gt; cmove %rsi, %rax<br>
&gt; addq $2, %rcx<br>
&gt; .L3:<br>
&gt; addl $1, %edx<br>
&gt; cmpl %edi, %edx<br>
&gt; jne .L5<br>
&gt; rep ret<br>
&gt; .L6:<br>
&gt; xorl %eax, %eax<br>
&gt; ret<br>
&gt;<br>
&gt;<br>
&gt; Notice this `sar` and `shr` instructions, they are here due to the tag=
ging.<br>
&gt; The GCC compiler was also able to leverage the conditional move instru=
ction<br>
&gt; `cmove`, though it looks nice I&#39;m not sure how much it actually wo=
n. There<br>
&gt; is no need to use flambda for the recursive and for-loop as they are a=
lready<br>
&gt; optimized to the maximum.<br>
&gt;<br>
&gt; Concerning your stream implementation, you may compare it to the Core<=
br>
&gt; Kernel&#39;s Sequence. It is basically the same, so you can play the<b=
r>
&gt; benchmarking game :)<br>
&gt;<br>
&gt; The general way to implement a stream with a zero performance penalty =
(i.e.,<br>
&gt; as fast as a for loop) is to use staging. This topic is explored in de=
tail<br>
&gt; in the Oleg Kiselyov et alas Strymonad [1] paper. The implementation, =
as<br>
&gt; well as other interesting discussions about streams, are available on =
his<br>
&gt; homepage [2] (there are tons of very interesting stuff there!).<br>
&gt;<br>
&gt; Best,<br>
&gt; Ivan<br>
&gt;<br>
&gt; [1]: <a href=3D"http://okmij.org/ftp/meta-programming/strymonas.pdf" r=
el=3D"noreferrer" target=3D"_blank">http://okmij.org/ftp/meta-<wbr>programm=
ing/strymonas.pdf</a><br>
&gt; [2]: <a href=3D"http://okmij.org/ftp/Streams.html" rel=3D"noreferrer" =
target=3D"_blank">http://okmij.org/ftp/Streams.<wbr>html</a><br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; On Mon, Jul 10, 2017 at 4:26 PM, Alexey Egorov &lt;<a href=3D"mailto:a=
lex.only.d@gmail.com">alex.only.d@gmail.com</a>&gt;<br>
&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Hello,<br>
&gt;&gt;<br>
&gt;&gt; Michael Snoyman recently posted very interesting article about Has=
kell<br>
&gt;&gt; streams and Rust iterators (<br>
&gt;&gt; <a href=3D"https://www.fpcomplete.com/blog/2017/07/iterators-strea=
ms-rust-haskell" rel=3D"noreferrer" target=3D"_blank">https://www.fpcomplet=
e.com/<wbr>blog/2017/07/iterators-<wbr>streams-rust-haskell</a><br>
&gt;&gt; ) and I tried to do the same in OCaml.<br>
&gt;&gt;<br>
&gt;&gt; You can find my code here -<br>
&gt;&gt; <a href=3D"https://gist.github.com/anonymous/127e9116b362d561c5dfa=
9cce6b453f3" rel=3D"noreferrer" target=3D"_blank">https://gist.github.com/<=
wbr>anonymous/<wbr>127e9116b362d561c5dfa9cce6b453<wbr>f3</a><br>
&gt;&gt;<br>
&gt;&gt; There is four different versions:<br>
&gt;&gt;<br>
&gt;&gt; 1) c_loop.c - plain C loop<br>
&gt;&gt; 2) <a href=3D"http://f_loop.ml" rel=3D"noreferrer" target=3D"_blan=
k">f_loop.ml</a> - for-loop in OCaml<br>
&gt;&gt; 3) <a href=3D"http://loop.ml" rel=3D"noreferrer" target=3D"_blank"=
>loop.ml</a> - OCaml loop using recursion<br>
&gt;&gt; 4) <a href=3D"http://stream.ml" rel=3D"noreferrer" target=3D"_blan=
k">stream.ml</a> - OCaml loop using streams<br>
&gt;&gt;<br>
&gt;&gt; Here is the results:<br>
&gt;&gt;<br>
&gt;&gt; c_loop.c:=C2=A0 =C2=A00m0.105s<br>
&gt;&gt; <a href=3D"http://f_loop.ml" rel=3D"noreferrer" target=3D"_blank">=
f_loop.ml</a>:=C2=A0 0m0.184s<br>
&gt;&gt; <a href=3D"http://loop.ml" rel=3D"noreferrer" target=3D"_blank">lo=
op.ml</a>:=C2=A0 =C2=A0 =C2=A00m0.207s<br>
&gt;&gt; <a href=3D"http://stream.ml" rel=3D"noreferrer" target=3D"_blank">=
stream.ml</a>: 0m0.605s<br>
&gt;&gt;<br>
&gt;&gt; It&#39;s not suprise that C version is fastest - at least, it uses=
 untagged<br>
&gt;&gt; ints.<br>
&gt;&gt; What surprises me is that recursion is slower than for-loop - can&=
#39;t<br>
&gt;&gt; OCaml compile it down to the same imperative loop?<br>
&gt;&gt;<br>
&gt;&gt; And it&#39;s very dissapointing that stream version is 3 times slo=
wer than<br>
&gt;&gt; recursion-based.<br>
&gt;&gt; The problem, I believe, is that OCaml is unable to optimize<br>
&gt;&gt; intermediate stream constructors away (as GHC do).<br>
&gt;&gt;<br>
&gt;&gt; I tried to add inline annotations, which helps a little - is there=
<br>
&gt;&gt; something I can do to improve it further?<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; Caml-list mailing list.=C2=A0 Subscription management and archives=
:<br>
&gt;&gt; <a href=3D"https://sympa.inria.fr/sympa/arc/caml-list" rel=3D"nore=
ferrer" target=3D"_blank">https://sympa.inria.fr/sympa/<wbr>arc/caml-list</=
a><br>
&gt;&gt; Beginner&#39;s list: <a href=3D"http://groups.yahoo.com/group/ocam=
l_beginners" rel=3D"noreferrer" target=3D"_blank">http://groups.yahoo.com/g=
roup/<wbr>ocaml_beginners</a><br>
&gt;&gt; Bug reports: <a href=3D"http://caml.inria.fr/bin/caml-bugs" rel=3D=
"noreferrer" target=3D"_blank">http://caml.inria.fr/bin/caml-<wbr>bugs</a><=
br>
&gt;<br>
&gt;<br>
</div></div></blockquote></div><br></div></div>

--001a11473b36d76cc8055409ff6a--