Re: [Caml-list] Optimizing pure-functional streams

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: Ivan Gotovchits <ivg@ieee.org>
To: Yotam Barnoy <yotambarnoy@gmail.com>
Cc: Gabriel Scherer <gabriel.scherer@gmail.com>,
	Simon Cruanes <simon.cruanes.2007@m4x.org>,
	 Alexey Egorov <alex.only.d@gmail.com>,
	caml users <caml-list@inria.fr>
Subject: Re: [Caml-list] Optimizing pure-functional streams
Date: Tue, 11 Jul 2017 14:55:26 -0400	[thread overview]
Message-ID: <CALdWJ+ypjkGUvbHs_WrvPyF__q+PQZ8mTEicLWr-cuTNvhqoYA@mail.gmail.com> (raw)
In-Reply-To: <CAN6ygO=k-GP6HwgpdsNgr2dU794oz_SpDcqND+LB07LQjUTfUw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5843 bytes --]

Yep, moving to boxing increased the time by ten times. However, I found a
strange behavior, that might be a bug, the following program:

open Int64

let (=) = equal
let (+) = add
let (mod) = rem
let ( * ) = mul

let loop high =
  let rec loop i = function
    | t when i > high -> t
    | t when i mod 2L = 0L -> loop (i + 1L) (t + i * 2L)
    | t -> loop (i + 1L) t in
  loop 1L 0L

let high = 1000L * 1000L * 1000L
let _ = Printf.printf "%Ld\n" (loop high)


Didn't compile with the following _tags file:

true : optimize(3), unbox_closures, optimization_rounds(8),
inline_max_depth(8), inline_max_unroll(2)



And terminates with the "Fatal error: exception Stack overflow" message.
With higher values of the inline_max_unroll parameter, it doesn't terminate
on mine machine in reasonable time and space.


The most surprising is not that, but that the offending line is `let (=) =
equal`. Without this line, the program compiles without any issues.

Best,
Ivan

P.S. Actually, I'm playing with flambda and our monads library, that is
highly functorized and abstracted. I reimplemented this loop example using
the State monad and got about 10 times better performance with flambda in
comparison to a compiler without flambda enabled (from 40 seconds to 4
seconds). That makes my happy panda :) It is still 4 times slower than the
regular version, but I'm ready to pay this cost.




On Tue, Jul 11, 2017 at 2:15 PM, Yotam Barnoy <yotambarnoy@gmail.com> wrote:

> > Moving from tagging to boxing does not make much sense to me, because
> the techniques to avoid either are similar and boxing is more
> expensive. It might be the case that the compiler today is better at
> unboxing than untagging (it currently tries to do both of these
> locally), but that would only be because boxing is more expensive and
> thus more effort was spent on the unboxing, but medium-term one could
> hope for a uniform approach to both transformations.
>
> Absolutely. I believe there was some skepticism expressed about the
> need for untagging in the past by some members of the dev team, which
> is why I brought it up. I suggested Int64 only because it's possible
> that Flambda will *currently* do better with it than with tagged
> integers.
>
> On Tue, Jul 11, 2017 at 2:04 PM, Gabriel Scherer
> <gabriel.scherer@gmail.com> wrote:
> > Moving from tagging to boxing does not make much sense to me, because
> > the techniques to avoid either are similar and boxing is more
> > expensive. It might be the case that the compiler today is better at
> > unboxing than untagging (it currently tries to do both of these
> > locally), but that would only be because boxing is more expensive and
> > thus more effort was spent on the unboxing, but medium-term one could
> > hope for a uniform approach to both transformations.
> >
> >
> > On Tue, Jul 11, 2017 at 1:46 PM, Yotam Barnoy <yotambarnoy@gmail.com>
> wrote:
> >>> I've played a little bit with different optimization options in
> flambda 4.04, and finally, all three versions of the loop: curried,
> uncurried, and the for-loop, have the same performance, though they still
> loose about 30% to the C version, due to tagging.
> >>
> >> Would it perhaps make sense to try Int64 in order to avoid the tagging?
> >>
> >> Also, I believe it would make sense to have Flambda try to switch to
> >> an untagged type in tight loops to avoid this tagging cost.
> >>
> >> On Tue, Jul 11, 2017 at 1:37 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
> >>> TWIMC,
> >>>
> >>> I've played a little bit with different optimization options in flambda
> >>> 4.04, and finally, all three versions of the loop: curried, uncurried,
> and
> >>> the for-loop, have the same performance, though they still loose about
> 30%
> >>> to the C version, due to tagging.
> >>>
> >>> Basically, this means, that flambda was able to get rid of the
> allocation. I
> >>> don't actually know which of the options finally made the difference,
> but
> >>> this is how I compiled it.
> >>>
> >>> ocamlopt.opt -c -S -inlining-report -unbox-closures -O3 -rounds 8
> >>> -inline-max-depth 256 -inline-max-unroll 1024 -o loop.cmx loop.ml
> >>> ocamlopt.opt loop.cmx -o loop.native
> >>>
> >>>
> >>> Regards,
> >>> Ivan
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jul 11, 2017 at 8:54 AM, Simon Cruanes <
> simon.cruanes.2007@m4x.org>
> >>> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> Iterators in OCaml have been the topic of many discussions. Another
> >>>> option for fast iterators is https://github.com/c-cube/sequence ,
> >>>> which (with flambda) should compile down to loops and tests on this
> kind
> >>>> of benchmark. With the attached additional file on 4.04.0+flambda,
> >>>> I obtain the following (where sequence is test-seq):
> >>>>
> >>>> $ for i in test-* ; do echo $i ; time ./$i ; done
> >>>> test-c_loop
> >>>> 5000000100000000
> >>>> ./$i  0.08s user 0.00s system 97% cpu 0.085 total
> >>>> test-f_loop
> >>>> 5000000100000000
> >>>> ./$i  0.10s user 0.00s system 96% cpu 0.100 total
> >>>> test-loop
> >>>> 5000000100000000
> >>>> ./$i  0.18s user 0.00s system 97% cpu 0.184 total
> >>>> test-seq
> >>>> 5000000100000000
> >>>> ./$i  0.11s user 0.00s system 97% cpu 0.113 total
> >>>> test-stream
> >>>> 5000000100000000
> >>>> ./$i  0.44s user 0.00s system 98% cpu 0.449 total
> >>>>
> >>>>
> >>>> Note that sequence is imperative underneath, but can be safely used
> as a
> >>>> functional structure.
> >>>>
> >>>> --
> >>>> Simon Cruanes
> >>>>
> >>>> http://weusepgp.info/
> >>>> key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3  7D8D 4AC0 1D08
> 49AA
> >>>> 62B6
> >>>
> >>>
> >>
> >> --
> >> Caml-list mailing list.  Subscription management and archives:
> >> https://sympa.inria.fr/sympa/arc/caml-list
> >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> >> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 9506 bytes --]

     prev parent reply	other threads:[~2017-07-11 18:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-10 20:26 Alexey Egorov
2017-07-10 21:29 ` Ivan Gotovchits
2017-07-10 22:28   ` Alexey Egorov
2017-07-11 12:40     ` Ivan Gotovchits
2017-07-11 12:54 ` Simon Cruanes
2017-07-11 17:37   ` Ivan Gotovchits
2017-07-11 17:46     ` Yotam Barnoy
2017-07-11 18:04       ` Gabriel Scherer
2017-07-11 18:15         ` Yotam Barnoy
2017-07-11 18:55           ` Ivan Gotovchits [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALdWJ+ypjkGUvbHs_WrvPyF__q+PQZ8mTEicLWr-cuTNvhqoYA@mail.gmail.com \
    --to=ivg@ieee.org \
    --cc=alex.only.d@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=gabriel.scherer@gmail.com \
    --cc=simon.cruanes.2007@m4x.org \
    --cc=yotambarnoy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox