From: Kuba Ober <ober.14@osu.edu>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Performance questions, -inline, ...
Date: Mon, 7 Jan 2008 10:22:40 -0500 [thread overview]
Message-ID: <200801071022.41044.ober.14@osu.edu> (raw)
In-Reply-To: <200801071441.48212.jon@ffconsultancy.com>
On Monday 07 January 2008, Jon Harrop wrote:
> On Monday 07 January 2008 13:48:40 Kuba Ober wrote:
> > On Saturday 05 January 2008, Jon Harrop wrote:
> > > Optimizing numerical code is discussed in detail in my book OCaml for
> > > Scientists. You may also be interested in a very similar thread where I
> > > optimized someone's almost identical code on the beginners list
> > > recently. There is also this relevant blog post of mine:
> > >
> > > http://ocamlnews.blogspot.com/2007/09/spectral-norm.html
> > >
> > > Essentially, your benchmark has rediscovered the fact that abstractions
> > > (HOFs, polymorphism etc.) are prohibitively slow for high-performance
> > > numerics and must be avoided.
> >
> > In the case of the particular OCaml implementation - yes. In general -
> > no. I hope we agree here.
>
> You mean it might be possible to recover the performance of C from
> numerical code with high-level abstractions? Yes. Indeed, I would like to
> see this done. However, I've never heard of an implementation of any
> language that can do this.
g++ does that reasonably well. Heck, I have a severely hacked in-house LISP
system which had grown out of an assembler-with-LISP-macros, and it does it
all the time. It's a LISP with static types at runtime, but general LISPiness
at compile time (macros just run in gcl). In fact I'm looking to port it to
OCaml just because type inference and commonplace LISP syntax for type
declarations don't mix too well - the code starts looking really ugly. Maybe
I stuck too much with LISP's usual declaim-this-and-that approach, but that's
what you get when you reuse most of underlying LISP's implementation.
> > They are straight rewrites from C code and are used to compare how gcc
> > and OCaml stack up.
>
> I assume you didn't mimic run-time polymorphism, currying, closures and
> higher-order functions in the C though?
No. But while those are very useful features to have, the code is messy if
you've got two languages to work with (hi-perf stuff in C, everything else in
OCaml).
> > > > (* generic scalar operation *)
> > > > let op1 op const nloop =
> > > > let accum = ref start in
> > > > for i = 1 to nloop do
> > > > accum := op !accum const
> > > > done
> > >
> > > You probably meant to return "!accum".
> >
> > The return value is ignored anyway. It's a benchmark, noone cares what
> > the result is.
>
> If you do not care what the result of calls to this function are then you
> must also not care how quickly it runs. I cannot overemphasize how
> important this distinction is.
I assumed (correctly) that OCaml's optimizer isn't all that clever. This is
something you definitely have to watch for when using gcc/g++.
> > > > (** addition **)
> > > > let add1 nloop =
> > > > let accum = ref start in
> > > > for i = 1 to nloop do
> > > > accum := !accum +. addconst
> > > > done
> > >
> > > Again, should probably return "!accum". However, you can encourage
> > > OCaml to unbox the float by returning "!accum + 0.0" instead.
> >
> > OK, I don't quite get it. Are you talking about what the function should
> > return?
>
> Yes. Your "add1" function is entirely dead code and, therefore, is
> trivially reducible. This is not true of real programs and, therefore, your
> results will not be representative of real programs. Moreover, there is no
> sense in trying to optimize a program that does nothing anyway.
If you code such a thing in gcc, it correctly becomes a no-op. OCaml isn't all
that clever ;)
You're 100% academically-correct of course.
> > If so, are you implying that the function body will be compiled
> > differently (better?) if a different type is returned?
>
> The change of type isn't the reason for the performance improvement. If you
> return "!accum" here, so "add1" returns "float" then the performance will
> probably not change. However, if you then change the return expression
> to "!accum + 0.0" then you're likely to see a significant improvement in
> performance because the OCaml compiler will unbox the floating point
> value "!accum" throughout this function, avoiding costly allocations.
Ah-ha, I have to try that!
> > > > let add2 = op1 ( +. ) addconst
> > >
> > > This should be slower than "add1".
> >
> > But shouldn't. The way I write the code in this case should not affect
> > the assembly the least bit. The loop with explicit operand, functional
> > recursion and generic function-as-an-argument approach should all
> > generate same assembly. Obviously enough they don't :)
>
> The simplest route to recovering C performance here is:
>
> . Inline "( +. )".
> . Inline "op1".
> . Type-specialize "op1".
> . Hoist bounds checks.
You mean "the programmer has to do this"? OCaml compiler is *really* bad,
then.
> > > > let add4 = op2 ( +. ) addconst
> > >
> > > This will be slow because "op2" is polymorphic and "+." will not be
> > > inlined.
> >
> > Yeah, I see that, but that shouldn't be the case if OCaml were to be
> > serious in that department :)
>
> AFAIK, there are no languages with implementations that are "serious in
> that department" by this definition, although I agree that it would be very
> useful in practice.
I have a very broken inlining statically-typed LISP implementation which does
that all the time. For code which runs on 8-bit hardware and is better than
stock C compiler that came with it (Zilog's Encore! platform). It's really
not all that hard to do. It's hardly a full-fledged implementation, and it's
buggy, but it's only goal is to compile some very specific codebase. A hack,
but it works, and I write everything functional-style, with functions passed
left and right, etc. In fact, the compiler guarantees that a stateless
function, if passed as an argument, is equivalent to coding it in-line. It's
all DSP code.
> There are some trade-offs here though. Stalin-compiled Scheme,
> MLton-compiled SML and GHC-compiled Haskell do a lot more than OCaml in
> theory but, in practice, they are much less useful for high-performance
> numerics because their performance is so unpredictable.
Good to know.
> I've already spent a long time meticulously contructing benchmarks in
> various different languages along these lines and the most important
> optimizations missing from OCaml are:
>
> . Hoisting bounds checks.
> . Common subexpression elimination.
> . Type specialization.
> . More aggressive inlining.
> . Static optimization of div and mod.
>
> The nice thing about OCaml is that it has a superb 64-bit code generator
> so, once you've done those optimizations on the hot paths manually, OCaml
> lets you run code as quickly as C but with all of the high-level benefits
> everywhere else.
>
> A perfect language implementation would certainly do these optimizations
> for you and I wish OCaml did but I still haven't found anything better.
After I gain some more experience I will hopefully add an OCaml-esque front
end to my hackish compiler. It's completely unportable, only works on
12-bit-PIC-like architecture (SX from Parallax) and on Z8 Encore!, but hey,
maybe it can be a good starting point to something bigger and better. The
backend still has to be ported from Lisp, and I'm too time- and
knowledge-constrained to do it just yet.
The code is 70% Lisp macros, reuses every bit of underlying LISP platform
it can, and as of yet I don't know how clean the ported OCaml would look.
Needless to say, I'd never get this done without having the macros available,
as in "this would take me forever in C/C++". The way it currenly runs is as
follows: the source code is run via a LISP program which creates another LISP
program which then generates the linked binary to be downloaded to the
target.
I was originally hoping to reuse most of OCaml's compiler/optimizer and just
write a platform-specific backend. It now seems that it's easier for me to
add a hacked OCaml front-end to my existing system. Probably it will just
translate OCaml to LISP and then process it as it originally was. I really
like OCaml syntax better, *although* some of my sources also use LISP macros.
They could probably stay as-is for now if the OCaml parts will just get
translated to LISP.
Cheers, Kuba
next prev parent reply other threads:[~2008-01-07 15:22 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-03 16:28 Kuba Ober
2008-01-03 17:11 ` [Caml-list] " Edgar Friendly
2008-01-05 18:09 ` Kuba Ober
2008-01-05 18:44 ` Kuba Ober
2008-01-05 19:36 ` Jon Harrop
2008-01-05 20:31 ` Bünzli Daniel
2008-01-07 13:48 ` Kuba Ober
2008-01-07 14:41 ` Jon Harrop
2008-01-07 15:22 ` Kuba Ober [this message]
2008-01-07 19:58 ` Jon Harrop
2008-01-08 14:20 ` Kuba Ober
2008-01-12 14:22 ` Jon Harrop
2008-01-12 16:18 ` Dario Teixeira
2008-01-12 23:50 ` Jon Harrop
2008-01-07 15:31 ` Christophe Raffalli
2008-01-07 17:00 ` Jacques Carette
2008-01-07 17:07 ` Till Varoquaux
2008-01-07 17:20 ` Jacques Carette
2008-01-07 17:31 ` Kuba Ober
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200801071022.41044.ober.14@osu.edu \
--to=ober.14@osu.edu \
--cc=caml-list@yquem.inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox