* Performance questions, -inline, ...
@ 2008-01-03 16:28 Kuba Ober
2008-01-03 17:11 ` [Caml-list] " Edgar Friendly
2008-01-05 19:36 ` Jon Harrop
0 siblings, 2 replies; 19+ messages in thread
From: Kuba Ober @ 2008-01-03 16:28 UTC (permalink / raw)
To: caml-list
I haven't looked at assembly output yet, but I've run into some unexpected
behavior in my benchmarks.
This was compiled by ocamlopt -inline 100 -unsafe, the results and code are
below (MIPS is obtained by dividing 50 million iterations by (Unix.times
()) . Unix.tms_utime it took to run). I haven't included the timing etc. code
(it's part of a larger benchmark).
What I wonder is why vector-to-vector add is so much faster than (constant)
scalar to vector add. Vectors are preinitialized each time with a 1.0000,
1.0001, ... sequence.
Also, the very bad performance from generic vector-to-vector *with* inlining
is another puzzler, whereas generic add of scalar-to-scalar performs
similarly to straight-coded one.
Cheers, Kuba
* add1: add scalar to scalar 120 MIPS
* add3: add scalar to vector 250 MIPS
* add5: add vector to vector 320 MIPS
* add2: generic add scalar to scalar 100 MIPS
* add4: generic add vector to vector 38 MIPS
let start = 1.3
(* generic scalar operation *)
let op1 op const nloop =
let accum = ref start in
for i = 1 to nloop do
accum := op !accum const
done
(* generic vector operation *)
let op2 op const a b (nloop : int) =
let len = Array.length a in
for j = 0 to len-1 do
for i = 0 to len-1 do
b.(i) <- op a.(i) b.(i)
done;
done
(** addition **)
let add1 nloop =
let accum = ref start in
for i = 1 to nloop do
accum := !accum +. addconst
done
let add2 = op1 ( +. ) addconst
let add3 a b nloop =
let len = Array.length a in
for j = 0 to len-1 do
for i = 0 to len-1 do
b.(i) <- a.(i) +. addconst
done;
done
let add4 = op2 ( +. ) addconst
let add5 a b nloop =
let len = Array.length a in
for j = 0 to len-1 do
for i = 0 to len-1 do
b.(i) <- a.(i) +. b.(i)
done;
done
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-03 16:28 Performance questions, -inline, Kuba Ober
@ 2008-01-03 17:11 ` Edgar Friendly
2008-01-05 18:09 ` Kuba Ober
2008-01-05 19:36 ` Jon Harrop
1 sibling, 1 reply; 19+ messages in thread
From: Edgar Friendly @ 2008-01-03 17:11 UTC (permalink / raw)
To: Kuba Ober; +Cc: caml-list
On Thu, 2008-01-03 at 11:28 -0500, Kuba Ober wrote:
> I haven't looked at assembly output yet, but I've run into some unexpected
> behavior in my benchmarks.
>
> This was compiled by ocamlopt -inline 100 -unsafe, the results and code are
> below (MIPS is obtained by dividing 50 million iterations by (Unix.times
> ()) . Unix.tms_utime it took to run). I haven't included the timing etc. code
> (it's part of a larger benchmark).
>
> What I wonder is why vector-to-vector add is so much faster than (constant)
> scalar to vector add. Vectors are preinitialized each time with a 1.0000,
> 1.0001, ... sequence.
>
> Also, the very bad performance from generic vector-to-vector *with* inlining
> is another puzzler, whereas generic add of scalar-to-scalar performs
> similarly to straight-coded one.
>
> Cheers, Kuba
>
> * add1: add scalar to scalar 120 MIPS
> * add3: add scalar to vector 250 MIPS
> * add5: add vector to vector 320 MIPS
> * add2: generic add scalar to scalar 100 MIPS
> * add4: generic add vector to vector 38 MIPS
>
> let start = 1.3
>
> (* generic scalar operation *)
> let op1 op const nloop =
> let accum = ref start in
> for i = 1 to nloop do
> accum := op !accum const
> done
>
> (* generic vector operation *)
> let op2 op const a b (nloop : int) =
> let len = Array.length a in
> for j = 0 to len-1 do
> for i = 0 to len-1 do
> b.(i) <- op a.(i) b.(i)
> done;
> done
>
> (** addition **)
> let add1 nloop =
> let accum = ref start in
> for i = 1 to nloop do
> accum := !accum +. addconst
> done
> let add2 = op1 ( +. ) addconst
> let add3 a b nloop =
> let len = Array.length a in
> for j = 0 to len-1 do
> for i = 0 to len-1 do
> b.(i) <- a.(i) +. addconst
> done;
> done
> let add4 = op2 ( +. ) addconst
> let add5 a b nloop =
> let len = Array.length a in
> for j = 0 to len-1 do
> for i = 0 to len-1 do
> b.(i) <- a.(i) +. b.(i)
> done;
> done
>
how about:
(* generic vector operation *)
let op2 op a b nloop =
let len = Array.length a in
for j = 0 to nloop do
for i = 0 to len-1 do
b.(i) <- op a.(i) b.(i)
done;
done
let add4 = op2 (+.)
Why does your code have the j loops? You add a constant (or vector
element) a number of times equal to the length of your vector? Do you
mean for j = 1 to nloop do ... And I'd move that out of the test
function if I could, into the testing harness.
Arrays of floats have some optimizations built in to the compiler (no
boxing, even though they're not 31-bit values), so you should get as
good performance as you'll get.
E.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-03 17:11 ` [Caml-list] " Edgar Friendly
@ 2008-01-05 18:09 ` Kuba Ober
2008-01-05 18:44 ` Kuba Ober
0 siblings, 1 reply; 19+ messages in thread
From: Kuba Ober @ 2008-01-05 18:09 UTC (permalink / raw)
To: caml-list
> how about:
>
> (* generic vector operation *)
> let op2 op a b nloop =
> let len = Array.length a in
> for j = 0 to nloop do
> for i = 0 to len-1 do
> b.(i) <- op a.(i) b.(i)
> done;
> done
>
> let add4 = op2 (+.)
This doesn't change a thing, as I've said the code is part
of a larger benchmark, that's why there are some unused parameters
etc. They have no effect on performance.
> Why does your code have the j loops?
Simply to execute the operation Array.length ^ 2 times :)
> You add a constant (or vector
> element) a number of times equal to the length of your vector?
No, equal to the square of it. I didn't want to have too big vectors. The
stuff executes floor(sqrt(50e6)) times.
> Arrays of floats have some optimizations built in to the compiler (no
> boxing, even though they're not 31-bit values), so you should get as
> good performance as you'll get.
Are you saying that it may be faster to use a one-element array than a ref?
That'd be curious at best, and a WTF otherwise ;)
Cheers, Kuba
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-05 18:09 ` Kuba Ober
@ 2008-01-05 18:44 ` Kuba Ober
0 siblings, 0 replies; 19+ messages in thread
From: Kuba Ober @ 2008-01-05 18:44 UTC (permalink / raw)
To: caml-list
> > You add a constant (or vector
> > element) a number of times equal to the length of your vector?
>
> No, equal to the square of it. I didn't want to have too big vectors. The
> stuff executes floor(sqrt(50e6)) times.
I meant floor(sqrt(50e6))^2 times.
Cheers, Kuba
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-03 16:28 Performance questions, -inline, Kuba Ober
2008-01-03 17:11 ` [Caml-list] " Edgar Friendly
@ 2008-01-05 19:36 ` Jon Harrop
2008-01-05 20:31 ` Bünzli Daniel
2008-01-07 13:48 ` Kuba Ober
1 sibling, 2 replies; 19+ messages in thread
From: Jon Harrop @ 2008-01-05 19:36 UTC (permalink / raw)
To: caml-list
Optimizing numerical code is discussed in detail in my book OCaml for
Scientists. You may also be interested in a very similar thread where I
optimized someone's almost identical code on the beginners list recently.
There is also this relevant blog post of mine:
http://ocamlnews.blogspot.com/2007/09/spectral-norm.html
Essentially, your benchmark has rediscovered the fact that abstractions (HOFs,
polymorphism etc.) are prohibitively slow for high-performance numerics and
must be avoided. There are also some minor boxing issues at play.
On Thursday 03 January 2008 16:28:30 Kuba Ober wrote:
> I haven't looked at assembly output yet, but I've run into some unexpected
> behavior in my benchmarks.
Your benchmarks aren't sufficiently well defined to convey information about
anything specific, so you're highly likely to misinterpret what you see.
> This was compiled by ocamlopt -inline 100 -unsafe,
You should use Array.unsafe_get and _set rather than the command-line
option -unsafe because the latter breaks correct code.
> What I wonder is why vector-to-vector add is so much faster than (constant)
> scalar to vector add. Vectors are preinitialized each time with a 1.0000,
> 1.0001, ... sequence.
This is also highly likely to be platform and architecture dependent.
> Also, the very bad performance from generic vector-to-vector *with*
> inlining is another puzzler, whereas generic add of scalar-to-scalar
> performs similarly to straight-coded one.
>
> Cheers, Kuba
>
> * add1: add scalar to scalar 120 MIPS
> * add3: add scalar to vector 250 MIPS
> * add5: add vector to vector 320 MIPS
> * add2: generic add scalar to scalar 100 MIPS
> * add4: generic add vector to vector 38 MIPS
>
> let start = 1.3
>
> (* generic scalar operation *)
> let op1 op const nloop =
> let accum = ref start in
> for i = 1 to nloop do
> accum := op !accum const
> done
You probably meant to return "!accum".
> (* generic vector operation *)
> let op2 op const a b (nloop : int) =
> let len = Array.length a in
> for j = 0 to len-1 do
> for i = 0 to len-1 do
> b.(i) <- op a.(i) b.(i)
> done;
> done
>
> (** addition **)
> let add1 nloop =
> let accum = ref start in
> for i = 1 to nloop do
> accum := !accum +. addconst
> done
Again, should probably return "!accum". However, you can encourage OCaml to
unbox the float by returning "!accum + 0.0" instead.
> let add2 = op1 ( +. ) addconst
This should be slower than "add1".
> let add3 a b nloop =
> let len = Array.length a in
> for j = 0 to len-1 do
> for i = 0 to len-1 do
> b.(i) <- a.(i) +. addconst
> done;
> done
The loop over "j" can be removed.
> let add4 = op2 ( +. ) addconst
This will be slow because "op2" is polymorphic and "+." will not be inlined.
> let add5 a b nloop =
> let len = Array.length a in
> for j = 0 to len-1 do
> for i = 0 to len-1 do
> b.(i) <- a.(i) +. b.(i)
> done;
> done
This increments "b" by "a" many times. Replace the repeated adds with a single
multiply for each element.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-05 19:36 ` Jon Harrop
@ 2008-01-05 20:31 ` Bünzli Daniel
2008-01-07 13:48 ` Kuba Ober
1 sibling, 0 replies; 19+ messages in thread
From: Bünzli Daniel @ 2008-01-05 20:31 UTC (permalink / raw)
To: Kuba Ober; +Cc: caml-list caml-list
Have also a look at this page
http://caml.inria.fr/pub/old_caml_site/ocaml/numerical.html
Best,
Daniel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-05 19:36 ` Jon Harrop
2008-01-05 20:31 ` Bünzli Daniel
@ 2008-01-07 13:48 ` Kuba Ober
2008-01-07 14:41 ` Jon Harrop
1 sibling, 1 reply; 19+ messages in thread
From: Kuba Ober @ 2008-01-07 13:48 UTC (permalink / raw)
To: caml-list
On Saturday 05 January 2008, Jon Harrop wrote:
> Optimizing numerical code is discussed in detail in my book OCaml for
> Scientists. You may also be interested in a very similar thread where I
> optimized someone's almost identical code on the beginners list recently.
> There is also this relevant blog post of mine:
>
> http://ocamlnews.blogspot.com/2007/09/spectral-norm.html
>
> Essentially, your benchmark has rediscovered the fact that abstractions
> (HOFs, polymorphism etc.) are prohibitively slow for high-performance
> numerics and must be avoided.
In the case of the particular OCaml implementation - yes. In general - no. I
hope we agree here.
> On Thursday 03 January 2008 16:28:30 Kuba Ober wrote:
> > I haven't looked at assembly output yet, but I've run into some
> > unexpected behavior in my benchmarks.
>
> Your benchmarks aren't sufficiently well defined to convey information
> about anything specific, so you're highly likely to misinterpret what you
> see.
They are straight rewrites from C code and are used to compare how gcc and
OCaml stack up.
> > This was compiled by ocamlopt -inline 100 -unsafe,
>
> You should use Array.unsafe_get and _set rather than the command-line
> option -unsafe because the latter breaks correct code.
This option is not affecting the execution speed in my case and thus can be
dropped.
> > What I wonder is why vector-to-vector add is so much faster than
> > (constant) scalar to vector add. Vectors are preinitialized each time
> > with a 1.0000, 1.0001, ... sequence.
>
> This is also highly likely to be platform and architecture dependent.
The benchmark was run on the same machine and on the same day as the C code it
was rewritten from :)
> > (* generic scalar operation *)
> > let op1 op const nloop =
> > let accum = ref start in
> > for i = 1 to nloop do
> > accum := op !accum const
> > done
>
> You probably meant to return "!accum".
The return value is ignored anyway. It's a benchmark, noone cares what
the result is. Or is it for performance reasons??
> > (* generic vector operation *)
> > let op2 op const a b (nloop : int) =
> > let len = Array.length a in
> > for j = 0 to len-1 do
> > for i = 0 to len-1 do
> > b.(i) <- op a.(i) b.(i)
> > done;
> > done
> >
> > (** addition **)
> > let add1 nloop =
> > let accum = ref start in
> > for i = 1 to nloop do
> > accum := !accum +. addconst
> > done
>
> Again, should probably return "!accum". However, you can encourage OCaml to
> unbox the float by returning "!accum + 0.0" instead.
OK, I don't quite get it. Are you talking about what the function should
return? If so, are you implying that the function body will be compiled
differently (better?) if a different type is returned?
> > let add2 = op1 ( +. ) addconst
>
> This should be slower than "add1".
But shouldn't. The way I write the code in this case should not affect the
assembly the least bit. The loop with explicit operand, functional recursion
and generic function-as-an-argument approach should all generate same
assembly. Obviously enough they don't :)
> > let add3 a b nloop =
> > let len = Array.length a in
> > for j = 0 to len-1 do
> > for i = 0 to len-1 do
> > b.(i) <- a.(i) +. addconst
> > done;
> > done
>
> The loop over "j" can be removed.
Well, the goal was to iterate Array.length^2 times :)
> > let add4 = op2 ( +. ) addconst
>
> This will be slow because "op2" is polymorphic and "+." will not be
> inlined.
Yeah, I see that, but that shouldn't be the case if OCaml were to be serious
in that department :)
> > let add5 a b nloop =
> > let len = Array.length a in
> > for j = 0 to len-1 do
> > for i = 0 to len-1 do
> > b.(i) <- a.(i) +. b.(i)
> > done;
> > done
>
> This increments "b" by "a" many times. Replace the repeated adds with a
> single multiply for each element.
It's benchmark code. It's supposed to check performance of
increment-vector-by-a-vector operation. I was hoping it would be obvious
enough :(
Cheers, Kuba
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-07 13:48 ` Kuba Ober
@ 2008-01-07 14:41 ` Jon Harrop
2008-01-07 15:22 ` Kuba Ober
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Jon Harrop @ 2008-01-07 14:41 UTC (permalink / raw)
To: caml-list
On Monday 07 January 2008 13:48:40 Kuba Ober wrote:
> On Saturday 05 January 2008, Jon Harrop wrote:
> > Optimizing numerical code is discussed in detail in my book OCaml for
> > Scientists. You may also be interested in a very similar thread where I
> > optimized someone's almost identical code on the beginners list recently.
> > There is also this relevant blog post of mine:
> >
> > http://ocamlnews.blogspot.com/2007/09/spectral-norm.html
> >
> > Essentially, your benchmark has rediscovered the fact that abstractions
> > (HOFs, polymorphism etc.) are prohibitively slow for high-performance
> > numerics and must be avoided.
>
> In the case of the particular OCaml implementation - yes. In general - no.
> I hope we agree here.
You mean it might be possible to recover the performance of C from numerical
code with high-level abstractions? Yes. Indeed, I would like to see this
done. However, I've never heard of an implementation of any language that can
do this.
> > On Thursday 03 January 2008 16:28:30 Kuba Ober wrote:
> > > I haven't looked at assembly output yet, but I've run into some
> > > unexpected behavior in my benchmarks.
> >
> > Your benchmarks aren't sufficiently well defined to convey information
> > about anything specific, so you're highly likely to misinterpret what you
> > see.
>
> They are straight rewrites from C code and are used to compare how gcc and
> OCaml stack up.
I assume you didn't mimic run-time polymorphism, currying, closures and
higher-order functions in the C though?
> > > This was compiled by ocamlopt -inline 100 -unsafe,
> >
> > You should use Array.unsafe_get and _set rather than the command-line
> > option -unsafe because the latter breaks correct code.
>
> This option is not affecting the execution speed in my case and thus can be
> dropped.
I find that surprising.
> > > What I wonder is why vector-to-vector add is so much faster than
> > > (constant) scalar to vector add. Vectors are preinitialized each time
> > > with a 1.0000, 1.0001, ... sequence.
> >
> > This is also highly likely to be platform and architecture dependent.
>
> The benchmark was run on the same machine and on the same day as the C code
> it was rewritten from :)
OCaml is only likely to compete with C in terms of numerical performance on
modern CPUs, e.g. AMD64. If you're using an old CPU or running in 32-bit then
OCaml can be up to 4x slower.
> > > (* generic scalar operation *)
> > > let op1 op const nloop =
> > > let accum = ref start in
> > > for i = 1 to nloop do
> > > accum := op !accum const
> > > done
> >
> > You probably meant to return "!accum".
>
> The return value is ignored anyway. It's a benchmark, noone cares what
> the result is.
If you do not care what the result of calls to this function are then you must
also not care how quickly it runs. I cannot overemphasize how important this
distinction is.
> Or is it for performance reasons??
In this case, I believe that will not affect performance because "accum" will
always be boxed because it is polymorphic.
> > > (* generic vector operation *)
> > > let op2 op const a b (nloop : int) =
> > > let len = Array.length a in
> > > for j = 0 to len-1 do
> > > for i = 0 to len-1 do
> > > b.(i) <- op a.(i) b.(i)
> > > done;
> > > done
> > >
> > > (** addition **)
> > > let add1 nloop =
> > > let accum = ref start in
> > > for i = 1 to nloop do
> > > accum := !accum +. addconst
> > > done
> >
> > Again, should probably return "!accum". However, you can encourage OCaml
> > to unbox the float by returning "!accum + 0.0" instead.
>
> OK, I don't quite get it. Are you talking about what the function should
> return?
Yes. Your "add1" function is entirely dead code and, therefore, is trivially
reducible. This is not true of real programs and, therefore, your results
will not be representative of real programs. Moreover, there is no sense in
trying to optimize a program that does nothing anyway.
> If so, are you implying that the function body will be compiled
> differently (better?) if a different type is returned?
The change of type isn't the reason for the performance improvement. If you
return "!accum" here, so "add1" returns "float" then the performance will
probably not change. However, if you then change the return expression
to "!accum + 0.0" then you're likely to see a significant improvement in
performance because the OCaml compiler will unbox the floating point
value "!accum" throughout this function, avoiding costly allocations.
You can't appreciate any of that without fixing your function first though.
> > > let add2 = op1 ( +. ) addconst
> >
> > This should be slower than "add1".
>
> But shouldn't. The way I write the code in this case should not affect the
> assembly the least bit. The loop with explicit operand, functional
> recursion and generic function-as-an-argument approach should all generate
> same assembly. Obviously enough they don't :)
The simplest route to recovering C performance here is:
. Inline "( +. )".
. Inline "op1".
. Type-specialize "op1".
. Hoist bounds checks.
> > > let add3 a b nloop =
> > > let len = Array.length a in
> > > for j = 0 to len-1 do
> > > for i = 0 to len-1 do
> > > b.(i) <- a.(i) +. addconst
> > > done;
> > > done
> >
> > The loop over "j" can be removed.
>
> Well, the goal was to iterate Array.length^2 times :)
Again, that makes your benchmark trivially reducible and, consequently, your
results cannot be related to real code.
> > > let add4 = op2 ( +. ) addconst
> >
> > This will be slow because "op2" is polymorphic and "+." will not be
> > inlined.
>
> Yeah, I see that, but that shouldn't be the case if OCaml were to be
> serious in that department :)
AFAIK, there are no languages with implementations that are "serious in that
department" by this definition, although I agree that it would be very useful
in practice.
There are some trade-offs here though. Stalin-compiled Scheme, MLton-compiled
SML and GHC-compiled Haskell do a lot more than OCaml in theory but, in
practice, they are much less useful for high-performance numerics because
their performance is so unpredictable.
> > > let add5 a b nloop =
> > > let len = Array.length a in
> > > for j = 0 to len-1 do
> > > for i = 0 to len-1 do
> > > b.(i) <- a.(i) +. b.(i)
> > > done;
> > > done
> >
> > This increments "b" by "a" many times. Replace the repeated adds with a
> > single multiply for each element.
>
> It's benchmark code. It's supposed to check performance of
> increment-vector-by-a-vector operation. I was hoping it would be obvious
> enough :(
Good benchmark code should not be trivially reducible in ways that real
programs aren't, i.e. unless you're specifically testing the compiler's
dead-code elimination. As you've written these functions, there is a lot of
dead code. Some compilers (particularly C/C++) might eliminate the dead code
and look better on this benchmark as a consequence but that is of little
practical relevance.
I've already spent a long time meticulously contructing benchmarks in various
different languages along these lines and the most important optimizations
missing from OCaml are:
. Hoisting bounds checks.
. Common subexpression elimination.
. Type specialization.
. More aggressive inlining.
. Static optimization of div and mod.
The nice thing about OCaml is that it has a superb 64-bit code generator so,
once you've done those optimizations on the hot paths manually, OCaml lets
you run code as quickly as C but with all of the high-level benefits
everywhere else.
A perfect language implementation would certainly do these optimizations for
you and I wish OCaml did but I still haven't found anything better.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-07 14:41 ` Jon Harrop
@ 2008-01-07 15:22 ` Kuba Ober
2008-01-07 19:58 ` Jon Harrop
2008-01-07 15:31 ` Christophe Raffalli
2008-01-07 17:00 ` Jacques Carette
2 siblings, 1 reply; 19+ messages in thread
From: Kuba Ober @ 2008-01-07 15:22 UTC (permalink / raw)
To: caml-list
On Monday 07 January 2008, Jon Harrop wrote:
> On Monday 07 January 2008 13:48:40 Kuba Ober wrote:
> > On Saturday 05 January 2008, Jon Harrop wrote:
> > > Optimizing numerical code is discussed in detail in my book OCaml for
> > > Scientists. You may also be interested in a very similar thread where I
> > > optimized someone's almost identical code on the beginners list
> > > recently. There is also this relevant blog post of mine:
> > >
> > > http://ocamlnews.blogspot.com/2007/09/spectral-norm.html
> > >
> > > Essentially, your benchmark has rediscovered the fact that abstractions
> > > (HOFs, polymorphism etc.) are prohibitively slow for high-performance
> > > numerics and must be avoided.
> >
> > In the case of the particular OCaml implementation - yes. In general -
> > no. I hope we agree here.
>
> You mean it might be possible to recover the performance of C from
> numerical code with high-level abstractions? Yes. Indeed, I would like to
> see this done. However, I've never heard of an implementation of any
> language that can do this.
g++ does that reasonably well. Heck, I have a severely hacked in-house LISP
system which had grown out of an assembler-with-LISP-macros, and it does it
all the time. It's a LISP with static types at runtime, but general LISPiness
at compile time (macros just run in gcl). In fact I'm looking to port it to
OCaml just because type inference and commonplace LISP syntax for type
declarations don't mix too well - the code starts looking really ugly. Maybe
I stuck too much with LISP's usual declaim-this-and-that approach, but that's
what you get when you reuse most of underlying LISP's implementation.
> > They are straight rewrites from C code and are used to compare how gcc
> > and OCaml stack up.
>
> I assume you didn't mimic run-time polymorphism, currying, closures and
> higher-order functions in the C though?
No. But while those are very useful features to have, the code is messy if
you've got two languages to work with (hi-perf stuff in C, everything else in
OCaml).
> > > > (* generic scalar operation *)
> > > > let op1 op const nloop =
> > > > let accum = ref start in
> > > > for i = 1 to nloop do
> > > > accum := op !accum const
> > > > done
> > >
> > > You probably meant to return "!accum".
> >
> > The return value is ignored anyway. It's a benchmark, noone cares what
> > the result is.
>
> If you do not care what the result of calls to this function are then you
> must also not care how quickly it runs. I cannot overemphasize how
> important this distinction is.
I assumed (correctly) that OCaml's optimizer isn't all that clever. This is
something you definitely have to watch for when using gcc/g++.
> > > > (** addition **)
> > > > let add1 nloop =
> > > > let accum = ref start in
> > > > for i = 1 to nloop do
> > > > accum := !accum +. addconst
> > > > done
> > >
> > > Again, should probably return "!accum". However, you can encourage
> > > OCaml to unbox the float by returning "!accum + 0.0" instead.
> >
> > OK, I don't quite get it. Are you talking about what the function should
> > return?
>
> Yes. Your "add1" function is entirely dead code and, therefore, is
> trivially reducible. This is not true of real programs and, therefore, your
> results will not be representative of real programs. Moreover, there is no
> sense in trying to optimize a program that does nothing anyway.
If you code such a thing in gcc, it correctly becomes a no-op. OCaml isn't all
that clever ;)
You're 100% academically-correct of course.
> > If so, are you implying that the function body will be compiled
> > differently (better?) if a different type is returned?
>
> The change of type isn't the reason for the performance improvement. If you
> return "!accum" here, so "add1" returns "float" then the performance will
> probably not change. However, if you then change the return expression
> to "!accum + 0.0" then you're likely to see a significant improvement in
> performance because the OCaml compiler will unbox the floating point
> value "!accum" throughout this function, avoiding costly allocations.
Ah-ha, I have to try that!
> > > > let add2 = op1 ( +. ) addconst
> > >
> > > This should be slower than "add1".
> >
> > But shouldn't. The way I write the code in this case should not affect
> > the assembly the least bit. The loop with explicit operand, functional
> > recursion and generic function-as-an-argument approach should all
> > generate same assembly. Obviously enough they don't :)
>
> The simplest route to recovering C performance here is:
>
> . Inline "( +. )".
> . Inline "op1".
> . Type-specialize "op1".
> . Hoist bounds checks.
You mean "the programmer has to do this"? OCaml compiler is *really* bad,
then.
> > > > let add4 = op2 ( +. ) addconst
> > >
> > > This will be slow because "op2" is polymorphic and "+." will not be
> > > inlined.
> >
> > Yeah, I see that, but that shouldn't be the case if OCaml were to be
> > serious in that department :)
>
> AFAIK, there are no languages with implementations that are "serious in
> that department" by this definition, although I agree that it would be very
> useful in practice.
I have a very broken inlining statically-typed LISP implementation which does
that all the time. For code which runs on 8-bit hardware and is better than
stock C compiler that came with it (Zilog's Encore! platform). It's really
not all that hard to do. It's hardly a full-fledged implementation, and it's
buggy, but it's only goal is to compile some very specific codebase. A hack,
but it works, and I write everything functional-style, with functions passed
left and right, etc. In fact, the compiler guarantees that a stateless
function, if passed as an argument, is equivalent to coding it in-line. It's
all DSP code.
> There are some trade-offs here though. Stalin-compiled Scheme,
> MLton-compiled SML and GHC-compiled Haskell do a lot more than OCaml in
> theory but, in practice, they are much less useful for high-performance
> numerics because their performance is so unpredictable.
Good to know.
> I've already spent a long time meticulously contructing benchmarks in
> various different languages along these lines and the most important
> optimizations missing from OCaml are:
>
> . Hoisting bounds checks.
> . Common subexpression elimination.
> . Type specialization.
> . More aggressive inlining.
> . Static optimization of div and mod.
>
> The nice thing about OCaml is that it has a superb 64-bit code generator
> so, once you've done those optimizations on the hot paths manually, OCaml
> lets you run code as quickly as C but with all of the high-level benefits
> everywhere else.
>
> A perfect language implementation would certainly do these optimizations
> for you and I wish OCaml did but I still haven't found anything better.
After I gain some more experience I will hopefully add an OCaml-esque front
end to my hackish compiler. It's completely unportable, only works on
12-bit-PIC-like architecture (SX from Parallax) and on Z8 Encore!, but hey,
maybe it can be a good starting point to something bigger and better. The
backend still has to be ported from Lisp, and I'm too time- and
knowledge-constrained to do it just yet.
The code is 70% Lisp macros, reuses every bit of underlying LISP platform
it can, and as of yet I don't know how clean the ported OCaml would look.
Needless to say, I'd never get this done without having the macros available,
as in "this would take me forever in C/C++". The way it currenly runs is as
follows: the source code is run via a LISP program which creates another LISP
program which then generates the linked binary to be downloaded to the
target.
I was originally hoping to reuse most of OCaml's compiler/optimizer and just
write a platform-specific backend. It now seems that it's easier for me to
add a hacked OCaml front-end to my existing system. Probably it will just
translate OCaml to LISP and then process it as it originally was. I really
like OCaml syntax better, *although* some of my sources also use LISP macros.
They could probably stay as-is for now if the OCaml parts will just get
translated to LISP.
Cheers, Kuba
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-07 14:41 ` Jon Harrop
2008-01-07 15:22 ` Kuba Ober
@ 2008-01-07 15:31 ` Christophe Raffalli
2008-01-07 17:00 ` Jacques Carette
2 siblings, 0 replies; 19+ messages in thread
From: Christophe Raffalli @ 2008-01-07 15:31 UTC (permalink / raw)
To: Jon Harrop; +Cc: caml-list
[-- Attachment #1: Type: text/plain, Size: 990 bytes --]
>
> You mean it might be possible to recover the performance of C from numerical
> code with high-level abstractions? Yes. Indeed, I would like to see this
> done. However, I've never heard of an implementation of any language that can
> do this.
>
That's what MLTon tries to do ... I do not know how much they succeed, but monomorphisation and
defunctorization shoud do the job better than OCaml (look at the specific numerical benchmark for
MLTon ?)
--
Christophe Raffalli
Universite de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex
tel: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution
can check this signature. The public key is
stored on www.keyserver.net
---------------------------------------------
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-07 14:41 ` Jon Harrop
2008-01-07 15:22 ` Kuba Ober
2008-01-07 15:31 ` Christophe Raffalli
@ 2008-01-07 17:00 ` Jacques Carette
2008-01-07 17:07 ` Till Varoquaux
2008-01-07 17:31 ` Kuba Ober
2 siblings, 2 replies; 19+ messages in thread
From: Jacques Carette @ 2008-01-07 17:00 UTC (permalink / raw)
To: Jon Harrop; +Cc: caml-list
Jon Harrop wrote:
> You mean it might be possible to recover the performance of C from numerical
> code with high-level abstractions? Yes. Indeed, I would like to see this
> done. However, I've never heard of an implementation of any language that can
> do this.
>
<shameless plug>
With MetaOCaml you can -- see either the long version
http://www.cas.mcmaster.ca/~carette/scp_metamonads.pdf
or the more condensed version
http://www.cas.mcmaster.ca/~carette/metamonads/index.html
</shameless plug>
With a little bit of work, you can achieve all of
> The simplest route to recovering C performance here is:
>
> . Inline "( +. )".
> . Inline "op1".
> . Type-specialize "op1".
> . Hoist bounds checks.
>
automatically. There are three drawbacks:
1) the code you write no longer looks like O'Caml but Lisp instead [can
be fixed with enough campl4 hacking]
2) the error messages can be very difficult to figure out [could be
improved a lot if monads were integrated in O'Caml]
3) metaocaml is not as well supported as ocaml
Jacques
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-07 17:00 ` Jacques Carette
@ 2008-01-07 17:07 ` Till Varoquaux
2008-01-07 17:20 ` Jacques Carette
2008-01-07 17:31 ` Kuba Ober
1 sibling, 1 reply; 19+ messages in thread
From: Till Varoquaux @ 2008-01-07 17:07 UTC (permalink / raw)
To: Jacques Carette; +Cc: Jon Harrop, caml-list
First link is dead link... which a shame because any article with
metaocaml monads and oleg is bound to be very interesting..
Till
On Jan 7, 2008 5:00 PM, Jacques Carette <carette@mcmaster.ca> wrote:
> Jon Harrop wrote:
> > You mean it might be possible to recover the performance of C from numerical
> > code with high-level abstractions? Yes. Indeed, I would like to see this
> > done. However, I've never heard of an implementation of any language that can
> > do this.
> >
> <shameless plug>
> With MetaOCaml you can -- see either the long version
> http://www.cas.mcmaster.ca/~carette/scp_metamonads.pdf
> or the more condensed version
> http://www.cas.mcmaster.ca/~carette/metamonads/index.html
> </shameless plug>
>
> With a little bit of work, you can achieve all of
> > The simplest route to recovering C performance here is:
> >
> > . Inline "( +. )".
> > . Inline "op1".
> > . Type-specialize "op1".
> > . Hoist bounds checks.
> >
> automatically. There are three drawbacks:
> 1) the code you write no longer looks like O'Caml but Lisp instead [can
> be fixed with enough campl4 hacking]
> 2) the error messages can be very difficult to figure out [could be
> improved a lot if monads were integrated in O'Caml]
> 3) metaocaml is not as well supported as ocaml
>
> Jacques
>
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
--
http://till-varoquaux.blogspot.com/
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-07 17:07 ` Till Varoquaux
@ 2008-01-07 17:20 ` Jacques Carette
0 siblings, 0 replies; 19+ messages in thread
From: Jacques Carette @ 2008-01-07 17:20 UTC (permalink / raw)
To: Till Varoquaux; +Cc: caml-list
Oops, sorry, the link should have been
http://www.cas.mcmaster.ca/~carette/publications/scp_metamonads.pdf
Jacques
Till Varoquaux wrote:
> First link is dead link... which a shame because any article with
> metaocaml monads and oleg is bound to be very interesting..
> Till
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-07 17:00 ` Jacques Carette
2008-01-07 17:07 ` Till Varoquaux
@ 2008-01-07 17:31 ` Kuba Ober
1 sibling, 0 replies; 19+ messages in thread
From: Kuba Ober @ 2008-01-07 17:31 UTC (permalink / raw)
To: caml-list
On Monday 07 January 2008, Jacques Carette wrote:
> Jon Harrop wrote:
> > You mean it might be possible to recover the performance of C from
> > numerical code with high-level abstractions? Yes. Indeed, I would like to
> > see this done. However, I've never heard of an implementation of any
> > language that can do this.
>
> <shameless plug>
> With MetaOCaml you can -- see either the long version
> http://www.cas.mcmaster.ca/~carette/scp_metamonads.pdf
> or the more condensed version
> http://www.cas.mcmaster.ca/~carette/metamonads/index.html
> </shameless plug>
Yup, that's what my homegrown "compiler" does, although it is all LISP, and
not really nicely structured - it has grown out of an assembler with LISP
macros. Eventually the macros became powerful enough to do register
allocation and optimizations, it was a short road to getting the "assembler"
to dig generic expression input in LISP syntax too. Function inlining
and "hoisting" are mostly trivial syntax tree operations -- once you have the
predicates written to tell you whether it's safe to optimize in a given way.
From day 1 it was a LISP-syntax assembler, of course. Which may look funny,
but I never had to write a lexer for it, and "parsing" LISP isn't much :) In
about 12klocs, for my application it generates assembly mostly on par with
what I could ever write - 95% is DSP operations in fixed point (matrix by
vector multiplies, vector adds, MACs (FIR/IIR)). It also handles saturations
where necessary - I've added a "saturated integer" type which gracefully
saturates on overflow. Kinda necessary in control loops.
To perform well, some of my applications can't even be coded in C unless you'd
use some (AFAIK nonexistent) C-extensions, or resort to relatively unclean
techniques like inlineable assembly functions. That's what gcc digs very
well, but it becomes mostly unmaintainable crap for someone who only knows a
higher-level language and assembly, but has no gcc experience. Most languages
are also perfect at hiding the processor's state/flags register from you, and
it's nigh impossible to expect the optimizer to deduce that all you want to
do is long addition or multiplication. I can see (if it gets there) my OCaml
front-end by default having the result type of + being an (int, bool) tuple
that coerces to int by default, with the bool being the carry status. Thank
$deity almost all CPUs know what carry means, so this could be pretty
portable if I were after that -- which I'm not.
I gave up on working with gcc codebase mostly because C is a very unexpressive
language for anything scientific-related (I wholeheartedly agree with Jon
Harrop here). And I don't like the C++ "bolted-on" functional template
metaprogramming either - it reads like a Turing-machine program to me. It is
pretty nice in theory, but in practice I found it distracting. And the error
messages are a yet another story.
But enough rambling, I have some thinking to do :)
Cheers, Kuba
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-07 15:22 ` Kuba Ober
@ 2008-01-07 19:58 ` Jon Harrop
2008-01-08 14:20 ` Kuba Ober
0 siblings, 1 reply; 19+ messages in thread
From: Jon Harrop @ 2008-01-07 19:58 UTC (permalink / raw)
To: caml-list
On Monday 07 January 2008 15:22:40 Kuba Ober wrote:
> On Monday 07 January 2008, Jon Harrop wrote:
> > You mean it might be possible to recover the performance of C from
> > numerical code with high-level abstractions? Yes. Indeed, I would like to
> > see this done. However, I've never heard of an implementation of any
> > language that can do this.
>
> g++ does that reasonably well. Heck, I have a severely hacked in-house LISP
> system which had grown out of an assembler-with-LISP-macros, and it does it
> all the time. It's a LISP with static types at runtime, but general
> LISPiness at compile time (macros just run in gcl). In fact I'm looking to
> port it to OCaml just because type inference and commonplace LISP syntax
> for type declarations don't mix too well - the code starts looking really
> ugly. Maybe I stuck too much with LISP's usual declaim-this-and-that
> approach, but that's what you get when you reuse most of underlying LISP's
> implementation.
Yes. If you're willing to do that kind of hacking then you could get a long
way with camlp4 without too much trouble.
> > I assume you didn't mimic run-time polymorphism, currying, closures and
> > higher-order functions in the C though?
>
> No. But while those are very useful features to have, the code is messy if
> you've got two languages to work with (hi-perf stuff in C, everything else
> in OCaml).
On 64-bit there is no performance benefit in dropping to C.
> > Yes. Your "add1" function is entirely dead code and, therefore, is
> > trivially reducible. This is not true of real programs and, therefore,
> > your results will not be representative of real programs. Moreover, there
> > is no sense in trying to optimize a program that does nothing anyway.
>
> If you code such a thing in gcc, it correctly becomes a no-op. OCaml isn't
> all that clever ;)
>
> You're 100% academically-correct of course.
As Xavier says, "the OCaml compiler is designed to compile good code".
> > The simplest route to recovering C performance here is:
> >
> > . Inline "( +. )".
> > . Inline "op1".
> > . Type-specialize "op1".
> > . Hoist bounds checks.
>
> You mean "the programmer has to do this"? OCaml compiler is *really* bad,
> then.
You're 100% academically-correct of course.
In practice, the combination of OCaml's awesome native-code generation on
AMD64 and the ease with which you can work around those missing optimizations
mean that OCaml is *really* good compared to all other language
implementations with comparable qualities. This is precisely why it is so
popular.
> > There are some trade-offs here though. Stalin-compiled Scheme,
> > MLton-compiled SML and GHC-compiled Haskell do a lot more than OCaml in
> > theory but, in practice, they are much less useful for high-performance
> > numerics because their performance is so unpredictable.
>
> Good to know.
OCaml's code generator is also much better than MLton's and GHC's (even 6.8).
> > I've already spent a long time meticulously contructing benchmarks in
> > various different languages along these lines and the most important
> > optimizations missing from OCaml are:
> >
> > . Hoisting bounds checks.
> > . Common subexpression elimination.
> > . Type specialization.
> > . More aggressive inlining.
> > . Static optimization of div and mod.
> >
> > The nice thing about OCaml is that it has a superb 64-bit code generator
> > so, once you've done those optimizations on the hot paths manually, OCaml
> > lets you run code as quickly as C but with all of the high-level benefits
> > everywhere else.
> >
> > A perfect language implementation would certainly do these optimizations
> > for you and I wish OCaml did but I still haven't found anything better.
>
> After I gain some more experience I will hopefully add an OCaml-esque front
> end to my hackish compiler. It's completely unportable, only works on
> 12-bit-PIC-like architecture (SX from Parallax) and on Z8 Encore!, but hey,
> maybe it can be a good starting point to something bigger and better. The
> backend still has to be ported from Lisp, and I'm too time- and
> knowledge-constrained to do it just yet.
> ...
I see. If you're not using OCaml as a general purpose language then you don't
need its other benefits (e.g. GUI libraries).
I'm in a similar situation. I'm toying with the idea of building a better FPL
implementation for high-performance numerics that is commerce friendly using
LLVM. If you take the path of least resistance then I believe that is
tractable but you lose nice things like LablGTK2...
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-07 19:58 ` Jon Harrop
@ 2008-01-08 14:20 ` Kuba Ober
2008-01-12 14:22 ` Jon Harrop
0 siblings, 1 reply; 19+ messages in thread
From: Kuba Ober @ 2008-01-08 14:20 UTC (permalink / raw)
To: caml-list
On Monday 07 January 2008, Jon Harrop wrote:
> On Monday 07 January 2008 15:22:40 Kuba Ober wrote:
> > On Monday 07 January 2008, Jon Harrop wrote:
> > > You mean it might be possible to recover the performance of C from
> > > numerical code with high-level abstractions? Yes. Indeed, I would like
> > > to see this done. However, I've never heard of an implementation of any
> > > language that can do this.
> >
> > g++ does that reasonably well. Heck, I have a severely hacked in-house
> > LISP system which had grown out of an assembler-with-LISP-macros, and it
> > does it all the time. It's a LISP with static types at runtime, but
> > general LISPiness at compile time (macros just run in gcl). In fact I'm
> > looking to port it to OCaml just because type inference and commonplace
> > LISP syntax for type declarations don't mix too well - the code starts
> > looking really ugly. Maybe I stuck too much with LISP's usual
> > declaim-this-and-that approach, but that's what you get when you reuse
> > most of underlying LISP's implementation.
>
> Yes. If you're willing to do that kind of hacking then you could get a long
> way with camlp4 without too much trouble.
>
> > > I assume you didn't mimic run-time polymorphism, currying, closures and
> > > higher-order functions in the C though?
> >
> > No. But while those are very useful features to have, the code is messy
> > if you've got two languages to work with (hi-perf stuff in C, everything
> > else in OCaml).
>
> On 64-bit there is no performance benefit in dropping to C.
>
> > > Yes. Your "add1" function is entirely dead code and, therefore, is
> > > trivially reducible. This is not true of real programs and, therefore,
> > > your results will not be representative of real programs. Moreover,
> > > there is no sense in trying to optimize a program that does nothing
> > > anyway.
> >
> > If you code such a thing in gcc, it correctly becomes a no-op. OCaml
> > isn't all that clever ;)
> >
> > You're 100% academically-correct of course.
>
> As Xavier says, "the OCaml compiler is designed to compile good code".
>
> > > The simplest route to recovering C performance here is:
> > >
> > > . Inline "( +. )".
> > > . Inline "op1".
> > > . Type-specialize "op1".
> > > . Hoist bounds checks.
> >
> > You mean "the programmer has to do this"? OCaml compiler is *really* bad,
> > then.
>
> You're 100% academically-correct of course.
>
> In practice, the combination of OCaml's awesome native-code generation on
> AMD64 and the ease with which you can work around those missing
> optimizations mean that OCaml is *really* good compared to all other
> language
> implementations with comparable qualities. This is precisely why it is so
> popular.
>
> > > There are some trade-offs here though. Stalin-compiled Scheme,
> > > MLton-compiled SML and GHC-compiled Haskell do a lot more than OCaml in
> > > theory but, in practice, they are much less useful for high-performance
> > > numerics because their performance is so unpredictable.
> >
> > Good to know.
>
> OCaml's code generator is also much better than MLton's and GHC's (even
> 6.8).
>
> > > I've already spent a long time meticulously contructing benchmarks in
> > > various different languages along these lines and the most important
> > > optimizations missing from OCaml are:
> > >
> > > . Hoisting bounds checks.
> > > . Common subexpression elimination.
> > > . Type specialization.
> > > . More aggressive inlining.
> > > . Static optimization of div and mod.
> > >
> > > The nice thing about OCaml is that it has a superb 64-bit code
> > > generator so, once you've done those optimizations on the hot paths
> > > manually, OCaml lets you run code as quickly as C but with all of the
> > > high-level benefits everywhere else.
> > >
> > > A perfect language implementation would certainly do these
> > > optimizations for you and I wish OCaml did but I still haven't found
> > > anything better.
> >
> > After I gain some more experience I will hopefully add an OCaml-esque
> > front end to my hackish compiler. It's completely unportable, only works
> > on 12-bit-PIC-like architecture (SX from Parallax) and on Z8 Encore!, but
> > hey, maybe it can be a good starting point to something bigger and
> > better. The backend still has to be ported from Lisp, and I'm too time-
> > and
> > knowledge-constrained to do it just yet.
> > ...
>
> I see. If you're not using OCaml as a general purpose language then you
> don't need its other benefits (e.g. GUI libraries).
>
> I'm in a similar situation. I'm toying with the idea of building a better
> FPL implementation for high-performance numerics that is commerce friendly
> using LLVM. If you take the path of least resistance then I believe that is
> tractable but you lose nice things like LablGTK2...
If you can do some code generation, it shouldn't be a big deal to implement
even some very complex ABIs, say interfacing to C++ libraries. As long as you
can cleanly run your language at compile time (like lisp does), and as long
as the compiler provides a documented way to pop assembly into the code, you
can have a nice language that can, in "natively compiled" output, interface
say with Qt. IMHO, the latter is now a few years ahead of GTK, and is only
gaining the advantage as time passes. Languages such as OCaml sorely lack in
nontrivial ABI department - there's no reason to have to write (or even
generate) C-style binding code for say Qt just to use it in OCaml. Both
bytecode compiler and native code compiler should have a pluggable
ABI-generator scheme where they can interface with C++ (at least). Another
way to go about it would be to machine-translate C++ libraries to OCaml. This
should be doable for Qt, but I don't think I'm up for that one just yet, even
though I think about it.
If you want to generate code for a VM, the backend should allow selectable
VMs - at least CLR and JVM, if the thing is to be of any relevance.
The language should, IMHO, also be flexible enough that features could be
reasonably added to have it support resource-constrained architectures.
I'd expect the language to have a core feature set which doesn't do automated
memory management beyond what can be inferred for fixed-lifetime data. This
would be akin to working with good old C++, and should be easily portable to
tiny targets. Then you have "level 2" feature set which is supposed to use
garbage collecting virtual machines. I'd even add something like "level 1.5"
which uses say C++ ABI but also links-in a garbage collector. So that you get
automated memory management but no VM per se. Heck, for all I care you could
have that thing emit code in C++ ABI by default - it's a reasonable-enough
choice.
I find it annoying that writing good "PC" code requires a wholly different
toolset from writing embedded code, the latter usually limited to really
broken vendor C (or sometimes C++).
Cheers, Kuba
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-08 14:20 ` Kuba Ober
@ 2008-01-12 14:22 ` Jon Harrop
2008-01-12 16:18 ` Dario Teixeira
0 siblings, 1 reply; 19+ messages in thread
From: Jon Harrop @ 2008-01-12 14:22 UTC (permalink / raw)
To: caml-list
On Tuesday 08 January 2008 14:20:03 Kuba Ober wrote:
> If you can do some code generation, it shouldn't be a big deal to implement
> even some very complex ABIs, say interfacing to C++ libraries. As long as
> you can cleanly run your language at compile time (like lisp does), and as
> long as the compiler provides a documented way to pop assembly into the
> code, you can have a nice language that can, in "natively compiled" output,
> interface say with Qt.
Yes. This is one of the features that I would dearly love but it is also
another one of the features that doesn't count as research so you're never
likely to find it in a research implementation of a language like OCaml. F#
has a fantastic FFI in comparison, for example.
Perhaps the best solution for OCaml is to interface with a popular dynamic
language and borrow its bindings. I believe Richard Jones' Perl interface
does this, although I've never used it myself.
The obvious flaw is that you end up with a very fast compiled language with a
very slow interface boundary. That's fine for many cases, of course, but I'm
particularly interested in high-performance numerics and visualization which
really benefit from a high-performance FFI.
> IMHO, the latter is now a few years ahead of GTK, and is only gaining the
> advantage as time passes.
May I ask what features Qt has that GTK does not?
My only gripe with GTK is that it is very slow and, consequently, I always
seem to end up migrating my GUI code to OpenGL. I still think an OpenGL-based
GUI written in OCaml would be great...
> Languages such as OCaml
> sorely lack in nontrivial ABI department - there's no reason to have to
> write (or even generate) C-style binding code for say Qt just to use it in
> OCaml. Both bytecode compiler and native code compiler should have a
> pluggable ABI-generator scheme where they can interface with C++ (at
> least). Another way to go about it would be to machine-translate C++
> libraries to OCaml.
I'd be happy to interface with C and have no real preference about C++ myself.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-12 14:22 ` Jon Harrop
@ 2008-01-12 16:18 ` Dario Teixeira
2008-01-12 23:50 ` Jon Harrop
0 siblings, 1 reply; 19+ messages in thread
From: Dario Teixeira @ 2008-01-12 16:18 UTC (permalink / raw)
To: Jon Harrop, caml-list
Hi,
> > IMHO, the latter is now a few years ahead of GTK, and is only gaining the
> > advantage as time passes.
>
> May I ask what features Qt has that GTK does not?
Though some would argue this is a matter of taste, Qt feels like a
much more elegant API. And yes, feature-wise is also a far more
comprehensive library. It includes modules not only for the expected
GUI widgets, but also for database connectivity, XML processing,
network programming, easy integration with openGL, generation and
visualisation of SVG and PDF, etc, etc. Moreover, the various modules
are well integrated and go well together. To achieve the same degree
of functionality in Gtk-land, you need to mix in several independent
libraries (Gtk+Cairo+...), which not always feel like part of a coherent
whole.
You could of course argue that in the Ocaml world we have better solutions
for some of the modules present in Qt. Ocamlnet is top-notch, and the
facilities for XML processing (such as Cduce and allies) are so good you
probably will find the similarly-purposed Qt modules unnecessary.
Nevertheless, just the graphics facilities present in Qt would more
than justify Ocaml bindings.
Incidentally, the Haskell folks are working on bindings:
http://qthaskell.sourceforge.net/
Does Haskell's FFI make this an easier task than Ocaml's?
Cheers,
Dario Teixeira
___________________________________________________________
Support the World Aids Awareness campaign this month with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Caml-list] Performance questions, -inline, ...
2008-01-12 16:18 ` Dario Teixeira
@ 2008-01-12 23:50 ` Jon Harrop
0 siblings, 0 replies; 19+ messages in thread
From: Jon Harrop @ 2008-01-12 23:50 UTC (permalink / raw)
To: caml-list
On Saturday 12 January 2008 16:18:46 Dario Teixeira wrote:
> > > IMHO, the latter is now a few years ahead of GTK, and is only gaining
> > > the advantage as time passes.
> >
> > May I ask what features Qt has that GTK does not?
>
> Though some would argue this is a matter of taste, Qt feels like a
> much more elegant API. And yes, feature-wise is also a far more
> comprehensive library. It includes modules not only for the expected
> GUI widgets, but also for database connectivity, XML processing,
> network programming, easy integration with openGL, generation and
> visualisation of SVG and PDF, etc, etc. Moreover, the various modules
> are well integrated and go well together. To achieve the same degree
> of functionality in Gtk-land, you need to mix in several independent
> libraries (Gtk+Cairo+...), which not always feel like part of a coherent
> whole.
I see. That's very interesting.
> You could of course argue that in the Ocaml world we have better solutions
> for some of the modules present in Qt. Ocamlnet is top-notch, and the
> facilities for XML processing (such as Cduce and allies) are so good you
> probably will find the similarly-purposed Qt modules unnecessary.
> Nevertheless, just the graphics facilities present in Qt would more
> than justify Ocaml bindings.
I'm surprised to hear that. The last time I looked (a long time ago) I thought
OCaml's support for such things (and unicode) was not that good but we do
seem to be hearing about web-related successes with OCaml all the time and
they must require working implementations of these kinds of libraries. Gerd
always seems to be involved. ;-)
> Incidentally, the Haskell folks are working on bindings:
> http://qthaskell.sourceforge.net/
Yes. I noticed those Qt bindings for GHC being advertised recently but it
turns out they are pre-alpha release have no applications using them and no
Debian or Ubuntu packages providing them.
> Does Haskell's FFI make this an easier task than Ocaml's?
Haskell certainly has far more bindings to libraries than OCaml but it also
has far fewer working applications using those bindings, which makes me
suspicious. ;-)
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-01-12 23:57 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-03 16:28 Performance questions, -inline, Kuba Ober
2008-01-03 17:11 ` [Caml-list] " Edgar Friendly
2008-01-05 18:09 ` Kuba Ober
2008-01-05 18:44 ` Kuba Ober
2008-01-05 19:36 ` Jon Harrop
2008-01-05 20:31 ` Bünzli Daniel
2008-01-07 13:48 ` Kuba Ober
2008-01-07 14:41 ` Jon Harrop
2008-01-07 15:22 ` Kuba Ober
2008-01-07 19:58 ` Jon Harrop
2008-01-08 14:20 ` Kuba Ober
2008-01-12 14:22 ` Jon Harrop
2008-01-12 16:18 ` Dario Teixeira
2008-01-12 23:50 ` Jon Harrop
2008-01-07 15:31 ` Christophe Raffalli
2008-01-07 17:00 ` Jacques Carette
2008-01-07 17:07 ` Till Varoquaux
2008-01-07 17:20 ` Jacques Carette
2008-01-07 17:31 ` Kuba Ober
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox