* Odd performance result with HLVM
@ 2009-02-28 1:12 Jon Harrop
2009-02-28 20:18 ` [Caml-list] " Kuba Ober
0 siblings, 1 reply; 15+ messages in thread
From: Jon Harrop @ 2009-02-28 1:12 UTC (permalink / raw)
To: caml-list
I am developing a high-level virtual machine built upon LLVM and written in
OCaml that uses JIT compilation to execute OCaml-like code at break-neck
speeds.
I just stumbled upon a weird performance result: compiling my VM with ocamlc
and ocamlopt produces very different benchmark results for the performance of
the generated code (which should be identical). For example, my float-based
Fibonacci function becomes 70% slower if I use ocamlopt and some other
float-based functions also see big performance drops.
What is ocamlopt doing that might explain this? I assume it is fiddling with
the floating point state somehow...
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-02-28 1:12 Odd performance result with HLVM Jon Harrop
@ 2009-02-28 20:18 ` Kuba Ober
2009-02-28 21:21 ` Richard Jones
2009-02-28 21:52 ` Jon Harrop
0 siblings, 2 replies; 15+ messages in thread
From: Kuba Ober @ 2009-02-28 20:18 UTC (permalink / raw)
To: caml-list
> I am developing a high-level virtual machine built upon LLVM and
> written in
> OCaml that uses JIT compilation to execute OCaml-like code at break-
> neck
> speeds.
>
> I just stumbled upon a weird performance result: compiling my VM
> with ocamlc
> and ocamlopt produces very different benchmark results for the
> performance of
> the generated code (which should be identical). For example, my
> float-based
> Fibonacci function becomes 70% slower if I use ocamlopt and some other
> float-based functions also see big performance drops.
>
> What is ocamlopt doing that might explain this? I assume it is
> fiddling with
> the floating point state somehow...
You didn't let us in on how it really works. You said "high-level
virtual machine
built upon LLVM and written in OCaml". LLVM means too many things to be
able to decipher what you mean, and your statement is too general.
I'm assuming, but that's forced, so don't shoot if I make an asinus
out of myself ;)
So, it's a VM and it runs native jit-ted code like say .net would. So
presumably you
have some OCaml code that then invokes jit and some native functions
to dispatch
to jit-ted code? Do you interpret any bytecode, or always compile? Do
you even
have to have any OCaml code running in the process where the jit-ted
code runs in?
I presume you use the LLVM infrastructure to do the jit-ting, but
without knowing
exactly what code runs in the process space of the application, it's
hard to tell
what's going on.
There's no floating point state to change that would slow things up
that much.
At least I have never seen anything like that. Maybe the FP exceptions
are being
fired somehow? You can always set a breakpoint in exception handler(s).
Cheers, Kuba
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-02-28 20:18 ` [Caml-list] " Kuba Ober
@ 2009-02-28 21:21 ` Richard Jones
2009-02-28 21:59 ` Jon Harrop
2009-02-28 21:52 ` Jon Harrop
1 sibling, 1 reply; 15+ messages in thread
From: Richard Jones @ 2009-02-28 21:21 UTC (permalink / raw)
To: Kuba Ober; +Cc: caml-list
On Sat, Feb 28, 2009 at 03:18:40PM -0500, Kuba Ober wrote:
> You didn't let us in on how it really works. You said "high-level
> virtual machine
> built upon LLVM and written in OCaml". LLVM means too many things to be
> able to decipher what you mean, and your statement is too general.
I'm still waiting for the open source project ...
Rich.
--
Richard Jones
Red Hat
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-02-28 20:18 ` [Caml-list] " Kuba Ober
2009-02-28 21:21 ` Richard Jones
@ 2009-02-28 21:52 ` Jon Harrop
2009-03-02 3:20 ` Jon Harrop
2009-03-02 14:28 ` Florent Ouchet
1 sibling, 2 replies; 15+ messages in thread
From: Jon Harrop @ 2009-02-28 21:52 UTC (permalink / raw)
To: caml-list
On Saturday 28 February 2009 20:18:40 Kuba Ober wrote:
> You didn't let us in on how it really works. You said "high-level
> virtual machine
> built upon LLVM and written in OCaml". LLVM means too many things to be
> able to decipher what you mean, and your statement is too general.
I'm referring to my HLVM project that I described here:
http://flyingfrogblog.blogspot.com/2008/12/building-better-future-high-level.html
That is built upon the Low-Level Virtual Machine here:
http://llvm.org
Basically, my idea is to build a better VM for OCaml-like languages. My work
is still at a very early stage of development but the results are already
promising: faster than ocamlopt on all of my benchmarks and several times
faster on some numerical benchmarks.
There are really two major advantages over the current ocamlopt design and
both stem from the use of JIT compilation:
. Run-time types allow per-type functions like generic pretty printers and
comparison.
. Monomorphisation during JIT compilation completely removes the performance
cost of polymorphism, e.g. floats, tuples and records are never boxed.
Algebraic datatypes were the latest addition and I expected them to be very
slow because I am calling thread-safe malloc every time one is created.
However, a simple benchmark that creates and folds over a huge list actually
runs faster on my HLVM than it does in OCaml.
> I'm assuming, but that's forced, so don't shoot if I make an asinus
> out of myself ;)
>
> So, it's a VM and it runs native jit-ted code like say .net would. So
> presumably you
> have some OCaml code that then invokes jit and some native functions
> to dispatch
> to jit-ted code?
Yes.
> Do you interpret any bytecode, or always compile?
Always compile.
> Do
> you even
> have to have any OCaml code running in the process where the jit-ted
> code runs in?
I believe the OCaml bindings to LLVM JIT compile a function, get a pointer to
it and call into that function from the current thread. So there is no
process separation.
> I presume you use the LLVM infrastructure to do the jit-ting, but
> without knowing
> exactly what code runs in the process space of the application, it's
> hard to tell
> what's going on.
You just gave me another idea: use LLVM to compile the same IR code to a
standalone executable and benchmark that. I'll get back to you...
> There's no floating point state to change that would slow things up
> that much.
> At least I have never seen anything like that.
That's what I thought.
> Maybe the FP exceptions are being fired somehow?
Maybe something like that. I have no idea.
This definitely only affects float performance so it is not a difference in
the efficiency of calling into JIT compiled code from OCaml bytecode or
native code.
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-02-28 21:21 ` Richard Jones
@ 2009-02-28 21:59 ` Jon Harrop
2009-03-02 0:38 ` Jon Harrop
0 siblings, 1 reply; 15+ messages in thread
From: Jon Harrop @ 2009-02-28 21:59 UTC (permalink / raw)
To: caml-list
On Saturday 28 February 2009 21:21:18 Richard Jones wrote:
> On Sat, Feb 28, 2009 at 03:18:40PM -0500, Kuba Ober wrote:
> > You didn't let us in on how it really works. You said "high-level
> > virtual machine
> > built upon LLVM and written in OCaml". LLVM means too many things to be
> > able to decipher what you mean, and your statement is too general.
>
> I'm still waiting for the open source project ...
I'm working on it. I now have:
. unit, bool, int, float.
. tuples.
. arrays.
. algebraic datatypes.
. function pointers.
. tail calls.
. generic printing.
. catching stack overflows.
. C FFI.
. JIT compilation to high performance native code.
I need to implement closures and a GC before I release anything. Then I'll add
parametric polymorphism, exceptions and a front end.
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-02-28 21:59 ` Jon Harrop
@ 2009-03-02 0:38 ` Jon Harrop
0 siblings, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2009-03-02 0:38 UTC (permalink / raw)
To: caml-list
On Saturday 28 February 2009 21:59:29 Jon Harrop wrote:
> On Saturday 28 February 2009 21:21:18 Richard Jones wrote:
> > On Sat, Feb 28, 2009 at 03:18:40PM -0500, Kuba Ober wrote:
> > > You didn't let us in on how it really works. You said "high-level
> > > virtual machine
> > > built upon LLVM and written in OCaml". LLVM means too many things to be
> > > able to decipher what you mean, and your statement is too general.
> >
> > I'm still waiting for the open source project ...
>
> I'm working on it. I now have:
>
> . unit, bool, int, float.
> . tuples.
> . arrays.
> . algebraic datatypes.
> . function pointers.
> . tail calls.
> . generic printing.
> . catching stack overflows.
> . C FFI.
> . JIT compilation to high performance native code.
>
> I need to implement closures...
Now that I come to try it, I don't think I do need to implement closures.
Check out the following sample:
let curry : t list =
let ty_closure(ty1, ty2) =
`Struct[`Function([`Reference; ty1], ty2); `Reference] in
let apply(f, x) =
Apply(GetValue(f, 0), [GetValue(f, 1); x]) in
let ty_ret = `Struct[`Int; `Float] in
[`Function("f_uncurried", ["x", `Int; "y", `Float], ty_ret,
Struct[Var "x"; Var "y"]);
`Type("env_int", ["Int", `Int]);
`Function("f_apply_2", ["env", `Reference; "y", `Float], ty_ret,
Apply(Var "f_uncurried", [Cast(Var "env", "Int"); Var "y"]));
`Function("f_apply_1", ["x", `Int], ty_closure(`Float, ty_ret),
Struct[Var "f_apply_2"; Construct("Int", Var "x")]);
`Expr(Let("g", Apply(Var "f_apply_1", [Int 3]),
Struct[apply(Var "g", Float 2.3);
apply(Var "g", Float 3.4)]), `Struct[ty_ret; ty_ret])]
Unions are represented by the type "`Reference" and constructed
with "Construct". Their type can be tested with "IsType" and their argument
can be extracted with "Cast".
The above code is an intermediate representation of the curried OCaml
function:
let f (x: int) (y: float) = (x, y)
and the example application of a closure created by partially applying the
first argument "x":
let g = f 3 in (g 2.3, g 3.4)
The raw representation is an uncurried function "f_uncurried". That should
make recursive closures a lot faster than they are in OCaml.
The type constructor "Int" is used to represent a closure environment that
conveys an int. Such types can be generated on-the-fly whenever the compiler
sees code that would generate a closure that required an environment of this
type.
The "f_apply_2" function extracts "x" from the environment and applies "x"
and "y" to "f".
The "f_apply_1" function partially applies "x" to "f", creating a closure
represented by a struct containing a pointer to "f_apply_2" and the boxed
environment containing "x".
The final expression partially applies "f" to "x=3" and then applies the
resulting closure twice.
Running this program in HLVM produces:
Writing bitcode
Evaluating
- : <type> = ((3, 2.3), (3, 3.4))
Took 0.027330s
Note that the structs are pretty printed as tuples using generic printers. You
can call the internal Print function on any value of any type and it will be
pretty printed.
A front-end for a functional language can, therefore, already handle
first-class functions with the current HLVM by representing all functional
values as a struct containing a function pointer and a boxed environment.
Moreover, HLVM could include an optimization pass to remove redundant closure
constructions and deconstructions. The nice aspect of this is that it reduces
complexity and makes it much easier to write a GC: only algebraic datatypes
are non-trivial.
So I'll probably just reimplement my GC and then publish HLVM as open source
software on the OCaml Forge.
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-02-28 21:52 ` Jon Harrop
@ 2009-03-02 3:20 ` Jon Harrop
2009-03-02 14:28 ` Florent Ouchet
1 sibling, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2009-03-02 3:20 UTC (permalink / raw)
To: caml-list
On Saturday 28 February 2009 21:52:18 Jon Harrop wrote:
> However, a simple benchmark that creates and folds over a huge list
> actually runs faster on my HLVM than it does in OCaml.
I just finished implementing a list-based solution to the n-queens problem and
that does run 5-14x faster in OCaml than on my HLVM:
n=8
OCaml: 0.012s
HLVM: 0.190s
n=9
OCaml: 0.088s
HLVM: 0.591s
n=10
OCaml: 0.740s
HLVM: 3.713s
n=11
OCaml: 6.68s
HLVM: Out of memory
HLVM ran out of memory in the last case because I have not yet reintegrated
the GC.
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-02-28 21:52 ` Jon Harrop
2009-03-02 3:20 ` Jon Harrop
@ 2009-03-02 14:28 ` Florent Ouchet
2009-03-02 16:18 ` Jon Harrop
2009-03-02 20:09 ` Kuba Ober
1 sibling, 2 replies; 15+ messages in thread
From: Florent Ouchet @ 2009-03-02 14:28 UTC (permalink / raw)
To: caml-list
Jon Harrop a écrit :
> There are really two major advantages over the current ocamlopt design and
> both stem from the use of JIT compilation:
>
> . Run-time types allow per-type functions like generic pretty printers and
> comparison.
>
> . Monomorphisation during JIT compilation completely removes the performance
> cost of polymorphism, e.g. floats, tuples and records are never boxed.
Do you mean that each polymorphic function is compiled into a different
native piece of code each time it is called with different parameter
types? How does the JIT'ed code size compare to ocamlopt'ed code size?
- Florent
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-03-02 14:28 ` Florent Ouchet
@ 2009-03-02 16:18 ` Jon Harrop
2009-03-02 20:09 ` Kuba Ober
1 sibling, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2009-03-02 16:18 UTC (permalink / raw)
To: caml-list
On Monday 02 March 2009 14:28:23 Florent Ouchet wrote:
> Jon Harrop a écrit :
> > There are really two major advantages over the current ocamlopt design
> > and both stem from the use of JIT compilation:
> >
> > . Run-time types allow per-type functions like generic pretty printers
> > and comparison.
> >
> > . Monomorphisation during JIT compilation completely removes the
> > performance cost of polymorphism, e.g. floats, tuples and records are
> > never boxed.
>
> Do you mean that each polymorphic function is compiled into a different
> native piece of code each time it is called with different parameter
> types?
Yes.
> How does the JIT'ed code size compare to ocamlopt'ed code size?
No idea. Without a front end I have only compiled the smallest pieces of test
code so far, just to make sure that the functionality works.
.NET does the same thing and it offers substantial performance improvements
over OCaml for polymorphic code.
Note that there is no reason to distinguish between reference types for they
can all be treated equivalently with respect to instantiating polymorphic
code. My type system is as follows:
type t =
[ `Unit
| `Bool
| `Int
| `Float
| `Struct of t list
| `Array of t
| `Function of t list * t
| `Reference ]
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-03-02 14:28 ` Florent Ouchet
2009-03-02 16:18 ` Jon Harrop
@ 2009-03-02 20:09 ` Kuba Ober
2009-03-04 16:17 ` Mikkel Fahnøe Jørgensen
1 sibling, 1 reply; 15+ messages in thread
From: Kuba Ober @ 2009-03-02 20:09 UTC (permalink / raw)
To: caml-list
> Jon Harrop a écrit :
>> There are really two major advantages over the current ocamlopt
>> design and both stem from the use of JIT compilation:
>>
>> . Run-time types allow per-type functions like generic pretty
>> printers and comparison.
>>
>> . Monomorphisation during JIT compilation completely removes the
>> performance cost of polymorphism, e.g. floats, tuples and records
>> are never boxed.
>
> Do you mean that each polymorphic function is compiled into a
> different
> native piece of code each time it is called with different parameter
> types? How does the JIT'ed code size compare to ocamlopt'ed code size?
Having done it, although not in a JIT but in your plain-old whole-
project compiler,
for my use cases the code size actually shrinks. The functions usually
end up inlined
and sometimes reduce to a few machine instructions. Most of the
runtime library is written
using polymorphic functions. Case in point: all sorts of string-
processing functions which
can take as arguments either strings stored in RAM or stored in ROM,
and those data types
are very much orthogonal on my platform. An invocation of a tail-
recursive
"strlen" reduces to about as many bytes of code than it'd take to push
the arguments on
the stack and call a non-polymorphic version of itself.
That's how I initially got a statically typed LISP to compile for
"tiny" 8 bit microcontrollers
without using all of the whopping 1kb of RAM and 16kb of program flash
on a Z8F162x
device.
Right now I'm hacking away to get rid of last traces of LISPiness and
to get the project fully
working in OCaml, using ML-like syntax for user code. I like it much
better than LISP's.
I have also found that by doing whole-project compilation with
aggressive constant propagation
and compile-time execution of functions that depend only on known
constants, I could get
rid of about 85% of LISP macros in my code. The other macros ended up
being rewritten
to just invoke ct_eval: string -> function, which is a compile-time
eval function.
It's just like LISP macros, but since in ML family code isn't data, it
was easier to just
generate strings and feed them into compiler, rather than expose all
of the AST machinery
to "userland".
Cheers, Kuba
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-03-02 20:09 ` Kuba Ober
@ 2009-03-04 16:17 ` Mikkel Fahnøe Jørgensen
2009-03-04 16:30 ` Kuba Ober
2009-03-04 19:05 ` Jon Harrop
0 siblings, 2 replies; 15+ messages in thread
From: Mikkel Fahnøe Jørgensen @ 2009-03-04 16:17 UTC (permalink / raw)
To: Kuba Ober; +Cc: caml-list
When looking at the benchmark game and other benchmarks I have seen, I
noticed that Haskell is almost as fast as OCaml and sometimes faster.
Some Lisp implementations are also pretty fast.
However, when you look at memory consumption OCaml uses considerably
less memory, except for languages in the C family.
I suspect that many real world performance scenarios, such as heavily
loaded web servers and complex simulations, depend very much on memory
consumption. This is both because of GC overhead and because of the
slower memory pipeline the more cache levels are involved.
So in case of a new JIT solution for OCaml, I believe it is important
to observe this aspect as well.
Mikkel
2009/3/2 Kuba Ober <ober.14@osu.edu>:
>
>> Jon Harrop a écrit :
>>>
>>> There are really two major advantages over the current ocamlopt design
>>> and both stem from the use of JIT compilation:
>>>
>>> . Run-time types allow per-type functions like generic pretty printers
>>> and comparison.
>>>
>>> . Monomorphisation during JIT compilation completely removes the
>>> performance cost of polymorphism, e.g. floats, tuples and records are never
>>> boxed.
>>
>> Do you mean that each polymorphic function is compiled into a different
>> native piece of code each time it is called with different parameter
>> types? How does the JIT'ed code size compare to ocamlopt'ed code size?
>
> Having done it, although not in a JIT but in your plain-old whole-project
> compiler,
> for my use cases the code size actually shrinks. The functions usually end
> up inlined
> and sometimes reduce to a few machine instructions. Most of the runtime
> library is written
> using polymorphic functions. Case in point: all sorts of string-processing
> functions which
> can take as arguments either strings stored in RAM or stored in ROM, and
> those data types
> are very much orthogonal on my platform. An invocation of a tail-recursive
> "strlen" reduces to about as many bytes of code than it'd take to push the
> arguments on
> the stack and call a non-polymorphic version of itself.
>
> That's how I initially got a statically typed LISP to compile for "tiny" 8
> bit microcontrollers
> without using all of the whopping 1kb of RAM and 16kb of program flash on a
> Z8F162x
> device.
>
> Right now I'm hacking away to get rid of last traces of LISPiness and to get
> the project fully
> working in OCaml, using ML-like syntax for user code. I like it much better
> than LISP's.
>
> I have also found that by doing whole-project compilation with aggressive
> constant propagation
> and compile-time execution of functions that depend only on known constants,
> I could get
> rid of about 85% of LISP macros in my code. The other macros ended up being
> rewritten
> to just invoke ct_eval: string -> function, which is a compile-time eval
> function.
> It's just like LISP macros, but since in ML family code isn't data, it was
> easier to just
> generate strings and feed them into compiler, rather than expose all of the
> AST machinery
> to "userland".
>
> Cheers, Kuba
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-03-04 16:17 ` Mikkel Fahnøe Jørgensen
@ 2009-03-04 16:30 ` Kuba Ober
2009-03-04 18:15 ` Mikkel Fahnøe Jørgensen
2009-03-04 19:05 ` Jon Harrop
1 sibling, 1 reply; 15+ messages in thread
From: Kuba Ober @ 2009-03-04 16:30 UTC (permalink / raw)
To: caml-list
On Mar 4, 2009, at 11:17 AM, Mikkel Fahnøe Jørgensen wrote:
> When looking at the benchmark game and other benchmarks I have seen, I
> noticed that Haskell is almost as fast as OCaml and sometimes faster.
> Some Lisp implementations are also pretty fast.
>
> However, when you look at memory consumption OCaml uses considerably
> less memory, except for languages in the C family.
>
> I suspect that many real world performance scenarios, such as heavily
> loaded web servers and complex simulations, depend very much on memory
> consumption. This is both because of GC overhead and because of the
> slower memory pipeline the more cache levels are involved.
>
> So in case of a new JIT solution for OCaml, I believe it is important
> to observe this aspect as well.
I believe it is also important not to dynamically allocate memory for no
good reason.
All of my realtime numerical code uses statically allocated memory with
overlaying based on execution flow of basic blocks. That has zero
runtime
overhead: the produced machine code has fixed addresses for data
(not all data of course).
It reduces to whether a "basic block" can be re-entered from its
future (downstream)
or not. If it can, you have to use stack or heap. If it won't, then
you can do static
allocation. The potential cost is if given function is entered from
many points.
At that point you can get some overhead since the overlaying has to take
into account all possible ways the code is reached. This can be
mitigated
by generating more than one copy of the function. It makes sense when
you
have some free code ROM, but your RAM is almost full.
This of course can only be done when you do whole-project compilation.
If you
compile "modules" separately, you have to fall back on doing it in the
linker,
where all you have is the function call graph and available
granularity is much
worse, at bigger RAM overhead. The code ROM overhead is then none
since linker
can hardly generate copies of functions; at the point where you copy
functions
you may as well do other optimizations, so linker is way too late to
do that
efficiently.
There's no reason not to use those techniques in code that runs on
"large"
platforms. It'd at least artificially boost some benchmark results ;)
Cheers, Kuba
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-03-04 16:30 ` Kuba Ober
@ 2009-03-04 18:15 ` Mikkel Fahnøe Jørgensen
2009-03-04 18:32 ` Jon Harrop
0 siblings, 1 reply; 15+ messages in thread
From: Mikkel Fahnøe Jørgensen @ 2009-03-04 18:15 UTC (permalink / raw)
To: caml-list
oh, and another thing:
I'd like the VM to be small enough to link into an executable like the
OCaml runtime so you don't have some version and dispatch nightmare.
Mikkel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-03-04 18:15 ` Mikkel Fahnøe Jørgensen
@ 2009-03-04 18:32 ` Jon Harrop
0 siblings, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2009-03-04 18:32 UTC (permalink / raw)
To: caml-list
On Wednesday 04 March 2009 18:15:50 Mikkel Fahnøe Jørgensen wrote:
> oh, and another thing:
>
> I'd like the VM to be small enough to link into an executable like the
> OCaml runtime so you don't have some version and dispatch nightmare.
You can generate executables but it takes 10s to compile HLVM to OCaml
bytecode and the result is a 60Mb file so I doubt it will satisfy "small".
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM
2009-03-04 16:17 ` Mikkel Fahnøe Jørgensen
2009-03-04 16:30 ` Kuba Ober
@ 2009-03-04 19:05 ` Jon Harrop
1 sibling, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2009-03-04 19:05 UTC (permalink / raw)
To: caml-list
On Wednesday 04 March 2009 16:17:55 Mikkel Fahnøe Jørgensen wrote:
> When looking at the benchmark game and other benchmarks I have seen, I
> noticed that Haskell is almost as fast as OCaml and sometimes faster.
> Some Lisp implementations are also pretty fast.
From my ray tracer language comparison, my OCaml is ~50% faster than Haskell
written by Lennart Augustsson and Lisp written by Juho Snellman, both of whom
have extensive experience writing optimizing compilers for those languages
whereas I did not:
http://www.ffconsultancy.com/languages/ray_tracer/results.html
Moreover, I received dozens of implementations in Haskell and Lisp and these
were the only vaguely competitive ones: most programmers are not able to
write fast code in Haskell or Lisp primarily because their performance is so
wildly unpredictable.
The Burrows-Wheeler block sorting data compression algorithm implemented in
Haskell and discussed extensively for weeks in the context of performance is
a good example of this: they never got within 10,000x the performance of C.
There are many other examples where nobody was able to get within orders of
magnitude of the performance of common languages.
GHC does have rudimentary support for parallelism and that makes it much
easier to leverage 2-6 cores in Haskell compared to OCaml. However, that is
merely a deficiency in the current OCaml implementation and is something that
can be addressed. Moreover, the current Haskell implementation scales very
poorly and is easily maxed out even on today's 8 core computers. For example,
on a recent Mandelbrot benchmark from comp.lang.functional the Haskell is
faster for 1-6 cores but stops seeing improvements beyond that whereas OCaml
with process forking continues to see improvements up to all 8 cores and it
faster overall on this machine as a consequence.
Although efficient concurrent garbage collectors are hard to write, parallel
ones like the one in GHC are comparatively easy to write and still very
useful.
> However, when you look at memory consumption OCaml uses considerably
> less memory, except for languages in the C family.
>
> I suspect that many real world performance scenarios, such as heavily
> loaded web servers and complex simulations, depend very much on memory
> consumption. This is both because of GC overhead and because of the
> slower memory pipeline the more cache levels are involved.
>
> So in case of a new JIT solution for OCaml, I believe it is important
> to observe this aspect as well.
OCaml's memory efficiency is certainly extremely good and it may be
theoretically possible to preserve that in a new implementation that supports
parallelism. That is absolutely not the goal of my work though: I only intent
to get the simplest possible parallel GC working because I am interested
primarily in high-performance numerics, string processing and visualization
and not web servers.
However, I will endeavour to make the implementation as extensible as possible
so that other people can create drop-in replacements that provide this kind
of functionality. Improving upon my GC should be quite easy for anyone versed
in the subject. Interestingly, my GC is written entirely in my own
intermediate representation.
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-03-04 19:00 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-28 1:12 Odd performance result with HLVM Jon Harrop
2009-02-28 20:18 ` [Caml-list] " Kuba Ober
2009-02-28 21:21 ` Richard Jones
2009-02-28 21:59 ` Jon Harrop
2009-03-02 0:38 ` Jon Harrop
2009-02-28 21:52 ` Jon Harrop
2009-03-02 3:20 ` Jon Harrop
2009-03-02 14:28 ` Florent Ouchet
2009-03-02 16:18 ` Jon Harrop
2009-03-02 20:09 ` Kuba Ober
2009-03-04 16:17 ` Mikkel Fahnøe Jørgensen
2009-03-04 16:30 ` Kuba Ober
2009-03-04 18:15 ` Mikkel Fahnøe Jørgensen
2009-03-04 18:32 ` Jon Harrop
2009-03-04 19:05 ` Jon Harrop
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox