* HLVM
@ 2009-09-26 17:21 David McClain
2009-09-26 21:41 ` [Caml-list] HLVM Jon Harrop
0 siblings, 1 reply; 5+ messages in thread
From: David McClain @ 2009-09-26 17:21 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 548 bytes --]
Yes, I saw those references already. Still not enough information...
What, in particular, sets HLVM apart. Surely not just the native
machine types?
Are you handling array references in some unusually efficient manner?
Are you avoiding unnecessary copy-on-writes of large arrays by some
form of whole-program analysis? I still don't have a handle on HLVM...
>
> http://www.ffconsultancy.com/ocaml/hlvm/
> http://flyingfrogblog.blogspot.com/2009/03/performance-ocaml-vs-hlvm-beta-04.html
Dr. David McClain
dbm@refined-audiometrics.com
[-- Attachment #2: Type: text/html, Size: 1790 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] HLVM
2009-09-26 17:21 HLVM David McClain
@ 2009-09-26 21:41 ` Jon Harrop
0 siblings, 0 replies; 5+ messages in thread
From: Jon Harrop @ 2009-09-26 21:41 UTC (permalink / raw)
To: caml-list
On Saturday 26 September 2009 18:21:21 David McClain wrote:
> What, in particular, sets HLVM apart. Surely not just the native
> machine types?
JIT compilation opens up several hugely-productive optimizations:
1. Polymorphism no longer persists to run-time so there is no need for a
uniform representation. Whereas OCaml tags integers and boxes floats and
tuples with a couple of hacks for special cases like all-float-records and
float arrays, HLVM never boxes any of them (including 32-bit ints, 64-bit
floats and complex numbers). Moreover, this massively reduces the stress on
the garbage collector. For example, although OCaml has a heavily optimized GC
it is already 50% slower than the current HLVM at computing a complex FFT
where the complex numbers are represented by the type float * float because
OCaml allocates every intermediate tuple on the minor heap.
2. The garbage collector is partially specialized with respect to user defined
types. Whereas OCaml's GC blindly traverses the heap testing almost every
word to see whether or not it is a pointer, HLVM's GC JIT compiles traversal
code for every type it sees and that code jumps directly to the pointers in a
value.
3. The code generator can target the exact machine being used so things like
SSE are trivial to implement and use. For example, the integer-based
Monte-Carlo test from the SciMark2 benchmark is up to 6x faster with LLVM
than with OCaml.
4. You get a native code REPL for free.
5. FFI stubs are JIT compiled for you: no more hand-written C stubs!
On top of that, LLVM handles aggregate values very efficiently so HLVM lets
you use them for tuples. That gives you value types (a feature that OCaml
lacks, which can dramatically improve performance and interoperability) that
work with tail calls (structs break tail calls on the CLR).
> Are you handling array references in some unusually efficient manner?
Not really. The main oddity is that references in HLVM are currently 3 word
structs: one pointer to the run-time type, one word of metadata (e.g. array
length) and a one word pointer that points to the real data (or NULL). That
has three main advantages:
1. You can pass arrays of value types to and from C easily and efficiently.
2. You have a typed "null" pointer. So empty options, lists and arrays are
represented efficiently as an unboxed value (unlike F# where empty lists and
arrays are inefficient heap-allocated values) yet they can still be pretty
printed correctly (which OCaml cannot even handle and F# gets wrong on
optimized types like options which use the CLR's typeless null).
3. You can get to the run-time type directly with a single indirection rather
than loading the value and then the run-time types from its header (two
indirections).
> Are you avoiding unnecessary copy-on-writes of large arrays by some
> form of whole-program analysis?
No. HLVM does not yet do any program analysis at all. It is currently only
trying to map ML-style code to assembler (via LLVM IR) differently in order
to obtain a different performance profile. Specifically, the performance
profile I would like. I am not a fan of non-trivial larger scale analysis and
optimization because it renders performance unpredictable. Predictable
performance is one of the (many) nice aspects of OCaml that I would like to
keep.
> I still don't have a handle on HLVM...
HLVM is an experiment to see what a next-generation implementation of a
language like OCaml might look like, addressing all of OCaml's major
shortcomings (interoperability, parallelism and some performance issues). My
preliminary results proved that there is indeed a lot to be gained by doing
this but more work is required to give HLVM all of the basic features.
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
^ permalink raw reply [flat|nested] 5+ messages in thread
* HLVM
@ 2009-09-27 16:57 David McClain
0 siblings, 0 replies; 5+ messages in thread
From: David McClain @ 2009-09-27 16:57 UTC (permalink / raw)
To: caml-list
Wow! Thanks for that much more substantive feedback on HLVM.
I am not intimately familiar with LLVM. I am surprised that JIT can
offer speedups over statically compiled code.
And forgive me for asking what may seem a question with an obvious
answer... but now don't you also have to change the OCaml compiler
back end to target the HLVM?
Dr. David McClain
dbm@refined-audiometrics.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* HLVM...
@ 2009-09-27 18:06 David McClain
0 siblings, 0 replies; 5+ messages in thread
From: David McClain @ 2009-09-27 18:06 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 584 bytes --]
Interesting quote from a Wikipedia article on JIT compiling...
"...For instance, most Common Lisp systems have a compile function
which can compile new functions created during the run. This provides
many of the advantages of JIT, but the programmer, rather than the
runtime, is in control of what parts of the code are compiled. This
can also compile dynamically generated code, which can, in many
scenarios, provide substantial performance advantages over statically
compiled code, as well as over most JIT systems."
Dr. David McClain
dbm@refined-audiometrics.com
[-- Attachment #2: Type: text/html, Size: 2362 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* HLVM...
@ 2009-09-27 22:09 David McClain
0 siblings, 0 replies; 5+ messages in thread
From: David McClain @ 2009-09-27 22:09 UTC (permalink / raw)
To: caml-list
... remember too, in signal and image processing applications,
converting to raw machine integers and plowing ahead is often
counterproductive.
Rather we need saturating arithmetic to avoid abrupt transitions on
overflow conditions, or modulo addressing. Neither of these is native
to SSM, and have to be synthesized. DSP chips on the other hand almost
always offer these variations implicitly or explicitly.
Dr. David McClain
dbm@refined-audiometrics.com
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-09-27 22:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-26 17:21 HLVM David McClain
2009-09-26 21:41 ` [Caml-list] HLVM Jon Harrop
2009-09-27 16:57 HLVM David McClain
2009-09-27 18:06 HLVM David McClain
2009-09-27 22:09 HLVM David McClain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox