* Odd performance result with HLVM @ 2009-02-28 1:12 Jon Harrop 2009-02-28 20:18 ` [Caml-list] " Kuba Ober 0 siblings, 1 reply; 15+ messages in thread From: Jon Harrop @ 2009-02-28 1:12 UTC (permalink / raw) To: caml-list I am developing a high-level virtual machine built upon LLVM and written in OCaml that uses JIT compilation to execute OCaml-like code at break-neck speeds. I just stumbled upon a weird performance result: compiling my VM with ocamlc and ocamlopt produces very different benchmark results for the performance of the generated code (which should be identical). For example, my float-based Fibonacci function becomes 70% slower if I use ocamlopt and some other float-based functions also see big performance drops. What is ocamlopt doing that might explain this? I assume it is fiddling with the floating point state somehow... -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-02-28 1:12 Odd performance result with HLVM Jon Harrop @ 2009-02-28 20:18 ` Kuba Ober 2009-02-28 21:21 ` Richard Jones 2009-02-28 21:52 ` Jon Harrop 0 siblings, 2 replies; 15+ messages in thread From: Kuba Ober @ 2009-02-28 20:18 UTC (permalink / raw) To: caml-list > I am developing a high-level virtual machine built upon LLVM and > written in > OCaml that uses JIT compilation to execute OCaml-like code at break- > neck > speeds. > > I just stumbled upon a weird performance result: compiling my VM > with ocamlc > and ocamlopt produces very different benchmark results for the > performance of > the generated code (which should be identical). For example, my > float-based > Fibonacci function becomes 70% slower if I use ocamlopt and some other > float-based functions also see big performance drops. > > What is ocamlopt doing that might explain this? I assume it is > fiddling with > the floating point state somehow... You didn't let us in on how it really works. You said "high-level virtual machine built upon LLVM and written in OCaml". LLVM means too many things to be able to decipher what you mean, and your statement is too general. I'm assuming, but that's forced, so don't shoot if I make an asinus out of myself ;) So, it's a VM and it runs native jit-ted code like say .net would. So presumably you have some OCaml code that then invokes jit and some native functions to dispatch to jit-ted code? Do you interpret any bytecode, or always compile? Do you even have to have any OCaml code running in the process where the jit-ted code runs in? I presume you use the LLVM infrastructure to do the jit-ting, but without knowing exactly what code runs in the process space of the application, it's hard to tell what's going on. There's no floating point state to change that would slow things up that much. At least I have never seen anything like that. Maybe the FP exceptions are being fired somehow? You can always set a breakpoint in exception handler(s). Cheers, Kuba ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-02-28 20:18 ` [Caml-list] " Kuba Ober @ 2009-02-28 21:21 ` Richard Jones 2009-02-28 21:59 ` Jon Harrop 2009-02-28 21:52 ` Jon Harrop 1 sibling, 1 reply; 15+ messages in thread From: Richard Jones @ 2009-02-28 21:21 UTC (permalink / raw) To: Kuba Ober; +Cc: caml-list On Sat, Feb 28, 2009 at 03:18:40PM -0500, Kuba Ober wrote: > You didn't let us in on how it really works. You said "high-level > virtual machine > built upon LLVM and written in OCaml". LLVM means too many things to be > able to decipher what you mean, and your statement is too general. I'm still waiting for the open source project ... Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-02-28 21:21 ` Richard Jones @ 2009-02-28 21:59 ` Jon Harrop 2009-03-02 0:38 ` Jon Harrop 0 siblings, 1 reply; 15+ messages in thread From: Jon Harrop @ 2009-02-28 21:59 UTC (permalink / raw) To: caml-list On Saturday 28 February 2009 21:21:18 Richard Jones wrote: > On Sat, Feb 28, 2009 at 03:18:40PM -0500, Kuba Ober wrote: > > You didn't let us in on how it really works. You said "high-level > > virtual machine > > built upon LLVM and written in OCaml". LLVM means too many things to be > > able to decipher what you mean, and your statement is too general. > > I'm still waiting for the open source project ... I'm working on it. I now have: . unit, bool, int, float. . tuples. . arrays. . algebraic datatypes. . function pointers. . tail calls. . generic printing. . catching stack overflows. . C FFI. . JIT compilation to high performance native code. I need to implement closures and a GC before I release anything. Then I'll add parametric polymorphism, exceptions and a front end. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-02-28 21:59 ` Jon Harrop @ 2009-03-02 0:38 ` Jon Harrop 0 siblings, 0 replies; 15+ messages in thread From: Jon Harrop @ 2009-03-02 0:38 UTC (permalink / raw) To: caml-list On Saturday 28 February 2009 21:59:29 Jon Harrop wrote: > On Saturday 28 February 2009 21:21:18 Richard Jones wrote: > > On Sat, Feb 28, 2009 at 03:18:40PM -0500, Kuba Ober wrote: > > > You didn't let us in on how it really works. You said "high-level > > > virtual machine > > > built upon LLVM and written in OCaml". LLVM means too many things to be > > > able to decipher what you mean, and your statement is too general. > > > > I'm still waiting for the open source project ... > > I'm working on it. I now have: > > . unit, bool, int, float. > . tuples. > . arrays. > . algebraic datatypes. > . function pointers. > . tail calls. > . generic printing. > . catching stack overflows. > . C FFI. > . JIT compilation to high performance native code. > > I need to implement closures... Now that I come to try it, I don't think I do need to implement closures. Check out the following sample: let curry : t list = let ty_closure(ty1, ty2) = `Struct[`Function([`Reference; ty1], ty2); `Reference] in let apply(f, x) = Apply(GetValue(f, 0), [GetValue(f, 1); x]) in let ty_ret = `Struct[`Int; `Float] in [`Function("f_uncurried", ["x", `Int; "y", `Float], ty_ret, Struct[Var "x"; Var "y"]); `Type("env_int", ["Int", `Int]); `Function("f_apply_2", ["env", `Reference; "y", `Float], ty_ret, Apply(Var "f_uncurried", [Cast(Var "env", "Int"); Var "y"])); `Function("f_apply_1", ["x", `Int], ty_closure(`Float, ty_ret), Struct[Var "f_apply_2"; Construct("Int", Var "x")]); `Expr(Let("g", Apply(Var "f_apply_1", [Int 3]), Struct[apply(Var "g", Float 2.3); apply(Var "g", Float 3.4)]), `Struct[ty_ret; ty_ret])] Unions are represented by the type "`Reference" and constructed with "Construct". Their type can be tested with "IsType" and their argument can be extracted with "Cast". The above code is an intermediate representation of the curried OCaml function: let f (x: int) (y: float) = (x, y) and the example application of a closure created by partially applying the first argument "x": let g = f 3 in (g 2.3, g 3.4) The raw representation is an uncurried function "f_uncurried". That should make recursive closures a lot faster than they are in OCaml. The type constructor "Int" is used to represent a closure environment that conveys an int. Such types can be generated on-the-fly whenever the compiler sees code that would generate a closure that required an environment of this type. The "f_apply_2" function extracts "x" from the environment and applies "x" and "y" to "f". The "f_apply_1" function partially applies "x" to "f", creating a closure represented by a struct containing a pointer to "f_apply_2" and the boxed environment containing "x". The final expression partially applies "f" to "x=3" and then applies the resulting closure twice. Running this program in HLVM produces: Writing bitcode Evaluating - : <type> = ((3, 2.3), (3, 3.4)) Took 0.027330s Note that the structs are pretty printed as tuples using generic printers. You can call the internal Print function on any value of any type and it will be pretty printed. A front-end for a functional language can, therefore, already handle first-class functions with the current HLVM by representing all functional values as a struct containing a function pointer and a boxed environment. Moreover, HLVM could include an optimization pass to remove redundant closure constructions and deconstructions. The nice aspect of this is that it reduces complexity and makes it much easier to write a GC: only algebraic datatypes are non-trivial. So I'll probably just reimplement my GC and then publish HLVM as open source software on the OCaml Forge. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-02-28 20:18 ` [Caml-list] " Kuba Ober 2009-02-28 21:21 ` Richard Jones @ 2009-02-28 21:52 ` Jon Harrop 2009-03-02 3:20 ` Jon Harrop 2009-03-02 14:28 ` Florent Ouchet 1 sibling, 2 replies; 15+ messages in thread From: Jon Harrop @ 2009-02-28 21:52 UTC (permalink / raw) To: caml-list On Saturday 28 February 2009 20:18:40 Kuba Ober wrote: > You didn't let us in on how it really works. You said "high-level > virtual machine > built upon LLVM and written in OCaml". LLVM means too many things to be > able to decipher what you mean, and your statement is too general. I'm referring to my HLVM project that I described here: http://flyingfrogblog.blogspot.com/2008/12/building-better-future-high-level.html That is built upon the Low-Level Virtual Machine here: http://llvm.org Basically, my idea is to build a better VM for OCaml-like languages. My work is still at a very early stage of development but the results are already promising: faster than ocamlopt on all of my benchmarks and several times faster on some numerical benchmarks. There are really two major advantages over the current ocamlopt design and both stem from the use of JIT compilation: . Run-time types allow per-type functions like generic pretty printers and comparison. . Monomorphisation during JIT compilation completely removes the performance cost of polymorphism, e.g. floats, tuples and records are never boxed. Algebraic datatypes were the latest addition and I expected them to be very slow because I am calling thread-safe malloc every time one is created. However, a simple benchmark that creates and folds over a huge list actually runs faster on my HLVM than it does in OCaml. > I'm assuming, but that's forced, so don't shoot if I make an asinus > out of myself ;) > > So, it's a VM and it runs native jit-ted code like say .net would. So > presumably you > have some OCaml code that then invokes jit and some native functions > to dispatch > to jit-ted code? Yes. > Do you interpret any bytecode, or always compile? Always compile. > Do > you even > have to have any OCaml code running in the process where the jit-ted > code runs in? I believe the OCaml bindings to LLVM JIT compile a function, get a pointer to it and call into that function from the current thread. So there is no process separation. > I presume you use the LLVM infrastructure to do the jit-ting, but > without knowing > exactly what code runs in the process space of the application, it's > hard to tell > what's going on. You just gave me another idea: use LLVM to compile the same IR code to a standalone executable and benchmark that. I'll get back to you... > There's no floating point state to change that would slow things up > that much. > At least I have never seen anything like that. That's what I thought. > Maybe the FP exceptions are being fired somehow? Maybe something like that. I have no idea. This definitely only affects float performance so it is not a difference in the efficiency of calling into JIT compiled code from OCaml bytecode or native code. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-02-28 21:52 ` Jon Harrop @ 2009-03-02 3:20 ` Jon Harrop 2009-03-02 14:28 ` Florent Ouchet 1 sibling, 0 replies; 15+ messages in thread From: Jon Harrop @ 2009-03-02 3:20 UTC (permalink / raw) To: caml-list On Saturday 28 February 2009 21:52:18 Jon Harrop wrote: > However, a simple benchmark that creates and folds over a huge list > actually runs faster on my HLVM than it does in OCaml. I just finished implementing a list-based solution to the n-queens problem and that does run 5-14x faster in OCaml than on my HLVM: n=8 OCaml: 0.012s HLVM: 0.190s n=9 OCaml: 0.088s HLVM: 0.591s n=10 OCaml: 0.740s HLVM: 3.713s n=11 OCaml: 6.68s HLVM: Out of memory HLVM ran out of memory in the last case because I have not yet reintegrated the GC. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-02-28 21:52 ` Jon Harrop 2009-03-02 3:20 ` Jon Harrop @ 2009-03-02 14:28 ` Florent Ouchet 2009-03-02 16:18 ` Jon Harrop 2009-03-02 20:09 ` Kuba Ober 1 sibling, 2 replies; 15+ messages in thread From: Florent Ouchet @ 2009-03-02 14:28 UTC (permalink / raw) To: caml-list Jon Harrop a écrit : > There are really two major advantages over the current ocamlopt design and > both stem from the use of JIT compilation: > > . Run-time types allow per-type functions like generic pretty printers and > comparison. > > . Monomorphisation during JIT compilation completely removes the performance > cost of polymorphism, e.g. floats, tuples and records are never boxed. Do you mean that each polymorphic function is compiled into a different native piece of code each time it is called with different parameter types? How does the JIT'ed code size compare to ocamlopt'ed code size? - Florent ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-03-02 14:28 ` Florent Ouchet @ 2009-03-02 16:18 ` Jon Harrop 2009-03-02 20:09 ` Kuba Ober 1 sibling, 0 replies; 15+ messages in thread From: Jon Harrop @ 2009-03-02 16:18 UTC (permalink / raw) To: caml-list On Monday 02 March 2009 14:28:23 Florent Ouchet wrote: > Jon Harrop a écrit : > > There are really two major advantages over the current ocamlopt design > > and both stem from the use of JIT compilation: > > > > . Run-time types allow per-type functions like generic pretty printers > > and comparison. > > > > . Monomorphisation during JIT compilation completely removes the > > performance cost of polymorphism, e.g. floats, tuples and records are > > never boxed. > > Do you mean that each polymorphic function is compiled into a different > native piece of code each time it is called with different parameter > types? Yes. > How does the JIT'ed code size compare to ocamlopt'ed code size? No idea. Without a front end I have only compiled the smallest pieces of test code so far, just to make sure that the functionality works. .NET does the same thing and it offers substantial performance improvements over OCaml for polymorphic code. Note that there is no reason to distinguish between reference types for they can all be treated equivalently with respect to instantiating polymorphic code. My type system is as follows: type t = [ `Unit | `Bool | `Int | `Float | `Struct of t list | `Array of t | `Function of t list * t | `Reference ] -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-03-02 14:28 ` Florent Ouchet 2009-03-02 16:18 ` Jon Harrop @ 2009-03-02 20:09 ` Kuba Ober 2009-03-04 16:17 ` Mikkel Fahnøe Jørgensen 1 sibling, 1 reply; 15+ messages in thread From: Kuba Ober @ 2009-03-02 20:09 UTC (permalink / raw) To: caml-list > Jon Harrop a écrit : >> There are really two major advantages over the current ocamlopt >> design and both stem from the use of JIT compilation: >> >> . Run-time types allow per-type functions like generic pretty >> printers and comparison. >> >> . Monomorphisation during JIT compilation completely removes the >> performance cost of polymorphism, e.g. floats, tuples and records >> are never boxed. > > Do you mean that each polymorphic function is compiled into a > different > native piece of code each time it is called with different parameter > types? How does the JIT'ed code size compare to ocamlopt'ed code size? Having done it, although not in a JIT but in your plain-old whole- project compiler, for my use cases the code size actually shrinks. The functions usually end up inlined and sometimes reduce to a few machine instructions. Most of the runtime library is written using polymorphic functions. Case in point: all sorts of string- processing functions which can take as arguments either strings stored in RAM or stored in ROM, and those data types are very much orthogonal on my platform. An invocation of a tail- recursive "strlen" reduces to about as many bytes of code than it'd take to push the arguments on the stack and call a non-polymorphic version of itself. That's how I initially got a statically typed LISP to compile for "tiny" 8 bit microcontrollers without using all of the whopping 1kb of RAM and 16kb of program flash on a Z8F162x device. Right now I'm hacking away to get rid of last traces of LISPiness and to get the project fully working in OCaml, using ML-like syntax for user code. I like it much better than LISP's. I have also found that by doing whole-project compilation with aggressive constant propagation and compile-time execution of functions that depend only on known constants, I could get rid of about 85% of LISP macros in my code. The other macros ended up being rewritten to just invoke ct_eval: string -> function, which is a compile-time eval function. It's just like LISP macros, but since in ML family code isn't data, it was easier to just generate strings and feed them into compiler, rather than expose all of the AST machinery to "userland". Cheers, Kuba ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-03-02 20:09 ` Kuba Ober @ 2009-03-04 16:17 ` Mikkel Fahnøe Jørgensen 2009-03-04 16:30 ` Kuba Ober 2009-03-04 19:05 ` Jon Harrop 0 siblings, 2 replies; 15+ messages in thread From: Mikkel Fahnøe Jørgensen @ 2009-03-04 16:17 UTC (permalink / raw) To: Kuba Ober; +Cc: caml-list When looking at the benchmark game and other benchmarks I have seen, I noticed that Haskell is almost as fast as OCaml and sometimes faster. Some Lisp implementations are also pretty fast. However, when you look at memory consumption OCaml uses considerably less memory, except for languages in the C family. I suspect that many real world performance scenarios, such as heavily loaded web servers and complex simulations, depend very much on memory consumption. This is both because of GC overhead and because of the slower memory pipeline the more cache levels are involved. So in case of a new JIT solution for OCaml, I believe it is important to observe this aspect as well. Mikkel 2009/3/2 Kuba Ober <ober.14@osu.edu>: > >> Jon Harrop a écrit : >>> >>> There are really two major advantages over the current ocamlopt design >>> and both stem from the use of JIT compilation: >>> >>> . Run-time types allow per-type functions like generic pretty printers >>> and comparison. >>> >>> . Monomorphisation during JIT compilation completely removes the >>> performance cost of polymorphism, e.g. floats, tuples and records are never >>> boxed. >> >> Do you mean that each polymorphic function is compiled into a different >> native piece of code each time it is called with different parameter >> types? How does the JIT'ed code size compare to ocamlopt'ed code size? > > Having done it, although not in a JIT but in your plain-old whole-project > compiler, > for my use cases the code size actually shrinks. The functions usually end > up inlined > and sometimes reduce to a few machine instructions. Most of the runtime > library is written > using polymorphic functions. Case in point: all sorts of string-processing > functions which > can take as arguments either strings stored in RAM or stored in ROM, and > those data types > are very much orthogonal on my platform. An invocation of a tail-recursive > "strlen" reduces to about as many bytes of code than it'd take to push the > arguments on > the stack and call a non-polymorphic version of itself. > > That's how I initially got a statically typed LISP to compile for "tiny" 8 > bit microcontrollers > without using all of the whopping 1kb of RAM and 16kb of program flash on a > Z8F162x > device. > > Right now I'm hacking away to get rid of last traces of LISPiness and to get > the project fully > working in OCaml, using ML-like syntax for user code. I like it much better > than LISP's. > > I have also found that by doing whole-project compilation with aggressive > constant propagation > and compile-time execution of functions that depend only on known constants, > I could get > rid of about 85% of LISP macros in my code. The other macros ended up being > rewritten > to just invoke ct_eval: string -> function, which is a compile-time eval > function. > It's just like LISP macros, but since in ML family code isn't data, it was > easier to just > generate strings and feed them into compiler, rather than expose all of the > AST machinery > to "userland". > > Cheers, Kuba > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-03-04 16:17 ` Mikkel Fahnøe Jørgensen @ 2009-03-04 16:30 ` Kuba Ober 2009-03-04 18:15 ` Mikkel Fahnøe Jørgensen 2009-03-04 19:05 ` Jon Harrop 1 sibling, 1 reply; 15+ messages in thread From: Kuba Ober @ 2009-03-04 16:30 UTC (permalink / raw) To: caml-list On Mar 4, 2009, at 11:17 AM, Mikkel Fahnøe Jørgensen wrote: > When looking at the benchmark game and other benchmarks I have seen, I > noticed that Haskell is almost as fast as OCaml and sometimes faster. > Some Lisp implementations are also pretty fast. > > However, when you look at memory consumption OCaml uses considerably > less memory, except for languages in the C family. > > I suspect that many real world performance scenarios, such as heavily > loaded web servers and complex simulations, depend very much on memory > consumption. This is both because of GC overhead and because of the > slower memory pipeline the more cache levels are involved. > > So in case of a new JIT solution for OCaml, I believe it is important > to observe this aspect as well. I believe it is also important not to dynamically allocate memory for no good reason. All of my realtime numerical code uses statically allocated memory with overlaying based on execution flow of basic blocks. That has zero runtime overhead: the produced machine code has fixed addresses for data (not all data of course). It reduces to whether a "basic block" can be re-entered from its future (downstream) or not. If it can, you have to use stack or heap. If it won't, then you can do static allocation. The potential cost is if given function is entered from many points. At that point you can get some overhead since the overlaying has to take into account all possible ways the code is reached. This can be mitigated by generating more than one copy of the function. It makes sense when you have some free code ROM, but your RAM is almost full. This of course can only be done when you do whole-project compilation. If you compile "modules" separately, you have to fall back on doing it in the linker, where all you have is the function call graph and available granularity is much worse, at bigger RAM overhead. The code ROM overhead is then none since linker can hardly generate copies of functions; at the point where you copy functions you may as well do other optimizations, so linker is way too late to do that efficiently. There's no reason not to use those techniques in code that runs on "large" platforms. It'd at least artificially boost some benchmark results ;) Cheers, Kuba ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-03-04 16:30 ` Kuba Ober @ 2009-03-04 18:15 ` Mikkel Fahnøe Jørgensen 2009-03-04 18:32 ` Jon Harrop 0 siblings, 1 reply; 15+ messages in thread From: Mikkel Fahnøe Jørgensen @ 2009-03-04 18:15 UTC (permalink / raw) To: caml-list oh, and another thing: I'd like the VM to be small enough to link into an executable like the OCaml runtime so you don't have some version and dispatch nightmare. Mikkel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-03-04 18:15 ` Mikkel Fahnøe Jørgensen @ 2009-03-04 18:32 ` Jon Harrop 0 siblings, 0 replies; 15+ messages in thread From: Jon Harrop @ 2009-03-04 18:32 UTC (permalink / raw) To: caml-list On Wednesday 04 March 2009 18:15:50 Mikkel Fahnøe Jørgensen wrote: > oh, and another thing: > > I'd like the VM to be small enough to link into an executable like the > OCaml runtime so you don't have some version and dispatch nightmare. You can generate executables but it takes 10s to compile HLVM to OCaml bytecode and the result is a 60Mb file so I doubt it will satisfy "small". -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Odd performance result with HLVM 2009-03-04 16:17 ` Mikkel Fahnøe Jørgensen 2009-03-04 16:30 ` Kuba Ober @ 2009-03-04 19:05 ` Jon Harrop 1 sibling, 0 replies; 15+ messages in thread From: Jon Harrop @ 2009-03-04 19:05 UTC (permalink / raw) To: caml-list On Wednesday 04 March 2009 16:17:55 Mikkel Fahnøe Jørgensen wrote: > When looking at the benchmark game and other benchmarks I have seen, I > noticed that Haskell is almost as fast as OCaml and sometimes faster. > Some Lisp implementations are also pretty fast. From my ray tracer language comparison, my OCaml is ~50% faster than Haskell written by Lennart Augustsson and Lisp written by Juho Snellman, both of whom have extensive experience writing optimizing compilers for those languages whereas I did not: http://www.ffconsultancy.com/languages/ray_tracer/results.html Moreover, I received dozens of implementations in Haskell and Lisp and these were the only vaguely competitive ones: most programmers are not able to write fast code in Haskell or Lisp primarily because their performance is so wildly unpredictable. The Burrows-Wheeler block sorting data compression algorithm implemented in Haskell and discussed extensively for weeks in the context of performance is a good example of this: they never got within 10,000x the performance of C. There are many other examples where nobody was able to get within orders of magnitude of the performance of common languages. GHC does have rudimentary support for parallelism and that makes it much easier to leverage 2-6 cores in Haskell compared to OCaml. However, that is merely a deficiency in the current OCaml implementation and is something that can be addressed. Moreover, the current Haskell implementation scales very poorly and is easily maxed out even on today's 8 core computers. For example, on a recent Mandelbrot benchmark from comp.lang.functional the Haskell is faster for 1-6 cores but stops seeing improvements beyond that whereas OCaml with process forking continues to see improvements up to all 8 cores and it faster overall on this machine as a consequence. Although efficient concurrent garbage collectors are hard to write, parallel ones like the one in GHC are comparatively easy to write and still very useful. > However, when you look at memory consumption OCaml uses considerably > less memory, except for languages in the C family. > > I suspect that many real world performance scenarios, such as heavily > loaded web servers and complex simulations, depend very much on memory > consumption. This is both because of GC overhead and because of the > slower memory pipeline the more cache levels are involved. > > So in case of a new JIT solution for OCaml, I believe it is important > to observe this aspect as well. OCaml's memory efficiency is certainly extremely good and it may be theoretically possible to preserve that in a new implementation that supports parallelism. That is absolutely not the goal of my work though: I only intent to get the simplest possible parallel GC working because I am interested primarily in high-performance numerics, string processing and visualization and not web servers. However, I will endeavour to make the implementation as extensible as possible so that other people can create drop-in replacements that provide this kind of functionality. Improving upon my GC should be quite easy for anyone versed in the subject. Interestingly, my GC is written entirely in my own intermediate representation. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-03-04 19:00 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-02-28 1:12 Odd performance result with HLVM Jon Harrop 2009-02-28 20:18 ` [Caml-list] " Kuba Ober 2009-02-28 21:21 ` Richard Jones 2009-02-28 21:59 ` Jon Harrop 2009-03-02 0:38 ` Jon Harrop 2009-02-28 21:52 ` Jon Harrop 2009-03-02 3:20 ` Jon Harrop 2009-03-02 14:28 ` Florent Ouchet 2009-03-02 16:18 ` Jon Harrop 2009-03-02 20:09 ` Kuba Ober 2009-03-04 16:17 ` Mikkel Fahnøe Jørgensen 2009-03-04 16:30 ` Kuba Ober 2009-03-04 18:15 ` Mikkel Fahnøe Jørgensen 2009-03-04 18:32 ` Jon Harrop 2009-03-04 19:05 ` Jon Harrop
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox