From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE autolearn=disabled version=3.1.3 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 63F6DBC6B for ; Tue, 8 Jan 2008 15:20:07 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq4HAOYUg0dDWxLC/2dsb2JhbACBV6dw X-IronPort-AV: E=Sophos;i="4.24,258,1196636400"; d="scan'208";a="6415564" Received: from ip67-91-18-194.z18-91-67.customer.algx.net (HELO server1.bertec.net) ([67.91.18.194]) by mail1-smtp-roc.national.inria.fr with ESMTP; 08 Jan 2008 15:20:06 +0100 Received: from kuba.bertec.net (kuba.bertec.net [192.168.2.16]) by server1.bertec.net (Postfix) with ESMTP id 30F05105830 for ; Tue, 8 Jan 2008 09:20:05 -0500 (EST) From: Kuba Ober To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Performance questions, -inline, ... Date: Tue, 8 Jan 2008 09:20:03 -0500 User-Agent: KMail/1.9.6 (enterprise 0.20071123.740460) References: <200801031128.30183.ober.14@osu.edu> <200801071022.41044.ober.14@osu.edu> <200801071958.23681.jon@ffconsultancy.com> In-Reply-To: <200801071958.23681.jon@ffconsultancy.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801080920.04187.ober.14@osu.edu> X-Spam: no; 0.00; -inline:01 high-level:01 hacked:01 runtime:01 ocaml:01 inference:01 syntax:01 camlp:01 run-time:01 polymorphism:01 higher-order:01 ocaml:01 gcc:01 compiler:01 compiler:01 On Monday 07 January 2008, Jon Harrop wrote: > On Monday 07 January 2008 15:22:40 Kuba Ober wrote: > > On Monday 07 January 2008, Jon Harrop wrote: > > > You mean it might be possible to recover the performance of C from > > > numerical code with high-level abstractions? Yes. Indeed, I would like > > > to see this done. However, I've never heard of an implementation of any > > > language that can do this. > > > > g++ does that reasonably well. Heck, I have a severely hacked in-house > > LISP system which had grown out of an assembler-with-LISP-macros, and it > > does it all the time. It's a LISP with static types at runtime, but > > general LISPiness at compile time (macros just run in gcl). In fact I'm > > looking to port it to OCaml just because type inference and commonplace > > LISP syntax for type declarations don't mix too well - the code starts > > looking really ugly. Maybe I stuck too much with LISP's usual > > declaim-this-and-that approach, but that's what you get when you reuse > > most of underlying LISP's implementation. > > Yes. If you're willing to do that kind of hacking then you could get a long > way with camlp4 without too much trouble. > > > > I assume you didn't mimic run-time polymorphism, currying, closures and > > > higher-order functions in the C though? > > > > No. But while those are very useful features to have, the code is messy > > if you've got two languages to work with (hi-perf stuff in C, everything > > else in OCaml). > > On 64-bit there is no performance benefit in dropping to C. > > > > Yes. Your "add1" function is entirely dead code and, therefore, is > > > trivially reducible. This is not true of real programs and, therefore, > > > your results will not be representative of real programs. Moreover, > > > there is no sense in trying to optimize a program that does nothing > > > anyway. > > > > If you code such a thing in gcc, it correctly becomes a no-op. OCaml > > isn't all that clever ;) > > > > You're 100% academically-correct of course. > > As Xavier says, "the OCaml compiler is designed to compile good code". > > > > The simplest route to recovering C performance here is: > > > > > > . Inline "( +. )". > > > . Inline "op1". > > > . Type-specialize "op1". > > > . Hoist bounds checks. > > > > You mean "the programmer has to do this"? OCaml compiler is *really* bad, > > then. > > You're 100% academically-correct of course. > > In practice, the combination of OCaml's awesome native-code generation on > AMD64 and the ease with which you can work around those missing > optimizations mean that OCaml is *really* good compared to all other > language > implementations with comparable qualities. This is precisely why it is so > popular. > > > > There are some trade-offs here though. Stalin-compiled Scheme, > > > MLton-compiled SML and GHC-compiled Haskell do a lot more than OCaml in > > > theory but, in practice, they are much less useful for high-performance > > > numerics because their performance is so unpredictable. > > > > Good to know. > > OCaml's code generator is also much better than MLton's and GHC's (even > 6.8). > > > > I've already spent a long time meticulously contructing benchmarks in > > > various different languages along these lines and the most important > > > optimizations missing from OCaml are: > > > > > > . Hoisting bounds checks. > > > . Common subexpression elimination. > > > . Type specialization. > > > . More aggressive inlining. > > > . Static optimization of div and mod. > > > > > > The nice thing about OCaml is that it has a superb 64-bit code > > > generator so, once you've done those optimizations on the hot paths > > > manually, OCaml lets you run code as quickly as C but with all of the > > > high-level benefits everywhere else. > > > > > > A perfect language implementation would certainly do these > > > optimizations for you and I wish OCaml did but I still haven't found > > > anything better. > > > > After I gain some more experience I will hopefully add an OCaml-esque > > front end to my hackish compiler. It's completely unportable, only works > > on 12-bit-PIC-like architecture (SX from Parallax) and on Z8 Encore!, but > > hey, maybe it can be a good starting point to something bigger and > > better. The backend still has to be ported from Lisp, and I'm too time- > > and > > knowledge-constrained to do it just yet. > > ... > > I see. If you're not using OCaml as a general purpose language then you > don't need its other benefits (e.g. GUI libraries). > > I'm in a similar situation. I'm toying with the idea of building a better > FPL implementation for high-performance numerics that is commerce friendly > using LLVM. If you take the path of least resistance then I believe that is > tractable but you lose nice things like LablGTK2... If you can do some code generation, it shouldn't be a big deal to implement even some very complex ABIs, say interfacing to C++ libraries. As long as you can cleanly run your language at compile time (like lisp does), and as long as the compiler provides a documented way to pop assembly into the code, you can have a nice language that can, in "natively compiled" output, interface say with Qt. IMHO, the latter is now a few years ahead of GTK, and is only gaining the advantage as time passes. Languages such as OCaml sorely lack in nontrivial ABI department - there's no reason to have to write (or even generate) C-style binding code for say Qt just to use it in OCaml. Both bytecode compiler and native code compiler should have a pluggable ABI-generator scheme where they can interface with C++ (at least). Another way to go about it would be to machine-translate C++ libraries to OCaml. This should be doable for Qt, but I don't think I'm up for that one just yet, even though I think about it. If you want to generate code for a VM, the backend should allow selectable VMs - at least CLR and JVM, if the thing is to be of any relevance. The language should, IMHO, also be flexible enough that features could be reasonably added to have it support resource-constrained architectures. I'd expect the language to have a core feature set which doesn't do automated memory management beyond what can be inferred for fixed-lifetime data. This would be akin to working with good old C++, and should be easily portable to tiny targets. Then you have "level 2" feature set which is supposed to use garbage collecting virtual machines. I'd even add something like "level 1.5" which uses say C++ ABI but also links-in a garbage collector. So that you get automated memory management but no VM per se. Heck, for all I care you could have that thing emit code in C++ ABI by default - it's a reasonable-enough choice. I find it annoying that writing good "PC" code requires a wholly different toolset from writing embedded code, the latter usually limited to really broken vendor C (or sometimes C++). Cheers, Kuba