From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <peng.zang@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: *
X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_NEUTRAL 
	autolearn=disabled version=3.1.3
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by yquem.inria.fr (Postfix) with ESMTP id AE416BC6B
	for <caml-list@yquem.inria.fr>; Fri, 11 Jan 2008 01:42:12 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AgAAAOpJhkdC+VLui2dsb2JhbACQHgEBAQgEBCQFkwuGJQ
X-IronPort-AV: E=Sophos;i="4.24,269,1196636400"; 
   d="scan'208";a="21111185"
Received: from wx-out-0506.google.com ([66.249.82.238])
  by mail4-smtp-sop.national.inria.fr with ESMTP; 11 Jan 2008 01:42:15 +0100
Received: by wx-out-0506.google.com with SMTP id h27so795754wxd.15
        for <caml-list@yquem.inria.fr>; Thu, 10 Jan 2008 16:42:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:received:received:from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:content-type:content-transfer-encoding:content-disposition:message-id;
        bh=MIxsntsFn1WugSuOL+UJW7jZ2GsejtAGIbiQeNmH8hY=;
        b=IrPhzDJ876WvAvoVXhDRs/tMMQ+cmZcGSWCg7B2gfXLk6+HvC3PHVYZU4XgurUNj59oHJw4DlRIz019xcU10QNDoG1p7TZ412IdgLXWK6TbZY+N4gy0yc4LKHPfUAnDZbpPTQso1QN8HnPd9fIugt4HErKgDHXbuJt4OWWTJnb0=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:content-type:content-transfer-encoding:content-disposition:message-id;
        b=tKfplgcoApoVUtIVexRl03EZzOJ3SZDgExxarSj2acN8jZ5XW/WeSlW3Q+CdT0bVLec7ojkKFCHm6CVlzRwHoy8kIvLo2EyReAai6DK651bwWLOT0gEmadocd1ZsRgknEQyIyJ1NzxhXNEFSPDmkgeqfTX1S28mLeWcEXv4uTPs=
Received: by 10.70.17.1 with SMTP id 1mr1675741wxq.28.1200012130816;
        Thu, 10 Jan 2008 16:42:10 -0800 (PST)
Received: from tama-chan ( [24.99.180.210])
        by mx.google.com with ESMTPS id h14sm4214314wxd.26.2008.01.10.16.42.09
        (version=TLSv1/SSLv3 cipher=OTHER);
        Thu, 10 Jan 2008 16:42:09 -0800 (PST)
From: Peng Zang <peng.zang@gmail.com>
Reply-To: peng.zang@gmail.com
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] ocaml doesn't need to optimize on amd64??
Date: Thu, 10 Jan 2008 19:42:05 -0500
User-Agent: KMail/1.9.7
Cc: Jon Harrop <jon@ffconsultancy.com>
References: <200801090922.00231.ober.14@osu.edu> <200801091714.52294.jon@ffconsultancy.com> <200801102256.42193.jon@ffconsultancy.com>
In-Reply-To: <200801102256.42193.jon@ffconsultancy.com>
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200801101942.08123.peng.zang@gmail.com>
X-Spam: no; 0.00; ocaml:01 hash:01 ocaml's:01 ocaml:01 ocaml's:01 high-level:01 compiler:01 implements:01 high-level:01 compiler:01 bigloo:01 implements:01 lacks:01 low-level:01 afaik:01 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey Jon,

I was wondering if you've had any experience with OCaml's speed and quality of 
code generate on Intel 64 platforms.  Eg. a Core 2 Duo or one of the more 
recent Xeons.  Thanks,

Peng

On Thursday 10 January 2008 05:56:42 pm Jon Harrop wrote:
> On Wednesday 09 January 2008 17:14:52 Jon Harrop wrote:
> > On Wednesday 09 January 2008 14:22:00 Kuba Ober wrote:
> > > Jon & al,
> > >
> > > why do you think that OCaml doesn't need to do certain
> > > optimizations on amd64?
> >
> > OCaml does a (much) better job of code generation on AMD64.
>
> I was in a bit of a rush when I wrote that and I'd like to explain what I
> meant in more detail.
>
> OCaml's AMD64 code gen is so good that I have never heard of a reason to
> drop to C for high performance code. That is simply no longer an issue, so
> we are free to stay in the land of safety and high-level concepts which is
> enormously valuable in practice because it saves so much developer time.
>
> In particular, this is more important than having a compiler that
> implements the high-level optimizations we discussed because such a
> compiler can never match the performance of C if its AMD64 code gen is not
> as good as OCaml's.
>
> This is one of the reasons why I continue to choose OCaml over SML/NJ,
> MLton, GHC6.8, SBCL, Bigloo and almost all other FPL compilers: they have
> worse AMD64 code gens and imposes a significantly lower speed limit upon
> their users.
>
> If you look at the ray tracer benchmark:
>
>   http://www.ffconsultancy.com/languages/ray_tracer/results.html
>
> Three of the five OCaml programs are faster than any SML compiled with
> MLton even though MLton implements lots of high-level optimizations that
> OCaml does not. You can draw two crucial conclusions from this:
>
> 1. Even though OCaml lacks some high-level optimizations, you don't have to
> put much effort in before OCaml beats MLton-compiled SML because OCaml's
> AMD64 code gen is so good. In particular, two of those three OCaml
> implementations don't even bother implementing any of the low-level
> optimizations that we've discussed at all!
>
> 2. OCaml lets you choose how much you want to optimize your code right up
> to the performance of C but MLton imposes quite a low speed limit: 55%
> slower than the front runners. Once you hit that limit with MLton your only
> option is to drop to C (or OCaml ;-). However, dropping into another
> language imposes its own performance hits and is even likely to undermine
> the compiler's optimization efforts.
>
> So I agree that it would be very nice if OCaml implemented more
> optimizations along these lines but I choose OCaml with its excellent code
> gen over a more optimizing compiler that didn't have such a good code gen
> every day of the week and twice on Sundays.
>
> AFAIK, these high-level optimizations are never likely to get implemented
> in OCaml. My first vote would actually go to a different (and more
> fundamentally important) optimization anyway: arbitrary unboxing. One of
> the nice things about F# and GHC is that you can specify in your type
> declaration that the type is to be stored unboxed. This can have profound
> performance
> implications.
>
> Even in standalone code, unboxing arrays of complex numbers (which OCaml
> does not do) makes FFTs 5x faster. In the context of FFIs, the performance
> improvement can be much bigger because you can completely avoid the cost of
> copying huge quantities of data (e.g. color/texcoord/normal/position struct
> arrays in vertex buffer objects for OpenGL). In contrast, the overhead of
> lambda abstraction in numerical code is "only" a factor of 2.
>
> PS: Kuba, your C code will most likely run a significantly faster in 64-bit
> as well.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.7 (GNU/Linux)

iD8DBQFHhrtgfIRcEFL/JewRAqgFAKCm1Px3uMP8nEq5lrIp/vHdbIvSsgCgiDSJ
NpRFs4RKR+W5b0S0n9SIkAo=
=Y81S
-----END PGP SIGNATURE-----