From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p3NKIlLF016641
	for <caml-list@sympa-roc.inria.fr>; Sat, 23 Apr 2011 22:18:47 +0200
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AgwCAOwys03RVdQ2kGdsb2JhbACYHYYLAYc3CBQBAQEBCQkNBxQEIYhwngaKeIIng3U0iF4BAQMGhXAEjjWECIEGgmSCIDo
X-IronPort-AV: E=Sophos;i="4.64,259,1301868000"; 
   d="scan'208,217";a="93677583"
Received: from mail-vw0-f54.google.com ([209.85.212.54])
  by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 23 Apr 2011 22:18:41 +0200
Received: by vws18 with SMTP id 18so2170867vws.27
        for <caml-list@inria.fr>; Sat, 23 Apr 2011 13:18:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:subject:mime-version:content-type:from
         :in-reply-to:date:cc:message-id:references:to:x-mailer;
        bh=4B+f+gA9wF6I2UrFCgGZ9C+zuylHttsjhNLzQj+crv8=;
        b=C8sjlgMd2mjZn3jqBLjfe/bh4qpSBltQcuYiIfxhggyDgfo824OKgHwrjsTEQ9DcN4
         fFKPPUV7od/fsLbeNtcddG+HIlAcWkCVnXLnI8j4NDIZafFHJ6Vn7kOb14NaDpNfGTM5
         8XqgxjWg9FdJvY+G2Obk4MlG5e4LBDrFmf4EU=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=subject:mime-version:content-type:from:in-reply-to:date:cc
         :message-id:references:to:x-mailer;
        b=qzENnBmcJ6wxf2Yf/9lpWl9xHVARrxSkoP6QFVRNEMnfr6+ZBIExOOv9UyAMxRHx2r
         5jykPCbZRf7r7Q/4BQdmhJgXJ5d3zfH9JzBe36PJ50bsITF7k56+/l+cXFFycab0vhMj
         4Qo228/5TtWTKgPUMF4/2V32licoNSB4aRaSU=
Received: by 10.52.108.8 with SMTP id hg8mr3782060vdb.100.1303589919508;
        Sat, 23 Apr 2011 13:18:39 -0700 (PDT)
Received: from [192.168.0.100] (216-107-215-230.KishHeaterRoad.nhvt.static.cust.seg.net [216.107.215.230])
        by mx.google.com with ESMTPS id p29sm740343vcr.31.2011.04.23.13.18.35
        (version=TLSv1/SSLv3 cipher=OTHER);
        Sat, 23 Apr 2011 13:18:38 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: multipart/alternative; boundary=Apple-Mail-5-1012680577
From: Alexy Khrabrov <deliverable@gmail.com>
In-Reply-To: <BANLkTikHp1yu5fEuLf-=d_cMsshAcpFP5g@mail.gmail.com>
Date: Sat, 23 Apr 2011 16:18:31 -0400
Cc: Caml List <caml-list@inria.fr>
Message-Id: <7D1E1BE5-D04F-42FE-B36E-344E48792F8A@gmail.com>
References: <76544177.594058.1303341821437.JavaMail.root@zmbs4.inria.fr> <4DAFE141.7080003@inria.fr> <4DAFF442.8000806@lexifi.com> <799994864.610698.1303412613509.JavaMail.root@zmbs4.inria.fr> <4DB136FB.6050302@inria.fr> <1303463512.8429.1344.camel@thinkpad> <BANLkTi=1hkuu9XtMyH+c5RxpM4tvK_ro3A@mail.gmail.com> <97D08229-2871-42F1-A50D-8E85C6C2BE31@gmail.com> <BANLkTikHp1yu5fEuLf-=d_cMsshAcpFP5g@mail.gmail.com>
To: Eray Ozkural <examachine@gmail.com>
X-Mailer: Apple Mail (2.1082)
Subject: Re: [Caml-list] Efficient OCaml multicore -- roadmap?


--Apple-Mail-5-1012680577
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii


On Apr 23, 2011, at 1:39 PM, Eray Ozkural wrote:

> On Sat, Apr 23, 2011 at 4:47 PM, Alexy Khrabrov <deliverable@gmail.com> w=
rote:
>=20
> On Apr 23, 2011, at 6:17 AM, Eray Ozkural wrote:
>=20
> > I don't really care what others say, but to prove that this has any per=
formance value you should do the following:
> >
> >
> > Compare your most "parallel" algorithm with the performance of a corres=
ponding well-written MPI application using openmpi's shared memory transpor=
t. If there is a difference, then your system has some value.
> >
> > Of course openmpi's shared memory transport is terribly buggy, but it s=
hould give a baseline acceptable performance.
> >
> > If there is no comparison, we have no idea.
>=20
> The problem with "implement in MPI and compare" is that you have to rearc=
hitect a sequential program for a totally different model.  By contrast, us=
ing shared memory parallelism, it's often a question of using pmap.
>=20
> Incorrect. We always compare to sequential code in parallel computing. It=
's called "speedup".
>=20
> And doubly incorrect because we are not comparing to sequential code but =
a claimed shared memory parallelism. It's only logical to compare two appro=
aches on the same hardware.

I'm not claiming that something is correct or not -- I'm just saying that r=
eplacing map by pmap is easy, while rewriting in MPI style is complex.  Mak=
ing a shared memory program out of sequential one might be this trivial, wh=
ile MPI never will be; you have to program in message-passing style from th=
e get-go, and preferably in Erlang or Scala actors with or without the  AKK=
A kernel and such or 0MQ, etc.

>=20
> I really recommend everybody interested in parallelism to learn and try C=
lojure on a small problem.  You can replace a single map by pmap in a suita=
ble setting and observe a not-quite-linear, but proportional speedup.
>=20
> Of course functional programming fits such parallelism very well. It's a =
shame that ocaml does not have parallel functional primitives.
>=20=20
>=20
> I'd be really happy if OCaml gets the mechanisms from Clojure.=20
>=20
>=20
> It'd be even better if such explicit parallelism had good compiler suppor=
t, too :) I don't know much about Clojure, but I wouldn't use anything that=
 runs on JVM for a parallel program. That might be like first turning your =
computer to Commodore 64 and then getting some speedup.

This is an obsolete urban legend.  JVM has the most mature GC out there and=
 computational performance often on par with C when loaded and running.  In=
 my social networking benchmark, the largest data-churning test ever for fu=
nctional programming languages (http://functional.tv/), Clojure was only 2-=
3 times slower than OCaml and Haskell, and it's mostly due to slow Java ser=
ialization and deserialization.  My experience with Scala and Clojure tells=
 me these are the best ways now to do shared memory parallelism for perform=
ance gains in a real-world manner (using many libraries).  BTW, Haskell the=
n beat OCaml by a small margin, although using purely functional maps to OC=
aml's hash tables.  The Haskell folks keep improving their performance, alt=
hough the GC then originally crashed under such an unexpected volume as a T=
witter graph of 5 million users -- and was quickly fixed.  Still we had to =
strictify Haskell's core data structures, an exercise which made me go back=
 to OCaml.  I finished my Twitter data mining Ph.D. in OCaml as the most pr=
actical way to handle the graph, filling up a 64 GB RAM server, yet it was =
only one core out of eight running, which is a pity.

Clojure's performance improves by leaps and bounds, e.g. using primitives a=
s efficiently as in Java, and I think OCaml would benefit from a similar se=
t of primitives -- then it would be the most practical ML-style FP language=
, the prize now in fact held by Scala.

-- Alexy


--Apple-Mail-5-1012680577
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head></head><body style=3D"word-wrap: break-word; -webkit-nbsp-mode:=
 space; -webkit-line-break: after-white-space; "><br><div><div>On Apr 23, 2=
011, at 1:39 PM, Eray Ozkural wrote:</div><br class=3D"Apple-interchange-ne=
wline"><blockquote type=3D"cite">On Sat, Apr 23, 2011 at 4:47 PM, Alexy Khr=
abrov <span dir=3D"ltr">&lt;<a href=3D"mailto:deliverable@gmail.com">delive=
rable@gmail.com</a>&gt;</span> wrote:<br><div class=3D"gmail_quote"><blockq=
uote class=3D"gmail_quote" style=3D"margin-top: 0px; margin-right: 0px; mar=
gin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-co=
lor: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; posit=
ion: static; z-index: auto; ">
<div class=3D"im"><br>
On Apr 23, 2011, at 6:17 AM, Eray Ozkural wrote:<br>
<br>
&gt; I don't really care what others say, but to prove that this has any pe=
rformance value you should do the following:<br>
&gt;<br>
&gt;<br>
&gt; Compare your most "parallel" algorithm with the performance of a corre=
sponding well-written MPI application using openmpi's shared memory transpo=
rt. If there is a difference, then your system has some value.<br>

&gt;<br>
&gt; Of course openmpi's shared memory transport is terribly buggy, but it =
should give a baseline acceptable performance.<br>
&gt;<br>
&gt; If there is no comparison, we have no idea.<br>
<br>
</div>The problem with "implement in MPI and compare" is that you have to r=
earchitect a sequential program for a totally different model. &nbsp;By con=
trast, using shared memory parallelism, it's often a question of using pmap=
.<br>
</blockquote><div><br></div><div>Incorrect. We always compare to sequential=
 code in parallel computing. It's called "speedup".</div><div><br></div><di=
v>And doubly incorrect because we are not comparing to sequential code but =
a claimed shared memory parallelism. It's only logical to compare two appro=
aches on the same hardware.</div></div></blockquote><div><br></div>I'm not =
claiming that something is correct or not -- I'm just saying that replacing=
 map by pmap is easy, while rewriting in MPI style is complex. &nbsp;Making=
 a shared memory program out of sequential one might be this trivial, while=
 MPI never will be; you have to program in message-passing style from the g=
et-go, and preferably in Erlang or Scala actors with or without the &nbsp;A=
KKA kernel and such or 0MQ, etc.</div><div><br><blockquote type=3D"cite"><d=
iv class=3D"gmail_quote">
<blockquote class=3D"gmail_quote" style=3D"margin-top: 0px; margin-right: 0=
px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-=
left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex=
; position: static; z-index: auto; ">
<br>
I really recommend everybody interested in parallelism to learn and try Clo=
jure on a small problem. &nbsp;You can replace a single map by pmap in a su=
itable setting and observe a not-quite-linear, but proportional speedup.<br>
</blockquote><div><br></div><div>Of course functional programming fits such=
 parallelism very well. It's a shame that ocaml does not have parallel func=
tional primitives.</div><div>&nbsp;</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>
I'd be really happy if OCaml gets the mechanisms from Clojure.&nbsp;</block=
quote><div><br></div><div><br></div><div>It'd be even better if such explic=
it parallelism had good compiler support, too :) I don't know much about Cl=
ojure, but I wouldn't use anything that runs on JVM for a parallel program.=
 That might be like first turning your computer to Commodore 64 and then ge=
tting some speedup.</div>
</div></blockquote><br></div><div>This is an obsolete urban legend. &nbsp;J=
VM has the most mature GC out there and computational performance often on =
par with C when loaded and running. &nbsp;In my social networking benchmark=
, the largest data-churning test ever for functional programming languages =
(<a href=3D"http://functional.tv/">http://functional.tv/</a>), Clojure was =
only 2-3 times slower than OCaml and Haskell, and it's mostly due to slow J=
ava serialization and deserialization. &nbsp;My experience with Scala and C=
lojure tells me these are the best ways now to do shared memory parallelism=
 for performance gains in a real-world manner (using many libraries). &nbsp=
;BTW, Haskell then beat OCaml by a small margin, although using purely func=
tional maps to OCaml's hash tables. &nbsp;The Haskell folks keep improving =
their performance, although the GC then originally crashed under such an un=
expected volume as a Twitter graph of 5 million users -- and was quickly fi=
xed. &nbsp;Still we had to strictify Haskell's core data structures, an exe=
rcise which made me go back to OCaml. &nbsp;I finished my Twitter data mini=
ng Ph.D. in OCaml as the most practical way to handle the graph, filling up=
 a 64 GB RAM server, yet it was only one core out of eight running, which i=
s a pity.</div><div><br></div><div>Clojure's performance improves by leaps =
and bounds, e.g. using primitives as efficiently as in Java, and I think OC=
aml would benefit from a similar set of primitives -- then it would be the =
most practical ML-style FP language, the prize now in fact held by Scala.</=
div><div><br></div><div>-- Alexy</div><br></body></html>=

--Apple-Mail-5-1012680577--