From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p3NKIlLF016641 for ; Sat, 23 Apr 2011 22:18:47 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgwCAOwys03RVdQ2kGdsb2JhbACYHYYLAYc3CBQBAQEBCQkNBxQEIYhwngaKeIIng3U0iF4BAQMGhXAEjjWECIEGgmSCIDo X-IronPort-AV: E=Sophos;i="4.64,259,1301868000"; d="scan'208,217";a="93677583" Received: from mail-vw0-f54.google.com ([209.85.212.54]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 23 Apr 2011 22:18:41 +0200 Received: by vws18 with SMTP id 18so2170867vws.27 for ; Sat, 23 Apr 2011 13:18:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:subject:mime-version:content-type:from :in-reply-to:date:cc:message-id:references:to:x-mailer; bh=4B+f+gA9wF6I2UrFCgGZ9C+zuylHttsjhNLzQj+crv8=; b=C8sjlgMd2mjZn3jqBLjfe/bh4qpSBltQcuYiIfxhggyDgfo824OKgHwrjsTEQ9DcN4 fFKPPUV7od/fsLbeNtcddG+HIlAcWkCVnXLnI8j4NDIZafFHJ6Vn7kOb14NaDpNfGTM5 8XqgxjWg9FdJvY+G2Obk4MlG5e4LBDrFmf4EU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :message-id:references:to:x-mailer; b=qzENnBmcJ6wxf2Yf/9lpWl9xHVARrxSkoP6QFVRNEMnfr6+ZBIExOOv9UyAMxRHx2r 5jykPCbZRf7r7Q/4BQdmhJgXJ5d3zfH9JzBe36PJ50bsITF7k56+/l+cXFFycab0vhMj 4Qo228/5TtWTKgPUMF4/2V32licoNSB4aRaSU= Received: by 10.52.108.8 with SMTP id hg8mr3782060vdb.100.1303589919508; Sat, 23 Apr 2011 13:18:39 -0700 (PDT) Received: from [192.168.0.100] (216-107-215-230.KishHeaterRoad.nhvt.static.cust.seg.net [216.107.215.230]) by mx.google.com with ESMTPS id p29sm740343vcr.31.2011.04.23.13.18.35 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 23 Apr 2011 13:18:38 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: multipart/alternative; boundary=Apple-Mail-5-1012680577 From: Alexy Khrabrov In-Reply-To: Date: Sat, 23 Apr 2011 16:18:31 -0400 Cc: Caml List Message-Id: <7D1E1BE5-D04F-42FE-B36E-344E48792F8A@gmail.com> References: <76544177.594058.1303341821437.JavaMail.root@zmbs4.inria.fr> <4DAFE141.7080003@inria.fr> <4DAFF442.8000806@lexifi.com> <799994864.610698.1303412613509.JavaMail.root@zmbs4.inria.fr> <4DB136FB.6050302@inria.fr> <1303463512.8429.1344.camel@thinkpad> <97D08229-2871-42F1-A50D-8E85C6C2BE31@gmail.com> To: Eray Ozkural X-Mailer: Apple Mail (2.1082) Subject: Re: [Caml-list] Efficient OCaml multicore -- roadmap? --Apple-Mail-5-1012680577 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Apr 23, 2011, at 1:39 PM, Eray Ozkural wrote: > On Sat, Apr 23, 2011 at 4:47 PM, Alexy Khrabrov w= rote: >=20 > On Apr 23, 2011, at 6:17 AM, Eray Ozkural wrote: >=20 > > I don't really care what others say, but to prove that this has any per= formance value you should do the following: > > > > > > Compare your most "parallel" algorithm with the performance of a corres= ponding well-written MPI application using openmpi's shared memory transpor= t. If there is a difference, then your system has some value. > > > > Of course openmpi's shared memory transport is terribly buggy, but it s= hould give a baseline acceptable performance. > > > > If there is no comparison, we have no idea. >=20 > The problem with "implement in MPI and compare" is that you have to rearc= hitect a sequential program for a totally different model. By contrast, us= ing shared memory parallelism, it's often a question of using pmap. >=20 > Incorrect. We always compare to sequential code in parallel computing. It= 's called "speedup". >=20 > And doubly incorrect because we are not comparing to sequential code but = a claimed shared memory parallelism. It's only logical to compare two appro= aches on the same hardware. I'm not claiming that something is correct or not -- I'm just saying that r= eplacing map by pmap is easy, while rewriting in MPI style is complex. Mak= ing a shared memory program out of sequential one might be this trivial, wh= ile MPI never will be; you have to program in message-passing style from th= e get-go, and preferably in Erlang or Scala actors with or without the AKK= A kernel and such or 0MQ, etc. >=20 > I really recommend everybody interested in parallelism to learn and try C= lojure on a small problem. You can replace a single map by pmap in a suita= ble setting and observe a not-quite-linear, but proportional speedup. >=20 > Of course functional programming fits such parallelism very well. It's a = shame that ocaml does not have parallel functional primitives. >=20=20 >=20 > I'd be really happy if OCaml gets the mechanisms from Clojure.=20 >=20 >=20 > It'd be even better if such explicit parallelism had good compiler suppor= t, too :) I don't know much about Clojure, but I wouldn't use anything that= runs on JVM for a parallel program. That might be like first turning your = computer to Commodore 64 and then getting some speedup. This is an obsolete urban legend. JVM has the most mature GC out there and= computational performance often on par with C when loaded and running. In= my social networking benchmark, the largest data-churning test ever for fu= nctional programming languages (http://functional.tv/), Clojure was only 2-= 3 times slower than OCaml and Haskell, and it's mostly due to slow Java ser= ialization and deserialization. My experience with Scala and Clojure tells= me these are the best ways now to do shared memory parallelism for perform= ance gains in a real-world manner (using many libraries). BTW, Haskell the= n beat OCaml by a small margin, although using purely functional maps to OC= aml's hash tables. The Haskell folks keep improving their performance, alt= hough the GC then originally crashed under such an unexpected volume as a T= witter graph of 5 million users -- and was quickly fixed. Still we had to = strictify Haskell's core data structures, an exercise which made me go back= to OCaml. I finished my Twitter data mining Ph.D. in OCaml as the most pr= actical way to handle the graph, filling up a 64 GB RAM server, yet it was = only one core out of eight running, which is a pity. Clojure's performance improves by leaps and bounds, e.g. using primitives a= s efficiently as in Java, and I think OCaml would benefit from a similar se= t of primitives -- then it would be the most practical ML-style FP language= , the prize now in fact held by Scala. -- Alexy --Apple-Mail-5-1012680577 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
On Apr 23, 2= 011, at 1:39 PM, Eray Ozkural wrote:

On Sat, Apr 23, 2011 at 4:47 PM, Alexy Khr= abrov <delive= rable@gmail.com> wrote:

On Apr 23, 2011, at 6:17 AM, Eray Ozkural wrote:

> I don't really care what others say, but to prove that this has any pe= rformance value you should do the following:
>
>
> Compare your most "parallel" algorithm with the performance of a corre= sponding well-written MPI application using openmpi's shared memory transpo= rt. If there is a difference, then your system has some value.
>
> Of course openmpi's shared memory transport is terribly buggy, but it = should give a baseline acceptable performance.
>
> If there is no comparison, we have no idea.

The problem with "implement in MPI and compare" is that you have to r= earchitect a sequential program for a totally different model.  By con= trast, using shared memory parallelism, it's often a question of using pmap= .

Incorrect. We always compare to sequential= code in parallel computing. It's called "speedup".

And doubly incorrect because we are not comparing to sequential code but = a claimed shared memory parallelism. It's only logical to compare two appro= aches on the same hardware.

I'm not = claiming that something is correct or not -- I'm just saying that replacing= map by pmap is easy, while rewriting in MPI style is complex.  Making= a shared memory program out of sequential one might be this trivial, while= MPI never will be; you have to program in message-passing style from the g= et-go, and preferably in Erlang or Scala actors with or without the  A= KKA kernel and such or 0MQ, etc.


I really recommend everybody interested in parallelism to learn and try Clo= jure on a small problem.  You can replace a single map by pmap in a su= itable setting and observe a not-quite-linear, but proportional speedup.

Of course functional programming fits such= parallelism very well. It's a shame that ocaml does not have parallel func= tional primitives.
 

I'd be really happy if OCaml gets the mechanisms from Clojure. 


It'd be even better if such explic= it parallelism had good compiler support, too :) I don't know much about Cl= ojure, but I wouldn't use anything that runs on JVM for a parallel program.= That might be like first turning your computer to Commodore 64 and then ge= tting some speedup.

This is an obsolete urban legend.  J= VM has the most mature GC out there and computational performance often on = par with C when loaded and running.  In my social networking benchmark= , the largest data-churning test ever for functional programming languages = (http://functional.tv/), Clojure was = only 2-3 times slower than OCaml and Haskell, and it's mostly due to slow J= ava serialization and deserialization.  My experience with Scala and C= lojure tells me these are the best ways now to do shared memory parallelism= for performance gains in a real-world manner (using many libraries).  = ;BTW, Haskell then beat OCaml by a small margin, although using purely func= tional maps to OCaml's hash tables.  The Haskell folks keep improving = their performance, although the GC then originally crashed under such an un= expected volume as a Twitter graph of 5 million users -- and was quickly fi= xed.  Still we had to strictify Haskell's core data structures, an exe= rcise which made me go back to OCaml.  I finished my Twitter data mini= ng Ph.D. in OCaml as the most practical way to handle the graph, filling up= a 64 GB RAM server, yet it was only one core out of eight running, which i= s a pity.

Clojure's performance improves by leaps = and bounds, e.g. using primitives as efficiently as in Java, and I think OC= aml would benefit from a similar set of primitives -- then it would be the = most practical ML-style FP language, the prize now in fact held by Scala.

-- Alexy

= --Apple-Mail-5-1012680577--