Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Jon Harrop <jon@ffconsultancy.com>
To: David Teller <David.Teller@univ-orleans.fr>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Re: Why OCaml rocks
Date: Sat, 10 May 2008 20:59:54 +0100	[thread overview]
Message-ID: <200805102059.54856.jon@ffconsultancy.com> (raw)
In-Reply-To: <1210371949.6399.62.camel@Blefuscu>

On Friday 09 May 2008 23:25:49 David Teller wrote:
> On Fri, 2008-05-09 at 19:10 +0100, Jon Harrop wrote:
> > Parallelism is easy in F#.
>
> Now, that's a cliffhanger. Could you elaborate?

Sure. Review the concerns cited regarding parallel programming in OCaml:

1. When do we fork? Earlier to amortize the overhead or later to enable 
sharing of data structures?

2. How do we avoid excessive copying? What if each parallel thread only 
requires part of a data structure, should we dissect the data structure to 
alleviate message passing?

3. How do we pass values like heap allocated custom blocks?

4. Do we use the Ancient module and manage our memory manually so that we can 
mutate shared data?

These all make parallel programming much harder in OCaml than it needs to be. 
All of these concerns simply disappear when you have a concurrent GC. F# 
already does. Hopefully OCaml will soon.

Addressing each point in turn:

1. Use the high-performance thread pool provided by the Task Parallel Library.

2. There is no copying.

3. You can pass any value to another thread because everything is uniformly 
represented.

4. Memory management is fully automated and shared mutable data is easy.

Consider the embarassingly-parallel task of matrix-matrix multiplication. Note 
that this is absolutely best-case for OCaml because the overheads are as 
small as possible thanks to the complete absence of fine-grained parallelism. 
Most real applications require much more fine-grained parallelism and OCaml's 
performance is much worse as a consequence.

In F#, this is really easy to write:

    let c = Array2.zero_create am bn
    Parallel.For(0, n, fun i ->
      for j = 0 to n - 1 do
        let mutable r = 0.0
        for k = 0 to n - 1 do
          r <- r + a.[i,k] * b.[k,j]
        c.[i,j] <- r)

There are several things to notice about this:

. The input matrices are immediately accessible from parallel threads without 
us having to do anything. There is no copying, so memory consumption is lower 
and there is no overhead to worry about.

. The output matrix is filled directly using mutation. Again, there is no 
copying, no subsequent single-threaded collation of results and no overhead.

. No manual memory management is required.

There is no easy solution in OCaml but you might:

. Fork a process for each CPU at the start of the multiply in order to avoid 
copying the input matrices, use message passing to return the result and 
collate it serially.

. Prefork a process for each CPU and use message passing to queue work items, 
pass the input matrices using message passing, compute results, pass results 
back using message passing and collate them serially.

So we might have an O(1) hit from forking, an O(n^2) hit from message passing 
and collation and an O(n^3) actual algorithm. Like Ulf said, message passing 
scales well. However, its raw performance is awful compared to leveraging 
shared memory.

Here is an OCaml implementation of the above F#:

  let invoke (f : 'a -> 'b) x : unit -> 'b =
    let input, output = Unix.pipe() in
    match Unix.fork() with
    | -1 -> (let v = f x in fun () -> v)
    | 0 ->
	Unix.close input;
	let output = Unix.out_channel_of_descr output in
	Marshal.to_channel output (try `Res(f x) with e -> `Exn e) [];
	close_out output;
	exit 0
    | pid ->
	Unix.close output;
	let input = Unix.in_channel_of_descr input in
	fun () ->
	  let v = Marshal.from_channel input in
	  ignore (Unix.waitpid [] pid);
	  close_in input;
	  match v with
	  | `Res x -> x
	  | `Exn e -> raise e
  
  let parmul_aux i0 i1 n a b =
    let c = Array.make_matrix (i1 - i0) n 0. in
    for i = i0 to i1 - 1 do
      let ai = a.(i) in
      for j = 0 to n - 1 do
	let r = ref 0.0 in
	for k = 0 to n - 1 do
	  r := !r +. Array.unsafe_get ai k *. Array.unsafe_get (Array.unsafe_get b k) 
j
	done;
	c.(i - i0).(j) <- !r
      done;
    done;
    c
  
  let parmul n a b =
    let c1 = invoke (fun () -> parmul_aux 0 (n/2) n a b) () in
    let c2 = parmul_aux (n/2) n n a b in
    Array.concat [c1(); c2]

Note that we must manually insert unsafe array accesses and hoist loop 
invariants in order to obtain comparable performance. This code is clearly 
much more complicated than the F# and that complexity is completely 
unnecessary.

> > > I think that the cost of copying data is totally overrated. We are
> > > doing this often, and even over the network, and hey, we are breaking
> > > every speed limit.
> >
> > You cannot afford to pay that price for parallel implementations of most
> > numerical algorithms.
>
> Er... Not being a specialist, I may be wrong, but I seem to remember
> that you can afford that, as long as you're also doing something else
> during that copy.

Here are some actual performance results for this task on my machine:

 n        OCaml              F#
      Serial Parallel   Serial Parallel
 16  0.00005 0.00022   0.00002 0.000045
 32  0.00040 0.00067   0.00015 0.00015
 64  0.0033  0.0021    0.00090 0.00068
512  4.0     2.4       2.5     1.4

As you can see, F# is much faster in every case. Larger "n" approaches the 
limit where F# is only 1.6x faster than OCaml. For smaller "n" the gap 
quickly widens with F# being 5x faster for n=16.

> > On the contrary, that is not a theoretical statement at all: it
> > already happened. F# already makes it much easier to write high
> > performance parallel algorithms and its concurrent GC is the crux of that
> > capability. 
>
> Examples ? Pretty please ?

The above examples are a good start. The latest F#.NET Journal article covers 
parallelism using Microsoft's TPL. A future OCaml Journal article will cover 
the use of fork-based parallelism. I think you can see what I mean from the 
results I've presented in this post though.

To put this another way, multicore computing provides hardware accelerated 
shared memory and a concurrent GC makes it easy to exploit that. Message 
passing is fine for concurrent applications that are not CPU bound or for 
distributed computing but it is not competitive on today's multicore machines 
and will not become competitive in the next decade.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


  parent reply	other threads:[~2008-05-10 20:04 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-09  0:39 Why OCaml sucks Jon Harrop
2008-05-09  1:11 ` [Caml-list] " Matthew William Cox
2008-05-09  5:10   ` [Caml-list] Re: Why OCaml **cks Jon Harrop
2008-05-09  4:45 ` [Caml-list] Re: Why OCaml sucks Arthur Chan
2008-05-09  5:09   ` Jon Harrop
2008-05-09 11:12     ` [Caml-list] Re: Why OCaml rocks Gerd Stolpmann
2008-05-09 11:58       ` Gabriel Kerneis
2008-05-09 12:10         ` Concurrency [was Re: [Caml-list] Re: Why OCaml rocks] Robert Fischer
2008-05-09 12:41         ` [Caml-list] Re: Why OCaml rocks Gerd Stolpmann
2008-05-09 12:49         ` David Teller
2008-05-09 18:10       ` Jon Harrop
2008-05-09 20:40         ` Gerd Stolpmann
2008-05-09 20:55           ` Berke Durak
2008-05-10 10:56             ` Gerd Stolpmann
2008-05-09 21:00           ` Till Varoquaux
2008-05-09 21:13             ` Berke Durak
2008-05-09 22:26               ` Richard Jones
2008-05-09 23:01                 ` Berke Durak
2008-05-10  7:52                   ` Richard Jones
2008-05-10  8:24                     ` Berke Durak
2008-05-10  8:51                       ` Richard Jones
2008-05-13  3:47           ` Jon Harrop
2008-05-09 22:25         ` David Teller
2008-05-09 22:57           ` Vincent Hanquez
2008-05-10 19:59           ` Jon Harrop [this message]
2008-05-10 21:39             ` Charles Forsyth
2008-05-11  3:58               ` Jon Harrop
2008-05-11  9:41                 ` Charles Forsyth
2008-05-12 13:22             ` Richard Jones
2008-05-12 18:07               ` Jon Harrop
2008-05-12 20:05                 ` Arthur Chan
2008-05-13  0:42               ` Gerd Stolpmann
2008-05-13  1:19                 ` Jon Harrop
2008-05-13  2:03                   ` Gerd Stolpmann
2008-05-13  3:13                     ` Jon Harrop
2008-05-12 20:33             ` Arthur Chan
2008-05-12 21:22               ` Till Varoquaux
2008-05-09 13:00     ` [Caml-list] Re: Why OCaml sucks Ulf Wiger (TN/EAB)
2008-05-09 17:46       ` Jon Harrop
2008-05-09 18:17         ` Ulf Wiger (TN/EAB)
2008-05-10  1:29           ` Jon Harrop
2008-05-10 14:51             ` [Caml-list] Re: Why OCaml **cks Ulf Wiger (TN/EAB)
2008-05-10 18:19               ` Jon Harrop
2008-05-10 21:58                 ` Ulf Wiger (TN/EAB)
2008-05-10 18:39               ` Mike Lin
2008-05-12 13:31           ` [Caml-list] Re: Why OCaml sucks Kuba Ober
2008-05-12 18:18             ` Jon Harrop
2008-05-12 13:13   ` Kuba Ober
2008-05-12 19:32     ` Arthur Chan
2008-05-09  6:31 ` Tom Primožič
2008-05-09  6:46 ` Elliott Oti
2008-05-09  7:53   ` Till Varoquaux
2008-05-09  7:45 ` Richard Jones
2008-05-09  8:10   ` Jon Harrop
2008-05-09  9:31     ` Richard Jones
2008-05-09  7:58 ` [Caml-list] Re: Why OCaml rocks David Teller
2008-05-09 10:29   ` Jon Harrop
2008-05-09 13:08     ` David Teller
2008-05-09 15:38     ` Jeff Polakow
2008-05-09 18:09       ` Jon Harrop
2008-05-09 20:36         ` Berke Durak
2008-05-09 22:34         ` Richard Jones
2008-05-14 13:44           ` Kuba Ober
2008-05-09  8:29 ` constructive criticism about Ocaml Ulf Wiger (TN/EAB)
2008-05-09  9:45 ` [Caml-list] Re: Why OCaml sucks Vincent Hanquez
2008-05-09 10:23   ` [Caml-list] Re: Why OCaml **cks Jon Harrop
2008-05-09 22:01     ` Vincent Hanquez
2008-05-09 22:23       ` David Teller
2008-05-10  8:36       ` Christophe TROESTLER
2008-05-10  9:18         ` Vincent Hanquez
2008-05-09 11:37   ` [Caml-list] Re: Why OCaml sucks Ralph Douglass
2008-05-09 13:02     ` [Caml-list] Re: Why OCaml rocks David Teller
2008-05-09 12:33 ` not all functional languages lack parallelism Ulf Wiger (TN/EAB)
2008-05-09 18:10   ` Jon Harrop
2008-05-09 20:26     ` Ulf Wiger (TN/EAB)
2008-05-12 12:54 ` [Caml-list] Re: Why OCaml sucks Kuba Ober
2008-05-12 14:16   ` Jon Harrop
2008-05-13 13:33     ` Kuba Ober
2008-05-13 13:49       ` Robert Fischer
2008-05-13 14:01         ` Brian Hurt
2008-05-13 14:13           ` Robert Fischer
2008-05-13 15:18             ` Berke Durak
2008-05-14  4:40             ` Kuba Ober
2008-05-13 14:25           ` Gerd Stolpmann
2008-05-14  4:29           ` Kuba Ober
2008-05-12 13:01 ` Kuba Ober
2008-05-12 19:18   ` Arthur Chan
2008-05-12 19:41     ` Karl Zilles
2008-05-13 13:17     ` Kuba Ober

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200805102059.54856.jon@ffconsultancy.com \
    --to=jon@ffconsultancy.com \
    --cc=David.Teller@univ-orleans.fr \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox