* [Caml-list] Parallel n-queens solver @ 2011-04-24 23:32 Jon Harrop 2011-04-25 10:36 ` Gerd Stolpmann 0 siblings, 1 reply; 5+ messages in thread From: Jon Harrop @ 2011-04-24 23:32 UTC (permalink / raw) To: Caml List Gerd recently posted this article about parallelizing an n-queens solver in OCaml: http://blog.camlcity.org/blog/multicore3.html The idea is to reuse the same basic definitions and parallelize the program simply by changing the "run" function. I have tried to translate this OCaml to F# without benefit of being able to run the original OCaml code but I have checked the results against the known solutions to this problem. Here is Gerd's sequential OCaml: module Sequential = struct let run n = let t0 = Unix.gettimeofday() in let ht = Hashtbl.create 91 in for k = 0 to n-1 do solve k n (fun b -> if not (Hashtbl.mem ht b) then ( let b = Array.copy b in List.iter (fun b' -> Hashtbl.add ht b' () ) (transformations b); print b ) ) done; let t1 = Unix.gettimeofday() in printf "Number solutions: %n\n%!" (Hashtbl.length ht / 8); printf "Time: %.3f\n%!" (t1-.t0) end My equivalent F#: module Sequential = let run n = let timer = System.Diagnostics.Stopwatch.StartNew() let solve k = let sols = ResizeArray() solve k n (transformations >> Seq.min >> sols.Add) sols.ToArray() let sols = Array.init n solve |> Seq.concat |> Seq.distinct |> Seq.length printfn "Number solutions: %n" sols printfn "Time: %.3f" timer.Elapsed.TotalSeconds Gerd's sequential OCaml run on his 4-core Opteron and my sequential F# run on a 4-core W3520 have very similar performance characteristics. Now for parallelization. Gerd's fastest parallel OCaml solution (aka MP2) is 287 lines long. In contrast, the F# can be parallelized by adding just 12 characters to the source code of the sequential implementation: module Parallel = let run n = let timer = System.Diagnostics.Stopwatch.StartNew() let solve k = let sols = ResizeArray() solve k n (transformations >> Seq.min >> sols.Add) sols.ToArray() let sols = Array.Parallel.init n solve |> PSeq.concat |> PSeq.distinct |> PSeq.length printfn "Number solutions: %n" sols printfn "Time: %.3f" timer.Elapsed.TotalSeconds I just replaced Array.init with Array.Parallel.init and Seq with PSeq. Moreover, this trivially parallelized F# implementation also achieves performance on this machine comparable to Gerd's parallel OCaml on his machine. In particular, the absolute performance results for my parallel F# are better in every single case. However, it is not clear how scalable these parallel solutions are. On an 8-core E5405 Xeon I get only 5x speedup compared to 3.8x speedup on 4 cores. There can be little doubt that this solution is vastly more readable and maintainable though. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Parallel n-queens solver 2011-04-24 23:32 [Caml-list] Parallel n-queens solver Jon Harrop @ 2011-04-25 10:36 ` Gerd Stolpmann 2011-04-25 16:36 ` Frédéric Gava 0 siblings, 1 reply; 5+ messages in thread From: Gerd Stolpmann @ 2011-04-25 10:36 UTC (permalink / raw) To: Jon Harrop; +Cc: Caml List Am Montag, den 25.04.2011, 00:32 +0100 schrieb Jon Harrop: > Gerd recently posted this article about parallelizing an n-queens solver in > OCaml: > > http://blog.camlcity.org/blog/multicore3.html > > The idea is to reuse the same basic definitions and parallelize the program > simply by changing the "run" function. I have tried to translate this OCaml > to F# without benefit of being able to run the original OCaml code but I > have checked the results against the known solutions to this problem. > > Here is Gerd's sequential OCaml: > > module Sequential = struct > let run n = > let t0 = Unix.gettimeofday() in > let ht = Hashtbl.create 91 in > for k = 0 to n-1 do > solve k n > (fun b -> > if not (Hashtbl.mem ht b) then ( > let b = Array.copy b in > List.iter > (fun b' -> > Hashtbl.add ht b' () > ) > (transformations b); > print b > ) > ) > done; > let t1 = Unix.gettimeofday() in > printf "Number solutions: %n\n%!" (Hashtbl.length ht / 8); > printf "Time: %.3f\n%!" (t1-.t0) > end > > My equivalent F#: > > module Sequential = > let run n = > let timer = System.Diagnostics.Stopwatch.StartNew() > let solve k = > let sols = ResizeArray() > solve k n (transformations >> Seq.min >> sols.Add) > sols.ToArray() > let sols = > Array.init n solve > |> Seq.concat > |> Seq.distinct > |> Seq.length > printfn "Number solutions: %n" sols > printfn "Time: %.3f" timer.Elapsed.TotalSeconds > > Gerd's sequential OCaml run on his 4-core Opteron and my sequential F# run > on a 4-core W3520 have very similar performance characteristics. > > Now for parallelization. Gerd's fastest parallel OCaml solution (aka MP2) is > 287 lines long. In contrast, the F# can be parallelized by adding just 12 > characters to the source code of the sequential implementation: > > module Parallel = > let run n = > let timer = System.Diagnostics.Stopwatch.StartNew() > let solve k = > let sols = ResizeArray() > solve k n (transformations >> Seq.min >> sols.Add) > sols.ToArray() > let sols = > Array.Parallel.init n solve > |> PSeq.concat > |> PSeq.distinct > |> PSeq.length > printfn "Number solutions: %n" sols > printfn "Time: %.3f" timer.Elapsed.TotalSeconds > > I just replaced Array.init with Array.Parallel.init and Seq with PSeq. > > Moreover, this trivially parallelized F# implementation also achieves > performance on this machine comparable to Gerd's parallel OCaml on his > machine. In particular, the absolute performance results for my parallel F# > are better in every single case. However, it is not clear how scalable these > parallel solutions are. On an 8-core E5405 Xeon I get only 5x speedup > compared to 3.8x speedup on 4 cores. > > There can be little doubt that this solution is vastly more readable and > maintainable though. Well admitted. Netmulticore is an add-on to an existing compiler and runtime, which explains a lot of the additional verbosity. Also, it is right now focusing on the core mechanisms for parallelization, so there is nothing like a parallel array initializer. Actually, I'm quite happy that a quite complex approach like MP2 (using 3 processes communicating with each other, and a bit over-engineered) can be worked out in only 287 lines. Without having checked in detail, I'm quite sure a number of parallel design patterns can be supported by higher-level constructs. Because of the "fork-style" process creation, not everything is well-suited for this, e.g. a parallel Array.init makes only sense when the array lives in shared memory (which exists as Netmcore_array). In some sense, the challenge is how well three ideas can be brought together: - fork-style multi-processing - value heaps in shared memory that support mutation - the right mix of functional and imperative programming that is best for the problem to solve Generally, I think FP data structures are easier to support in this context, e.g. with parallel list or stream operations (imagine parallelized map and filter operations). Let's see how this evolves - Netmulticore is at its beginning, and there is still a lot of room for improvement. Gerd > > -- > Dr Jon Harrop, Flying Frog Consultancy Ltd. > http://www.ffconsultancy.com > > -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Parallel n-queens solver 2011-04-25 10:36 ` Gerd Stolpmann @ 2011-04-25 16:36 ` Frédéric Gava 2011-04-25 19:30 ` Jon Harrop 0 siblings, 1 reply; 5+ messages in thread From: Frédéric Gava @ 2011-04-25 16:36 UTC (permalink / raw) To: caml-list Dear all, > Without having checked in detail, I'm quite sure a number of parallel > design patterns can be supported by higher-level constructs. It is the well known "skeleton paradigm" and BSP model (twice, since 1990) Skeletons: http://en.wikipedia.org/wiki/Algorithmic_skeleton http://homepages.inf.ed.ac.uk/mic/Skeletons/index.html BSP: http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel http://www.bsp-worldwide.org/ Frédéric Gava ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [Caml-list] Parallel n-queens solver 2011-04-25 16:36 ` Frédéric Gava @ 2011-04-25 19:30 ` Jon Harrop 2011-04-25 23:33 ` Eray Ozkural 0 siblings, 1 reply; 5+ messages in thread From: Jon Harrop @ 2011-04-25 19:30 UTC (permalink / raw) To: frederic.gava, caml-list Frédéric wrote: > It is the well known "skeleton paradigm" and BSP model (twice, since 1990) > > Skeletons: > http://en.wikipedia.org/wiki/Algorithmic_skeleton > http://homepages.inf.ed.ac.uk/mic/Skeletons/index.html > > BSP: > http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel > http://www.bsp-worldwide.org/ Very true. Subsequent fiddling turned up the following F# solution that uses a sequential pair of calls to Parallel.init (i.e. direct bulk synchronous parallelism) instead of the data flow parallelism: let run n = let timer = System.Diagnostics.Stopwatch.StartNew() let solve k = let sols = Array.init (n*n) (fun _ -> ResizeArray()) solve k n (fun b -> let b = transformations b sols.[b.[1]*n + b.[0]].Add b) sols |> Array.map (fun sols -> sols.ToArray()) let sols = let sols = Array.Parallel.init n solve Array.Parallel.init (n*n) (fun i -> let s = System.Collections.Generic.HashSet(HashIdentity.Structural) for sols in sols do let sols = sols.[i] for i=0 to sols.Length-1 do s.Add sols.[i] |> ignore s.Count) |> Array.sum printfn "%d solutions %d-queens problem in %gs" sols n timer.Elapsed.TotalSeconds That old "invoke" combinator that evaluates a function application on a forked process and marshals the result back could be used to implement a naïve Parallel.init and, therefore, this solution could be translated to OCaml quite easily. However, it would still lack load balancing. That turns out to be ok in this particular case because the generated data is quite evenly distributed (and we assume that nothing else is running on the machine) but real applications might not be so lucky... Turns out you can also use data flow parallelism entirely by replacing Array.Parallel.init with PSeq.init. In theory, this allows the second phase to occur concurrently with the first phase, being updated as new results are generated. In practice, I don't see a speedup. Cheers, Jon. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Parallel n-queens solver 2011-04-25 19:30 ` Jon Harrop @ 2011-04-25 23:33 ` Eray Ozkural 0 siblings, 0 replies; 5+ messages in thread From: Eray Ozkural @ 2011-04-25 23:33 UTC (permalink / raw) To: Jon Harrop; +Cc: frederic.gava, caml-list [-- Attachment #1: Type: text/plain, Size: 1033 bytes --] On Mon, Apr 25, 2011 at 10:30 PM, Jon Harrop <jon@ffconsultancy.com> wrote: > Frédéric wrote: > That old "invoke" combinator that evaluates a function application on a > forked process and marshals the result back could be used to implement a > naïve Parallel.init and, therefore, this solution could be translated to > OCaml quite easily. However, it would still lack load balancing. That turns > out to be ok in this particular case because the generated data is quite > evenly distributed (and we assume that nothing else is running on the > machine) but real applications might not be so lucky... > > Turns out you can also use data flow parallelism entirely by r > > Compare implementations of the adaptive quadrature algorithm. That doesn't balance so easily :) http://en.wikipedia.org/wiki/Adaptive_quadrature Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct [-- Attachment #2: Type: text/html, Size: 1557 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-04-25 23:33 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-04-24 23:32 [Caml-list] Parallel n-queens solver Jon Harrop 2011-04-25 10:36 ` Gerd Stolpmann 2011-04-25 16:36 ` Frédéric Gava 2011-04-25 19:30 ` Jon Harrop 2011-04-25 23:33 ` Eray Ozkural
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox