From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p2U0YvB6025093 for ; Wed, 30 Mar 2011 02:34:57 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AuQCALF5kk3U436rkWdsb2JhbACERqELFAEBAQEJCwsHFAMiiHmkT5EtgSeDTHcEiDaIJA X-IronPort-AV: E=Sophos;i="4.63,265,1299452400"; d="scan'208";a="91504066" Received: from moutng.kundenserver.de ([212.227.126.171]) by mail4-smtp-sop.national.inria.fr with ESMTP; 30 Mar 2011 02:34:51 +0200 Received: from office1.lan.sumadev.de (dslb-094-219-208-030.pools.arcor-ip.net [94.219.208.30]) by mrelayeu.kundenserver.de (node=mrbap2) with ESMTP (Nemesis) id 0M9N4I-1QGEsx3JCI-00CnSH; Wed, 30 Mar 2011 02:34:49 +0200 Received: from [192.168.0.33] (dslb-084-058-046-203.pools.arcor-ip.net [84.58.46.203]) by office1.lan.sumadev.de (Postfix) with ESMTPSA id 55A655F702; Wed, 30 Mar 2011 02:34:49 +0200 (CEST) From: Gerd Stolpmann To: Alexy Khrabrov Cc: caml-list In-Reply-To: <3CE6E368-B103-472C-B622-672616E7CAB8@gmail.com> References: <3CE6E368-B103-472C-B622-672616E7CAB8@gmail.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 30 Mar 2011 02:35:14 +0200 Message-ID: <1301445314.1754.139.camel@gps-desktop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:WfPdVYchkcJ3hrbMXhuiDPPbPFENJTlz7Sna4hU/PEb N6IpAukvJWbGPBRQ/zD8Vp/uwBx7vYlVf3jqK53N+wM/nh6Yeu GJ9PWHIj5xY5aKtQTd13zy4ZEWsw7ZKrFijv883Pwbo5a3UXY/ ElsoT01iWQFR1psB15AfV1I7iObpWtBMIKtu7V6v8zRx0u9YjY auiQ6urxPvEIMuP2+cvtg== Subject: Re: [Caml-list] walking a graph in parallel Am Dienstag, den 29.03.2011, 18:56 -0400 schrieb Alexy Khrabrov: > I have a giant graph of Twitter data which takes several gigabytes in RAM, as a Hashtbl. I need to walk it, collecting various statistics, and building equally huge data structures under each node. Currently I do it all in a single OCaml program, which uses up to 60 GB of RAM and works fine. However, out of the 8 powerful CPUs the box has, only 1 is used. > > Having seen Joel's tasty bites of ZeroMQ and Thrift and Piqi, I'm thinking of exploring 0MQ as a parallel MPI/Erlang-like way to walk the graph. I'd move the graph into a server, and walkers would be separate processes. I only need inter-process communication, IPC, for the box. I could do threads and inter-thread in 0MQ if OCaml would allow real parallel threads. > > How would you manage 7 identical worker processes and 1 server, so that in the end, the results of the workers are all reduced together? What's the best way to set up the server? Some ideas: Ocamlnet contains a manager for worker processes called Netplex. Here is an example how to parallelize a matrix multiplication: http://blog.camlcity.org/blog/parallelmm.html Communication between processes is here done via SunRPC (well, I'm not for these hyped new protocols like Thrift). > -- hold the graph in MongoDB, it allows for parallel queries > -- keep the graph in an OCaml process, it allows for custom queries; but will 0MQ try to fork and copy it when replying to several workers? Copying is impossible, too big If the graph is highly interconnected, you can practically only store it in RAM (that's what all social network sites do). For read-only access, I suggest you simply allocate a big block of shared memory, and move the graph to it. Ocamlnet contains also helpers for this: http://projects.camlcity.org/projects/dl/ocamlnet-3.2.1/doc/html-main/Netsys_mem.html Look for init_value. Shared memory can be allocated using shm_open, in http://projects.camlcity.org/projects/dl/ocamlnet-3.2.1/doc/html-main/Netsys_posix.html I'm also working on extending Ocamlnet with more shared memory functionality, but this is only partially available yet (Netmulticore library). > Or, is it possible to use a huge chunk of shared memory, to place the read-only graph there and query it somehow separately from each worker, then use 0MQ for the reduce communication phase? If you need help with this, I can also offer commercial support for Ocamlnet. (Provided your company is not a search engine, or a social network, for contractual reasons.) Gerd > > -- Alexy >