From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p9CGJYKa003214 for ; Wed, 12 Oct 2011 18:19:36 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkcCAFa9lU7U4xEIk2dsb2JhbABDhHWhEIIvIgEBAQEJCQsJFAMigVUGAQEgSws1AiYCKUkJh3cGqCOSIIEsgR+DeYEUBIxBjFyMLg X-IronPort-AV: E=Sophos;i="4.69,335,1315173600"; d="scan'208";a="112661700" Received: from moutng.kundenserver.de ([212.227.17.8]) by mail4-smtp-sop.national.inria.fr with ESMTP; 12 Oct 2011 18:19:35 +0200 Received: from office1.lan.sumadev.de (dslb-084-059-065-126.pools.arcor-ip.net [84.59.65.126]) by mrelayeu.kundenserver.de (node=mreu2) with ESMTP (Nemesis) id 0MaXEr-1RXbod3yxd-00Kl69; Wed, 12 Oct 2011 18:19:35 +0200 Received: from [192.168.5.106] (dslb-084-059-065-126.pools.arcor-ip.net [84.59.65.126]) by office1.lan.sumadev.de (Postfix) with ESMTPSA id 9F6A7C00C7; Wed, 12 Oct 2011 18:19:34 +0200 (CEST) From: Gerd Stolpmann To: caml-list@inria.fr Cc: plasma-list@ocaml-programming.de Content-Type: text/plain; charset="UTF-8" Date: Wed, 12 Oct 2011 18:19:33 +0200 Message-ID: <1318436373.16477.216.camel@thinkpad> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:HZPwsaNa5mqYlf6v/34lFplVlp4Vkbws6ig3hKD1831 BDxKtCz+CltueYUI6Yz+TQxUs8sigYqxN96RGFLAOJx/jMn5ou W/JWAbp1BunkINgG3uVtD+doSnrC9Xx217LIr7B1zshVud3Vki XJejTE2Npe1wnVdiYuo0hC76p7M5tmIEDG3iNcubx++n/iEUNe 0sW1NSxqwp0YJKBv0zxAimlcrqr9Mxs8onfUOgnKQDNYRg32z9 zjVJA/uJWOCpHIKGxCOcyIx1+npNq6B684Y96HdycA7CXIfc6A Z7wMvXBV92aetG/mhAodYm/bNBGEIts6r0M9TTzvnk/G/F34Zb QMAhHHlu2RTmYjQfrkPKLi82mThFv1rKgwp9n0O1H Subject: [Caml-list] [ANN] Plasma MapReduce, PlasmaFS, version 0.4 Hi, I've just released Plasma-0.4. Plasma consists of two parts (for now), namely Plasma MapReduce, a map/reduce compute framework, and PlasmaFS, the underlying distributed filesystem. Major changes in version 0.4: * Added a security system (including strong authentication, and authorization). This is a quite big change, and makes PlasmaFS a highly secure DFS. * Datanodes are now monitored, and failed nodes are automatically considered as unavailable. The monitoring system uses multicast messaging. * The namenode can now profit from multi-processing, removing a potential bottleneck. * Improved the caching subsystem. * Better management of file buffers in map/reduce jobs. Of course, there are also numerous bug fixes and performance improvements. Plasma MapReduce is a distributed implementation of the map/reduce algorithm scheme. In a sentence, map/reduce performs a parallel List.map on an input file, sorts and splits the output by some criterion into partitions, and runs a List.fold_left on each partition. Only that it does not do that sequentially, but in a distributed way, and chunk by chunk. Because of this Plasma MapReduce can process very large files, and if run on enough computers, this also will work in reasonable time. Of course, map and reduce are Ocaml functions here. This all works on top of a distributed filesystem, PlasmaFS. This is a user-space filesystem that is primarily accessed over RPC (but it is also mountable as NFS volume). Actually, most of the effort went here. PlasmaFS focuses on reliability and speed for big blocksizes. To get this, it implements ACID transactions, replicates data and metadata with two-phase commit, uses a shared memory data channel if possible, and monitors itself. Unlike other filesystems for map/reduce, PlasmaFS implements the complete set of usual file operations, including random reads and writes. It can also be used as unspecialized global filesystem. Both pieces of software are bundled together in one download. The project page with further links is http://projects.camlcity.org/projects/plasma.html There is now also a homepage at http://plasma.camlcity.org This is an early alpha release (0.4). A lot of things work already, and you can already run distributed map/reduce jobs. However, it is in no way complete. Plasma is installable via GODI for Ocaml 3.12. There is now a chart comparing Plasma with Hadoop. In one sentence, PlasmaFS bases on a superior filesystem design, and has now to prove that the implementation is really working. Plasma map/reduce generalizes the algorithm scheme compared with Hadoop, but has still some shortcomings in the implementation: http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmafs_and_hdfs.html http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmamr_and_hadoop.html For discussions on specifics of Plasma there is a separate mailing list: https://godirepo.camlcity.org/mailman/listinfo/plasma-list Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. ------------------------------------------------------------