From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p4C1J76S003330 for ; Thu, 12 May 2011 03:19:07 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AuIGALw0y03RVdI2kGdsb2JhbAA/lz6NawgUAQEBAQkJDQcUBCGqI4JCjBmCMoUzN4hfAQEDBoYLBIZEiS+FOIU/O4NS X-IronPort-AV: E=Sophos;i="4.64,355,1301868000"; d="scan'208";a="108257862" Received: from mail-pz0-f54.google.com ([209.85.210.54]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 12 May 2011 03:19:01 +0200 Received: by pzk27 with SMTP id 27so766567pzk.27 for ; Wed, 11 May 2011 18:19:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:date:from:to:subject:message-id:mime-version :content-type:content-disposition:user-agent; bh=ig3FjqQo/Z8GuZUj49bptq9luQ+wsm+f/5cfDpfgHRY=; b=pvbakijuK01LWqRJ9h+Jej76N+LgnZg5YGJqsf/0TkHI2qr/pW1lWQXrUir77Ho9Q2 a/t8sXI1vIrlImQi37sxsW/f98DH/cwOwt5LGHbPCvtRXMeOrzLzs0Nfyc8arCfX7Ocp gDiyNn/WfJahaEbWfg4mYdZ9hlOSSWdwYQK0w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; b=X55JGJBk7W88ysejn8Hki5haur8vvRpdAQ6an10QH3OBnmHcz2UAoRafm4GGA9E5fh 4/5YsL7VGvnpo202FMm2gwu9dvvNiWzLirfNeLRzI5UUQlmNDNyn7o4bo/N9SEpa1ug2 tGUJ6VnnkECwgWJElbci00/G65FoW4Nmek5R0= Received: by 10.68.40.98 with SMTP id w2mr5531494pbk.144.1305163139663; Wed, 11 May 2011 18:18:59 -0700 (PDT) Received: from damage.nokiapaloalto.com ([67.98.70.230]) by mx.google.com with ESMTPS id a1sm319979pbo.91.2011.05.11.18.18.57 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 11 May 2011 18:18:58 -0700 (PDT) Date: Wed, 11 May 2011 11:16:05 -0700 From: Prashanth Mundkur To: caml-list@inria.fr Message-ID: <20110511181605.GA26425@damage.nokiapaloalto.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Subject: [Caml-list] [ANNOUNCE] ODisco, for large-scale data processing in OCaml Hello, The Disco team is pleased to announce the possibility of doing large-scale data analysis (ala map-reduce) in OCaml. Disco [1] is an open-source distributed computing framework inspired by the map-reduce paradigm. It includes a distributed replicating tag-based filesystem that allows you to store your datasets in a fault-tolerant manner. Disco comes with additional tools: DiscoDB [2] for implementing efficient mapping objects and Discodex [3] for distributed indices for querying large datasets. Disco has been in production use at Nokia for two years, and is used to process terabytes of data daily [4]. The core job scheduling, cluster monitoring and filesystem logic of Disco is written in Erlang, leveraging the strengths of Erlang in concurrency and distribution. The primary language for writing compute jobs is currently Python; however, the latest Disco 0.4 release [5] has opened up the Disco worker interface, allowing jobs written to be written in any language. ODisco is the first available non-Python implementation of this Disco worker interface, and allows distributed processing of large-scale datasets in OCaml. The computation is not restricted to a record-oriented key-value style interface; the OCaml task directly gets access to the input data source and writes the output data in whatever format it chooses. The overall computation however currently still follows the traditional map-reduce dataflow, with map/shuffle/reduce stages. ODisco is available at https://github.com/pmundkur/odisco and also in the 3.12 section of Godi as the godi-odisco package. Please let us know if you have any issues with either ODisco or Disco on the Disco mailing list. Happy hacking! [1] Disco Project, http://discoproject.org [2] DiscoDB, http://discoproject.org/doc/contrib/discodb/discodb.html [3] Discodex, http://discoproject.org/doc/contrib/discodex/discodex.html#discodex [4] Disco at Nokia, http://www.erlang-factory.com/conference/SFBay2011/speakers/VilleTuulos [5] Disco 0.4 release, http://disco.posterous.com/disco-04 --prashanth