From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p0RJAn1m015682 for ; Thu, 27 Jan 2011 20:10:49 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlYBAE9SQU3RVaG2kGdsb2JhbACWMIY2AYgOCBUBAQEBCQkMBxEEIKIOiX+CF4USLohZAQEDBYVKBIwn X-IronPort-AV: E=Sophos;i="4.60,387,1291590000"; d="scan'208";a="86514102" Received: from mail-gx0-f182.google.com ([209.85.161.182]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-MD5; 27 Jan 2011 20:10:43 +0100 Received: by gxk8 with SMTP id 8so797088gxk.27 for ; Thu, 27 Jan 2011 11:10:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=OPzA07lkE8J2oXkC/0OiFGJgtP88lht4ud6qrVz9+p8=; b=tuzjl6zQGaI9k5YOm8n4mppuDg5bLHQDlPyOqtkIx119+dPFrSL/RxhEdwCjVxO/Fq k8KCfcrvqMR8m4wH+qsws42Qwmt6Gmx34ySJutfoTuu4qATgnUGnAHHTnBD2JQ3urRvc JlNuLWYHxfMqsOYvJ+0ByzxllJgji19AUYbJ4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=X/BChJSRScoJ2K5HZxVKiCydQr/baqJ8O5ilN66M9kuM6RmKEt752+lDRmk2+TnXRT DUIFm5D9qO4urxp71uvd7oT+S2E/jpAQH0HAvDcqkdjsFohpEIeoja5gLrXx0G2k9Ltm k6qJypWfrAtJJwl7ZtfNJf6g9HiEOJ5kLIBAw= MIME-Version: 1.0 Received: by 10.91.2.20 with SMTP id e20mr3481267agi.146.1296155442023; Thu, 27 Jan 2011 11:10:42 -0800 (PST) Received: by 10.90.89.4 with HTTP; Thu, 27 Jan 2011 11:10:42 -0800 (PST) Date: Thu, 27 Jan 2011 21:10:42 +0200 Message-ID: From: Eray Ozkural To: Caml List Content-Type: multipart/alternative; boundary=0016363b8f40b20efc049ad8b490 Subject: [Caml-list] Using threads for overlapping communication/computation --0016363b8f40b20efc049ad8b490 Content-Type: text/plain; charset=ISO-8859-1 Hi there, Has anybody used OcamlMPI in conjunction with multi-threading to overlap computation and communication? During the writing of a parallel information-retrieval code, I have implemented such a feature spawning a new thread for the communications routines that correspond to a block of computation. While that proceeds, I start processing the next block (and I yield the main thread so that some context switching occurs). I think this would not meet any problems had I used a single blocking call in the spawned threads. However, this is not the case. My program performs a quite non-trivial kind of collective communication: a multi-node accumulation with a complex reduction operator, and another complex collective communication with output-partitioning. Naturally, I have to invoke several MPI calls (at least 2*logp presently). And no matter where I yield, my code slows down compared to the single-threaded version. I am suspecting this is partially due to the single-physical-thread feature of the ocaml runtime (and partially due to stress on the local memory). Do you have any recommendations on handling such scenarios? It occurred to me that I might be missing a known solution. Best Regards, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy --0016363b8f40b20efc049ad8b490 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi there,

Has anybody used OcamlMPI in conjunction with = multi-threading to overlap computation and communication?

During the writing of a parallel information-retrieval code, I have= implemented such a feature spawning a new thread for the communications ro= utines that correspond to a block of computation. While that proceeds, I st= art processing the next block (and I yield the main thread so that some con= text switching occurs). I think this would not meet any problems had I used= a single blocking call in the spawned threads. However, this is not the ca= se. My program performs a quite non-trivial kind of collective communicatio= n: a multi-node accumulation with a complex reduction operator, and another= complex collective communication with output-partitioning. Naturally, I ha= ve to invoke several MPI calls (at least 2*logp presently). And no matter w= here I yield, my code slows down compared to the single-threaded version. I= am suspecting this is partially due to the single-physical-thread feature = of the ocaml runtime (and partially due to stress on the local memory). Do = you have any recommendations on handling such scenarios? It occurred to me = that I might be missing a known solution.

Best Regards,

--
Eray Ozkural, PhD candida= te.=A0 Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philos= ophy

--0016363b8f40b20efc049ad8b490--