From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id A2158BB9C for ; Fri, 9 Dec 2005 20:36:00 +0100 (CET) Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id jB9JZwoN017639 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 9 Dec 2005 20:36:00 +0100 Received: from lion.cs.columbia.edu (IDENT:TeKZlg2cr/6fFiXSXxtiX9sAlbX2CfZo@lion.cs.columbia.edu [128.59.16.120]) by cs.columbia.edu (8.12.10/8.12.10) with ESMTP id jB9JZgiP017817 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NOT) for ; Fri, 9 Dec 2005 14:35:48 -0500 (EST) Received: from [192.168.2.8] (pool-70-107-156-125.ny325.east.verizon.net [70.107.156.125]) (authenticated bits=0) by lion.cs.columbia.edu (8.12.9/8.12.9) with ESMTP id jB9JZgnE011736 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for ; Fri, 9 Dec 2005 14:35:42 -0500 Message-ID: <4399DC7E.6000509@cs.columbia.edu> Date: Fri, 09 Dec 2005 14:35:26 -0500 From: Christopher Conway User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: caml-list@yquem.inria.fr Subject: Unix.system returns "no child processes" Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-PerlMx-Spam: Gauge=XXII, Probability=22%, Report='RELAY_IN_NJABL_DYNABLOCK 2.5, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __HAS_MSGID 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __RELAY_IN_NJABL_DYNABLOCK 0, __SANE_MSGID 0, __USER_AGENT 0' X-Miltered: at concorde with ID 4399DC9E.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; ocaml:01 3.09.0:01 ocaml:01 waitpid:01 terminating:01 execv:01 waitpid:01 execv:01 terminating:01 endline:01 endline:01 harmless:98 unix:01 unix:01 invoke:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 My employer recently upgraded our development machines from relatively old Pentiums to AMD Opterons (x86_64, easily an order of magnitude faster). This naturally required a re-compile of my Ocaml installation (3.09.0) and the supporting libraries for my application. I am using Unix.system to invoke external commands from within Ocaml. On the old machines (with the 32-bit version of Ocaml), I would occasionally get the exception Unix_error(ECHILD,"waitpid","") from Unix.system. With the new machines, I'm seeing this at every call to Unix.system, every time. I have investigate the behavior of the sub-processes, and they are terminating normally, with no indication of any error. I have a theory why this is happening, but no supporting evidence. It goes like this: system is basically fork+execv+waitpid; on the new machines, fork+execv is terminating so quickly that the process is gone before the call to waitpid. (NOTE: The external commands are relatively small, and take on the order of 0.05s of user time.) There's one rather large piece of evidence cutting against this theory: I can't reproduce this behavior using the simple test case below, even using precisly the same command arguments that get used in my full app. I don't know how anything else going on in my code could interfere with the output of Unix.system. Maybe the context-switching time with a bigger executable makes a difference? This isn't the biggest deal in the world, but I'd like to know for sure that this behavior is harmless before I write code that ignores the exception. If my theory is correct, it would be nice to see this fixed (since this behavior will become more and more common with time). Thanks, Chris system.ml: let n = ref "echo 'no command'" in Arg.parse [] (fun s -> n := s) "Usage: system " ; try ignore (Unix.system !n); (* returns normally *) print_endline "Success" with | Unix.Unix_error(err,fn,arg) -> print_endline ("Unix_error: \"" ^ (Unix.error_message err) ^ "\" in " ^ fn ^ "(" ^ arg ^ ")")