From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 5CE6E80161 for ; Mon, 19 Jun 2017 23:19:15 +0200 (CEST) X-IronPort-AV: E=Sophos;i="5.39,362,1493676000"; d="scan'208";a="279687441" Received: from 198.67.28.109.rev.sfr.net (HELO [192.168.1.69]) ([109.28.67.198]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Jun 2017 23:19:15 +0200 Date: Mon, 19 Jun 2017 23:19:14 +0200 (CEST) From: Julia Lawall X-X-Sender: jll@hadrien To: Josh Berdine cc: caml-list@inria.fr In-Reply-To: Message-ID: References: <88108a18-8044-bc46-bed3-aef241b4ff48@linux-france.org> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY="8323329-784363063-1497907155=:2076" Subject: Re: [Caml-list] segmentation fault --8323329-784363063-1497907155=:2076 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT On Mon, 19 Jun 2017, Josh Berdine wrote: > On Mon, Jun 19 2017, Julia Lawall wrote: > > > On Mon, 19 Jun 2017, David MENTRÉ wrote: > > > >> Hello Julia, > >> > >> Le 2017-06-18 à 21:41, Julia Lawall a écrit : > >> > Over several > >> > runs on two different laptops, the backtraces have nothing obvious in > >> > common. The bytecode version does not seem to stack overflow. Adding > >> > Gc.print_stat() at a periodic quiescent point in the execution did not > >> > show a memory leak. > >> > >> A similar issue related to random crash in native code version was asked > >> by Alexey Egorov on this list 9 days ago. Daniel Bünzli advised to him > >> to frequently call Gc.full_major () to have a crash closer to the real > >> issue. > > > > OK, I will try this. > > Are you using Windows? I have only experienced segfaults from stack > overflow on Windows. No, Linux. > > >> In the case of Alexey, it was a non tail-recursive call that triggered a > >> stack overflow and that is not detected in OCaml native code. Apparently > >> this kind of issue is fixed in next to come OCaml 4.06.0. > > > > Why would something like this not be deterministic? In my case, it can > > happen after seconds or after tens of minutes. > > This sounds like an interaction with the GC to me. This was my thought as well. Also, of the four cores I have, two are in the Gc. But two are not. > >> Of course, your issue might be entirely different. But frequently > >> calling Gc.full_major () seems a sensible starting point to me. > >> > >> Otherwise you have the usual suspects: are you using C bidings? Threads? > > > > There are no threads. The software may use C bindings. I don't think > > they are involved in the failing execution, but I'm not 100% sure. > > I have found running under valgrind to be helpful in this sort of > situation, though it is slow and some valid code can trigger many > warnings. OK, I'll try that. thanks, julia > > Cheers, Josh > --8323329-784363063-1497907155=:2076--