On 11/28/2012 03:43 AM, Mike Lin wrote: > Thanks, Gerd! > > FWIW I could not reproduce the crash by using ocaml-ssl's blocking operations directly. > > https://gist.github.com/4152047#file_ssl_threads.ml > > This works fine- so, perhaps something nasty arises from using nonblocking I/O on ssl sockets from multiple threads. I'm sure if there is any other critical difference with how netclient/equeue-ssl and my example use ocaml-ssl. > > I also don't have time to pursue this much further, so I will try to put all of my http operations on one thread as your example suggests. I was debugging a similar bug, and found this old thread with a testcase. I figured a way to fix it, see the patch below (it didn't acquire the OCaml master lock before raising an exception, causing Ocaml code to be executed in parallel with other OCaml code ... leading to all sorts of nasty situations). Debugging thread-related bugs is hard, especially that none of the usual tools help here. I modified st_posix.h a bit by adding an m->owner field and checking it against pthread_self() to make sure a thread attempts to release only a lock it acquired itself, but there is more that could be done (check in raise_with_arg/etc. that we do hold the master lock, check after returning from each C call that we do hold the lock, same for C callbacks, etc.). Would it be possible to do add checks like this with '-runtime-variant d', i.e. can the thread implementation be changed to a "checking" one in that case? The patch: Index: src/equeue-ssl/ssl_exts_stubs.c =================================================================== --- src/equeue-ssl/ssl_exts_stubs.c (revision 1913) +++ src/equeue-ssl/ssl_exts_stubs.c (working copy) @@ -27,6 +27,7 @@ caml_enter_blocking_section(); ret = SSL_shutdown(ssl); if (ret == -1) { + caml_leave_blocking_section(); raise_with_arg(*caml_named_value("ssl_exn_shutdown_error"), Val_int(SSL_get_error(ssl, ret))); };