* Severe loss of performance due to new signal handling @ 2006-03-17 18:39 Markus Mottl 2006-03-17 19:10 ` [Caml-list] " Christophe TROESTLER 2006-03-20 9:29 ` Xavier Leroy 0 siblings, 2 replies; 13+ messages in thread From: Markus Mottl @ 2006-03-17 18:39 UTC (permalink / raw) To: ocaml [-- Attachment #1.1: Type: text/plain, Size: 1591 bytes --] Hi, this report has also been posted to the OCaml bug tracker, but since it is a surprising observation, it may be good if people on the list know that it exists without having to search the bug tracker archive. Maybe some assembler guru can repeat this result and explain to us what's going on... ---------- It seems that changes to signal handling between OCaml 3.08.4 and 3.09.1 can lead to a very significant loss of performance (up to several orders of magnitude!) in code that uses threads and performs I/O (tested on Linux). The attached file (slow.ml) demonstrates this: it prints a character to stdout in a for-loop. The uploaded version will take approximately 600ms in native code to complete this test when redirecting output to /dev/null. If you comment out the line containing "module X = Thread" and compile without thread support, then the test suddenly only takes around 1.5ms, i.e. it runs 400 times faster. Profiling using oprofile revealed that the function "caml_process_pending_signals" seems to be responsible for that. Annotated assembler output showed that the code was sampled an astonishing number of times in the instruction "test %eax,%eax" as obviously generated for "if (async_action != NULL)" in this function. This is really weird, because everything else seems to be sampled a sensible number of times, but it would surely explain the timings. OCaml-3.08.4 does not exhibit any problems of that kind. ---------- Best regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com [-- Attachment #1.2: Type: text/html, Size: 1937 bytes --] [-- Attachment #2: slow.ml --] [-- Type: application/octet-stream, Size: 195 bytes --] open Unix open Printf module X = Thread let () = let t1 = gettimeofday () in for i = 1 to 100000 do print_char '.'; done; let t2 = gettimeofday () in eprintf "%f\n" (t2 -. t1); ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-17 18:39 Severe loss of performance due to new signal handling Markus Mottl @ 2006-03-17 19:10 ` Christophe TROESTLER 2006-03-20 9:29 ` Xavier Leroy 1 sibling, 0 replies; 13+ messages in thread From: Christophe TROESTLER @ 2006-03-17 19:10 UTC (permalink / raw) To: OCaml Mailing List Hi, On Fri, 17 Mar 2006, "Markus Mottl" <markus.mottl@gmail.com> wrote: > > Profiling using oprofile revealed that the function > "caml_process_pending_signals" seems to be responsible for that. An earlier related thread: http://caml.inria.fr/pub/ml-archives/caml-list/2006/02/2858f1e4532daae90d5b0762e3fff3cd.en.html But your code is even more striking! > OCaml-3.08.4 does not exhibit any problems of that kind. If somebody who has both OCaml 3.08 and 3.09 on his machine is willing to spend some time to check whether the same thing happens with the above mentioned program, that will be appreciated. Best regards, ChriS ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-17 18:39 Severe loss of performance due to new signal handling Markus Mottl 2006-03-17 19:10 ` [Caml-list] " Christophe TROESTLER @ 2006-03-20 9:29 ` Xavier Leroy 2006-03-20 10:39 ` Oliver Bandel ` (2 more replies) 1 sibling, 3 replies; 13+ messages in thread From: Xavier Leroy @ 2006-03-20 9:29 UTC (permalink / raw) To: Markus Mottl; +Cc: ocaml > It seems that changes to signal handling between OCaml 3.08.4 and 3.09.1 > can lead to a very significant loss of performance (up to several orders > of magnitude!) in code that uses threads and performs I/O (tested on Linux). > [...] > Maybe some assembler guru can repeat this result and explain to us > what's going on... Short explanation: atomic instructions are dog slow. Longer explanation: OCaml 3.09 fixed a number of long-standing bugs in signal handling that could cause signals to be "lost" (not acted upon). The fixes, located mostly in the code that polls for pending signals (caml_process_pending_signals), rely on an atomic "read-and-clear" operation, implemented using atomic processor instructions on x86, x86-64 and PPC. This makes signal handling correct (no signal can be lost) but I didn't realize that it has such an impact on performance, even on a uniprocessor machine. Thanks for pointing this out. (To prevent a number of well-meaning but irrelevant posts, keep in mind that we're using atomic instructions in a single-threaded program, to get atomicity w.r.t. signals, not w.r.t. concurrent threads. We don't need the latter kind of atomicity given OCaml's threading model.) Now, you may wonder why the problem appears mainly with threaded programs. The reason is that programs linked with the Thread library, even if they do not create threads, check for signals much more often, because they enter and leave blocking sections more often. In your example, each call to "print_char" needs to lock and unlock the stdout channel, causing two signal polls each time. So, it's time to go back to the drawing board. Fortunately, it appears that reliable polling of signals is possible without atomic processor instructions. Expect a fix in 3.09.2 at the latest, and probably within a couple of weeks in the CVS. Regards, - Xavier Leroy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-20 9:29 ` Xavier Leroy @ 2006-03-20 10:39 ` Oliver Bandel 2006-03-20 12:37 ` Gerd Stolpmann 2006-03-20 16:15 ` Markus Mottl 2006-03-21 1:33 ` Robert Roessler 2 siblings, 1 reply; 13+ messages in thread From: Oliver Bandel @ 2006-03-20 10:39 UTC (permalink / raw) To: caml-list On Mon, Mar 20, 2006 at 10:29:49AM +0100, Xavier Leroy wrote: > > It seems that changes to signal handling between OCaml 3.08.4 and 3.09.1 > > can lead to a very significant loss of performance (up to several orders > > of magnitude!) in code that uses threads and performs I/O (tested on > Linux). > > [...] > > Maybe some assembler guru can repeat this result and explain to us > > what's going on... > > Short explanation: atomic instructions are dog slow. > > Longer explanation: > > OCaml 3.09 fixed a number of long-standing bugs in signal handling > that could cause signals to be "lost" (not acted upon). The fixes, [...] > Now, you may wonder why the problem appears mainly with threaded > programs. The reason is that programs linked with the Thread library, > even if they do not create threads, check for signals much more > often, because they enter and leave blocking sections more often. In > your example, each call to "print_char" needs to lock and unlock the > stdout channel, causing two signal polls each time. Is this really necessary? Doing a write to stdout with locking... if not explicitly wanted?! > So, it's time to go back to the drawing board. Fortunately, it > appears that reliable polling of signals is possible without atomic > processor instructions. Expect a fix in 3.09.2 at the latest, and > probably within a couple of weeks in the CVS. I'm not clear about what your proble is with lost signals, but when using signals on Unix/Linux-systems, you can use UNIX-API, with sigaction/sigprocmask etc. you can do things well, and with the signal-function which C provides things are bad/worse. The C-API signal-function signal(3) clears out the signal handler after a call to it. In the sigaction/sigprocmask/... functions the handler remains installed. But if this is what you think about (and how it will be done on windows or other systems) I don't know, but maybe this is a hint that matters. BTW: I saw that in the Unix-module the unix-signalling functions are now included... (the ywere not on older versions of Ocaml). Ciao, Oliver ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-20 10:39 ` Oliver Bandel @ 2006-03-20 12:37 ` Gerd Stolpmann 2006-03-20 13:13 ` Oliver Bandel 2006-03-20 15:54 ` Xavier Leroy 0 siblings, 2 replies; 13+ messages in thread From: Gerd Stolpmann @ 2006-03-20 12:37 UTC (permalink / raw) To: Oliver Bandel; +Cc: caml-list Am Montag, den 20.03.2006, 11:39 +0100 schrieb Oliver Bandel: > On Mon, Mar 20, 2006 at 10:29:49AM +0100, Xavier Leroy wrote: > > > It seems that changes to signal handling between OCaml 3.08.4 and 3.09.1 > > > can lead to a very significant loss of performance (up to several orders > > > of magnitude!) in code that uses threads and performs I/O (tested on > > Linux). > > > [...] > > > Maybe some assembler guru can repeat this result and explain to us > > > what's going on... > > > > Short explanation: atomic instructions are dog slow. > > > > Longer explanation: > > > > OCaml 3.09 fixed a number of long-standing bugs in signal handling > > that could cause signals to be "lost" (not acted upon). The fixes, > > I'm not clear about what your proble is with lost signals, > but when using signals on Unix/Linux-systems, you can use > UNIX-API, with sigaction/sigprocmask etc. you can do things well, > and with the signal-function which C provides things are bad/worse. > The C-API signal-function signal(3) clears out the signal handler > after a call to it. In the sigaction/sigprocmask/... functions > the handler remains installed. The problem is the following: The O'Caml runtime cannot handle signals immediately because this would break memory management (e.g. imagine a signal happens when memory has just been allocated but not initialized). To get around this the signal handler sets just a flag, and the compiler emits instructions that regularly check this flag at safe points of execution (i.e. memory is known to be initialised). These instructions are now atomic in 3.09. In 3.08, you have basically if "flag is set" then ( (*) "clear flag"; "call the signal handler function" ) If another signal happens at (*) it will be lost. As you mention sigprocmask: Of course, you can block signals before checking the flag and allow them again after clearing it, but this would be even _much_ slower than the solution in 3.09, because sigprocmask needs a context switch to do its work (it is a kernel function). I don't know what Xavier has in mind to solve the problem, but I would think about reducing the frequency of the atomic check. This could work as follows: - Revert the check to the 3.08 solution - Use the alarm clock timer to regularly call a signal_manager function at a certain frequency (i.e. the signal flag is set at a certain frequency) - Only the alarm clock timer signal is left unblocked. The other signals are normally blocked. - In signal_manager, it is checked whether there are other pending signals, and if so, their functions are called. Of course, it is again possible that alarm clock signals are lost, but this is harmless, because it is a repeatedly emitted signal. The other signals cannot be lost, but their execution is deferred to the next alarm clock event. > But if this is what you think about (and how it will be done > on windows or other systems) I don't know, but maybe this is > a hint that matters. > > BTW: I saw that in the Unix-module the unix-signalling functions are > now included... (the ywere not on older versions of Ocaml). They have been included for a long time. New is Thread.sigmask. Gerd -- ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-20 12:37 ` Gerd Stolpmann @ 2006-03-20 13:13 ` Oliver Bandel 2006-03-20 15:54 ` Xavier Leroy 1 sibling, 0 replies; 13+ messages in thread From: Oliver Bandel @ 2006-03-20 13:13 UTC (permalink / raw) To: caml-list On Mon, Mar 20, 2006 at 01:37:39PM +0100, Gerd Stolpmann wrote: > Am Montag, den 20.03.2006, 11:39 +0100 schrieb Oliver Bandel: > > On Mon, Mar 20, 2006 at 10:29:49AM +0100, Xavier Leroy wrote: > > > > It seems that changes to signal handling between OCaml 3.08.4 and 3.09.1 > > > > can lead to a very significant loss of performance (up to several orders > > > > of magnitude!) in code that uses threads and performs I/O (tested on > > > Linux). > > > > [...] > > > > Maybe some assembler guru can repeat this result and explain to us > > > > what's going on... > > > > > > Short explanation: atomic instructions are dog slow. > > > > > > Longer explanation: > > > > > > OCaml 3.09 fixed a number of long-standing bugs in signal handling > > > that could cause signals to be "lost" (not acted upon). The fixes, > > > > I'm not clear about what your proble is with lost signals, > > but when using signals on Unix/Linux-systems, you can use > > UNIX-API, with sigaction/sigprocmask etc. you can do things well, > > and with the signal-function which C provides things are bad/worse. > > The C-API signal-function signal(3) clears out the signal handler > > after a call to it. In the sigaction/sigprocmask/... functions > > the handler remains installed. > > The problem is the following: The O'Caml runtime cannot handle signals > immediately because this would break memory management (e.g. imagine a > signal happens when memory has just been allocated but not initialized). > To get around this the signal handler sets just a flag, and the compiler > emits instructions that regularly check this flag at safe points of > execution (i.e. memory is known to be initialised). These instructions > are now atomic in 3.09. In 3.08, you have basically > > if "flag is set" then ( > (*) > "clear flag"; > "call the signal handler function" > ) > > If another signal happens at (*) it will be lost. Well, I'm not an OCaml-internals specialist, so I can't say if this would be necessary... On the first look it looks like the problem one has when using signal(3) instead of sigprocmask(), sigaction() and Co. > > As you mention sigprocmask: Of course, you can block signals before > checking the flag and allow them again after clearing it, but this would > be even _much_ slower than the solution in 3.09, because sigprocmask > needs a context switch to do its work (it is a kernel function). Why to call such functions often? You can use sigaction() to handle signals when you want it; even if signals are blocked, their occurence will be saved. When you want to handle them, then you can do it. It's too long ago to say details here, but if wanted, I can look for details (not today, but tomorrow I will have some time to do it). (The only thing you can't find out with this mechanism is, which of the signals came first and which later.) > > I don't know what Xavier has in mind to solve the problem, but I would > think about reducing the frequency of the atomic check. > This could work as follows: > > - Revert the check to the 3.08 solution > - Use the alarm clock timer to regularly call a signal_manager > function at a certain frequency (i.e. the signal flag is set > at a certain frequency) Using alarm() is not reliable. [...] > > BTW: I saw that in the Unix-module the unix-signalling functions are > > now included... (the ywere not on older versions of Ocaml). > > They have been included for a long time. New is Thread.sigmask. Depends on the definition of "long time" ;-) As I had first conact with OCaml, which really is some years ago, it was not included (I think 3.04?). I didn't looked for these functions, and just saw them, while looking for other things at about 3.08 (?). So then I was astouned. This makes OCaml better suited for applications in the real world, because C's signal(3) is unreliable. (When catching the signal, the handler will be deactivated, until it is re-established again - that's the same problem as you has mentioned above. So if a signal comes twice, you lost one. But on the other hand, this provides the system for recursive loops which could make it unreliable too. But only with sigprocmask()/sigaction() and so on you can do it reliable and clean and clear.) Ciao, Oliver ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-20 12:37 ` Gerd Stolpmann 2006-03-20 13:13 ` Oliver Bandel @ 2006-03-20 15:54 ` Xavier Leroy 1 sibling, 0 replies; 13+ messages in thread From: Xavier Leroy @ 2006-03-20 15:54 UTC (permalink / raw) To: Gerd Stolpmann; +Cc: caml-list > The problem is the following: [...] In 3.08, you have basically > > if "flag is set" then ( > (*) > "clear flag"; > "call the signal handler function" > ) > > If another signal happens at (*) it will be lost. Actually, the problematic code in 3.08 is: tmp <- flag; (*) flag <- 0; if (tmp) { process the signal; } and indeed a signal can be lost (never processed) if it occurs at (*). The solution I have in mind is to implement exactly the pseudocode you give above. If a signal occurs at (*), it is not lost (the signal handler function will be called just afterwards!), just conflated with a previous occurrence of that signal, but this is fair game: POSIX signals have the same behaviour. (Yes, I'm ignoring the queueing behaviour of realtime POSIX signals.) Note however that in 3.09 and in my proposed fix, there is one flag per signal, which still improves over 3.08 (which had only one shared flag) and ensures that two occurrences of different signals are not conflated, again as per POSIX. > I don't know what Xavier has in mind to solve the problem, but I would > think about reducing the frequency of the atomic check. That would be plan C, plan B being making the check even more efficient. I'd rather not introduce timer signals if at all possible, though, since these mess up many function calls. - Xavier Leroy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-20 9:29 ` Xavier Leroy 2006-03-20 10:39 ` Oliver Bandel @ 2006-03-20 16:15 ` Markus Mottl 2006-03-20 16:24 ` Will Farr 2006-03-21 1:33 ` Robert Roessler 2 siblings, 1 reply; 13+ messages in thread From: Markus Mottl @ 2006-03-20 16:15 UTC (permalink / raw) To: Xavier Leroy; +Cc: ocaml [-- Attachment #1: Type: text/plain, Size: 1087 bytes --] On 3/20/06, Xavier Leroy <Xavier.Leroy@inria.fr> wrote: > > Short explanation: atomic instructions are dog slow. Thanks for the explanation. I'd never have guessed that atomic instructions could be responsible for such a deterioration of performance. So, it's time to go back to the drawing board. Fortunately, it > appears that reliable polling of signals is possible without atomic > processor instructions. Expect a fix in 3.09.2 at the latest, and > probably within a couple of weeks in the CVS. > Great! Btw., since we are at it, you could also make us really happy by fixing issue 3906 in the next release, too. Now that the reason is clear this should be very straightforward, and would save people who write certain kinds of threaded code a lot of headaches, because this bug can cause all sorts of weird problems in long-running applications (freezes, crashes, execution of random code, etc.), and was extremely hard to trigger and track down. Best regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com [-- Attachment #2: Type: text/html, Size: 1729 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-20 16:15 ` Markus Mottl @ 2006-03-20 16:24 ` Will Farr 0 siblings, 0 replies; 13+ messages in thread From: Will Farr @ 2006-03-20 16:24 UTC (permalink / raw) To: ocaml Hello all, As an aside, if anyone is interested in techniques for making atomic transactions fast with low latency, etc, the paper Atomic heap transactions and fine-grain interrupts by Olin Shivers, James W. Clark and Roland McGrath: http://www-static.cc.gatech.edu/~shivers/papers/heap.ps presents several *neat* hacks to do this efficiently. I'm sure that the implementators on the list are already aware of this work, but I just wanted to point it out as interesting reading for people (like myself) who think this stuff is neat but don't necessarily have broad experience with it. Will ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-20 9:29 ` Xavier Leroy 2006-03-20 10:39 ` Oliver Bandel 2006-03-20 16:15 ` Markus Mottl @ 2006-03-21 1:33 ` Robert Roessler 2006-03-21 3:11 ` Markus Mottl 2 siblings, 1 reply; 13+ messages in thread From: Robert Roessler @ 2006-03-21 1:33 UTC (permalink / raw) To: Caml-list Xavier Leroy wrote: > > It seems that changes to signal handling between OCaml 3.08.4 and 3.09.1 > > can lead to a very significant loss of performance (up to several orders > > of magnitude!) in code that uses threads and performs I/O (tested on > Linux). > > [...] > > Maybe some assembler guru can repeat this result and explain to us > > what's going on... > > Short explanation: atomic instructions are dog slow. At the risk of being "irrelevant", I wanted to nail down exactly what assertion is being made here: are we talking about directly executing in assembly code the relevant x86[-64]/ppc/whatever instructions for "read-and-clear", or going through OS-dependent access routines like Windows' InterlockedExchange()? Or: is the source of the dog slow behavior because of OS overhead, or is it a low-level issue like memory barriers/cache lines getting flushed/something else? Robert Roessler roessler@rftp.com http://www.rftp.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-21 1:33 ` Robert Roessler @ 2006-03-21 3:11 ` Markus Mottl 2006-03-21 4:04 ` Brian Hurt 2006-03-21 12:54 ` Robert Roessler 0 siblings, 2 replies; 13+ messages in thread From: Markus Mottl @ 2006-03-21 3:11 UTC (permalink / raw) To: Robert Roessler; +Cc: Caml-list [-- Attachment #1: Type: text/plain, Size: 604 bytes --] On 3/20/06, Robert Roessler <roessler@rftp.com> wrote: > > At the risk of being "irrelevant", I wanted to nail down exactly what > assertion is being made here: are we talking about directly executing > in assembly code the relevant x86[-64]/ppc/whatever instructions for > "read-and-clear", or going through OS-dependent access routines like > Windows' InterlockedExchange()? We are talking of the assembly code. See file byterun/signals_machdep.h, which contains the corresponding macros. Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com [-- Attachment #2: Type: text/html, Size: 1081 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-21 3:11 ` Markus Mottl @ 2006-03-21 4:04 ` Brian Hurt 2006-03-21 12:54 ` Robert Roessler 1 sibling, 0 replies; 13+ messages in thread From: Brian Hurt @ 2006-03-21 4:04 UTC (permalink / raw) To: Markus Mottl; +Cc: Robert Roessler, Caml-list On Mon, 20 Mar 2006, Markus Mottl wrote: > On 3/20/06, Robert Roessler <roessler@rftp.com> wrote: >> >> At the risk of being "irrelevant", I wanted to nail down exactly what >> assertion is being made here: are we talking about directly executing >> in assembly code the relevant x86[-64]/ppc/whatever instructions for >> "read-and-clear", or going through OS-dependent access routines like >> Windows' InterlockedExchange()? > > > We are talking of the assembly code. See file byterun/signals_machdep.h, > which contains the corresponding macros. OK, poking around a little bit in byterun, I'm seeing this peice of code: for (signal_number = 0; signal_number < NSIG; signal_number++) { Read_and_clear(signal_state, caml_pending_signals[signal_number]); if (signal_state) caml_execute_signal(signal_number, 0); } with Read_and_clear being defined as: #if defined(__GNUC__) && defined(__i386__) #define Read_and_clear(dst,src) \ asm("xorl %0, %0; xchgl %0, %1" \ : "=r" (dst), "=m" (src) \ : "m" (src)) xchgl is the atomic operation (this is always atomic when referencing a memory location, regardless of the presence or absence of a lock prefix). Appropos of nothing, a better definition of that macro would be: #define Read_and_clear(dst,src) \ asm volatile ("xchgl %0, %1" \ : "=r" (dst), "+m" (src) \ : "0" (0)) as this gives gcc the choice of how to move 0 into the register (using an xor will still be a popular choice, but it'll occassionally do a movl depending upon instruction scheduling choices). Some more poking around tells me that NSIG is defined on Linux to be 64. I think the problem is not doing an atomic operation, but doing 64 of them. I'd be inclined to move to a bitset implementation- allowing you to replace 64 atomic instructions with 2. On the x86, you can use the lock bts instruction to set the bit. Some implementation like: #if defined(__GNUC__) && defined(__i386__) typedef unsigned long sigword_t; #define Read_and_clear(dst,src) \ asm volatile ("xchgl %0, %1" \ : "=r" (dst), "+m" (src) \ : "0" (0)) #define Set_sigflag(sigflags, NR) \ asm volatile ("lock bts %1, %0" \ : "+m" (*sigflags) \ : "rN" (NR) \ : "cc") ... #define SIGWORD_BITS (CHAR_BITS * sizeof(sigword_t)) #define NR_SIGWORDS ((NSIG + SIGWORD_BITS - 1)/SIGWORD_BITS) extern sigword_t caml_pending_signals[NR_SIGWORDS]; for (i = 0; i < NR_SIGWORDS; i++) { sigword_t temp; int j; Read_and_clear(temp, caml_pending_signals[i]); for (j = 0; temp != 0; j++) { if ((temp & 1ul) != 0) { caml_execute_signal((i * SIGWORD_BITS) + j, 0) } temp >>= 1; } } This is somewhat more code, but i, j, and temp would all end up in registers, and it'd be two atomic instructions, not 64. The x86 assembly code I can dash off from the top of my head. Similiar bits of assembly can be written for other CPUs- I just have to go dig out the right books. Brian ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling 2006-03-21 3:11 ` Markus Mottl 2006-03-21 4:04 ` Brian Hurt @ 2006-03-21 12:54 ` Robert Roessler 1 sibling, 0 replies; 13+ messages in thread From: Robert Roessler @ 2006-03-21 12:54 UTC (permalink / raw) To: Caml-list Markus Mottl wrote: > On 3/20/06, *Robert Roessler* <roessler@rftp.com > > At the risk of being "irrelevant", I wanted to nail down exactly what > assertion is being made here: are we talking about directly executing > in assembly code the relevant x86[-64]/ppc/whatever instructions for > "read-and-clear", or going through OS-dependent access routines like > Windows' InterlockedExchange()? > > > We are talking of the assembly code. See file > byterun/signals_machdep.h, which contains the corresponding macros. Thanks, Markus - in the case you cite (direct instruction use), I was hoping for some illumination on this huge cost... reviewing the Intel manuals, I note that: 1) there is *no* claim that cache lines are flushed just by doing the xchg 2) in fact, with the Pentium Pro on, the bus LOCK# operation will not even happen if the data is cached - everything is left to the cache coherency mechanism 3) there *is* mention of processor *cache locking*, but this is still just in the context of cache coherency with multiple processors... so nothing here is suggesting cache line flushing or anything else that sounds horrendously expensive, particularly in the single CPU case < 8 hours later, back to finish email :) > Finally, it is interesting that you bring up this file - it appears as if the msvc toolchain is no longer supported for doing "correct" (in terms of Xavier's "atomicity w.r.t. signals") builds... at least that is how I interpret the conditional compilation directives. Robert Roessler roessler@rftp.com http://www.rftp.com ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-03-21 12:54 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-03-17 18:39 Severe loss of performance due to new signal handling Markus Mottl 2006-03-17 19:10 ` [Caml-list] " Christophe TROESTLER 2006-03-20 9:29 ` Xavier Leroy 2006-03-20 10:39 ` Oliver Bandel 2006-03-20 12:37 ` Gerd Stolpmann 2006-03-20 13:13 ` Oliver Bandel 2006-03-20 15:54 ` Xavier Leroy 2006-03-20 16:15 ` Markus Mottl 2006-03-20 16:24 ` Will Farr 2006-03-21 1:33 ` Robert Roessler 2006-03-21 3:11 ` Markus Mottl 2006-03-21 4:04 ` Brian Hurt 2006-03-21 12:54 ` Robert Roessler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox