* Re: [Caml-list] Severe loss of performance due to new signal handling (fwd)
@ 2006-03-21 23:53 Brian Hurt
2006-03-22 9:20 ` Alexander S. Usov
2006-03-22 10:56 ` Robert Roessler
0 siblings, 2 replies; 3+ messages in thread
From: Brian Hurt @ 2006-03-21 23:53 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: TEXT/PLAIN, Size: 968 bytes --]
Opps- didn't intend this message to be off-list.
---------- Forwarded message ----------
Date: Tue, 21 Mar 2006 16:32:51 -0600 (CST)
From: Brian Hurt <bhurt@spnz.org>
To: Robert Roessler <roessler@rftp.com>
Subject: Re: [Caml-list] Severe loss of performance due to new signal handling
On Tue, 21 Mar 2006, Robert Roessler wrote:
> Well, I *thought* there was a marked absence of "bit-level parallelism" in
> the signal-handling... ;)
>
> So the "expense" of individual atomic operations is not really what is at the
> heart of this performance problem...
Hmm. Maybe not. I'm measuring a 4 clock cycle cost for a xchgl, both with and
without a lock on my Athlon XP 1.8GHz. See attached code. Naturally, this is a
uniprocessor machine and the memory location is in L1 cache (or will be soon),
and no contention, so this is definately best case. 4 clocks is about rights
for a read and a write to L1 cache (each L1 cache access taking 2 clocks).
Brian
[-- Attachment #2: Type: TEXT/X-CSRC, Size: 1369 bytes --]
#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>
#if !defined(__GNUC__) && !defined(__i386__)
#error This code only works with GCC/i386.
#endif
/* The reason this only works under GNU C and the x86 is we're using the
* rdtsc instruction.
*/
static inline unsigned long long rdtsc() {
unsigned long long rval;
asm volatile ("rdtsc" : "=A" (rval));
return rval;
}
static inline unsigned long read_and_clear(unsigned long * ptr) {
unsigned long rval;
asm volatile ("lock xchgl %0, %1" : "=r" (rval), "+m" (*ptr) : "0" (0));
return rval;
}
int main(void) {
int i;
volatile unsigned long trash;
unsigned long val = 1;
unsigned long long start, stop, time, min;
/* Time how long a rdtsc takes- we do this ten times and take the
* cheapest run.
*/
min = ~0ull;
for (i = 0; i < 10; ++i) {
start = rdtsc();
trash = 0;
stop = rdtsc();
time = stop - start;
if (time < min) {
min = time;
}
}
printf("Minimum time for a rdtsc instruction (in clocks): %llu\n", min);
min = ~0ull;
for (i = 0; i < 10; ++i) {
val = 1;
start = rdtsc();
trash = read_and_clear(&val);
stop = rdtsc();
time = stop - start;
if (time < min) {
min = time;
}
}
printf("Minimum time for a read_and_clear() + rdtsc (in clocks): %llu\n", min);
return 0;
}
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling (fwd)
2006-03-21 23:53 [Caml-list] Severe loss of performance due to new signal handling (fwd) Brian Hurt
@ 2006-03-22 9:20 ` Alexander S. Usov
2006-03-22 10:56 ` Robert Roessler
1 sibling, 0 replies; 3+ messages in thread
From: Alexander S. Usov @ 2006-03-22 9:20 UTC (permalink / raw)
To: caml-list
On Wednesday 22 March 2006 00:53, Brian Hurt wrote:
> Opps- didn't intend this message to be off-list.
>
> ---------- Forwarded message ----------
> Date: Tue, 21 Mar 2006 16:32:51 -0600 (CST)
> From: Brian Hurt <bhurt@spnz.org>
> To: Robert Roessler <roessler@rftp.com>
> Subject: Re: [Caml-list] Severe loss of performance due to new signal
> handling
>
> On Tue, 21 Mar 2006, Robert Roessler wrote:
> > Well, I *thought* there was a marked absence of "bit-level parallelism"
> > in the signal-handling... ;)
> >
> > So the "expense" of individual atomic operations is not really what is at
> > the heart of this performance problem...
>
> Hmm. Maybe not. I'm measuring a 4 clock cycle cost for a xchgl, both with
> and without a lock on my Athlon XP 1.8GHz. See attached code. Naturally,
> this is a uniprocessor machine and the memory location is in L1 cache (or
> will be soon), and no contention, so this is definately best case. 4
> clocks is about rights for a read and a write to L1 cache (each L1 cache
> access taking 2 clocks).
$ ./a.out
Minimum time for a rdtsc instruction (in clocks): 88
Minimum time for a read_and_clear() + rdtsc (in clocks): 248
$ grep 'model name' /proc/cpu
model name : Intel(R) Pentium(R) 4 CPU 2.00GHz
--
Best regards,
Alexander.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Caml-list] Severe loss of performance due to new signal handling (fwd)
2006-03-21 23:53 [Caml-list] Severe loss of performance due to new signal handling (fwd) Brian Hurt
2006-03-22 9:20 ` Alexander S. Usov
@ 2006-03-22 10:56 ` Robert Roessler
1 sibling, 0 replies; 3+ messages in thread
From: Robert Roessler @ 2006-03-22 10:56 UTC (permalink / raw)
To: Caml-list
Brian Hurt wrote:
>
> ---------- Forwarded message ----------
> Date: Tue, 21 Mar 2006 16:32:51 -0600 (CST)
> From: Brian Hurt <bhurt@spnz.org>
> To: Robert Roessler <roessler@rftp.com>
> Subject: Re: [Caml-list] Severe loss of performance due to new signal
> handling
>
> On Tue, 21 Mar 2006, Robert Roessler wrote:
>
>> Well, I *thought* there was a marked absence of "bit-level
>> parallelism" in the signal-handling... ;)
>>
>> So the "expense" of individual atomic operations is not really what is
>> at the heart of this performance problem...
>
> Hmm. Maybe not. I'm measuring a 4 clock cycle cost for a xchgl, both
> with and without a lock on my Athlon XP 1.8GHz. See attached code.
> Naturally, this is a uniprocessor machine and the memory location is in
> L1 cache (or will be soon), and no contention, so this is definately
> best case. 4 clocks is about rights for a read and a write to L1 cache
> (each L1 cache access taking 2 clocks).
And after adjusting the inline assembly syntax for vc7.1, I get
Minimum time for a rdtsc instruction (in clocks): 38
Minimum time for a read_and_clear() + rdtsc (in clocks): 75
This is on a P-III S (Tualatin) @ 1.4GHz on Windows XP SP2.
Robert Roessler
roessler@rftp.com
http://www.rftp.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-03-22 10:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-21 23:53 [Caml-list] Severe loss of performance due to new signal handling (fwd) Brian Hurt
2006-03-22 9:20 ` Alexander S. Usov
2006-03-22 10:56 ` Robert Roessler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox