Gc.compact surprisingly helpful

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

* Gc.compact surprisingly helpful
@ 2009-12-04 19:09 Aaron Bohannon
  2009-12-04 19:31 ` [Caml-list] " Florian Hars
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Aaron Bohannon @ 2009-12-04 19:09 UTC (permalink / raw)
  To: caml-list

Hi,

I am trying to write a soft real time signal-processing application in
OCaml.  Running in a "simulation" mode, it spends no time blocking,
and uses an outermost loop that takes about 1/10 sec to run.  In order
to prevent irregular GC pauses, I decided to try inserting a call to
Gc.compact once per loop.  I was hoping the overall throughput
wouldn't suffer too badly.  To my very pleasant surprise, I found the
throughput *increased* by about 2%!!  So in a 15 second run (with no
idle time, as I said), it now does about 130 heap compactions instead
of 3 and gets better total performance because of it, utterly defying
my GC intuition.

 - Aaron

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Gc.compact surprisingly helpful
  2009-12-04 19:09 Gc.compact surprisingly helpful Aaron Bohannon
@ 2009-12-04 19:31 ` Florian Hars
  2009-12-05 17:06 ` Damien Doligez
  2009-12-05 17:35 ` Xavier Leroy
  2 siblings, 0 replies; 5+ messages in thread
From: Florian Hars @ 2009-12-04 19:31 UTC (permalink / raw)
  To: Aaron Bohannon; +Cc: caml-list

Aaron Bohannon schrieb:
> To my very pleasant surprise, I found the throughput *increased* by about 2%!! 
> [...] utterly defying my GC intuition.

Maybe you stay in L3 cache with the more compact heap, what is your resident set
size with and without the additional compactions? Performance on modern
processors is a riddle inside an enigma (someone once reported improved
performance after removing the -unsafe compile option, which has the single
function of making the code faster by removing array bounds checks).

- Florian
-- 
But our moral language is fragmented; our contemporaries reject the Kantian
hunch that choosing those things most admirable and plausible as ends in
themselves is the best practice; autonomous sources of the good are everywhere
brown and broken. Thus we have PHP. http://lambda-the-ultimate.org/node/1463


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Gc.compact surprisingly helpful
  2009-12-04 19:09 Gc.compact surprisingly helpful Aaron Bohannon
  2009-12-04 19:31 ` [Caml-list] " Florian Hars
@ 2009-12-05 17:06 ` Damien Doligez
  2009-12-08  3:30   ` Aaron Bohannon
  2009-12-05 17:35 ` Xavier Leroy
  2 siblings, 1 reply; 5+ messages in thread
From: Damien Doligez @ 2009-12-05 17:06 UTC (permalink / raw)
  To: OCaml List


On 2009-12-04, at 20:09, Aaron Bohannon wrote:

> So in a 15 second run (with no
> idle time, as I said), it now does about 130 heap compactions instead
> of 3 and gets better total performance because of it, utterly defying
> my GC intuition.


What is the size of your heap?  Have you tried compacting only once
every 2, 3, 5, or 10 loops?

One possible explanation is that compaction will also compact the
free list into a few large blocks, which makes allocation faster.

-- Damien


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Gc.compact surprisingly helpful
  2009-12-05 17:06 ` Damien Doligez
@ 2009-12-08  3:30   ` Aaron Bohannon
  0 siblings, 0 replies; 5+ messages in thread
From: Aaron Bohannon @ 2009-12-08  3:30 UTC (permalink / raw)
  To: Damien Doligez; +Cc: OCaml List

On Sat, Dec 5, 2009 at 12:06 PM, Damien Doligez <damien.doligez@inria.fr> wrote:
>
> On 2009-12-04, at 20:09, Aaron Bohannon wrote:
>
>> So in a 15 second run (with no
>> idle time, as I said), it now does about 130 heap compactions instead
>> of 3 and gets better total performance because of it, utterly defying
>> my GC intuition.
>
> What is the size of your heap?  Have you tried compacting only once
> every 2, 3, 5, or 10 loops?

In a 15 sec run, the total allocated memory is about 8GB (from
Gc.allocated_bytes()).  The max heap size reached using one compaction
per loop is about 7MB (on a 64-bit machine, based on the
"top_heap_words" field).  The max heap size reached with no
compactions at all is only about 10MB.  These numbers may seem wierd,
but they are really not too surprising since the program is a
signal-processing pipeline that does a huge number of (purely
functional) Array.map and Array.init operations to build short-lived
intermediate results, and it does the same operatations in each
(sub-)cycle regardless of the input.

Yeah, I can squeeze out a bit more performance by using calling
Gc.compact slightly less often, as suggested, but since we're only
talking about around 5%, it doesn't make much difference.  In theory,
I could also optimize away some allocation by replacing functional
Array ops with imperative Array updates, but the throughput seems good
enough right now---I'm actually just concerned about how to keep the
the maximum GC pause time as low as possible.

 - Aaron

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Gc.compact surprisingly helpful
  2009-12-04 19:09 Gc.compact surprisingly helpful Aaron Bohannon
  2009-12-04 19:31 ` [Caml-list] " Florian Hars
  2009-12-05 17:06 ` Damien Doligez
@ 2009-12-05 17:35 ` Xavier Leroy
  2 siblings, 0 replies; 5+ messages in thread
From: Xavier Leroy @ 2009-12-05 17:35 UTC (permalink / raw)
  To: Aaron Bohannon; +Cc: caml-list

Aaron Bohannon wrote:

> In order to prevent irregular GC pauses, I decided to try inserting
> a call to Gc.compact once per loop.  I was hoping the overall
> throughput wouldn't suffer too badly.  To my very pleasant surprise,
> I found the throughput *increased* by about 2%!!  So in a 15 second
> run (with no idle time, as I said), it now does about 130 heap
> compactions instead of 3 and gets better total performance because
> of it, utterly defying my GC intuition.

As Damien said, maybe the original code ran into a bad case of free
list fragmentation which the compactor cured.  But maybe the 2% is
just measurement noise.  Some of my favorite horror stories about
timings:

http://www-plan.cs.colorado.edu/diwan/asplos09.pdf
  "We see that something external and orthogonal to the program,
   i.e., changing the size (in bytes) of an unused environment variable,
   can dramatically (frequently by about 33% and once by almost 300%)
   change the performance of our program."

http://compilers.iecc.com/comparch/article/96-02-165
  [ Execution speed for the same binary varies by a factor of 2
    depending on cache placement ]

I have also personally observed speed differences of 20% just from
inserting or deleting dead code in a program...

- Xavier Leroy


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-12-08  8:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-04 19:09 Gc.compact surprisingly helpful Aaron Bohannon
2009-12-04 19:31 ` [Caml-list] " Florian Hars
2009-12-05 17:06 ` Damien Doligez
2009-12-08  3:30   ` Aaron Bohannon
2009-12-05 17:35 ` Xavier Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox