* Gc.compact surprisingly helpful @ 2009-12-04 19:09 Aaron Bohannon 2009-12-04 19:31 ` [Caml-list] " Florian Hars ` (2 more replies) 0 siblings, 3 replies; 5+ messages in thread From: Aaron Bohannon @ 2009-12-04 19:09 UTC (permalink / raw) To: caml-list Hi, I am trying to write a soft real time signal-processing application in OCaml. Running in a "simulation" mode, it spends no time blocking, and uses an outermost loop that takes about 1/10 sec to run. In order to prevent irregular GC pauses, I decided to try inserting a call to Gc.compact once per loop. I was hoping the overall throughput wouldn't suffer too badly. To my very pleasant surprise, I found the throughput *increased* by about 2%!! So in a 15 second run (with no idle time, as I said), it now does about 130 heap compactions instead of 3 and gets better total performance because of it, utterly defying my GC intuition. - Aaron ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Gc.compact surprisingly helpful 2009-12-04 19:09 Gc.compact surprisingly helpful Aaron Bohannon @ 2009-12-04 19:31 ` Florian Hars 2009-12-05 17:06 ` Damien Doligez 2009-12-05 17:35 ` Xavier Leroy 2 siblings, 0 replies; 5+ messages in thread From: Florian Hars @ 2009-12-04 19:31 UTC (permalink / raw) To: Aaron Bohannon; +Cc: caml-list Aaron Bohannon schrieb: > To my very pleasant surprise, I found the throughput *increased* by about 2%!! > [...] utterly defying my GC intuition. Maybe you stay in L3 cache with the more compact heap, what is your resident set size with and without the additional compactions? Performance on modern processors is a riddle inside an enigma (someone once reported improved performance after removing the -unsafe compile option, which has the single function of making the code faster by removing array bounds checks). - Florian -- But our moral language is fragmented; our contemporaries reject the Kantian hunch that choosing those things most admirable and plausible as ends in themselves is the best practice; autonomous sources of the good are everywhere brown and broken. Thus we have PHP. http://lambda-the-ultimate.org/node/1463 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Gc.compact surprisingly helpful 2009-12-04 19:09 Gc.compact surprisingly helpful Aaron Bohannon 2009-12-04 19:31 ` [Caml-list] " Florian Hars @ 2009-12-05 17:06 ` Damien Doligez 2009-12-08 3:30 ` Aaron Bohannon 2009-12-05 17:35 ` Xavier Leroy 2 siblings, 1 reply; 5+ messages in thread From: Damien Doligez @ 2009-12-05 17:06 UTC (permalink / raw) To: OCaml List On 2009-12-04, at 20:09, Aaron Bohannon wrote: > So in a 15 second run (with no > idle time, as I said), it now does about 130 heap compactions instead > of 3 and gets better total performance because of it, utterly defying > my GC intuition. What is the size of your heap? Have you tried compacting only once every 2, 3, 5, or 10 loops? One possible explanation is that compaction will also compact the free list into a few large blocks, which makes allocation faster. -- Damien ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Gc.compact surprisingly helpful 2009-12-05 17:06 ` Damien Doligez @ 2009-12-08 3:30 ` Aaron Bohannon 0 siblings, 0 replies; 5+ messages in thread From: Aaron Bohannon @ 2009-12-08 3:30 UTC (permalink / raw) To: Damien Doligez; +Cc: OCaml List On Sat, Dec 5, 2009 at 12:06 PM, Damien Doligez <damien.doligez@inria.fr> wrote: > > On 2009-12-04, at 20:09, Aaron Bohannon wrote: > >> So in a 15 second run (with no >> idle time, as I said), it now does about 130 heap compactions instead >> of 3 and gets better total performance because of it, utterly defying >> my GC intuition. > > What is the size of your heap? Have you tried compacting only once > every 2, 3, 5, or 10 loops? In a 15 sec run, the total allocated memory is about 8GB (from Gc.allocated_bytes()). The max heap size reached using one compaction per loop is about 7MB (on a 64-bit machine, based on the "top_heap_words" field). The max heap size reached with no compactions at all is only about 10MB. These numbers may seem wierd, but they are really not too surprising since the program is a signal-processing pipeline that does a huge number of (purely functional) Array.map and Array.init operations to build short-lived intermediate results, and it does the same operatations in each (sub-)cycle regardless of the input. Yeah, I can squeeze out a bit more performance by using calling Gc.compact slightly less often, as suggested, but since we're only talking about around 5%, it doesn't make much difference. In theory, I could also optimize away some allocation by replacing functional Array ops with imperative Array updates, but the throughput seems good enough right now---I'm actually just concerned about how to keep the the maximum GC pause time as low as possible. - Aaron ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Gc.compact surprisingly helpful 2009-12-04 19:09 Gc.compact surprisingly helpful Aaron Bohannon 2009-12-04 19:31 ` [Caml-list] " Florian Hars 2009-12-05 17:06 ` Damien Doligez @ 2009-12-05 17:35 ` Xavier Leroy 2 siblings, 0 replies; 5+ messages in thread From: Xavier Leroy @ 2009-12-05 17:35 UTC (permalink / raw) To: Aaron Bohannon; +Cc: caml-list Aaron Bohannon wrote: > In order to prevent irregular GC pauses, I decided to try inserting > a call to Gc.compact once per loop. I was hoping the overall > throughput wouldn't suffer too badly. To my very pleasant surprise, > I found the throughput *increased* by about 2%!! So in a 15 second > run (with no idle time, as I said), it now does about 130 heap > compactions instead of 3 and gets better total performance because > of it, utterly defying my GC intuition. As Damien said, maybe the original code ran into a bad case of free list fragmentation which the compactor cured. But maybe the 2% is just measurement noise. Some of my favorite horror stories about timings: http://www-plan.cs.colorado.edu/diwan/asplos09.pdf "We see that something external and orthogonal to the program, i.e., changing the size (in bytes) of an unused environment variable, can dramatically (frequently by about 33% and once by almost 300%) change the performance of our program." http://compilers.iecc.com/comparch/article/96-02-165 [ Execution speed for the same binary varies by a factor of 2 depending on cache placement ] I have also personally observed speed differences of 20% just from inserting or deleting dead code in a program... - Xavier Leroy ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-12-08 8:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-12-04 19:09 Gc.compact surprisingly helpful Aaron Bohannon 2009-12-04 19:31 ` [Caml-list] " Florian Hars 2009-12-05 17:06 ` Damien Doligez 2009-12-08 3:30 ` Aaron Bohannon 2009-12-05 17:35 ` Xavier Leroy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox