* [Caml-list] Memory management dominates running time @ 2003-01-03 21:58 Jörgen Gustavsson 2003-01-04 1:20 ` Norman Ramsey ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Jörgen Gustavsson @ 2003-01-03 21:58 UTC (permalink / raw) To: caml-list Hi, I have a performance problem with one of my ocaml programs: it seems as if the memory management asymptoticly dominates the running time. I.e., as the program runs the fraction of the time that is spent on memory management seems to go towards 100%. I believe that the problem is in the function fl_allocate in the ocaml runtime. I profiled my program with gprof on a moderately sized input which gave the following top ten time consumers. % cumulative self self total time seconds seconds calls ms/call ms/call name 55.5 63.29 63.29 fl_allocate [1] 11.0 75.79 12.50 mark_slice [2] 5.7 82.32 6.53 sweep_slice [3] 4.0 86.91 4.59 _brk_unlocked [4] 2.5 89.80 2.89 oldify_one [5] 2.3 92.42 2.62 oldify_mopup [6] 1.6 94.23 1.81 modify [7] 1.5 95.90 1.67 alloc_shr [8] 0.9 96.91 1.01 allocate_block [9] 0.9 97.90 0.99 compare_val [10] Judging from their names all top ten functions except compare_val and modify has to do with the memory management. In total roughly 85% of the time is spent on memory management. Unfortunately it gets worse when we increase the input size. % cumulative self self total time seconds seconds calls ms/call ms/call name 62.2 612.56 612.56 fl_allocate [1] 8.9 699.91 87.35 _brk_unlocked [2] 7.3 772.09 72.18 fl_add_block [3] 5.0 821.43 49.34 mark_slice [4] 3.7 857.90 36.47 oldify_local_roots [5] 2.6 883.93 26.03 sweep_slice [6] 1.2 895.87 11.94 add_to_heap [7] 1.0 905.92 10.05 oldify_one [8] 1.0 915.39 9.47 oldify_mopup [9] 0.6 921.78 6.39 alloc_shr [10] Now all top ten functions concern memory management and 95% of the time is spent on it. If I increase the input size even further it seems as if it gets even worse. I am not an expert on garbage collection but the only explanation that I can see for this behaviour is that the free list that fl_allocate searches grows as the program runs. If there are lots of small blocks in the begining of the free list then fl_allocate potentially has to search very far when it allocates somewhat larger blocks. This situation can occur, for example, if the program allocates many small blocks in the begining which are then reclaimed by the garbage collector and then later the program allocates mostly larger blocks. Does this explanation make any sense? One reason for why I doubt it, is that if it is correct then the ocaml memory management can asymptoticly dominate the running time. Is it really so? Finally, what can I do about this problem? Regards, Jörgen Gustavsson. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Memory management dominates running time 2003-01-03 21:58 [Caml-list] Memory management dominates running time Jörgen Gustavsson @ 2003-01-04 1:20 ` Norman Ramsey 2003-01-05 11:35 ` Chet Murthy 2003-01-08 14:20 ` Damien Doligez 2 siblings, 0 replies; 6+ messages in thread From: Norman Ramsey @ 2003-01-04 1:20 UTC (permalink / raw) To: Jörgen Gustavsson; +Cc: caml-list For some time, I have been hoping for a heap profiler or other tool that will help me debug my memory-performance problems in OCaml. I had a student who was interested but ran out of time at the end of last summer. If anybody is working on such a tool, we would be pleased to help if we can. Norman ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Memory management dominates running time 2003-01-03 21:58 [Caml-list] Memory management dominates running time Jörgen Gustavsson 2003-01-04 1:20 ` Norman Ramsey @ 2003-01-05 11:35 ` Chet Murthy 2003-01-08 12:58 ` Jörgen Gustavsson 2003-01-08 14:20 ` Damien Doligez 2 siblings, 1 reply; 6+ messages in thread From: Chet Murthy @ 2003-01-05 11:35 UTC (permalink / raw) To: Jörgen Gustavsson; +Cc: caml-list By using gprof, and doing a little careful counting, you can often find out where egregious allocations are happening, and eliminate them. I can't give you good pointers on this, because, well, it's a bit of an art, and takes a lot of experience. But it _is_ doable, and I have done it to nontrivial programs in Caml, as well as in Java, attaining speedups of 2x, 3x, without much code-restructuring. E.g., once, I took a Stream-based parser, and wrote a Stream module which only supported "char Stream", and only for parsing. By writing a few auxiliary functions in addition to this, I was able to reduce the consing associated with stream-based parsing (the constant creation of "Some _" blocks) significantly, and got a rather large speedup. I was able to see that this was important, basically by looking around in the gprof profile and trying to follow where the allocation were happening. --chet-- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Memory management dominates running time 2003-01-05 11:35 ` Chet Murthy @ 2003-01-08 12:58 ` Jörgen Gustavsson 0 siblings, 0 replies; 6+ messages in thread From: Jörgen Gustavsson @ 2003-01-08 12:58 UTC (permalink / raw) To: Chet Murthy; +Cc: caml-list On Sun, 5 Jan 2003, Chet Murthy wrote: > By using gprof, and doing a little careful counting, you can > often find out where egregious allocations are happening, and > eliminate them. I am afraid that is not my problem right now. I know where the allocations are and they are necessary in the algorithm we are currently implementing. The problem is (I believe) that the cost of each allocation is not atomic. Instead each allocation can under certain circumstances become more and more expensive as the program runs and eventually the program hardly does anything but searching for free blocks of memory to allocate. (I think this happens when data is moved from the minor heap to the major heap during the minor garbage collection. The free memory in the major heap is kept in a linked list which is searched linearly for a block of the right size.) It could actually help to introduce a space leak because it might lead to less fragmentation. Thus, when you reason about the asymptotic run time complexity of your ocaml programs you have to take the internals of the garbage collector into account which I think is unfortunate. Is it impossible to avoid this problem with a mark-sweep garbage collector? (With a simple two-space copying collector the problem disappear.) In any case we have found a temporary solution to the problem. By setting max_overhead in the gc control record to 0 one can force heap compaction to take place at the end of each major GC cycle. This makes the garbage collector behave like a compacting collector and it becomes very slow (90% of running time) but the asymptotic problem goes away so it is good enough for prototyping. Regards, Jörgen Gustavsson. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Memory management dominates running time 2003-01-03 21:58 [Caml-list] Memory management dominates running time Jörgen Gustavsson 2003-01-04 1:20 ` Norman Ramsey 2003-01-05 11:35 ` Chet Murthy @ 2003-01-08 14:20 ` Damien Doligez 2003-01-08 22:23 ` [Caml-list] GlSurf 1.2 available Christophe Raffalli 2 siblings, 1 reply; 6+ messages in thread From: Damien Doligez @ 2003-01-08 14:20 UTC (permalink / raw) To: Jörgen Gustavsson; +Cc: caml-list On Friday, January 3, 2003, at 10:58 PM, Jörgen Gustavsson wrote: > I profiled my program with gprof on a moderately sized input which gave > the following top ten time consumers. > > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 55.5 63.29 63.29 fl_allocate [1] > 11.0 75.79 12.50 mark_slice [2] > 5.7 82.32 6.53 sweep_slice [3] > 4.0 86.91 4.59 _brk_unlocked [4] > 2.5 89.80 2.89 oldify_one [5] > 2.3 92.42 2.62 oldify_mopup [6] > 1.6 94.23 1.81 modify [7] > 1.5 95.90 1.67 alloc_shr [8] > 0.9 96.91 1.01 allocate_block [9] > 0.9 97.90 0.99 compare_val [10] fl_allocate, alloc_shr, and allocate_block are O'Caml allocation functions for the major heap. mark_slice and sweep_slice are the major collector. brk_unlocked is a C library function, I'm surprised to see it use so much run time. oldify_one and oldify_mopup are the minor collector. modify is the write barrier that is called when an assignment is done. > Unfortunately it gets worse when we increase the input size. This behaviour is a bit unusual. > I am not an expert on garbage collection but the only explanation that > I can see for this behaviour is that the free list that fl_allocate > searches grows as the program runs. If there are lots of small blocks > in > the begining of the free list then fl_allocate potentially has to > search > very far when it allocates somewhat larger blocks. > > This situation can occur, for example, if the program allocates many > small > blocks in the begining which are then reclaimed by the garbage > collector > and then later the program allocates mostly larger blocks. This phenomenon is called "fragmentation" and O'Caml does a few things to prevent fragmentation-related problems: 1. Whenever two free blocks are adjacent, they are merged into one bigger block. 2. The free list is not searched from the beginning every time, because that would lead to an accumulation of small blocks at the beginning of the list, even under normal conditions. Instead, each search starts where the previous one stopped, so that no block is examined more often than any other. 3. The heap compaction algorithm is designed to remove all fragmentation from the free list by moving objects around to make sure that all free blocks are adjacent. 4. When fragmentation is too high, the typical behaviour is that the allocation requests cannot be satisfied from the free list, and the heap size has to be increased indefinitely. The GC can detect when this is the case, and automatically trigger a heap compaction. It seems that you have found a case where this strategy fails to detect a situation of heavy fragmentation. > One reason for why I doubt it, is that if it is correct then the ocaml > memory management can asymptoticly dominate the running time. Is it > really > so? It looks surprising, but we don't have an analysis of the run-time cost of memory management, and such an analysis would be quite hard to do precisely. > Finally, what can I do about this problem? You can have good control of the GC parameters through the Gc module of the standard library. The things you should try are: 1. Increase the minor heap size. This will let the minor GC deallocate more blocks, and thus reduce the load of the major GC. For this parameter, the point of diminishing returns depends quite a lot on the allocation behaviour of your program. 2. Increase the space overhead. The major GC has a trade-off between space and time. If you increase the space overhead, your program will need more memory, but the major GC load will decrease. Reasonable values for this parameter are between 50 and 500. 3. Decrease max_overhead. This will trigger more automatic heap compactions, thus reducing fragmentation. Be aware, however, that heap compaction is costly and not incremental, so it will pause the execution of your program (for up to a few seconds). This can be a problem if your program is real-time or interactive. 4. Call Gc.compact from your program. This is a good idea if you can easily identify in your program the point where it switches from allocating small blocks to allocating big blocks. If none of the above helps, I would be interested in getting a copy of your program, to see what is going on in more detail. After all, the behaviour that you are observing might simply be caused by a bug. -- Damien ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Caml-list] GlSurf 1.2 available 2003-01-08 14:20 ` Damien Doligez @ 2003-01-08 22:23 ` Christophe Raffalli 0 siblings, 0 replies; 6+ messages in thread From: Christophe Raffalli @ 2003-01-08 22:23 UTC (permalink / raw) To: caml-list [-- Attachment #1: Type: text/plain, Size: 1794 bytes --] I am pleased to annouce that GlSurf 1.2 is available at <http://www.lama.univ-savoie.fr/~raffalli/glsurf.html> GlSurf is a program to draw surface (implicit surface) using OpenGl. There is now a "wishes list" on the web page ... so if OCaml programmer wants to contribute ! This kind of programming is fun :-) -- Version 1.2: - bug when free variables are used in "surface". - new algorithm to extract triangles from cubes. Between two and three times less triangles with the same cubes ! on test9: | version 1.1 | version 1.2 | ------------------------------------- time | 40.36 | 19.64 | nb tri | ~125000 | ~45000 | ------------------------------------- - simplification of the algorithm (a lot of useless tests removed. - more parameters can be set from the script (see the documentation. - added "p" key to print the eye position. - fixed a display bug when changing the size parameter between two drawings. Version 1.1: - Removed a last minute simplification which was in fact a bug producing wrong topology (visible in examples/test7); - Simplification of the test to divide cubes and the root_in_cube function (a little less triangles in general). - Fixed a bug in root_in_cube (the diagonals were missing). - Fixed typo in examples/test10. Version 1.0: Initial version -- Christophe Raffalli Université de Savoie Batiment Le Chablais, bureau 21 73376 Le Bourget-du-Lac Cedex tél: (33) 4 79 75 81 03 fax: (33) 4 79 75 87 42 mail: Christophe.Raffalli@univ-savoie.fr www: http://www.lama.univ-savoie.fr/~RAFFALLI --------------------------------------------- IMPORTANT: this mail is signed using PGP/MIME At least Enigmail/Mozilla, mutt or evolution can check this signature --------------------------------------------- [-- Attachment #2: Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-01-09 6:42 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-01-03 21:58 [Caml-list] Memory management dominates running time Jörgen Gustavsson 2003-01-04 1:20 ` Norman Ramsey 2003-01-05 11:35 ` Chet Murthy 2003-01-08 12:58 ` Jörgen Gustavsson 2003-01-08 14:20 ` Damien Doligez 2003-01-08 22:23 ` [Caml-list] GlSurf 1.2 available Christophe Raffalli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox