* [Caml-list] Strategies for finding memory leaks @ 2012-03-21 9:49 Hans Ole Rafaelsen 2012-03-22 15:03 ` Goswin von Brederlow 2012-04-01 19:57 ` Richard W.M. Jones 0 siblings, 2 replies; 16+ messages in thread From: Hans Ole Rafaelsen @ 2012-03-21 9:49 UTC (permalink / raw) To: caml-list [-- Attachment #1: Type: text/plain, Size: 3181 bytes --] Hello, Is there some tools / tricks that can be used to help find memory leaks? I have trouble with an application, that when running for a long time, starts to use a lot of CPU and consume more memory. It starts out by using about 20% CPU (reported by top) and after 24 hours it has increased to 80% usage. Also physical (RES) memory usage goes from 80M to 160M. The workload for the application is the same the whole time. Using OProfile (http://oprofile.sourceforge.net/news/) shows that that most of the time is being spent doing memory management. At startup: CPU: Core 2, speed 2667 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % image name symbol name 52764 22.3913 vc_client.native mark_slice 33580 14.2502 vc_client.native caml_page_table_lookup 25415 10.7853 vc_client.native sweep_slice 10119 4.2942 vc_client.native caml_fl_allocate 6423 2.7257 [vdso] (tgid:9015 range:0x7fff4256c000-0x7fff4256d000) [vdso] (tgid:9015 range:0x7fff4256c000-0x7fff4256d000) 5233 2.2207 vc_client.native camlLividisvc__Nalbuf_tools__replace_pattern_1033 2759 1.1708 vc_client.native caml_iterate_global_roots 2728 1.1577 vc_client.native caml_modify 2473 1.0495 vc_client.native caml_oldify_one 2204 0.9353 vc_client.native camlLividisvc__Nalbuf_bytestream__search_1047 2183 0.9264 vc_client.native caml_darken 1935 0.8212 vc_client.native caml_stash_backtrace 1843 0.7821 vc_client.native caml_do_roots 1838 0.7800 vc_client.native caml_delete_global_root After ca. 24 hours run: CPU: Core 2, speed 2667 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % image name symbol name 1137401 56.2697 vc_client.native mark_slice 405598 20.0658 vc_client.native sweep_slice 399832 19.7806 vc_client.native caml_page_table_lookup 10106 0.5000 vc_client.native caml_fl_allocate 3548 0.1755 vc_client.native caml_iterate_global_roots 3397 0.1681 [vdso] (tgid:26129 range:0x7fff747ff000-0x7fff74800000) [vdso] (tgid:26129 range:0x7fff747ff000-0x7fff74800000) 2797 0.1384 vc_client.native camlLividisvc__Nalbuf_tools__replace_ pattern_1033 2307 0.1141 vc_client.native camlLividisvc__Nalbuf_bytestream__sea rch_1047 2005 0.0992 vc_client.native caml_oldify_local_roots 1786 0.0884 vc_client.native caml_gc_stat 1441 0.0713 vc_client.native caml_oldify_one 1163 0.0575 vc_client.native caml_darken 1163 0.0575 vc_client.native caml_fl_merge_block 1032 0.0511 vc_client.native camlHashtbl__find_1093 The application uses several 3rd party libraries, including: LablGTK2, OCamlNet, LWT and others. Is there some clever trick that can by used to track down or get a hint of what is causing this? Thanks, Hans Ole Rafaelsen [-- Attachment #2: Type: text/html, Size: 4083 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-03-21 9:49 [Caml-list] Strategies for finding memory leaks Hans Ole Rafaelsen @ 2012-03-22 15:03 ` Goswin von Brederlow 2012-03-23 9:32 ` Hans Ole Rafaelsen 2012-04-01 19:57 ` Richard W.M. Jones 1 sibling, 1 reply; 16+ messages in thread From: Goswin von Brederlow @ 2012-03-22 15:03 UTC (permalink / raw) To: Hans Ole Rafaelsen; +Cc: caml-list Hans Ole Rafaelsen <hrafaelsen@gmail.com> writes: > Hello, > > Is there some tools / tricks that can be used to help find memory leaks? > > I have trouble with an application, that when running for a long time, starts > to use a lot of CPU and consume more memory. It starts out by using about 20% > CPU (reported by top) and after 24 hours it has increased to 80% usage. Also > physical (RES) memory usage goes from 80M to 160M. The workload for the > application is the same the whole time. > > Using OProfile (http://oprofile.sourceforge.net/news/) shows that that most of > the time is being spent doing memory management. > > At startup: > CPU: Core 2, speed 2667 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask > of 0x00 (Unhalted core cycles) count 100000 > samples % image name symbol name > 52764 22.3913 vc_client.native mark_slice > 33580 14.2502 vc_client.native caml_page_table_lookup > 25415 10.7853 vc_client.native sweep_slice > 10119 4.2942 vc_client.native caml_fl_allocate > 6423 2.7257 [vdso] (tgid:9015 range:0x7fff4256c000-0x7fff4256d000) [vdso] > (tgid:9015 range:0x7fff4256c000-0x7fff4256d000) > 5233 2.2207 vc_client.native > camlLividisvc__Nalbuf_tools__replace_pattern_1033 > 2759 1.1708 vc_client.native caml_iterate_global_roots > 2728 1.1577 vc_client.native caml_modify > 2473 1.0495 vc_client.native caml_oldify_one > 2204 0.9353 vc_client.native > camlLividisvc__Nalbuf_bytestream__search_1047 > 2183 0.9264 vc_client.native caml_darken > 1935 0.8212 vc_client.native caml_stash_backtrace > 1843 0.7821 vc_client.native caml_do_roots > 1838 0.7800 vc_client.native caml_delete_global_root > > After ca. 24 hours run: > CPU: Core 2, speed 2667 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask > of 0x00 (Unhalted core cycles) count 100000 > samples % image name symbol name > 1137401 56.2697 vc_client.native mark_slice > 405598 20.0658 vc_client.native sweep_slice > 399832 19.7806 vc_client.native caml_page_table_lookup > 10106 0.5000 vc_client.native caml_fl_allocate > 3548 0.1755 vc_client.native caml_iterate_global_roots > 3397 0.1681 [vdso] (tgid:26129 range:0x7fff747ff000-0x7fff74800000) > [vdso] > (tgid:26129 range:0x7fff747ff000-0x7fff74800000) > 2797 0.1384 vc_client.native > camlLividisvc__Nalbuf_tools__replace_ > pattern_1033 > 2307 0.1141 vc_client.native > camlLividisvc__Nalbuf_bytestream__sea > rch_1047 > 2005 0.0992 vc_client.native caml_oldify_local_roots > 1786 0.0884 vc_client.native caml_gc_stat > 1441 0.0713 vc_client.native caml_oldify_one > 1163 0.0575 vc_client.native caml_darken > 1163 0.0575 vc_client.native caml_fl_merge_block > 1032 0.0511 vc_client.native camlHashtbl__find_1093 > > The application uses several 3rd party libraries, including: LablGTK2, > OCamlNet, LWT and others. > > Is there some clever trick that can by used to track down or get a hint of what > is causing this? > > Thanks, > > Hans Ole Rafaelsen Are you calling the GC manually somewhere in the code or in one of the libs? MfG Goswin ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-03-22 15:03 ` Goswin von Brederlow @ 2012-03-23 9:32 ` Hans Ole Rafaelsen 2012-03-24 14:00 ` Goswin von Brederlow 0 siblings, 1 reply; 16+ messages in thread From: Hans Ole Rafaelsen @ 2012-03-23 9:32 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 3822 bytes --] Hi, I have tried to trigger Gc.compact but that only have limited effect for a short time. Hans Ole On Thu, Mar 22, 2012 at 4:03 PM, Goswin von Brederlow <goswin-v-b@web.de>wrote: > Hans Ole Rafaelsen <hrafaelsen@gmail.com> writes: > > > Hello, > > > > Is there some tools / tricks that can be used to help find memory leaks? > > > > I have trouble with an application, that when running for a long time, > starts > > to use a lot of CPU and consume more memory. It starts out by using > about 20% > > CPU (reported by top) and after 24 hours it has increased to 80% usage. > Also > > physical (RES) memory usage goes from 80M to 160M. The workload for the > > application is the same the whole time. > > > > Using OProfile (http://oprofile.sourceforge.net/news/) shows that that > most of > > the time is being spent doing memory management. > > > > At startup: > > CPU: Core 2, speed 2667 MHz (estimated) > > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a > unit mask > > of 0x00 (Unhalted core cycles) count 100000 > > samples % image name symbol name > > 52764 22.3913 vc_client.native mark_slice > > 33580 14.2502 vc_client.native caml_page_table_lookup > > 25415 10.7853 vc_client.native sweep_slice > > 10119 4.2942 vc_client.native caml_fl_allocate > > 6423 2.7257 [vdso] (tgid:9015 range:0x7fff4256c000-0x7fff4256d000) > [vdso] > > (tgid:9015 range:0x7fff4256c000-0x7fff4256d000) > > 5233 2.2207 vc_client.native > > camlLividisvc__Nalbuf_tools__replace_pattern_1033 > > 2759 1.1708 vc_client.native caml_iterate_global_roots > > 2728 1.1577 vc_client.native caml_modify > > 2473 1.0495 vc_client.native caml_oldify_one > > 2204 0.9353 vc_client.native > > camlLividisvc__Nalbuf_bytestream__search_1047 > > 2183 0.9264 vc_client.native caml_darken > > 1935 0.8212 vc_client.native caml_stash_backtrace > > 1843 0.7821 vc_client.native caml_do_roots > > 1838 0.7800 vc_client.native caml_delete_global_root > > > > After ca. 24 hours run: > > CPU: Core 2, speed 2667 MHz (estimated) > > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a > unit mask > > of 0x00 (Unhalted core cycles) count 100000 > > samples % image name symbol name > > 1137401 56.2697 vc_client.native mark_slice > > 405598 20.0658 vc_client.native sweep_slice > > 399832 19.7806 vc_client.native caml_page_table_lookup > > 10106 0.5000 vc_client.native caml_fl_allocate > > 3548 0.1755 vc_client.native caml_iterate_global_roots > > 3397 0.1681 [vdso] (tgid:26129 > range:0x7fff747ff000-0x7fff74800000) > > [vdso] > > (tgid:26129 range:0x7fff747ff000-0x7fff74800000) > > 2797 0.1384 vc_client.native > > camlLividisvc__Nalbuf_tools__replace_ > > pattern_1033 > > 2307 0.1141 vc_client.native > > camlLividisvc__Nalbuf_bytestream__sea > > rch_1047 > > 2005 0.0992 vc_client.native caml_oldify_local_roots > > 1786 0.0884 vc_client.native caml_gc_stat > > 1441 0.0713 vc_client.native caml_oldify_one > > 1163 0.0575 vc_client.native caml_darken > > 1163 0.0575 vc_client.native caml_fl_merge_block > > 1032 0.0511 vc_client.native camlHashtbl__find_1093 > > > > The application uses several 3rd party libraries, including: LablGTK2, > > OCamlNet, LWT and others. > > > > Is there some clever trick that can by used to track down or get a hint > of what > > is causing this? > > > > Thanks, > > > > Hans Ole Rafaelsen > > Are you calling the GC manually somewhere in the code or in one of the > libs? > > MfG > Goswin > [-- Attachment #2: Type: text/html, Size: 5173 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-03-23 9:32 ` Hans Ole Rafaelsen @ 2012-03-24 14:00 ` Goswin von Brederlow 0 siblings, 0 replies; 16+ messages in thread From: Goswin von Brederlow @ 2012-03-24 14:00 UTC (permalink / raw) To: caml-list Hans Ole Rafaelsen <hrafaelsen@gmail.com> writes: > Hi, > > I have tried to trigger Gc.compact but that only have limited effect for a > short time. > > Hans Ole My thinking was rather that something triggers the GC manually and over time starts to do that far too often so more and more time is spend in the GC. If it isn't triggered manually then custom types can also give the GC a wrong idea about how much memory is used/wasted. But that is also just an idea of where to look. MfG Goswin ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-03-21 9:49 [Caml-list] Strategies for finding memory leaks Hans Ole Rafaelsen 2012-03-22 15:03 ` Goswin von Brederlow @ 2012-04-01 19:57 ` Richard W.M. Jones 2012-04-02 8:15 ` Hans Ole Rafaelsen 1 sibling, 1 reply; 16+ messages in thread From: Richard W.M. Jones @ 2012-04-01 19:57 UTC (permalink / raw) To: Hans Ole Rafaelsen; +Cc: caml-list On Wed, Mar 21, 2012 at 10:49:22AM +0100, Hans Ole Rafaelsen wrote: > Hello, > > Is there some tools / tricks that can be used to help find memory leaks? [...] > The application uses several 3rd party libraries, including: LablGTK2, > OCamlNet, LWT and others. If you suspect a leak in the *C* part of your program or these libraries, then I am very successfully using valgrind on my OCaml (native) programs to track such leaks. https://github.com/libguestfs/libguestfs/blob/95123a9144edc1ddb3ad67d75276ca3af70eddb8/tests/extra/Makefile.am Note that several of the programs tested are written in OCaml, and see also the 'suppressions' file in the same directory. However this will not find leaks in OCaml code. Although I was pleasantly surprised the other day when valgrind pointed correctly to a use of an uninitialized string in some OCaml code. Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-04-01 19:57 ` Richard W.M. Jones @ 2012-04-02 8:15 ` Hans Ole Rafaelsen 2012-04-02 10:13 ` Richard W.M. Jones ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Hans Ole Rafaelsen @ 2012-04-02 8:15 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 1956 bytes --] Hi, We have done some further investigation, and it does not seem to be memory leakage. The application is a video streaming application, where we use external processes to encode and decode the video. By configuring it in different ways, we have found the problem to be in the code that is reading the decoded video from an external process and displaying it using LablGTK2. The previous thought memory leakage seems to be memory used for buffers when the system is under high load, and this is not reclaimed by the OS. When running on low frame rate the application does not use more memory than at startup. However, the application still consumes more and more CPU time. And it seems to happen in the GC. Apart from that, the application seems to be just fine. So it seems to be something in our code (or in LablGTK) that is making the GC spend more and more time. Anyone experienced this kind of problem? Best, Hans Ole Rafaelsen On Sun, Apr 1, 2012 at 9:57 PM, Richard W.M. Jones <rich@annexia.org> wrote: > > On Wed, Mar 21, 2012 at 10:49:22AM +0100, Hans Ole Rafaelsen wrote: > > Hello, > > > > Is there some tools / tricks that can be used to help find memory leaks? > [...] > > The application uses several 3rd party libraries, including: LablGTK2, > > OCamlNet, LWT and others. > > If you suspect a leak in the *C* part of your program or these > libraries, then I am very successfully using valgrind on my OCaml > (native) programs to track such leaks. > > > https://github.com/libguestfs/libguestfs/blob/95123a9144edc1ddb3ad67d75276ca3af70eddb8/tests/extra/Makefile.am > > Note that several of the programs tested are written in OCaml, and see > also the 'suppressions' file in the same directory. > > However this will not find leaks in OCaml code. Although I was > pleasantly surprised the other day when valgrind pointed correctly to > a use of an uninitialized string in some OCaml code. > > Rich. > > -- > Richard Jones > Red Hat > [-- Attachment #2: Type: text/html, Size: 2565 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-04-02 8:15 ` Hans Ole Rafaelsen @ 2012-04-02 10:13 ` Richard W.M. Jones 2012-04-02 13:40 ` Hans Ole Rafaelsen 2012-04-02 11:26 ` John Carr 2012-04-03 10:42 ` Gerd Stolpmann 2 siblings, 1 reply; 16+ messages in thread From: Richard W.M. Jones @ 2012-04-02 10:13 UTC (permalink / raw) To: Hans Ole Rafaelsen; +Cc: caml-list On Mon, Apr 02, 2012 at 10:15:02AM +0200, Hans Ole Rafaelsen wrote: > However, the application still consumes more and more CPU time. And it > seems to happen in the GC. Apart from that, the application seems to be > just fine. So it seems to be something in our code (or in LablGTK) that is > making the GC spend more and more time. Anyone experienced this kind of > problem? You're not swapping are you? (Run 'vmstat 1' & look at the si/so columns) Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-04-02 10:13 ` Richard W.M. Jones @ 2012-04-02 13:40 ` Hans Ole Rafaelsen 0 siblings, 0 replies; 16+ messages in thread From: Hans Ole Rafaelsen @ 2012-04-02 13:40 UTC (permalink / raw) To: Richard W.M. Jones; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 672 bytes --] Just did another run to be sure. It does not do any swapping. -- Hans Ole On Mon, Apr 2, 2012 at 12:13 PM, Richard W.M. Jones <rich@annexia.org>wrote: > > On Mon, Apr 02, 2012 at 10:15:02AM +0200, Hans Ole Rafaelsen wrote: > > However, the application still consumes more and more CPU time. And it > > seems to happen in the GC. Apart from that, the application seems to be > > just fine. So it seems to be something in our code (or in LablGTK) that > is > > making the GC spend more and more time. Anyone experienced this kind of > > problem? > > You're not swapping are you? (Run 'vmstat 1' & look at the si/so columns) > > Rich. > > -- > Richard Jones > Red Hat > [-- Attachment #2: Type: text/html, Size: 1057 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-04-02 8:15 ` Hans Ole Rafaelsen 2012-04-02 10:13 ` Richard W.M. Jones @ 2012-04-02 11:26 ` John Carr 2012-04-03 10:42 ` Gerd Stolpmann 2 siblings, 0 replies; 16+ messages in thread From: John Carr @ 2012-04-02 11:26 UTC (permalink / raw) To: Hans Ole Rafaelsen; +Cc: caml-list Cache misses count as CPU time. I've seen people measure parallel speedup by looking at CPU usage. They get linear increase in CPU use to eight threads even though the task saturated memory bandwidth after the second thread. If you have a way to measure instruction count you can compare increase in instruction count in GC to increase in time in GC. If instructions are constant while time increases you probably have a memory access problem. If instructions increase the structure of references within your data may be forcing the collector to do more work. > However, the application still consumes more and more CPU time. And it > seems to happen in the GC. Apart from that, the application seems to > be just fine. So it seems to be something in our code (or in LablGTK) > that is making the GC spend more and more time. Anyone experienced > this kind of problem? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-04-02 8:15 ` Hans Ole Rafaelsen 2012-04-02 10:13 ` Richard W.M. Jones 2012-04-02 11:26 ` John Carr @ 2012-04-03 10:42 ` Gerd Stolpmann 2012-04-03 12:13 ` Jerome Vouillon 2 siblings, 1 reply; 16+ messages in thread From: Gerd Stolpmann @ 2012-04-03 10:42 UTC (permalink / raw) To: Hans Ole Rafaelsen; +Cc: Richard W.M. Jones, caml-list > Hi, > > We have done some further investigation, and it does not seem to be memory > leakage. The application is a video streaming application, where we use > external processes to encode and decode the video. By configuring it in > different ways, we have found the problem to be in the code that is > reading > the decoded video from an external process and displaying it using > LablGTK2. The previous thought memory leakage seems to be memory used for > buffers when the system is under high load, and this is not reclaimed by > the OS. When running on low frame rate the application does not use more > memory than at startup. > > However, the application still consumes more and more CPU time. And it > seems to happen in the GC. Apart from that, the application seems to be > just fine. So it seems to be something in our code (or in LablGTK) that is > making the GC spend more and more time. Anyone experienced this kind of > problem? This reminds me of a problem I had with a specific C binding (for mysql), years ago. That binding allocated custom blocks with badly chosen parameters used/max (see the docs for caml_alloc_custom in http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If the ratio used/max is > 0, these parameters accelerate the GC. If the custom blocks are frequently allocated, this can have a dramatic effect, even for quite small used/max ratios. The solution was to change the code, and to set used=0 and max=1. This type of problem would match your observation that the GC works more and more the longer the program runs, i.e. the more custom blocks have been allocated. The problem basically also exists with bigarrays - with used=<size_of_bigarary> and max=256M (hardcoded). Gerd > > Best, > > Hans Ole Rafaelsen > > On Sun, Apr 1, 2012 at 9:57 PM, Richard W.M. Jones <rich@annexia.org> > wrote: > >> >> On Wed, Mar 21, 2012 at 10:49:22AM +0100, Hans Ole Rafaelsen wrote: >> > Hello, >> > >> > Is there some tools / tricks that can be used to help find memory >> leaks? >> [...] >> > The application uses several 3rd party libraries, including: LablGTK2, >> > OCamlNet, LWT and others. >> >> If you suspect a leak in the *C* part of your program or these >> libraries, then I am very successfully using valgrind on my OCaml >> (native) programs to track such leaks. >> >> >> https://github.com/libguestfs/libguestfs/blob/95123a9144edc1ddb3ad67d75276ca3af70eddb8/tests/extra/Makefile.am >> >> Note that several of the programs tested are written in OCaml, and see >> also the 'suppressions' file in the same directory. >> >> However this will not find leaks in OCaml code. Although I was >> pleasantly surprised the other day when valgrind pointed correctly to >> a use of an uninitialized string in some OCaml code. >> >> Rich. >> >> -- >> Richard Jones >> Red Hat >> > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-04-03 10:42 ` Gerd Stolpmann @ 2012-04-03 12:13 ` Jerome Vouillon 2012-04-04 10:53 ` Hans Ole Rafaelsen 0 siblings, 1 reply; 16+ messages in thread From: Jerome Vouillon @ 2012-04-03 12:13 UTC (permalink / raw) To: Gerd Stolpmann; +Cc: Hans Ole Rafaelsen, Richard W.M. Jones, caml-list On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote: > This reminds me of a problem I had with a specific C binding (for mysql), > years ago. That binding allocated custom blocks with badly chosen > parameters used/max (see the docs for caml_alloc_custom in > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If the > ratio used/max is > 0, these parameters accelerate the GC. If the custom > blocks are frequently allocated, this can have a dramatic effect, even for > quite small used/max ratios. The solution was to change the code, and to > set used=0 and max=1. > > This type of problem would match your observation that the GC works more > and more the longer the program runs, i.e. the more custom blocks have > been allocated. > > The problem basically also exists with bigarrays - with > used=<size_of_bigarary> and max=256M (hardcoded). I have also observed this with input-output channels (in_channel and out_channel), where used = 1 and max = 1000. A full major GC is performed every time a thousand files are opened, which can result on a significant overhead when you open lot of files and the heap is large. -- Jerome ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-04-03 12:13 ` Jerome Vouillon @ 2012-04-04 10:53 ` Hans Ole Rafaelsen 2012-04-04 11:30 ` Gabriel Scherer 0 siblings, 1 reply; 16+ messages in thread From: Hans Ole Rafaelsen @ 2012-04-04 10:53 UTC (permalink / raw) To: Jerome Vouillon; +Cc: Gerd Stolpmann, Richard W.M. Jones, caml-list [-- Attachment #1.1: Type: text/plain, Size: 5561 bytes --] Hi, Thanks for your suggestions. I tried to patch lablgtk2 with: --- src/ml_gdkpixbuf.c.orig 2012-04-03 13:56:29.618264702 +0200 +++ src/ml_gdkpixbuf.c 2012-04-03 13:56:58.106263510 +0200 @@ -119,7 +119,7 @@ value ret; if (pb == NULL) ml_raise_null_pointer(); ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb, - 100, 1000); + 0, 1); p = Data_custom_val (ret); *p = ref ? g_object_ref (pb) : pb; return ret; --- src/ml_gobject.c.orig 2012-04-03 15:40:11.002004506 +0200 +++ src/ml_gobject.c 2012-04-03 15:41:04.938002250 +0200 @@ -219,7 +219,7 @@ CAMLprim value ml_g_value_new(void) { value ret = alloc_custom(&ml_custom_GValue, sizeof(value)+sizeof(GValue), - 20, 1000); + 0, 1); /* create an MLPointer */ Field(ret,1) = (value)2; ((GValue*)&Field(ret,2))->g_type = 0; @@ -272,14 +272,14 @@ custom_serialize_default, custom_deserialize_default }; CAMLprim value Val_gboxed(GType t, gpointer p) { - value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000); + value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1); Store_pointer(ret, g_boxed_copy (t,p)); Field(ret,2) = (value)t; return ret; } CAMLprim value Val_gboxed_new(GType t, gpointer p) { - value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000); + value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1); Store_pointer(ret, p); Field(ret,2) = (value)t; return ret; At startup is uses top - 16:40:27 up 1 day, 7:01, 28 users, load average: 0.47, 0.50, 0.35 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 4.8%us, 1.3%sy, 0.0%ni, 93.6%id, 0.2%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 4004736k total, 3617960k used, 386776k free, 130704k buffers Swap: 4070396k total, 9244k used, 4061152k free, 1730344k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10275 hans 20 0 529m 77m 13m S 14 2.0 0:01.66 vc_client.nativ and 12 hours later top - 04:40:07 up 1 day, 19:01, 35 users, load average: 0.00, 0.01, 0.05 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 20.2%us, 3.4%sy, 0.0%ni, 76.1%id, 0.1%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 4004736k total, 3828308k used, 176428k free, 143928k buffers Swap: 4070396k total, 10708k used, 4059688k free, 1756524k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10275 hans 20 0 534m 82m 13m S 17 2.1 110:11.19 vc_client.nativ Without the patch top - 22:05:38 up 1 day, 12:26, 34 users, load average: 0.35, 0.16, 0.13 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 5.6%us, 1.5%sy, 0.0%ni, 92.6%id, 0.2%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 4004736k total, 3868136k used, 136600k free, 140900k buffers Swap: 4070396k total, 9680k used, 4060716k free, 1837500k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25111 hans 20 0 453m 76m 13m S 14 2.0 0:13.68 vc_client_old.n top - 10:05:19 up 2 days, 26 min, 35 users, load average: 0.01, 0.04, 0.05 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 20.4%us, 3.2%sy, 0.0%ni, 75.8%id, 0.4%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 4004736k total, 3830596k used, 174140k free, 261692k buffers Swap: 4070396k total, 13640k used, 4056756k free, 1640452k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25111 hans 20 0 453m 76m 13m S 49 2.0 263:05.34 vc_client_old.n So from this it seems that with the patch it still uses more and more CPU, but at a much lower rate. However, it seems to increase memory usage with the patch. I also tried to patch the wrappers.h file, but the memory consumption just exploded. So it is working better, but still not good enough. Is there some way to prevent this kind of behavior? That is, no extra memory usage and no extra CPU usage. I have attached some additional profiling if that would be of any interest. In short it seems to be that it is the GC that is consuming the CPU. Best, Hans Ole On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon <vouillon@pps.jussieu.fr>wrote: > On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote: > > This reminds me of a problem I had with a specific C binding (for mysql), > > years ago. That binding allocated custom blocks with badly chosen > > parameters used/max (see the docs for caml_alloc_custom in > > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If > the > > ratio used/max is > 0, these parameters accelerate the GC. If the custom > > blocks are frequently allocated, this can have a dramatic effect, even > for > > quite small used/max ratios. The solution was to change the code, and to > > set used=0 and max=1. > > > > This type of problem would match your observation that the GC works more > > and more the longer the program runs, i.e. the more custom blocks have > > been allocated. > > > > The problem basically also exists with bigarrays - with > > used=<size_of_bigarary> and max=256M (hardcoded). > > I have also observed this with input-output channels (in_channel and > out_channel), where used = 1 and max = 1000. A full major GC is > performed every time a thousand files are opened, which can result on > a significant overhead when you open lot of files and the heap is > large. > > -- Jerome > [-- Attachment #1.2: Type: text/html, Size: 6515 bytes --] [-- Attachment #2: MemoryProfiling.txt --] [-- Type: text/plain, Size: 13629 bytes --] vc_client.native has run for about 19 hours vc_client_old.native has for about 14 hours Tasks: 272 total, 3 running, 269 sleeping, 0 stopped, 0 zombie Cpu(s): 20.2%us, 3.2%sy, 0.0%ni, 76.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 4004736k total, 3633672k used, 371064k free, 262916k buffers Swap: 4070396k total, 14160k used, 4056236k free, 1417088k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25111 hans 20 0 453m 76m 13m R 50 2.0 319:04.86 vc_client_old.n 10275 hans 20 0 553m 99m 13m S 19 2.5 188:30.50 vc_client.nativ vc_client_old.native CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 230093 51.8571 vc_client_old.native mark_slice 99659 22.4606 vc_client_old.native sweep_slice 76563 17.2553 vc_client_old.native caml_page_table_lookup 13215 2.9783 vc_client_old.native caml_fl_merge_block 1814 0.4088 vc_client_old.native truncate_flp 1013 0.2283 vc_client_old.native caml_gc_stat 784 0.1767 vc_client_old.native caml_iterate_global_roots 775 0.1747 vc_client_old.native caml_fl_allocate 564 0.1271 vc_client_old.native camlLividisvc__Nalbuf_tools__replace_pattern_1033 545 0.1228 vc_client_old.native caml_oldify_local_roots 503 0.1134 vc_client_old.native caml_modify 453 0.1021 vc_client_old.native camlLividisvc__Nalbuf_bytestream__search_1047 446 0.1005 vc_client_old.native caml_delete_global_root 412 0.0929 vc_client_old.native caml_do_roots 343 0.0773 vc_client_old.native caml_darken 306 0.0690 vc_client_old.native lwt_glib_get_sources 297 0.0669 vc_client_old.native caml_insert_global_root 250 0.0563 vc_client_old.native camlUq_gtk__fun_1484 213 0.0480 vc_client_old.native camlHashtbl__find_1093 205 0.0462 vc_client_old.native compare_val 195 0.0439 vc_client_old.native camlLwt_glib__enter_1079 191 0.0430 [vdso] (tgid:25111 range:0x7fff12dff000-0x7fff12e00000) [vdso] (tgid:25111 range:0x7fff12dff000-0x7fff12e00000) 190 0.0428 vc_client_old.native caml_stash_backtrace 188 0.0424 vc_client_old.native caml_oldify_one 185 0.0417 vc_client_old.native camlHashtbl__replace_1112 179 0.0403 vc_client_old.native caml_alloc_string 176 0.0397 vc_client_old.native camlMedia_ctrl__fun_3055 176 0.0397 vc_client_old.native caml_c_call 173 0.0390 vc_client_old.native camlList__iter_1074 171 0.0385 vc_client_old.native caml_apply2 164 0.0370 vc_client_old.native caml_obj_dup 159 0.0358 vc_client_old.native caml_send0 vc_client.native CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 50295 45.3259 vc_client.native mark_slice 17964 16.1892 vc_client.native sweep_slice 14450 13.0224 vc_client.native caml_page_table_lookup 2137 1.9259 vc_client.native caml_fl_merge_block 1397 1.2590 vc_client.native caml_fl_allocate 1305 1.1761 vc_client.native caml_gc_stat 599 0.5398 vc_client.native camlLividisvc__Nalbuf_tools__replace_pattern_1033 546 0.4921 vc_client.native camlLividisvc__Nalbuf_bytestream__search_1047 544 0.4903 vc_client.native caml_modify 540 0.4866 vc_client.native caml_delete_global_root 422 0.3803 vc_client.native caml_iterate_global_roots 404 0.3641 vc_client.native truncate_flp 339 0.3055 vc_client.native camlUq_gtk__fun_1484 327 0.2947 vc_client.native camlHashtbl__find_1093 311 0.2803 vc_client.native caml_insert_global_root 290 0.2613 vc_client.native lwt_glib_get_sources 284 0.2559 vc_client.native caml_oldify_local_roots 260 0.2343 vc_client.native camlMedia_ctrl__fun_3055 260 0.2343 vc_client.native camlUnixqueue_select__fun_2366 248 0.2235 vc_client.native compare_val 239 0.2154 vc_client.native caml_stash_backtrace 220 0.1983 [vdso] (tgid:10275 range:0x7fffea534000-0x7fffea535000) [vdso] (tgid:10275 range:0x7fffea534000-0x7fffea535000) 219 0.1974 vc_client.native camlLwt_glib__enter_1079 206 0.1856 vc_client.native camlList__iter_1074 202 0.1820 vc_client.native caml_apply2 196 0.1766 vc_client.native caml_c_call 194 0.1748 vc_client.native caml_send0 180 0.1622 vc_client.native camlHashtbl__replace_1112 180 0.1622 vc_client.native caml_alloc_string 176 0.1586 vc_client.native caml_obj_dup 176 0.1586 vc_client.native caml_oldify_one 173 0.1559 vc_client.native caml_remove_generational_global_root 171 0.1541 vc_client.native caml_curry3_2 vc_client.native CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 100000 samples % image name symbol name 598 53.3452 vc_client.native mark_slice 261 23.2828 vc_client.native caml_page_table_lookup 23 2.0517 vc_client.native sweep_slice 13 1.1597 vc_client.native caml_fl_allocate 6 0.5352 vc_client.native camlUq_gtk__fun_1484 5 0.4460 vc_client.native caml_delete_global_root 4 0.3568 vc_client.native camlEqueue__run_1119 4 0.3568 vc_client.native camlHashtbl__find_1093 4 0.3568 vc_client.native camlLividisvc__Nalbuf_bytestream__search_1047 4 0.3568 vc_client.native camlMedia_ctrl__fun_3055 4 0.3568 vc_client.native camlUnixqueue_select__method_run_1662 4 0.3568 vc_client.native caml_callback_exn 4 0.3568 vc_client.native caml_iterate_global_roots 4 0.3568 vc_client.native caml_modify 3 0.2676 vc_client.native camlAudio_encoder__transfer_audio_frame_1052 3 0.2676 vc_client.native camlBitstring_structs__string_of_rtp_ext_1030 3 0.2676 vc_client.native camlHashtbl__replace_1112 3 0.2676 vc_client.native camlLwt_glib__leave_1085 3 0.2676 vc_client.native camlSequencer__serialize_header_2231 3 0.2676 vc_client.native camlUnixqueue_select__fun_2366 3 0.2676 vc_client.native camlUq_gtk__code_begin 3 0.2676 vc_client.native caml_apply2 3 0.2676 vc_client.native caml_curry3_2 3 0.2676 vc_client.native caml_gc_stat 3 0.2676 vc_client.native caml_hash_univ_param 3 0.2676 vc_client.native caml_send0 3 0.2676 vc_client.native compare_val 2 0.1784 vc_client.native camlBuffer__create_1039 2 0.1784 vc_client.native camlList__iter_1074 2 0.1784 vc_client.native camlLwt__wait_1290 2 0.1784 vc_client.native camlLwt_sequence__loop_1098 2 0.1784 vc_client.native camlMedia_ctrl__fun_3085 2 0.1784 vc_client.native camlMedia_ctrl__receiver_cbfun_2123 2 0.1784 vc_client.native camlPervasives__min_1022 2 0.1784 vc_client.native camlProfiles__fec_percentage_1232 2 0.1784 vc_client.native camlScanf__scan_1702 2 0.1784 vc_client.native camlScanf__scan_format_1685 2 0.1784 vc_client.native camlSequencer__my_push_2260 2 0.1784 vc_client.native camlUnixqueue_select__fun_1864 2 0.1784 vc_client.native camlUnixqueue_select__method_protect_1800 2 0.1784 vc_client.native camlUnixqueue_select__method_uq_handler_1726 2 0.1784 vc_client.native caml_curry4_1 2 0.1784 vc_client.native caml_insert_global_root vc_client_old.native CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 100000 samples % symbol name 2411 65.8563 mark_slice 923 25.2117 caml_page_table_lookup 101 2.7588 sweep_slice 10 0.2731 caml_iterate_global_roots 8 0.2185 caml_oldify_local_roots 7 0.1912 caml_fl_allocate 7 0.1912 caml_fl_merge_block 5 0.1366 camlHashtbl__replace_1112 5 0.1366 camlLividisvc__Nalbuf_bytestream__search_1047 5 0.1366 camlUq_gtk__fun_1484 5 0.1366 caml_delete_global_root 4 0.1093 camlLwt_glib__enter_1079 4 0.1093 caml_darken 4 0.1093 caml_send2 3 0.0819 camlBitstring___add_bits_1532 3 0.0819 camlEqueue__run_1119 3 0.0819 camlLwt_sequence__loop_1098 3 0.0819 camlSubstream_seqnum__code_begin 3 0.0819 caml_alloc_string 3 0.0819 compare_val 3 0.0819 lwt_glib_get_sources 2 0.0546 camlBitstring__add_byte_1515 2 0.0546 camlBuffer__create_1039 2 0.0546 camlGObj__method_draw_3325 2 0.0546 camlHashtbl__find_1093 2 0.0546 camlHashtbl__find_1198 2 0.0546 camlHashtbl__iter_1129 2 0.0546 camlList__iter_1074 2 0.0546 camlLividicore__Filter__eval_tree_1110 2 0.0546 camlMcsvc_types__add_header_1175 2 0.0546 camlMedia_ctrl__fun_3085 2 0.0546 camlPervasives__$5e_1112 2 0.0546 camlSubstream_seqnum__my_push_1197 2 0.0546 camlUnixqueue_select__fun_2314 2 0.0546 camlUnixqueue_util__dlogr_1060 2 0.0546 caml_apply2 2 0.0546 caml_apply3 2 0.0546 caml_c_call 2 0.0546 caml_curry3_1 2 0.0546 caml_do_roots 2 0.0546 caml_gc_stat 2 0.0546 caml_hash_univ_param 2 0.0546 caml_modify 2 0.0546 fdlist_to_fdset at startup it looks like this: top - 12:16:03 up 2 days, 2:37, 35 users, load average: 0.02, 0.02, 0.05 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 19.7%us, 4.1%sy, 0.0%ni, 75.7%id, 0.2%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 4004736k total, 3634484k used, 370252k free, 263976k buffers Swap: 4070396k total, 14160k used, 4056236k free, 1424216k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12906 hans 20 0 453m 76m 13m S 14 2.0 0:14.62 vc_client_old.n CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 12046 25.2045 vc_client_old.native mark_slice 8590 17.9733 vc_client_old.native caml_page_table_lookup 6311 13.2049 vc_client_old.native sweep_slice 718 1.5023 vc_client_old.native camlLividisvc__Nalbuf_tools__replace_pattern_1033 633 1.3245 vc_client_old.native caml_fl_allocate 536 1.1215 vc_client_old.native caml_modify 505 1.0566 vc_client_old.native caml_do_roots 484 1.0127 vc_client_old.native camlLividisvc__Nalbuf_bytestream__search_1047 483 1.0106 vc_client_old.native caml_iterate_global_roots 471 0.9855 vc_client_old.native caml_darken 412 0.8621 vc_client_old.native caml_delete_global_root 371 0.7763 vc_client_old.native caml_oldify_local_roots 323 0.6758 vc_client_old.native caml_insert_global_root 304 0.6361 vc_client_old.native caml_fl_merge_block 291 0.6089 vc_client_old.native lwt_glib_get_sources 266 0.5566 vc_client_old.native camlHashtbl__find_1093 216 0.4519 [vdso] (tgid:12906 range:0x7fffba5fc000-0x7fffba5fd000) [vdso] (tgid:12906 range:0x7fffba5fc000-0x7fffba5fd000) 216 0.4519 vc_client_old.native compare_val 215 0.4499 vc_client_old.native camlLwt_glib__enter_1079 205 0.4289 vc_client_old.native camlUq_gtk__fun_1484 202 0.4227 vc_client_old.native caml_alloc_string 200 0.4185 vc_client_old.native caml_c_call 193 0.4038 vc_client_old.native caml_oldify_one 192 0.4017 vc_client_old.native caml_stash_backtrace 187 0.3913 vc_client_old.native camlMedia_ctrl__fun_3055 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-04-04 10:53 ` Hans Ole Rafaelsen @ 2012-04-04 11:30 ` Gabriel Scherer 2012-04-04 12:55 ` Gerd Stolpmann 2012-04-04 13:55 ` [Caml-list] GC speed for custom blocks, was: " Gerd Stolpmann 0 siblings, 2 replies; 16+ messages in thread From: Gabriel Scherer @ 2012-04-04 11:30 UTC (permalink / raw) To: Hans Ole Rafaelsen; +Cc: caml-list May your program leak one of those GTK resources? The effectiveness of your patch seems to indicate that you have a lot of one of these values allocated (and that they were requesting the GC much too frequently). The patch solves the CPU usage induced by additional GC, but does not change the reason why those GC were launched: apparently your code allocates a lot of those resources. If there indeed is a leak in your program, it will use more and more memory even if you fix the CPU-usage effect. An interesting side-effect of your patch is that you could, by selectively disabling some of the change you made (eg. by changing Val_g_boxed but not Val_g_boxed_new), isolate which of those resources were provoking the increased CPU usage, because it was allocated in high number. (Usual candidates that provoke leak are global data structures that store references to your data. A closure will also reference the data corresponding to the variables it captures, so storing closures in such tables can be an indirect cause for "leaks". Do you have global tables of callbacks or values for GTK-land?) On Wed, Apr 4, 2012 at 12:53 PM, Hans Ole Rafaelsen <hrafaelsen@gmail.com> wrote: > Hi, > > Thanks for your suggestions. I tried to patch lablgtk2 with: > > --- src/ml_gdkpixbuf.c.orig 2012-04-03 13:56:29.618264702 +0200 > +++ src/ml_gdkpixbuf.c 2012-04-03 13:56:58.106263510 +0200 > @@ -119,7 +119,7 @@ > value ret; > if (pb == NULL) ml_raise_null_pointer(); > ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb, > - 100, 1000); > + 0, 1); > p = Data_custom_val (ret); > *p = ref ? g_object_ref (pb) : pb; > return ret; > > --- src/ml_gobject.c.orig 2012-04-03 15:40:11.002004506 +0200 > +++ src/ml_gobject.c 2012-04-03 15:41:04.938002250 +0200 > @@ -219,7 +219,7 @@ > CAMLprim value ml_g_value_new(void) > { > value ret = alloc_custom(&ml_custom_GValue, > sizeof(value)+sizeof(GValue), > - 20, 1000); > + 0, 1); > /* create an MLPointer */ > Field(ret,1) = (value)2; > ((GValue*)&Field(ret,2))->g_type = 0; > @@ -272,14 +272,14 @@ > custom_serialize_default, custom_deserialize_default }; > CAMLprim value Val_gboxed(GType t, gpointer p) > { > - value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000); > + value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1); > Store_pointer(ret, g_boxed_copy (t,p)); > Field(ret,2) = (value)t; > return ret; > } > CAMLprim value Val_gboxed_new(GType t, gpointer p) > { > - value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000); > + value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1); > Store_pointer(ret, p); > Field(ret,2) = (value)t; > return ret; > > > > At startup is uses > top - 16:40:27 up 1 day, 7:01, 28 users, load average: 0.47, 0.50, 0.35 > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > Cpu(s): 4.8%us, 1.3%sy, 0.0%ni, 93.6%id, 0.2%wa, 0.0%hi, 0.1%si, > 0.0%st > Mem: 4004736k total, 3617960k used, 386776k free, 130704k buffers > Swap: 4070396k total, 9244k used, 4061152k free, 1730344k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 10275 hans 20 0 529m 77m 13m S 14 2.0 0:01.66 > vc_client.nativ > > and 12 hours later > top - 04:40:07 up 1 day, 19:01, 35 users, load average: 0.00, 0.01, 0.05 > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > Cpu(s): 20.2%us, 3.4%sy, 0.0%ni, 76.1%id, 0.1%wa, 0.0%hi, 0.2%si, > 0.0%st > Mem: 4004736k total, 3828308k used, 176428k free, 143928k buffers > Swap: 4070396k total, 10708k used, 4059688k free, 1756524k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 10275 hans 20 0 534m 82m 13m S 17 2.1 110:11.19 > vc_client.nativ > > Without the patch > top - 22:05:38 up 1 day, 12:26, 34 users, load average: 0.35, 0.16, 0.13 > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > Cpu(s): 5.6%us, 1.5%sy, 0.0%ni, 92.6%id, 0.2%wa, 0.0%hi, 0.1%si, > 0.0%st > Mem: 4004736k total, 3868136k used, 136600k free, 140900k buffers > Swap: 4070396k total, 9680k used, 4060716k free, 1837500k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 25111 hans 20 0 453m 76m 13m S 14 2.0 0:13.68 vc_client_old.n > > top - 10:05:19 up 2 days, 26 min, 35 users, load average: 0.01, 0.04, 0.05 > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > Cpu(s): 20.4%us, 3.2%sy, 0.0%ni, 75.8%id, 0.4%wa, 0.0%hi, 0.2%si, > 0.0%st > Mem: 4004736k total, 3830596k used, 174140k free, 261692k buffers > Swap: 4070396k total, 13640k used, 4056756k free, 1640452k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 25111 hans 20 0 453m 76m 13m S 49 2.0 263:05.34 > vc_client_old.n > > So from this it seems that with the patch it still uses more and more CPU, > but at a much lower rate. However, it seems to increase memory usage with > the patch. I also tried to patch the wrappers.h file, but the memory > consumption just exploded. > > So it is working better, but still not good enough. Is there some way to > prevent this kind of behavior? That is, no extra memory usage and no extra > CPU usage. > > I have attached some additional profiling if that would be of any interest. > In short it seems to be that it is the GC that is consuming the CPU. > > Best, > > Hans Ole > > > On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon <vouillon@pps.jussieu.fr> > wrote: >> >> On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote: >> > This reminds me of a problem I had with a specific C binding (for >> > mysql), >> > years ago. That binding allocated custom blocks with badly chosen >> > parameters used/max (see the docs for caml_alloc_custom in >> > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If >> > the >> > ratio used/max is > 0, these parameters accelerate the GC. If the custom >> > blocks are frequently allocated, this can have a dramatic effect, even >> > for >> > quite small used/max ratios. The solution was to change the code, and to >> > set used=0 and max=1. >> > >> > This type of problem would match your observation that the GC works more >> > and more the longer the program runs, i.e. the more custom blocks have >> > been allocated. >> > >> > The problem basically also exists with bigarrays - with >> > used=<size_of_bigarary> and max=256M (hardcoded). >> >> I have also observed this with input-output channels (in_channel and >> out_channel), where used = 1 and max = 1000. A full major GC is >> performed every time a thousand files are opened, which can result on >> a significant overhead when you open lot of files and the heap is >> large. >> >> -- Jerome > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-04-04 11:30 ` Gabriel Scherer @ 2012-04-04 12:55 ` Gerd Stolpmann 2012-04-07 13:27 ` Hans Ole Rafaelsen 2012-04-04 13:55 ` [Caml-list] GC speed for custom blocks, was: " Gerd Stolpmann 1 sibling, 1 reply; 16+ messages in thread From: Gerd Stolpmann @ 2012-04-04 12:55 UTC (permalink / raw) To: Gabriel Scherer; +Cc: Hans Ole Rafaelsen, caml-list Am Mittwoch, den 04.04.2012, 13:30 +0200 schrieb Gabriel Scherer: > May your program leak one of those GTK resources? > > The effectiveness of your patch seems to indicate that you have a lot > of one of these values allocated (and that they were requesting the GC > much too frequently). The patch solves the CPU usage induced by > additional GC, but does not change the reason why those GC were > launched: apparently your code allocates a lot of those resources. If > there indeed is a leak in your program, it will use more and more > memory even if you fix the CPU-usage effect. > > An interesting side-effect of your patch is that you could, by > selectively disabling some of the change you made (eg. by changing > Val_g_boxed but not Val_g_boxed_new), isolate which of those resources > were provoking the increased CPU usage, because it was allocated in > high number. Or just increment a counter for each type. Gerd > (Usual candidates that provoke leak are global data structures that > store references to your data. A closure will also reference the data > corresponding to the variables it captures, so storing closures in > such tables can be an indirect cause for "leaks". Do you have global > tables of callbacks or values for GTK-land?) > > On Wed, Apr 4, 2012 at 12:53 PM, Hans Ole Rafaelsen > <hrafaelsen@gmail.com> wrote: > > Hi, > > > > Thanks for your suggestions. I tried to patch lablgtk2 with: > > > > --- src/ml_gdkpixbuf.c.orig 2012-04-03 13:56:29.618264702 +0200 > > +++ src/ml_gdkpixbuf.c 2012-04-03 13:56:58.106263510 +0200 > > @@ -119,7 +119,7 @@ > > value ret; > > if (pb == NULL) ml_raise_null_pointer(); > > ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb, > > - 100, 1000); > > + 0, 1); > > p = Data_custom_val (ret); > > *p = ref ? g_object_ref (pb) : pb; > > return ret; > > > > --- src/ml_gobject.c.orig 2012-04-03 15:40:11.002004506 +0200 > > +++ src/ml_gobject.c 2012-04-03 15:41:04.938002250 +0200 > > @@ -219,7 +219,7 @@ > > CAMLprim value ml_g_value_new(void) > > { > > value ret = alloc_custom(&ml_custom_GValue, > > sizeof(value)+sizeof(GValue), > > - 20, 1000); > > + 0, 1); > > /* create an MLPointer */ > > Field(ret,1) = (value)2; > > ((GValue*)&Field(ret,2))->g_type = 0; > > @@ -272,14 +272,14 @@ > > custom_serialize_default, custom_deserialize_default }; > > CAMLprim value Val_gboxed(GType t, gpointer p) > > { > > - value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000); > > + value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1); > > Store_pointer(ret, g_boxed_copy (t,p)); > > Field(ret,2) = (value)t; > > return ret; > > } > > CAMLprim value Val_gboxed_new(GType t, gpointer p) > > { > > - value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000); > > + value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1); > > Store_pointer(ret, p); > > Field(ret,2) = (value)t; > > return ret; > > > > > > > > At startup is uses > > top - 16:40:27 up 1 day, 7:01, 28 users, load average: 0.47, 0.50, 0.35 > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > Cpu(s): 4.8%us, 1.3%sy, 0.0%ni, 93.6%id, 0.2%wa, 0.0%hi, 0.1%si, > > 0.0%st > > Mem: 4004736k total, 3617960k used, 386776k free, 130704k buffers > > Swap: 4070396k total, 9244k used, 4061152k free, 1730344k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > 10275 hans 20 0 529m 77m 13m S 14 2.0 0:01.66 > > vc_client.nativ > > > > and 12 hours later > > top - 04:40:07 up 1 day, 19:01, 35 users, load average: 0.00, 0.01, 0.05 > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > Cpu(s): 20.2%us, 3.4%sy, 0.0%ni, 76.1%id, 0.1%wa, 0.0%hi, 0.2%si, > > 0.0%st > > Mem: 4004736k total, 3828308k used, 176428k free, 143928k buffers > > Swap: 4070396k total, 10708k used, 4059688k free, 1756524k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > 10275 hans 20 0 534m 82m 13m S 17 2.1 110:11.19 > > vc_client.nativ > > > > Without the patch > > top - 22:05:38 up 1 day, 12:26, 34 users, load average: 0.35, 0.16, 0.13 > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > Cpu(s): 5.6%us, 1.5%sy, 0.0%ni, 92.6%id, 0.2%wa, 0.0%hi, 0.1%si, > > 0.0%st > > Mem: 4004736k total, 3868136k used, 136600k free, 140900k buffers > > Swap: 4070396k total, 9680k used, 4060716k free, 1837500k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > 25111 hans 20 0 453m 76m 13m S 14 2.0 0:13.68 vc_client_old.n > > > > top - 10:05:19 up 2 days, 26 min, 35 users, load average: 0.01, 0.04, 0.05 > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > Cpu(s): 20.4%us, 3.2%sy, 0.0%ni, 75.8%id, 0.4%wa, 0.0%hi, 0.2%si, > > 0.0%st > > Mem: 4004736k total, 3830596k used, 174140k free, 261692k buffers > > Swap: 4070396k total, 13640k used, 4056756k free, 1640452k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > 25111 hans 20 0 453m 76m 13m S 49 2.0 263:05.34 > > vc_client_old.n > > > > So from this it seems that with the patch it still uses more and more CPU, > > but at a much lower rate. However, it seems to increase memory usage with > > the patch. I also tried to patch the wrappers.h file, but the memory > > consumption just exploded. > > > > So it is working better, but still not good enough. Is there some way to > > prevent this kind of behavior? That is, no extra memory usage and no extra > > CPU usage. > > > > I have attached some additional profiling if that would be of any interest. > > In short it seems to be that it is the GC that is consuming the CPU. > > > > Best, > > > > Hans Ole > > > > > > On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon <vouillon@pps.jussieu.fr> > > wrote: > >> > >> On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote: > >> > This reminds me of a problem I had with a specific C binding (for > >> > mysql), > >> > years ago. That binding allocated custom blocks with badly chosen > >> > parameters used/max (see the docs for caml_alloc_custom in > >> > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If > >> > the > >> > ratio used/max is > 0, these parameters accelerate the GC. If the custom > >> > blocks are frequently allocated, this can have a dramatic effect, even > >> > for > >> > quite small used/max ratios. The solution was to change the code, and to > >> > set used=0 and max=1. > >> > > >> > This type of problem would match your observation that the GC works more > >> > and more the longer the program runs, i.e. the more custom blocks have > >> > been allocated. > >> > > >> > The problem basically also exists with bigarrays - with > >> > used=<size_of_bigarary> and max=256M (hardcoded). > >> > >> I have also observed this with input-output channels (in_channel and > >> out_channel), where used = 1 and max = 1000. A full major GC is > >> performed every time a thousand files are opened, which can result on > >> a significant overhead when you open lot of files and the heap is > >> large. > >> > >> -- Jerome > > > > > > -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Strategies for finding memory leaks 2012-04-04 12:55 ` Gerd Stolpmann @ 2012-04-07 13:27 ` Hans Ole Rafaelsen 0 siblings, 0 replies; 16+ messages in thread From: Hans Ole Rafaelsen @ 2012-04-07 13:27 UTC (permalink / raw) To: Gerd Stolpmann; +Cc: Gabriel Scherer, caml-list [-- Attachment #1: Type: text/plain, Size: 8912 bytes --] So just to be clear, it seems like I'm allocating lots of objects of a kind that I don't free. I have been trying to tracking down this in my ML part that use the library. You suggest that trying to have a counter in the C binding part of the library and count each time a object is created (and maybe each time it is destroyed) might be a better option. I have not worked much with ML<->C binding code, just want to be sure that this might be a proper way to do it before I start. -- Hans Ole On Wed, Apr 4, 2012 at 2:55 PM, Gerd Stolpmann <info@gerd-stolpmann.de>wrote: > Am Mittwoch, den 04.04.2012, 13:30 +0200 schrieb Gabriel Scherer: > > May your program leak one of those GTK resources? > > > > The effectiveness of your patch seems to indicate that you have a lot > > of one of these values allocated (and that they were requesting the GC > > much too frequently). The patch solves the CPU usage induced by > > additional GC, but does not change the reason why those GC were > > launched: apparently your code allocates a lot of those resources. If > > there indeed is a leak in your program, it will use more and more > > memory even if you fix the CPU-usage effect. > > > > An interesting side-effect of your patch is that you could, by > > selectively disabling some of the change you made (eg. by changing > > Val_g_boxed but not Val_g_boxed_new), isolate which of those resources > > were provoking the increased CPU usage, because it was allocated in > > high number. > > Or just increment a counter for each type. > > Gerd > > > (Usual candidates that provoke leak are global data structures that > > store references to your data. A closure will also reference the data > > corresponding to the variables it captures, so storing closures in > > such tables can be an indirect cause for "leaks". Do you have global > > tables of callbacks or values for GTK-land?) > > > > On Wed, Apr 4, 2012 at 12:53 PM, Hans Ole Rafaelsen > > <hrafaelsen@gmail.com> wrote: > > > Hi, > > > > > > Thanks for your suggestions. I tried to patch lablgtk2 with: > > > > > > --- src/ml_gdkpixbuf.c.orig 2012-04-03 13:56:29.618264702 +0200 > > > +++ src/ml_gdkpixbuf.c 2012-04-03 13:56:58.106263510 +0200 > > > @@ -119,7 +119,7 @@ > > > value ret; > > > if (pb == NULL) ml_raise_null_pointer(); > > > ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb, > > > - 100, 1000); > > > + 0, 1); > > > p = Data_custom_val (ret); > > > *p = ref ? g_object_ref (pb) : pb; > > > return ret; > > > > > > --- src/ml_gobject.c.orig 2012-04-03 15:40:11.002004506 +0200 > > > +++ src/ml_gobject.c 2012-04-03 15:41:04.938002250 +0200 > > > @@ -219,7 +219,7 @@ > > > CAMLprim value ml_g_value_new(void) > > > { > > > value ret = alloc_custom(&ml_custom_GValue, > > > sizeof(value)+sizeof(GValue), > > > - 20, 1000); > > > + 0, 1); > > > /* create an MLPointer */ > > > Field(ret,1) = (value)2; > > > ((GValue*)&Field(ret,2))->g_type = 0; > > > @@ -272,14 +272,14 @@ > > > custom_serialize_default, custom_deserialize_default }; > > > CAMLprim value Val_gboxed(GType t, gpointer p) > > > { > > > - value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, > 1000); > > > + value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, > 1); > > > Store_pointer(ret, g_boxed_copy (t,p)); > > > Field(ret,2) = (value)t; > > > return ret; > > > } > > > CAMLprim value Val_gboxed_new(GType t, gpointer p) > > > { > > > - value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, > 1000); > > > + value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, > 1); > > > Store_pointer(ret, p); > > > Field(ret,2) = (value)t; > > > return ret; > > > > > > > > > > > > At startup is uses > > > top - 16:40:27 up 1 day, 7:01, 28 users, load average: 0.47, 0.50, > 0.35 > > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > > Cpu(s): 4.8%us, 1.3%sy, 0.0%ni, 93.6%id, 0.2%wa, 0.0%hi, 0.1%si, > > > 0.0%st > > > Mem: 4004736k total, 3617960k used, 386776k free, 130704k > buffers > > > Swap: 4070396k total, 9244k used, 4061152k free, 1730344k cached > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > > COMMAND > > > 10275 hans 20 0 529m 77m 13m S 14 2.0 0:01.66 > > > vc_client.nativ > > > > > > and 12 hours later > > > top - 04:40:07 up 1 day, 19:01, 35 users, load average: 0.00, 0.01, > 0.05 > > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > > Cpu(s): 20.2%us, 3.4%sy, 0.0%ni, 76.1%id, 0.1%wa, 0.0%hi, 0.2%si, > > > 0.0%st > > > Mem: 4004736k total, 3828308k used, 176428k free, 143928k > buffers > > > Swap: 4070396k total, 10708k used, 4059688k free, 1756524k cached > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > > COMMAND > > > 10275 hans 20 0 534m 82m 13m S 17 2.1 110:11.19 > > > vc_client.nativ > > > > > > Without the patch > > > top - 22:05:38 up 1 day, 12:26, 34 users, load average: 0.35, 0.16, > 0.13 > > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > > Cpu(s): 5.6%us, 1.5%sy, 0.0%ni, 92.6%id, 0.2%wa, 0.0%hi, 0.1%si, > > > 0.0%st > > > Mem: 4004736k total, 3868136k used, 136600k free, 140900k > buffers > > > Swap: 4070396k total, 9680k used, 4060716k free, 1837500k cached > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > > COMMAND > > > 25111 hans 20 0 453m 76m 13m S 14 2.0 0:13.68 > vc_client_old.n > > > > > > top - 10:05:19 up 2 days, 26 min, 35 users, load average: 0.01, 0.04, > 0.05 > > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > > Cpu(s): 20.4%us, 3.2%sy, 0.0%ni, 75.8%id, 0.4%wa, 0.0%hi, 0.2%si, > > > 0.0%st > > > Mem: 4004736k total, 3830596k used, 174140k free, 261692k > buffers > > > Swap: 4070396k total, 13640k used, 4056756k free, 1640452k cached > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > > COMMAND > > > 25111 hans 20 0 453m 76m 13m S 49 2.0 263:05.34 > > > vc_client_old.n > > > > > > So from this it seems that with the patch it still uses more and more > CPU, > > > but at a much lower rate. However, it seems to increase memory usage > with > > > the patch. I also tried to patch the wrappers.h file, but the memory > > > consumption just exploded. > > > > > > So it is working better, but still not good enough. Is there some way > to > > > prevent this kind of behavior? That is, no extra memory usage and no > extra > > > CPU usage. > > > > > > I have attached some additional profiling if that would be of any > interest. > > > In short it seems to be that it is the GC that is consuming the CPU. > > > > > > Best, > > > > > > Hans Ole > > > > > > > > > On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon < > vouillon@pps.jussieu.fr> > > > wrote: > > >> > > >> On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote: > > >> > This reminds me of a problem I had with a specific C binding (for > > >> > mysql), > > >> > years ago. That binding allocated custom blocks with badly chosen > > >> > parameters used/max (see the docs for caml_alloc_custom in > > >> > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). > If > > >> > the > > >> > ratio used/max is > 0, these parameters accelerate the GC. If the > custom > > >> > blocks are frequently allocated, this can have a dramatic effect, > even > > >> > for > > >> > quite small used/max ratios. The solution was to change the code, > and to > > >> > set used=0 and max=1. > > >> > > > >> > This type of problem would match your observation that the GC works > more > > >> > and more the longer the program runs, i.e. the more custom blocks > have > > >> > been allocated. > > >> > > > >> > The problem basically also exists with bigarrays - with > > >> > used=<size_of_bigarary> and max=256M (hardcoded). > > >> > > >> I have also observed this with input-output channels (in_channel and > > >> out_channel), where used = 1 and max = 1000. A full major GC is > > >> performed every time a thousand files are opened, which can result on > > >> a significant overhead when you open lot of files and the heap is > > >> large. > > >> > > >> -- Jerome > > > > > > > > > > > > -- > ------------------------------------------------------------ > Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de > Creator of GODI and camlcity.org. > Contact details: http://www.camlcity.org/contact.html > Company homepage: http://www.gerd-stolpmann.de > *** Searching for new projects! Need consulting for system > *** programming in Ocaml? Gerd Stolpmann can help you. > ------------------------------------------------------------ > > [-- Attachment #2: Type: text/html, Size: 11385 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Caml-list] GC speed for custom blocks, was: Strategies for finding memory leaks 2012-04-04 11:30 ` Gabriel Scherer 2012-04-04 12:55 ` Gerd Stolpmann @ 2012-04-04 13:55 ` Gerd Stolpmann 1 sibling, 0 replies; 16+ messages in thread From: Gerd Stolpmann @ 2012-04-04 13:55 UTC (permalink / raw) To: Gabriel Scherer; +Cc: Hans Ole Rafaelsen, caml-list I'm trying to think beyond this particular problem, and it seems to me that the current "GC speed control" for custom blocks is too simple. Right now, we can practically only base the speed on either - the size of a custom object relative to the expected maximum size of all custom objects of the type, or - the number of a custom objects relative to the maximum number of objects, or - we don't control the speed, and hope that the normal GC speed is sufficient to collect the custom blocks fast enough. The problem of the first two strategies is that we normally cannot predict how many objects we will create in a program, or how much memory we will spend, and that most C bindings just choose constants for used/max. For instance, the Lablgtk bindings assume that there are normally no more than 10 gdk pixbufs (max/used = 1000/100 = 10), i.e. after every 10 pixbuf allocations one GC cycle is performed. What about a different scheme: For each custom block we record the number of bytes allocated outside the heap (which is often known or can be well guessed), and when calculating the GC speed, the size of this external memory is just added to the numbers for the heap memory (i.e. the size of allocations in the last GC slice to caml_allocated_words and the total to caml_stat_heap_size), so we "virtually" consider the external memory as an extension of the heap. The effect is that the same speed control is used as for the heap (i.e. overhead-based control). An implementation could store the number of external bytes in the custom block (one more word is allocated than requested) - you need it for updating the total. Let's look a bit closer at the effect: - If the size of the external memory is moderate, the GC speed is linearly increased the more external memory is allocated. The good thing is that the speed increase is now relative to the size of the heap. If the heap is large anyway, the external memory almost does not count, and can never dominate the speed controls (which is a main problem with the current scheme). If the heap is small, the speed-up can be substantial, but this is not as problematic, because a small heap is quickly GC'ed. - If the size of the external memory is large relative to the heap, it will still dominate the speed control. The user could, however, still influence the speed by setting the overhead parameter. So, still better than the current scheme. Well, it's just an idea. Any comments? Gerd Am Mittwoch, den 04.04.2012, 13:30 +0200 schrieb Gabriel Scherer: > May your program leak one of those GTK resources? > > The effectiveness of your patch seems to indicate that you have a lot > of one of these values allocated (and that they were requesting the GC > much too frequently). The patch solves the CPU usage induced by > additional GC, but does not change the reason why those GC were > launched: apparently your code allocates a lot of those resources. If > there indeed is a leak in your program, it will use more and more > memory even if you fix the CPU-usage effect. > > An interesting side-effect of your patch is that you could, by > selectively disabling some of the change you made (eg. by changing > Val_g_boxed but not Val_g_boxed_new), isolate which of those resources > were provoking the increased CPU usage, because it was allocated in > high number. > > (Usual candidates that provoke leak are global data structures that > store references to your data. A closure will also reference the data > corresponding to the variables it captures, so storing closures in > such tables can be an indirect cause for "leaks". Do you have global > tables of callbacks or values for GTK-land?) > > On Wed, Apr 4, 2012 at 12:53 PM, Hans Ole Rafaelsen > <hrafaelsen@gmail.com> wrote: > > Hi, > > > > Thanks for your suggestions. I tried to patch lablgtk2 with: > > > > --- src/ml_gdkpixbuf.c.orig 2012-04-03 13:56:29.618264702 +0200 > > +++ src/ml_gdkpixbuf.c 2012-04-03 13:56:58.106263510 +0200 > > @@ -119,7 +119,7 @@ > > value ret; > > if (pb == NULL) ml_raise_null_pointer(); > > ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb, > > - 100, 1000); > > + 0, 1); > > p = Data_custom_val (ret); > > *p = ref ? g_object_ref (pb) : pb; > > return ret; > > > > --- src/ml_gobject.c.orig 2012-04-03 15:40:11.002004506 +0200 > > +++ src/ml_gobject.c 2012-04-03 15:41:04.938002250 +0200 > > @@ -219,7 +219,7 @@ > > CAMLprim value ml_g_value_new(void) > > { > > value ret = alloc_custom(&ml_custom_GValue, > > sizeof(value)+sizeof(GValue), > > - 20, 1000); > > + 0, 1); > > /* create an MLPointer */ > > Field(ret,1) = (value)2; > > ((GValue*)&Field(ret,2))->g_type = 0; > > @@ -272,14 +272,14 @@ > > custom_serialize_default, custom_deserialize_default }; > > CAMLprim value Val_gboxed(GType t, gpointer p) > > { > > - value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000); > > + value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1); > > Store_pointer(ret, g_boxed_copy (t,p)); > > Field(ret,2) = (value)t; > > return ret; > > } > > CAMLprim value Val_gboxed_new(GType t, gpointer p) > > { > > - value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000); > > + value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1); > > Store_pointer(ret, p); > > Field(ret,2) = (value)t; > > return ret; > > > > > > > > At startup is uses > > top - 16:40:27 up 1 day, 7:01, 28 users, load average: 0.47, 0.50, 0.35 > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > Cpu(s): 4.8%us, 1.3%sy, 0.0%ni, 93.6%id, 0.2%wa, 0.0%hi, 0.1%si, > > 0.0%st > > Mem: 4004736k total, 3617960k used, 386776k free, 130704k buffers > > Swap: 4070396k total, 9244k used, 4061152k free, 1730344k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > 10275 hans 20 0 529m 77m 13m S 14 2.0 0:01.66 > > vc_client.nativ > > > > and 12 hours later > > top - 04:40:07 up 1 day, 19:01, 35 users, load average: 0.00, 0.01, 0.05 > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > Cpu(s): 20.2%us, 3.4%sy, 0.0%ni, 76.1%id, 0.1%wa, 0.0%hi, 0.2%si, > > 0.0%st > > Mem: 4004736k total, 3828308k used, 176428k free, 143928k buffers > > Swap: 4070396k total, 10708k used, 4059688k free, 1756524k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > 10275 hans 20 0 534m 82m 13m S 17 2.1 110:11.19 > > vc_client.nativ > > > > Without the patch > > top - 22:05:38 up 1 day, 12:26, 34 users, load average: 0.35, 0.16, 0.13 > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > Cpu(s): 5.6%us, 1.5%sy, 0.0%ni, 92.6%id, 0.2%wa, 0.0%hi, 0.1%si, > > 0.0%st > > Mem: 4004736k total, 3868136k used, 136600k free, 140900k buffers > > Swap: 4070396k total, 9680k used, 4060716k free, 1837500k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > 25111 hans 20 0 453m 76m 13m S 14 2.0 0:13.68 vc_client_old.n > > > > top - 10:05:19 up 2 days, 26 min, 35 users, load average: 0.01, 0.04, 0.05 > > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > > Cpu(s): 20.4%us, 3.2%sy, 0.0%ni, 75.8%id, 0.4%wa, 0.0%hi, 0.2%si, > > 0.0%st > > Mem: 4004736k total, 3830596k used, 174140k free, 261692k buffers > > Swap: 4070396k total, 13640k used, 4056756k free, 1640452k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > 25111 hans 20 0 453m 76m 13m S 49 2.0 263:05.34 > > vc_client_old.n > > > > So from this it seems that with the patch it still uses more and more CPU, > > but at a much lower rate. However, it seems to increase memory usage with > > the patch. I also tried to patch the wrappers.h file, but the memory > > consumption just exploded. > > > > So it is working better, but still not good enough. Is there some way to > > prevent this kind of behavior? That is, no extra memory usage and no extra > > CPU usage. > > > > I have attached some additional profiling if that would be of any interest. > > In short it seems to be that it is the GC that is consuming the CPU. > > > > Best, > > > > Hans Ole > > > > > > On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon <vouillon@pps.jussieu.fr> > > wrote: > >> > >> On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote: > >> > This reminds me of a problem I had with a specific C binding (for > >> > mysql), > >> > years ago. That binding allocated custom blocks with badly chosen > >> > parameters used/max (see the docs for caml_alloc_custom in > >> > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If > >> > the > >> > ratio used/max is > 0, these parameters accelerate the GC. If the custom > >> > blocks are frequently allocated, this can have a dramatic effect, even > >> > for > >> > quite small used/max ratios. The solution was to change the code, and to > >> > set used=0 and max=1. > >> > > >> > This type of problem would match your observation that the GC works more > >> > and more the longer the program runs, i.e. the more custom blocks have > >> > been allocated. > >> > > >> > The problem basically also exists with bigarrays - with > >> > used=<size_of_bigarary> and max=256M (hardcoded). > >> > >> I have also observed this with input-output channels (in_channel and > >> out_channel), where used = 1 and max = 1000. A full major GC is > >> performed every time a thousand files are opened, which can result on > >> a significant overhead when you open lot of files and the heap is > >> large. > >> > >> -- Jerome > > > > > > -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-04-07 13:27 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-03-21 9:49 [Caml-list] Strategies for finding memory leaks Hans Ole Rafaelsen 2012-03-22 15:03 ` Goswin von Brederlow 2012-03-23 9:32 ` Hans Ole Rafaelsen 2012-03-24 14:00 ` Goswin von Brederlow 2012-04-01 19:57 ` Richard W.M. Jones 2012-04-02 8:15 ` Hans Ole Rafaelsen 2012-04-02 10:13 ` Richard W.M. Jones 2012-04-02 13:40 ` Hans Ole Rafaelsen 2012-04-02 11:26 ` John Carr 2012-04-03 10:42 ` Gerd Stolpmann 2012-04-03 12:13 ` Jerome Vouillon 2012-04-04 10:53 ` Hans Ole Rafaelsen 2012-04-04 11:30 ` Gabriel Scherer 2012-04-04 12:55 ` Gerd Stolpmann 2012-04-07 13:27 ` Hans Ole Rafaelsen 2012-04-04 13:55 ` [Caml-list] GC speed for custom blocks, was: " Gerd Stolpmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox