[Caml-list] Strategies for finding memory leaks

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

* [Caml-list] Strategies for finding memory leaks
@ 2012-03-21  9:49 Hans Ole Rafaelsen
  2012-03-22 15:03 ` Goswin von Brederlow
  2012-04-01 19:57 ` Richard W.M. Jones
  0 siblings, 2 replies; 16+ messages in thread
From: Hans Ole Rafaelsen @ 2012-03-21  9:49 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 3181 bytes --]

Hello,

Is there some tools / tricks that can be used to help find memory leaks?

I have trouble with an application, that when running for a long time,
starts to use a lot of CPU and consume more memory. It starts out by using
about 20% CPU (reported by top) and after 24 hours it has increased to 80%
usage. Also physical (RES) memory usage goes from 80M to 160M. The workload
for the application is the same the whole time.

Using OProfile (http://oprofile.sourceforge.net/news/) shows that that most
of the time is being spent doing memory management.

At startup:
CPU: Core 2, speed 2667 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 100000
samples  %        image name               symbol name
52764    22.3913  vc_client.native         mark_slice
33580    14.2502  vc_client.native         caml_page_table_lookup
25415    10.7853  vc_client.native         sweep_slice
10119     4.2942  vc_client.native         caml_fl_allocate
6423      2.7257  [vdso] (tgid:9015 range:0x7fff4256c000-0x7fff4256d000)
[vdso] (tgid:9015 range:0x7fff4256c000-0x7fff4256d000)
5233      2.2207  vc_client.native
camlLividisvc__Nalbuf_tools__replace_pattern_1033
2759      1.1708  vc_client.native         caml_iterate_global_roots
2728      1.1577  vc_client.native         caml_modify
2473      1.0495  vc_client.native         caml_oldify_one
2204      0.9353  vc_client.native
camlLividisvc__Nalbuf_bytestream__search_1047
2183      0.9264  vc_client.native         caml_darken
1935      0.8212  vc_client.native         caml_stash_backtrace
1843      0.7821  vc_client.native         caml_do_roots
1838      0.7800  vc_client.native         caml_delete_global_root

After ca. 24 hours run:
CPU: Core 2, speed 2667 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask
of 0x00 (Unhalted core cycles) count 100000
samples  %        image name               symbol name
1137401  56.2697  vc_client.native         mark_slice
405598   20.0658  vc_client.native         sweep_slice
399832   19.7806  vc_client.native         caml_page_table_lookup
10106     0.5000  vc_client.native         caml_fl_allocate
3548      0.1755  vc_client.native         caml_iterate_global_roots
3397      0.1681  [vdso] (tgid:26129 range:0x7fff747ff000-0x7fff74800000)
[vdso]
 (tgid:26129 range:0x7fff747ff000-0x7fff74800000)
2797      0.1384  vc_client.native
camlLividisvc__Nalbuf_tools__replace_
pattern_1033
2307      0.1141  vc_client.native
camlLividisvc__Nalbuf_bytestream__sea
rch_1047
2005      0.0992  vc_client.native         caml_oldify_local_roots
1786      0.0884  vc_client.native         caml_gc_stat
1441      0.0713  vc_client.native         caml_oldify_one
1163      0.0575  vc_client.native         caml_darken
1163      0.0575  vc_client.native         caml_fl_merge_block
1032      0.0511  vc_client.native         camlHashtbl__find_1093

The application uses several 3rd party libraries, including: LablGTK2,
OCamlNet, LWT and others.

Is there some clever trick that can by used to track down or get a hint of
what is causing this?

Thanks,

Hans Ole Rafaelsen

[-- Attachment #2: Type: text/html, Size: 4083 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-03-21  9:49 [Caml-list] Strategies for finding memory leaks Hans Ole Rafaelsen
@ 2012-03-22 15:03 ` Goswin von Brederlow
  2012-03-23  9:32   ` Hans Ole Rafaelsen
  2012-04-01 19:57 ` Richard W.M. Jones
  1 sibling, 1 reply; 16+ messages in thread
From: Goswin von Brederlow @ 2012-03-22 15:03 UTC (permalink / raw)
  To: Hans Ole Rafaelsen; +Cc: caml-list

Hans Ole Rafaelsen <hrafaelsen@gmail.com> writes:

> Hello,
>
> Is there some tools / tricks that can be used to help find memory leaks?
>
> I have trouble with an application, that when running for a long time, starts
> to use a lot of CPU and consume more memory. It starts out by using about 20%
> CPU (reported by top) and after 24 hours it has increased to 80% usage. Also
> physical (RES) memory usage goes from 80M to 160M. The workload for the
> application is the same the whole time.
>
> Using OProfile (http://oprofile.sourceforge.net/news/) shows that that most of
> the time is being spent doing memory management.
>
> At startup:
> CPU: Core 2, speed 2667 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
> of 0x00 (Unhalted core cycles) count 100000
> samples  %        image name               symbol name
> 52764    22.3913  vc_client.native         mark_slice
> 33580    14.2502  vc_client.native         caml_page_table_lookup
> 25415    10.7853  vc_client.native         sweep_slice
> 10119     4.2942  vc_client.native         caml_fl_allocate
> 6423      2.7257  [vdso] (tgid:9015 range:0x7fff4256c000-0x7fff4256d000) [vdso]
> (tgid:9015 range:0x7fff4256c000-0x7fff4256d000)
> 5233      2.2207  vc_client.native        
> camlLividisvc__Nalbuf_tools__replace_pattern_1033
> 2759      1.1708  vc_client.native         caml_iterate_global_roots
> 2728      1.1577  vc_client.native         caml_modify
> 2473      1.0495  vc_client.native         caml_oldify_one
> 2204      0.9353  vc_client.native        
> camlLividisvc__Nalbuf_bytestream__search_1047
> 2183      0.9264  vc_client.native         caml_darken
> 1935      0.8212  vc_client.native         caml_stash_backtrace
> 1843      0.7821  vc_client.native         caml_do_roots
> 1838      0.7800  vc_client.native         caml_delete_global_root
>
> After ca. 24 hours run:
> CPU: Core 2, speed 2667 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
> of 0x00 (Unhalted core cycles) count 100000
> samples  %        image name               symbol name
> 1137401  56.2697  vc_client.native         mark_slice
> 405598   20.0658  vc_client.native         sweep_slice
> 399832   19.7806  vc_client.native         caml_page_table_lookup
> 10106     0.5000  vc_client.native         caml_fl_allocate
> 3548      0.1755  vc_client.native         caml_iterate_global_roots
> 3397      0.1681  [vdso] (tgid:26129 range:0x7fff747ff000-0x7fff74800000)
> [vdso]
>  (tgid:26129 range:0x7fff747ff000-0x7fff74800000)
> 2797      0.1384  vc_client.native        
> camlLividisvc__Nalbuf_tools__replace_
> pattern_1033
> 2307      0.1141  vc_client.native        
> camlLividisvc__Nalbuf_bytestream__sea
> rch_1047
> 2005      0.0992  vc_client.native         caml_oldify_local_roots
> 1786      0.0884  vc_client.native         caml_gc_stat
> 1441      0.0713  vc_client.native         caml_oldify_one
> 1163      0.0575  vc_client.native         caml_darken
> 1163      0.0575  vc_client.native         caml_fl_merge_block
> 1032      0.0511  vc_client.native         camlHashtbl__find_1093
>
> The application uses several 3rd party libraries, including: LablGTK2,
> OCamlNet, LWT and others.
>
> Is there some clever trick that can by used to track down or get a hint of what
> is causing this?
>
> Thanks,
>
> Hans Ole Rafaelsen

Are you calling the GC manually somewhere in the code or in one of the
libs?

MfG
        Goswin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-03-22 15:03 ` Goswin von Brederlow
@ 2012-03-23  9:32   ` Hans Ole Rafaelsen
  2012-03-24 14:00     ` Goswin von Brederlow
  0 siblings, 1 reply; 16+ messages in thread
From: Hans Ole Rafaelsen @ 2012-03-23  9:32 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 3822 bytes --]

Hi,

I have tried to trigger Gc.compact but that only have limited effect for a
short time.

Hans Ole

On Thu, Mar 22, 2012 at 4:03 PM, Goswin von Brederlow <goswin-v-b@web.de>wrote:

> Hans Ole Rafaelsen <hrafaelsen@gmail.com> writes:
>
> > Hello,
> >
> > Is there some tools / tricks that can be used to help find memory leaks?
> >
> > I have trouble with an application, that when running for a long time,
> starts
> > to use a lot of CPU and consume more memory. It starts out by using
> about 20%
> > CPU (reported by top) and after 24 hours it has increased to 80% usage.
> Also
> > physical (RES) memory usage goes from 80M to 160M. The workload for the
> > application is the same the whole time.
> >
> > Using OProfile (http://oprofile.sourceforge.net/news/) shows that that
> most of
> > the time is being spent doing memory management.
> >
> > At startup:
> > CPU: Core 2, speed 2667 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
> unit mask
> > of 0x00 (Unhalted core cycles) count 100000
> > samples  %        image name               symbol name
> > 52764    22.3913  vc_client.native         mark_slice
> > 33580    14.2502  vc_client.native         caml_page_table_lookup
> > 25415    10.7853  vc_client.native         sweep_slice
> > 10119     4.2942  vc_client.native         caml_fl_allocate
> > 6423      2.7257  [vdso] (tgid:9015 range:0x7fff4256c000-0x7fff4256d000)
> [vdso]
> > (tgid:9015 range:0x7fff4256c000-0x7fff4256d000)
> > 5233      2.2207  vc_client.native
> > camlLividisvc__Nalbuf_tools__replace_pattern_1033
> > 2759      1.1708  vc_client.native         caml_iterate_global_roots
> > 2728      1.1577  vc_client.native         caml_modify
> > 2473      1.0495  vc_client.native         caml_oldify_one
> > 2204      0.9353  vc_client.native
> > camlLividisvc__Nalbuf_bytestream__search_1047
> > 2183      0.9264  vc_client.native         caml_darken
> > 1935      0.8212  vc_client.native         caml_stash_backtrace
> > 1843      0.7821  vc_client.native         caml_do_roots
> > 1838      0.7800  vc_client.native         caml_delete_global_root
> >
> > After ca. 24 hours run:
> > CPU: Core 2, speed 2667 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
> unit mask
> > of 0x00 (Unhalted core cycles) count 100000
> > samples  %        image name               symbol name
> > 1137401  56.2697  vc_client.native         mark_slice
> > 405598   20.0658  vc_client.native         sweep_slice
> > 399832   19.7806  vc_client.native         caml_page_table_lookup
> > 10106     0.5000  vc_client.native         caml_fl_allocate
> > 3548      0.1755  vc_client.native         caml_iterate_global_roots
> > 3397      0.1681  [vdso] (tgid:26129
> range:0x7fff747ff000-0x7fff74800000)
> > [vdso]
> >  (tgid:26129 range:0x7fff747ff000-0x7fff74800000)
> > 2797      0.1384  vc_client.native
> > camlLividisvc__Nalbuf_tools__replace_
> > pattern_1033
> > 2307      0.1141  vc_client.native
> > camlLividisvc__Nalbuf_bytestream__sea
> > rch_1047
> > 2005      0.0992  vc_client.native         caml_oldify_local_roots
> > 1786      0.0884  vc_client.native         caml_gc_stat
> > 1441      0.0713  vc_client.native         caml_oldify_one
> > 1163      0.0575  vc_client.native         caml_darken
> > 1163      0.0575  vc_client.native         caml_fl_merge_block
> > 1032      0.0511  vc_client.native         camlHashtbl__find_1093
> >
> > The application uses several 3rd party libraries, including: LablGTK2,
> > OCamlNet, LWT and others.
> >
> > Is there some clever trick that can by used to track down or get a hint
> of what
> > is causing this?
> >
> > Thanks,
> >
> > Hans Ole Rafaelsen
>
> Are you calling the GC manually somewhere in the code or in one of the
> libs?
>
> MfG
>         Goswin
>

[-- Attachment #2: Type: text/html, Size: 5173 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-03-23  9:32   ` Hans Ole Rafaelsen
@ 2012-03-24 14:00     ` Goswin von Brederlow
  0 siblings, 0 replies; 16+ messages in thread
From: Goswin von Brederlow @ 2012-03-24 14:00 UTC (permalink / raw)
  To: caml-list

Hans Ole Rafaelsen <hrafaelsen@gmail.com> writes:

> Hi,
>
> I have tried to trigger Gc.compact but that only have limited effect for a
> short time.
>
> Hans Ole

My thinking was rather that something triggers the GC manually and over
time starts to do that far too often so more and more time is spend in
the GC.

If it isn't triggered manually then custom types can also give the GC a
wrong idea about how much memory is used/wasted. But that is also just
an idea of where to look.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-03-21  9:49 [Caml-list] Strategies for finding memory leaks Hans Ole Rafaelsen
  2012-03-22 15:03 ` Goswin von Brederlow
@ 2012-04-01 19:57 ` Richard W.M. Jones
  2012-04-02  8:15   ` Hans Ole Rafaelsen
  1 sibling, 1 reply; 16+ messages in thread
From: Richard W.M. Jones @ 2012-04-01 19:57 UTC (permalink / raw)
  To: Hans Ole Rafaelsen; +Cc: caml-list

On Wed, Mar 21, 2012 at 10:49:22AM +0100, Hans Ole Rafaelsen wrote:
> Hello,
> 
> Is there some tools / tricks that can be used to help find memory leaks?
[...]
> The application uses several 3rd party libraries, including: LablGTK2,
> OCamlNet, LWT and others.

If you suspect a leak in the *C* part of your program or these
libraries, then I am very successfully using valgrind on my OCaml
(native) programs to track such leaks.

https://github.com/libguestfs/libguestfs/blob/95123a9144edc1ddb3ad67d75276ca3af70eddb8/tests/extra/Makefile.am

Note that several of the programs tested are written in OCaml, and see
also the 'suppressions' file in the same directory.

However this will not find leaks in OCaml code.  Although I was
pleasantly surprised the other day when valgrind pointed correctly to
a use of an uninitialized string in some OCaml code.

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-04-01 19:57 ` Richard W.M. Jones
@ 2012-04-02  8:15   ` Hans Ole Rafaelsen
  2012-04-02 10:13     ` Richard W.M. Jones
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Hans Ole Rafaelsen @ 2012-04-02  8:15 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1956 bytes --]

Hi,

We have done some further investigation, and it does not seem to be memory
leakage. The application is a video streaming application, where we use
external processes to encode and decode the video. By configuring it in
different ways, we have found the problem to be in the code that is reading
the decoded video from an external process and displaying it using
LablGTK2. The previous thought memory leakage seems to be memory used for
buffers when the system is under high load, and this is not reclaimed by
the OS. When running on low frame rate the application does not use more
memory than at startup.

However, the application still consumes more and more CPU time. And it
seems to happen in the GC. Apart from that, the application seems to be
just fine. So it seems to be something in our code (or in LablGTK) that is
making the GC spend more and more time. Anyone experienced this kind of
problem?

Best,

Hans Ole Rafaelsen

On Sun, Apr 1, 2012 at 9:57 PM, Richard W.M. Jones <rich@annexia.org> wrote:

>
> On Wed, Mar 21, 2012 at 10:49:22AM +0100, Hans Ole Rafaelsen wrote:
> > Hello,
> >
> > Is there some tools / tricks that can be used to help find memory leaks?
> [...]
> > The application uses several 3rd party libraries, including: LablGTK2,
> > OCamlNet, LWT and others.
>
> If you suspect a leak in the *C* part of your program or these
> libraries, then I am very successfully using valgrind on my OCaml
> (native) programs to track such leaks.
>
>
> https://github.com/libguestfs/libguestfs/blob/95123a9144edc1ddb3ad67d75276ca3af70eddb8/tests/extra/Makefile.am
>
> Note that several of the programs tested are written in OCaml, and see
> also the 'suppressions' file in the same directory.
>
> However this will not find leaks in OCaml code.  Although I was
> pleasantly surprised the other day when valgrind pointed correctly to
> a use of an uninitialized string in some OCaml code.
>
> Rich.
>
> --
> Richard Jones
> Red Hat
>

[-- Attachment #2: Type: text/html, Size: 2565 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-04-02  8:15   ` Hans Ole Rafaelsen
@ 2012-04-02 10:13     ` Richard W.M. Jones
  2012-04-02 13:40       ` Hans Ole Rafaelsen
  2012-04-02 11:26     ` John Carr
  2012-04-03 10:42     ` Gerd Stolpmann
  2 siblings, 1 reply; 16+ messages in thread
From: Richard W.M. Jones @ 2012-04-02 10:13 UTC (permalink / raw)
  To: Hans Ole Rafaelsen; +Cc: caml-list


On Mon, Apr 02, 2012 at 10:15:02AM +0200, Hans Ole Rafaelsen wrote:
> However, the application still consumes more and more CPU time. And it
> seems to happen in the GC. Apart from that, the application seems to be
> just fine. So it seems to be something in our code (or in LablGTK) that is
> making the GC spend more and more time. Anyone experienced this kind of
> problem?

You're not swapping are you?  (Run 'vmstat 1' & look at the si/so columns)

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-04-02 10:13     ` Richard W.M. Jones
@ 2012-04-02 13:40       ` Hans Ole Rafaelsen
  0 siblings, 0 replies; 16+ messages in thread
From: Hans Ole Rafaelsen @ 2012-04-02 13:40 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

Just did another run to be sure. It does not do any swapping.

-- 
Hans Ole

On Mon, Apr 2, 2012 at 12:13 PM, Richard W.M. Jones <rich@annexia.org>wrote:

>
> On Mon, Apr 02, 2012 at 10:15:02AM +0200, Hans Ole Rafaelsen wrote:
> > However, the application still consumes more and more CPU time. And it
> > seems to happen in the GC. Apart from that, the application seems to be
> > just fine. So it seems to be something in our code (or in LablGTK) that
> is
> > making the GC spend more and more time. Anyone experienced this kind of
> > problem?
>
> You're not swapping are you?  (Run 'vmstat 1' & look at the si/so columns)
>
> Rich.
>
> --
> Richard Jones
> Red Hat
>

[-- Attachment #2: Type: text/html, Size: 1057 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-04-02  8:15   ` Hans Ole Rafaelsen
  2012-04-02 10:13     ` Richard W.M. Jones
@ 2012-04-02 11:26     ` John Carr
  2012-04-03 10:42     ` Gerd Stolpmann
  2 siblings, 0 replies; 16+ messages in thread
From: John Carr @ 2012-04-02 11:26 UTC (permalink / raw)
  To: Hans Ole Rafaelsen; +Cc: caml-list

Cache misses count as CPU time.  I've seen people measure
parallel speedup by looking at CPU usage.  They get linear
increase in CPU use to eight threads even though the task
saturated memory bandwidth after the second thread.

If you have a way to measure instruction count you can compare
increase in instruction count in GC to increase in time in GC.
If instructions are constant while time increases you probably
have a memory access problem.  If instructions increase the
structure of references within your data may be forcing the
collector to do more work.

> However, the application still consumes more and more CPU time. And it
> seems to happen in the GC. Apart from that, the application seems to
> be just fine. So it seems to be something in our code (or in LablGTK)
> that is making the GC spend more and more time. Anyone experienced
> this kind of problem?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-04-02  8:15   ` Hans Ole Rafaelsen
  2012-04-02 10:13     ` Richard W.M. Jones
  2012-04-02 11:26     ` John Carr
@ 2012-04-03 10:42     ` Gerd Stolpmann
  2012-04-03 12:13       ` Jerome Vouillon
  2 siblings, 1 reply; 16+ messages in thread
From: Gerd Stolpmann @ 2012-04-03 10:42 UTC (permalink / raw)
  To: Hans Ole Rafaelsen; +Cc: Richard W.M. Jones, caml-list


> Hi,
>
> We have done some further investigation, and it does not seem to be memory
> leakage. The application is a video streaming application, where we use
> external processes to encode and decode the video. By configuring it in
> different ways, we have found the problem to be in the code that is
> reading
> the decoded video from an external process and displaying it using
> LablGTK2. The previous thought memory leakage seems to be memory used for
> buffers when the system is under high load, and this is not reclaimed by
> the OS. When running on low frame rate the application does not use more
> memory than at startup.
>
> However, the application still consumes more and more CPU time. And it
> seems to happen in the GC. Apart from that, the application seems to be
> just fine. So it seems to be something in our code (or in LablGTK) that is
> making the GC spend more and more time. Anyone experienced this kind of
> problem?

This reminds me of a problem I had with a specific C binding (for mysql),
years ago. That binding allocated custom blocks with badly chosen
parameters used/max (see the docs for caml_alloc_custom in
http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If the
ratio used/max is > 0, these parameters accelerate the GC. If the custom
blocks are frequently allocated, this can have a dramatic effect, even for
quite small used/max ratios. The solution was to change the code, and to
set used=0 and max=1.

This type of problem would match your observation that the GC works more
and more the longer the program runs, i.e. the more custom blocks have
been allocated.

The problem basically also exists with bigarrays - with
used=<size_of_bigarary> and max=256M (hardcoded).

Gerd


>
> Best,
>
> Hans Ole Rafaelsen
>
> On Sun, Apr 1, 2012 at 9:57 PM, Richard W.M. Jones <rich@annexia.org>
> wrote:
>
>>
>> On Wed, Mar 21, 2012 at 10:49:22AM +0100, Hans Ole Rafaelsen wrote:
>> > Hello,
>> >
>> > Is there some tools / tricks that can be used to help find memory
>> leaks?
>> [...]
>> > The application uses several 3rd party libraries, including: LablGTK2,
>> > OCamlNet, LWT and others.
>>
>> If you suspect a leak in the *C* part of your program or these
>> libraries, then I am very successfully using valgrind on my OCaml
>> (native) programs to track such leaks.
>>
>>
>> https://github.com/libguestfs/libguestfs/blob/95123a9144edc1ddb3ad67d75276ca3af70eddb8/tests/extra/Makefile.am
>>
>> Note that several of the programs tested are written in OCaml, and see
>> also the 'suppressions' file in the same directory.
>>
>> However this will not find leaks in OCaml code.  Although I was
>> pleasantly surprised the other day when valgrind pointed correctly to
>> a use of an uninitialized string in some OCaml code.
>>
>> Rich.
>>
>> --
>> Richard Jones
>> Red Hat
>>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>


-- 
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-04-03 10:42     ` Gerd Stolpmann
@ 2012-04-03 12:13       ` Jerome Vouillon
  2012-04-04 10:53         ` Hans Ole Rafaelsen
  0 siblings, 1 reply; 16+ messages in thread
From: Jerome Vouillon @ 2012-04-03 12:13 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Hans Ole Rafaelsen, Richard W.M. Jones, caml-list

On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote:
> This reminds me of a problem I had with a specific C binding (for mysql),
> years ago. That binding allocated custom blocks with badly chosen
> parameters used/max (see the docs for caml_alloc_custom in
> http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If the
> ratio used/max is > 0, these parameters accelerate the GC. If the custom
> blocks are frequently allocated, this can have a dramatic effect, even for
> quite small used/max ratios. The solution was to change the code, and to
> set used=0 and max=1.
> 
> This type of problem would match your observation that the GC works more
> and more the longer the program runs, i.e. the more custom blocks have
> been allocated.
> 
> The problem basically also exists with bigarrays - with
> used=<size_of_bigarary> and max=256M (hardcoded).

I have also observed this with input-output channels (in_channel and
out_channel), where used = 1 and max = 1000. A full major GC is
performed every time a thousand files are opened, which can result on
a significant overhead when you open lot of files and the heap is
large.

-- Jerome

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-04-03 12:13       ` Jerome Vouillon
@ 2012-04-04 10:53         ` Hans Ole Rafaelsen
  2012-04-04 11:30           ` Gabriel Scherer
  0 siblings, 1 reply; 16+ messages in thread
From: Hans Ole Rafaelsen @ 2012-04-04 10:53 UTC (permalink / raw)
  To: Jerome Vouillon; +Cc: Gerd Stolpmann, Richard W.M. Jones, caml-list


[-- Attachment #1.1: Type: text/plain, Size: 5561 bytes --]

Hi,

Thanks for your suggestions. I tried to patch lablgtk2 with:

--- src/ml_gdkpixbuf.c.orig     2012-04-03 13:56:29.618264702 +0200
+++ src/ml_gdkpixbuf.c  2012-04-03 13:56:58.106263510 +0200
@@ -119,7 +119,7 @@
   value ret;
   if (pb == NULL) ml_raise_null_pointer();
   ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb,
-                     100, 1000);
+                     0, 1);
   p = Data_custom_val (ret);
   *p = ref ? g_object_ref (pb) : pb;
   return ret;

--- src/ml_gobject.c.orig       2012-04-03 15:40:11.002004506 +0200
+++ src/ml_gobject.c    2012-04-03 15:41:04.938002250 +0200
@@ -219,7 +219,7 @@
 CAMLprim value ml_g_value_new(void)
 {
     value ret = alloc_custom(&ml_custom_GValue,
sizeof(value)+sizeof(GValue),
-                             20, 1000);
+                             0, 1);
     /* create an MLPointer */
     Field(ret,1) = (value)2;
     ((GValue*)&Field(ret,2))->g_type = 0;
@@ -272,14 +272,14 @@
   custom_serialize_default, custom_deserialize_default };
 CAMLprim value Val_gboxed(GType t, gpointer p)
 {
-    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000);
+    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1);
     Store_pointer(ret, g_boxed_copy (t,p));
     Field(ret,2) = (value)t;
     return ret;
 }
 CAMLprim value Val_gboxed_new(GType t, gpointer p)
 {
-    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000);
+    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1);
     Store_pointer(ret, p);
     Field(ret,2) = (value)t;
     return ret;



At startup is uses
top - 16:40:27 up 1 day,  7:01, 28 users,  load average: 0.47, 0.50, 0.35
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.8%us,  1.3%sy,  0.0%ni, 93.6%id,  0.2%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:   4004736k total,  3617960k used,   386776k free,   130704k buffers
Swap:  4070396k total,     9244k used,  4061152k free,  1730344k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
10275 hans      20   0  529m  77m  13m S   14  2.0   0:01.66
vc_client.nativ

and 12 hours later
top - 04:40:07 up 1 day, 19:01, 35 users,  load average: 0.00, 0.01, 0.05
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.2%us,  3.4%sy,  0.0%ni, 76.1%id,  0.1%wa,  0.0%hi,  0.2%si,
0.0%st
Mem:   4004736k total,  3828308k used,   176428k free,   143928k buffers
Swap:  4070396k total,    10708k used,  4059688k free,  1756524k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
10275 hans      20   0  534m  82m  13m S   17  2.1 110:11.19
vc_client.nativ

Without the patch
top - 22:05:38 up 1 day, 12:26, 34 users,  load average: 0.35, 0.16, 0.13
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.6%us,  1.5%sy,  0.0%ni, 92.6%id,  0.2%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:   4004736k total,  3868136k used,   136600k free,   140900k buffers
Swap:  4070396k total,     9680k used,  4060716k free,  1837500k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
25111 hans      20   0  453m  76m  13m S   14  2.0   0:13.68
vc_client_old.n

top - 10:05:19 up 2 days, 26 min, 35 users,  load average: 0.01, 0.04, 0.05
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.4%us,  3.2%sy,  0.0%ni, 75.8%id,  0.4%wa,  0.0%hi,  0.2%si,
0.0%st
Mem:   4004736k total,  3830596k used,   174140k free,   261692k buffers
Swap:  4070396k total,    13640k used,  4056756k free,  1640452k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
25111 hans      20   0  453m  76m  13m S   49  2.0 263:05.34
vc_client_old.n

So from this it seems that with the patch it still uses more and more CPU,
but at a much lower rate. However, it seems to increase memory usage with
the patch. I also tried to patch the wrappers.h file, but the memory
consumption just exploded.

So it is working better, but still not good enough. Is there some way to
prevent this kind of behavior? That is, no extra memory usage and no extra
CPU usage.

I have attached some additional profiling if that would be of any interest.
In short it seems to be that it is the GC that is consuming the CPU.

Best,

Hans Ole

On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon <vouillon@pps.jussieu.fr>wrote:

> On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote:
> > This reminds me of a problem I had with a specific C binding (for mysql),
> > years ago. That binding allocated custom blocks with badly chosen
> > parameters used/max (see the docs for caml_alloc_custom in
> > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If
> the
> > ratio used/max is > 0, these parameters accelerate the GC. If the custom
> > blocks are frequently allocated, this can have a dramatic effect, even
> for
> > quite small used/max ratios. The solution was to change the code, and to
> > set used=0 and max=1.
> >
> > This type of problem would match your observation that the GC works more
> > and more the longer the program runs, i.e. the more custom blocks have
> > been allocated.
> >
> > The problem basically also exists with bigarrays - with
> > used=<size_of_bigarary> and max=256M (hardcoded).
>
> I have also observed this with input-output channels (in_channel and
> out_channel), where used = 1 and max = 1000. A full major GC is
> performed every time a thousand files are opened, which can result on
> a significant overhead when you open lot of files and the heap is
> large.
>
> -- Jerome
>

[-- Attachment #1.2: Type: text/html, Size: 6515 bytes --]

[-- Attachment #2: MemoryProfiling.txt --]
[-- Type: text/plain, Size: 13629 bytes --]

vc_client.native has run for about 19 hours
vc_client_old.native has for about 14 hours

Tasks: 272 total,   3 running, 269 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.2%us,  3.2%sy,  0.0%ni, 76.3%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4004736k total,  3633672k used,   371064k free,   262916k buffers
Swap:  4070396k total,    14160k used,  4056236k free,  1417088k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                             
25111 hans      20   0  453m  76m  13m R   50  2.0 319:04.86 vc_client_old.n                                                                                                                      
10275 hans      20   0  553m  99m  13m S   19  2.5 188:30.50 
vc_client.nativ                                                                                                                     



vc_client_old.native
CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
230093   51.8571  vc_client_old.native     mark_slice
99659    22.4606  vc_client_old.native     sweep_slice
76563    17.2553  vc_client_old.native     caml_page_table_lookup
13215     2.9783  vc_client_old.native     caml_fl_merge_block
1814      0.4088  vc_client_old.native     truncate_flp
1013      0.2283  vc_client_old.native     caml_gc_stat
784       0.1767  vc_client_old.native     caml_iterate_global_roots
775       0.1747  vc_client_old.native     caml_fl_allocate
564       0.1271  vc_client_old.native     camlLividisvc__Nalbuf_tools__replace_pattern_1033
545       0.1228  vc_client_old.native     caml_oldify_local_roots
503       0.1134  vc_client_old.native     caml_modify
453       0.1021  vc_client_old.native     camlLividisvc__Nalbuf_bytestream__search_1047
446       0.1005  vc_client_old.native     caml_delete_global_root
412       0.0929  vc_client_old.native     caml_do_roots
343       0.0773  vc_client_old.native     caml_darken
306       0.0690  vc_client_old.native     lwt_glib_get_sources
297       0.0669  vc_client_old.native     caml_insert_global_root
250       0.0563  vc_client_old.native     camlUq_gtk__fun_1484
213       0.0480  vc_client_old.native     camlHashtbl__find_1093
205       0.0462  vc_client_old.native     compare_val
195       0.0439  vc_client_old.native     camlLwt_glib__enter_1079
191       0.0430  [vdso] (tgid:25111 range:0x7fff12dff000-0x7fff12e00000) [vdso] (tgid:25111 range:0x7fff12dff000-0x7fff12e00000)
190       0.0428  vc_client_old.native     caml_stash_backtrace
188       0.0424  vc_client_old.native     caml_oldify_one
185       0.0417  vc_client_old.native     camlHashtbl__replace_1112
179       0.0403  vc_client_old.native     caml_alloc_string
176       0.0397  vc_client_old.native     camlMedia_ctrl__fun_3055
176       0.0397  vc_client_old.native     caml_c_call
173       0.0390  vc_client_old.native     camlList__iter_1074
171       0.0385  vc_client_old.native     caml_apply2
164       0.0370  vc_client_old.native     caml_obj_dup
159       0.0358  vc_client_old.native     caml_send0


vc_client.native
CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
50295    45.3259  vc_client.native         mark_slice
17964    16.1892  vc_client.native         sweep_slice
14450    13.0224  vc_client.native         caml_page_table_lookup
2137      1.9259  vc_client.native         caml_fl_merge_block
1397      1.2590  vc_client.native         caml_fl_allocate
1305      1.1761  vc_client.native         caml_gc_stat
599       0.5398  vc_client.native         camlLividisvc__Nalbuf_tools__replace_pattern_1033
546       0.4921  vc_client.native         camlLividisvc__Nalbuf_bytestream__search_1047
544       0.4903  vc_client.native         caml_modify
540       0.4866  vc_client.native         caml_delete_global_root
422       0.3803  vc_client.native         caml_iterate_global_roots
404       0.3641  vc_client.native         truncate_flp
339       0.3055  vc_client.native         camlUq_gtk__fun_1484
327       0.2947  vc_client.native         camlHashtbl__find_1093
311       0.2803  vc_client.native         caml_insert_global_root
290       0.2613  vc_client.native         lwt_glib_get_sources
284       0.2559  vc_client.native         caml_oldify_local_roots
260       0.2343  vc_client.native         camlMedia_ctrl__fun_3055
260       0.2343  vc_client.native         camlUnixqueue_select__fun_2366
248       0.2235  vc_client.native         compare_val
239       0.2154  vc_client.native         caml_stash_backtrace
220       0.1983  [vdso] (tgid:10275 range:0x7fffea534000-0x7fffea535000) [vdso] (tgid:10275 range:0x7fffea534000-0x7fffea535000)
219       0.1974  vc_client.native         camlLwt_glib__enter_1079
206       0.1856  vc_client.native         camlList__iter_1074
202       0.1820  vc_client.native         caml_apply2
196       0.1766  vc_client.native         caml_c_call
194       0.1748  vc_client.native         caml_send0
180       0.1622  vc_client.native         camlHashtbl__replace_1112
180       0.1622  vc_client.native         caml_alloc_string
176       0.1586  vc_client.native         caml_obj_dup
176       0.1586  vc_client.native         caml_oldify_one
173       0.1559  vc_client.native         caml_remove_generational_global_root
171       0.1541  vc_client.native         caml_curry3_2




vc_client.native
CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated)
Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit 
mask of 0x41 (No unit mask) count 100000
samples  %        image name               symbol name
598      53.3452  vc_client.native         mark_slice
261      23.2828  vc_client.native         caml_page_table_lookup
23        2.0517  vc_client.native         sweep_slice
13        1.1597  vc_client.native         caml_fl_allocate
6         0.5352  vc_client.native         camlUq_gtk__fun_1484
5         0.4460  vc_client.native         caml_delete_global_root
4         0.3568  vc_client.native         camlEqueue__run_1119
4         0.3568  vc_client.native         camlHashtbl__find_1093
4         0.3568  vc_client.native         camlLividisvc__Nalbuf_bytestream__search_1047
4         0.3568  vc_client.native         camlMedia_ctrl__fun_3055
4         0.3568  vc_client.native         camlUnixqueue_select__method_run_1662
4         0.3568  vc_client.native         caml_callback_exn
4         0.3568  vc_client.native         caml_iterate_global_roots
4         0.3568  vc_client.native         caml_modify
3         0.2676  vc_client.native         camlAudio_encoder__transfer_audio_frame_1052
3         0.2676  vc_client.native         camlBitstring_structs__string_of_rtp_ext_1030
3         0.2676  vc_client.native         camlHashtbl__replace_1112
3         0.2676  vc_client.native         camlLwt_glib__leave_1085
3         0.2676  vc_client.native         camlSequencer__serialize_header_2231
3         0.2676  vc_client.native         camlUnixqueue_select__fun_2366
3         0.2676  vc_client.native         camlUq_gtk__code_begin
3         0.2676  vc_client.native         caml_apply2
3         0.2676  vc_client.native         caml_curry3_2
3         0.2676  vc_client.native         caml_gc_stat
3         0.2676  vc_client.native         caml_hash_univ_param
3         0.2676  vc_client.native         caml_send0
3         0.2676  vc_client.native         compare_val
2         0.1784  vc_client.native         camlBuffer__create_1039
2         0.1784  vc_client.native         camlList__iter_1074
2         0.1784  vc_client.native         camlLwt__wait_1290
2         0.1784  vc_client.native         camlLwt_sequence__loop_1098
2         0.1784  vc_client.native         camlMedia_ctrl__fun_3085
2         0.1784  vc_client.native         camlMedia_ctrl__receiver_cbfun_2123
2         0.1784  vc_client.native         camlPervasives__min_1022
2         0.1784  vc_client.native         camlProfiles__fec_percentage_1232
2         0.1784  vc_client.native         camlScanf__scan_1702
2         0.1784  vc_client.native         camlScanf__scan_format_1685
2         0.1784  vc_client.native         camlSequencer__my_push_2260
2         0.1784  vc_client.native         camlUnixqueue_select__fun_1864
2         0.1784  vc_client.native         camlUnixqueue_select__method_protect_1800
2         0.1784  vc_client.native         camlUnixqueue_select__method_uq_handler_1726
2         0.1784  vc_client.native         caml_curry4_1
2         0.1784  vc_client.native         caml_insert_global_root

vc_client_old.native
CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated)
Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 100000
samples  %        symbol name
2411     65.8563  mark_slice
923      25.2117  caml_page_table_lookup
101       2.7588  sweep_slice
10        0.2731  caml_iterate_global_roots
8         0.2185  caml_oldify_local_roots
7         0.1912  caml_fl_allocate
7         0.1912  caml_fl_merge_block
5         0.1366  camlHashtbl__replace_1112
5         0.1366  camlLividisvc__Nalbuf_bytestream__search_1047
5         0.1366  camlUq_gtk__fun_1484
5         0.1366  caml_delete_global_root
4         0.1093  camlLwt_glib__enter_1079
4         0.1093  caml_darken
4         0.1093  caml_send2
3         0.0819  camlBitstring___add_bits_1532
3         0.0819  camlEqueue__run_1119
3         0.0819  camlLwt_sequence__loop_1098
3         0.0819  camlSubstream_seqnum__code_begin
3         0.0819  caml_alloc_string
3         0.0819  compare_val
3         0.0819  lwt_glib_get_sources
2         0.0546  camlBitstring__add_byte_1515
2         0.0546  camlBuffer__create_1039
2         0.0546  camlGObj__method_draw_3325
2         0.0546  camlHashtbl__find_1093
2         0.0546  camlHashtbl__find_1198
2         0.0546  camlHashtbl__iter_1129
2         0.0546  camlList__iter_1074
2         0.0546  camlLividicore__Filter__eval_tree_1110
2         0.0546  camlMcsvc_types__add_header_1175
2         0.0546  camlMedia_ctrl__fun_3085
2         0.0546  camlPervasives__$5e_1112
2         0.0546  camlSubstream_seqnum__my_push_1197
2         0.0546  camlUnixqueue_select__fun_2314
2         0.0546  camlUnixqueue_util__dlogr_1060
2         0.0546  caml_apply2
2         0.0546  caml_apply3
2         0.0546  caml_c_call
2         0.0546  caml_curry3_1
2         0.0546  caml_do_roots
2         0.0546  caml_gc_stat
2         0.0546  caml_hash_univ_param
2         0.0546  caml_modify
2         0.0546  fdlist_to_fdset


at startup it looks like this:
top - 12:16:03 up 2 days,  2:37, 35 users,  load average: 0.02, 0.02, 0.05
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 19.7%us,  4.1%sy,  0.0%ni, 75.7%id,  0.2%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4004736k total,  3634484k used,   370252k free,   263976k buffers
Swap:  4070396k total,    14160k used,  4056236k free,  1424216k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                              
12906 hans      20   0  453m  76m  13m S   14  2.0   0:14.62 vc_client_old.n    

CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
12046    25.2045  vc_client_old.native     mark_slice
8590     17.9733  vc_client_old.native     caml_page_table_lookup
6311     13.2049  vc_client_old.native     sweep_slice
718       1.5023  vc_client_old.native     camlLividisvc__Nalbuf_tools__replace_pattern_1033
633       1.3245  vc_client_old.native     caml_fl_allocate
536       1.1215  vc_client_old.native     caml_modify
505       1.0566  vc_client_old.native     caml_do_roots
484       1.0127  vc_client_old.native     camlLividisvc__Nalbuf_bytestream__search_1047
483       1.0106  vc_client_old.native     caml_iterate_global_roots
471       0.9855  vc_client_old.native     caml_darken
412       0.8621  vc_client_old.native     caml_delete_global_root
371       0.7763  vc_client_old.native     caml_oldify_local_roots
323       0.6758  vc_client_old.native     caml_insert_global_root
304       0.6361  vc_client_old.native     caml_fl_merge_block
291       0.6089  vc_client_old.native     lwt_glib_get_sources
266       0.5566  vc_client_old.native     camlHashtbl__find_1093
216       0.4519  [vdso] (tgid:12906 range:0x7fffba5fc000-0x7fffba5fd000) [vdso] (tgid:12906 range:0x7fffba5fc000-0x7fffba5fd000)
216       0.4519  vc_client_old.native     compare_val
215       0.4499  vc_client_old.native     camlLwt_glib__enter_1079
205       0.4289  vc_client_old.native     camlUq_gtk__fun_1484
202       0.4227  vc_client_old.native     caml_alloc_string
200       0.4185  vc_client_old.native     caml_c_call
193       0.4038  vc_client_old.native     caml_oldify_one
192       0.4017  vc_client_old.native     caml_stash_backtrace
187       0.3913  vc_client_old.native     camlMedia_ctrl__fun_3055



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-04-04 10:53         ` Hans Ole Rafaelsen
@ 2012-04-04 11:30           ` Gabriel Scherer
  2012-04-04 12:55             ` Gerd Stolpmann
  2012-04-04 13:55             ` [Caml-list] GC speed for custom blocks, was: " Gerd Stolpmann
  0 siblings, 2 replies; 16+ messages in thread
From: Gabriel Scherer @ 2012-04-04 11:30 UTC (permalink / raw)
  To: Hans Ole Rafaelsen; +Cc: caml-list

May your program leak one of those GTK resources?

The effectiveness of your patch seems to indicate that you have a lot
of one of these values allocated (and that they were requesting the GC
much too frequently). The patch solves the CPU usage induced by
additional GC, but does not change the reason why those GC were
launched: apparently your code allocates a lot of those resources. If
there indeed is a leak in your program, it will use more and more
memory even if you fix the CPU-usage effect.

An interesting side-effect of your patch is that you could, by
selectively disabling some of the change you made (eg. by changing
Val_g_boxed but not Val_g_boxed_new), isolate which of those resources
were provoking the increased CPU usage, because it was allocated in
high number.

(Usual candidates that provoke leak are global data structures that
store references to your data. A closure will also reference the data
corresponding to the variables it captures, so storing closures in
such tables can be an indirect cause for "leaks". Do you have global
tables of callbacks or values for GTK-land?)

On Wed, Apr 4, 2012 at 12:53 PM, Hans Ole Rafaelsen
<hrafaelsen@gmail.com> wrote:
> Hi,
>
> Thanks for your suggestions. I tried to patch lablgtk2 with:
>
> --- src/ml_gdkpixbuf.c.orig     2012-04-03 13:56:29.618264702 +0200
> +++ src/ml_gdkpixbuf.c  2012-04-03 13:56:58.106263510 +0200
> @@ -119,7 +119,7 @@
>    value ret;
>    if (pb == NULL) ml_raise_null_pointer();
>    ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb,
> -                     100, 1000);
> +                     0, 1);
>    p = Data_custom_val (ret);
>    *p = ref ? g_object_ref (pb) : pb;
>    return ret;
>
> --- src/ml_gobject.c.orig       2012-04-03 15:40:11.002004506 +0200
> +++ src/ml_gobject.c    2012-04-03 15:41:04.938002250 +0200
> @@ -219,7 +219,7 @@
>  CAMLprim value ml_g_value_new(void)
>  {
>      value ret = alloc_custom(&ml_custom_GValue,
> sizeof(value)+sizeof(GValue),
> -                             20, 1000);
> +                             0, 1);
>      /* create an MLPointer */
>      Field(ret,1) = (value)2;
>      ((GValue*)&Field(ret,2))->g_type = 0;
> @@ -272,14 +272,14 @@
>    custom_serialize_default, custom_deserialize_default };
>  CAMLprim value Val_gboxed(GType t, gpointer p)
>  {
> -    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000);
> +    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1);
>      Store_pointer(ret, g_boxed_copy (t,p));
>      Field(ret,2) = (value)t;
>      return ret;
>  }
>  CAMLprim value Val_gboxed_new(GType t, gpointer p)
>  {
> -    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000);
> +    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1);
>      Store_pointer(ret, p);
>      Field(ret,2) = (value)t;
>      return ret;
>
>
>
> At startup is uses
> top - 16:40:27 up 1 day,  7:01, 28 users,  load average: 0.47, 0.50, 0.35
> Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> Cpu(s):  4.8%us,  1.3%sy,  0.0%ni, 93.6%id,  0.2%wa,  0.0%hi,  0.1%si,
> 0.0%st
> Mem:   4004736k total,  3617960k used,   386776k free,   130704k buffers
> Swap:  4070396k total,     9244k used,  4061152k free,  1730344k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
> 10275 hans      20   0  529m  77m  13m S   14  2.0   0:01.66
> vc_client.nativ
>
> and 12 hours later
> top - 04:40:07 up 1 day, 19:01, 35 users,  load average: 0.00, 0.01, 0.05
> Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> Cpu(s): 20.2%us,  3.4%sy,  0.0%ni, 76.1%id,  0.1%wa,  0.0%hi,  0.2%si,
> 0.0%st
> Mem:   4004736k total,  3828308k used,   176428k free,   143928k buffers
> Swap:  4070396k total,    10708k used,  4059688k free,  1756524k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
> 10275 hans      20   0  534m  82m  13m S   17  2.1 110:11.19
> vc_client.nativ
>
> Without the patch
> top - 22:05:38 up 1 day, 12:26, 34 users,  load average: 0.35, 0.16, 0.13
> Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> Cpu(s):  5.6%us,  1.5%sy,  0.0%ni, 92.6%id,  0.2%wa,  0.0%hi,  0.1%si,
> 0.0%st
> Mem:   4004736k total,  3868136k used,   136600k free,   140900k buffers
> Swap:  4070396k total,     9680k used,  4060716k free,  1837500k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
> 25111 hans      20   0  453m  76m  13m S   14  2.0   0:13.68 vc_client_old.n
>
> top - 10:05:19 up 2 days, 26 min, 35 users,  load average: 0.01, 0.04, 0.05
> Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> Cpu(s): 20.4%us,  3.2%sy,  0.0%ni, 75.8%id,  0.4%wa,  0.0%hi,  0.2%si,
> 0.0%st
> Mem:   4004736k total,  3830596k used,   174140k free,   261692k buffers
> Swap:  4070396k total,    13640k used,  4056756k free,  1640452k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
> 25111 hans      20   0  453m  76m  13m S   49  2.0 263:05.34
> vc_client_old.n
>
> So from this it seems that with the patch it still uses more and more CPU,
> but at a much lower rate. However, it seems to increase memory usage with
> the patch. I also tried to patch the wrappers.h file, but the memory
> consumption just exploded.
>
> So it is working better, but still not good enough. Is there some way to
> prevent this kind of behavior? That is, no extra memory usage and no extra
> CPU usage.
>
> I have attached some additional profiling if that would be of any interest.
> In short it seems to be that it is the GC that is consuming the CPU.
>
> Best,
>
> Hans Ole
>
>
> On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon <vouillon@pps.jussieu.fr>
> wrote:
>>
>> On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote:
>> > This reminds me of a problem I had with a specific C binding (for
>> > mysql),
>> > years ago. That binding allocated custom blocks with badly chosen
>> > parameters used/max (see the docs for caml_alloc_custom in
>> > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If
>> > the
>> > ratio used/max is > 0, these parameters accelerate the GC. If the custom
>> > blocks are frequently allocated, this can have a dramatic effect, even
>> > for
>> > quite small used/max ratios. The solution was to change the code, and to
>> > set used=0 and max=1.
>> >
>> > This type of problem would match your observation that the GC works more
>> > and more the longer the program runs, i.e. the more custom blocks have
>> > been allocated.
>> >
>> > The problem basically also exists with bigarrays - with
>> > used=<size_of_bigarary> and max=256M (hardcoded).
>>
>> I have also observed this with input-output channels (in_channel and
>> out_channel), where used = 1 and max = 1000. A full major GC is
>> performed every time a thousand files are opened, which can result on
>> a significant overhead when you open lot of files and the heap is
>> large.
>>
>> -- Jerome
>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-04-04 11:30           ` Gabriel Scherer
@ 2012-04-04 12:55             ` Gerd Stolpmann
  2012-04-07 13:27               ` Hans Ole Rafaelsen
  2012-04-04 13:55             ` [Caml-list] GC speed for custom blocks, was: " Gerd Stolpmann
  1 sibling, 1 reply; 16+ messages in thread
From: Gerd Stolpmann @ 2012-04-04 12:55 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: Hans Ole Rafaelsen, caml-list

Am Mittwoch, den 04.04.2012, 13:30 +0200 schrieb Gabriel Scherer:
> May your program leak one of those GTK resources?
> 
> The effectiveness of your patch seems to indicate that you have a lot
> of one of these values allocated (and that they were requesting the GC
> much too frequently). The patch solves the CPU usage induced by
> additional GC, but does not change the reason why those GC were
> launched: apparently your code allocates a lot of those resources. If
> there indeed is a leak in your program, it will use more and more
> memory even if you fix the CPU-usage effect.
> 
> An interesting side-effect of your patch is that you could, by
> selectively disabling some of the change you made (eg. by changing
> Val_g_boxed but not Val_g_boxed_new), isolate which of those resources
> were provoking the increased CPU usage, because it was allocated in
> high number.

Or just increment a counter for each type.

Gerd

> (Usual candidates that provoke leak are global data structures that
> store references to your data. A closure will also reference the data
> corresponding to the variables it captures, so storing closures in
> such tables can be an indirect cause for "leaks". Do you have global
> tables of callbacks or values for GTK-land?)
> 
> On Wed, Apr 4, 2012 at 12:53 PM, Hans Ole Rafaelsen
> <hrafaelsen@gmail.com> wrote:
> > Hi,
> >
> > Thanks for your suggestions. I tried to patch lablgtk2 with:
> >
> > --- src/ml_gdkpixbuf.c.orig     2012-04-03 13:56:29.618264702 +0200
> > +++ src/ml_gdkpixbuf.c  2012-04-03 13:56:58.106263510 +0200
> > @@ -119,7 +119,7 @@
> >    value ret;
> >    if (pb == NULL) ml_raise_null_pointer();
> >    ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb,
> > -                     100, 1000);
> > +                     0, 1);
> >    p = Data_custom_val (ret);
> >    *p = ref ? g_object_ref (pb) : pb;
> >    return ret;
> >
> > --- src/ml_gobject.c.orig       2012-04-03 15:40:11.002004506 +0200
> > +++ src/ml_gobject.c    2012-04-03 15:41:04.938002250 +0200
> > @@ -219,7 +219,7 @@
> >  CAMLprim value ml_g_value_new(void)
> >  {
> >      value ret = alloc_custom(&ml_custom_GValue,
> > sizeof(value)+sizeof(GValue),
> > -                             20, 1000);
> > +                             0, 1);
> >      /* create an MLPointer */
> >      Field(ret,1) = (value)2;
> >      ((GValue*)&Field(ret,2))->g_type = 0;
> > @@ -272,14 +272,14 @@
> >    custom_serialize_default, custom_deserialize_default };
> >  CAMLprim value Val_gboxed(GType t, gpointer p)
> >  {
> > -    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000);
> > +    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1);
> >      Store_pointer(ret, g_boxed_copy (t,p));
> >      Field(ret,2) = (value)t;
> >      return ret;
> >  }
> >  CAMLprim value Val_gboxed_new(GType t, gpointer p)
> >  {
> > -    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000);
> > +    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1);
> >      Store_pointer(ret, p);
> >      Field(ret,2) = (value)t;
> >      return ret;
> >
> >
> >
> > At startup is uses
> > top - 16:40:27 up 1 day,  7:01, 28 users,  load average: 0.47, 0.50, 0.35
> > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > Cpu(s):  4.8%us,  1.3%sy,  0.0%ni, 93.6%id,  0.2%wa,  0.0%hi,  0.1%si,
> > 0.0%st
> > Mem:   4004736k total,  3617960k used,   386776k free,   130704k buffers
> > Swap:  4070396k total,     9244k used,  4061152k free,  1730344k cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > COMMAND
> > 10275 hans      20   0  529m  77m  13m S   14  2.0   0:01.66
> > vc_client.nativ
> >
> > and 12 hours later
> > top - 04:40:07 up 1 day, 19:01, 35 users,  load average: 0.00, 0.01, 0.05
> > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > Cpu(s): 20.2%us,  3.4%sy,  0.0%ni, 76.1%id,  0.1%wa,  0.0%hi,  0.2%si,
> > 0.0%st
> > Mem:   4004736k total,  3828308k used,   176428k free,   143928k buffers
> > Swap:  4070396k total,    10708k used,  4059688k free,  1756524k cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > COMMAND
> > 10275 hans      20   0  534m  82m  13m S   17  2.1 110:11.19
> > vc_client.nativ
> >
> > Without the patch
> > top - 22:05:38 up 1 day, 12:26, 34 users,  load average: 0.35, 0.16, 0.13
> > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > Cpu(s):  5.6%us,  1.5%sy,  0.0%ni, 92.6%id,  0.2%wa,  0.0%hi,  0.1%si,
> > 0.0%st
> > Mem:   4004736k total,  3868136k used,   136600k free,   140900k buffers
> > Swap:  4070396k total,     9680k used,  4060716k free,  1837500k cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > COMMAND
> > 25111 hans      20   0  453m  76m  13m S   14  2.0   0:13.68 vc_client_old.n
> >
> > top - 10:05:19 up 2 days, 26 min, 35 users,  load average: 0.01, 0.04, 0.05
> > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > Cpu(s): 20.4%us,  3.2%sy,  0.0%ni, 75.8%id,  0.4%wa,  0.0%hi,  0.2%si,
> > 0.0%st
> > Mem:   4004736k total,  3830596k used,   174140k free,   261692k buffers
> > Swap:  4070396k total,    13640k used,  4056756k free,  1640452k cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > COMMAND
> > 25111 hans      20   0  453m  76m  13m S   49  2.0 263:05.34
> > vc_client_old.n
> >
> > So from this it seems that with the patch it still uses more and more CPU,
> > but at a much lower rate. However, it seems to increase memory usage with
> > the patch. I also tried to patch the wrappers.h file, but the memory
> > consumption just exploded.
> >
> > So it is working better, but still not good enough. Is there some way to
> > prevent this kind of behavior? That is, no extra memory usage and no extra
> > CPU usage.
> >
> > I have attached some additional profiling if that would be of any interest.
> > In short it seems to be that it is the GC that is consuming the CPU.
> >
> > Best,
> >
> > Hans Ole
> >
> >
> > On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon <vouillon@pps.jussieu.fr>
> > wrote:
> >>
> >> On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote:
> >> > This reminds me of a problem I had with a specific C binding (for
> >> > mysql),
> >> > years ago. That binding allocated custom blocks with badly chosen
> >> > parameters used/max (see the docs for caml_alloc_custom in
> >> > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If
> >> > the
> >> > ratio used/max is > 0, these parameters accelerate the GC. If the custom
> >> > blocks are frequently allocated, this can have a dramatic effect, even
> >> > for
> >> > quite small used/max ratios. The solution was to change the code, and to
> >> > set used=0 and max=1.
> >> >
> >> > This type of problem would match your observation that the GC works more
> >> > and more the longer the program runs, i.e. the more custom blocks have
> >> > been allocated.
> >> >
> >> > The problem basically also exists with bigarrays - with
> >> > used=<size_of_bigarary> and max=256M (hardcoded).
> >>
> >> I have also observed this with input-output channels (in_channel and
> >> out_channel), where used = 1 and max = 1000. A full major GC is
> >> performed every time a thousand files are opened, which can result on
> >> a significant overhead when you open lot of files and the heap is
> >> large.
> >>
> >> -- Jerome
> >
> >
> 
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Strategies for finding memory leaks
  2012-04-04 12:55             ` Gerd Stolpmann
@ 2012-04-07 13:27               ` Hans Ole Rafaelsen
  0 siblings, 0 replies; 16+ messages in thread
From: Hans Ole Rafaelsen @ 2012-04-07 13:27 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Gabriel Scherer, caml-list

[-- Attachment #1: Type: text/plain, Size: 8912 bytes --]

So just to be clear, it seems like I'm allocating lots of objects of a kind
that I don't free. I have been trying to tracking down this in my ML part
that use the library. You suggest that trying to have a counter in the C
binding part of the library and count each time a object is created (and
maybe each time it is destroyed) might be a better option. I have not
worked much with ML<->C binding code, just want to be sure that this might
be a proper way to do it before I start.

-- 
Hans Ole

On Wed, Apr 4, 2012 at 2:55 PM, Gerd Stolpmann <info@gerd-stolpmann.de>wrote:

> Am Mittwoch, den 04.04.2012, 13:30 +0200 schrieb Gabriel Scherer:
> > May your program leak one of those GTK resources?
> >
> > The effectiveness of your patch seems to indicate that you have a lot
> > of one of these values allocated (and that they were requesting the GC
> > much too frequently). The patch solves the CPU usage induced by
> > additional GC, but does not change the reason why those GC were
> > launched: apparently your code allocates a lot of those resources. If
> > there indeed is a leak in your program, it will use more and more
> > memory even if you fix the CPU-usage effect.
> >
> > An interesting side-effect of your patch is that you could, by
> > selectively disabling some of the change you made (eg. by changing
> > Val_g_boxed but not Val_g_boxed_new), isolate which of those resources
> > were provoking the increased CPU usage, because it was allocated in
> > high number.
>
> Or just increment a counter for each type.
>
> Gerd
>
> > (Usual candidates that provoke leak are global data structures that
> > store references to your data. A closure will also reference the data
> > corresponding to the variables it captures, so storing closures in
> > such tables can be an indirect cause for "leaks". Do you have global
> > tables of callbacks or values for GTK-land?)
> >
> > On Wed, Apr 4, 2012 at 12:53 PM, Hans Ole Rafaelsen
> > <hrafaelsen@gmail.com> wrote:
> > > Hi,
> > >
> > > Thanks for your suggestions. I tried to patch lablgtk2 with:
> > >
> > > --- src/ml_gdkpixbuf.c.orig     2012-04-03 13:56:29.618264702 +0200
> > > +++ src/ml_gdkpixbuf.c  2012-04-03 13:56:58.106263510 +0200
> > > @@ -119,7 +119,7 @@
> > >    value ret;
> > >    if (pb == NULL) ml_raise_null_pointer();
> > >    ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb,
> > > -                     100, 1000);
> > > +                     0, 1);
> > >    p = Data_custom_val (ret);
> > >    *p = ref ? g_object_ref (pb) : pb;
> > >    return ret;
> > >
> > > --- src/ml_gobject.c.orig       2012-04-03 15:40:11.002004506 +0200
> > > +++ src/ml_gobject.c    2012-04-03 15:41:04.938002250 +0200
> > > @@ -219,7 +219,7 @@
> > >  CAMLprim value ml_g_value_new(void)
> > >  {
> > >      value ret = alloc_custom(&ml_custom_GValue,
> > > sizeof(value)+sizeof(GValue),
> > > -                             20, 1000);
> > > +                             0, 1);
> > >      /* create an MLPointer */
> > >      Field(ret,1) = (value)2;
> > >      ((GValue*)&Field(ret,2))->g_type = 0;
> > > @@ -272,14 +272,14 @@
> > >    custom_serialize_default, custom_deserialize_default };
> > >  CAMLprim value Val_gboxed(GType t, gpointer p)
> > >  {
> > > -    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10,
> 1000);
> > > +    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0,
> 1);
> > >      Store_pointer(ret, g_boxed_copy (t,p));
> > >      Field(ret,2) = (value)t;
> > >      return ret;
> > >  }
> > >  CAMLprim value Val_gboxed_new(GType t, gpointer p)
> > >  {
> > > -    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10,
> 1000);
> > > +    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0,
> 1);
> > >      Store_pointer(ret, p);
> > >      Field(ret,2) = (value)t;
> > >      return ret;
> > >
> > >
> > >
> > > At startup is uses
> > > top - 16:40:27 up 1 day,  7:01, 28 users,  load average: 0.47, 0.50,
> 0.35
> > > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > > Cpu(s):  4.8%us,  1.3%sy,  0.0%ni, 93.6%id,  0.2%wa,  0.0%hi,  0.1%si,
> > > 0.0%st
> > > Mem:   4004736k total,  3617960k used,   386776k free,   130704k
> buffers
> > > Swap:  4070396k total,     9244k used,  4061152k free,  1730344k cached
> > >
> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > > COMMAND
> > > 10275 hans      20   0  529m  77m  13m S   14  2.0   0:01.66
> > > vc_client.nativ
> > >
> > > and 12 hours later
> > > top - 04:40:07 up 1 day, 19:01, 35 users,  load average: 0.00, 0.01,
> 0.05
> > > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > > Cpu(s): 20.2%us,  3.4%sy,  0.0%ni, 76.1%id,  0.1%wa,  0.0%hi,  0.2%si,
> > > 0.0%st
> > > Mem:   4004736k total,  3828308k used,   176428k free,   143928k
> buffers
> > > Swap:  4070396k total,    10708k used,  4059688k free,  1756524k cached
> > >
> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > > COMMAND
> > > 10275 hans      20   0  534m  82m  13m S   17  2.1 110:11.19
> > > vc_client.nativ
> > >
> > > Without the patch
> > > top - 22:05:38 up 1 day, 12:26, 34 users,  load average: 0.35, 0.16,
> 0.13
> > > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > > Cpu(s):  5.6%us,  1.5%sy,  0.0%ni, 92.6%id,  0.2%wa,  0.0%hi,  0.1%si,
> > > 0.0%st
> > > Mem:   4004736k total,  3868136k used,   136600k free,   140900k
> buffers
> > > Swap:  4070396k total,     9680k used,  4060716k free,  1837500k cached
> > >
> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > > COMMAND
> > > 25111 hans      20   0  453m  76m  13m S   14  2.0   0:13.68
> vc_client_old.n
> > >
> > > top - 10:05:19 up 2 days, 26 min, 35 users,  load average: 0.01, 0.04,
> 0.05
> > > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > > Cpu(s): 20.4%us,  3.2%sy,  0.0%ni, 75.8%id,  0.4%wa,  0.0%hi,  0.2%si,
> > > 0.0%st
> > > Mem:   4004736k total,  3830596k used,   174140k free,   261692k
> buffers
> > > Swap:  4070396k total,    13640k used,  4056756k free,  1640452k cached
> > >
> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > > COMMAND
> > > 25111 hans      20   0  453m  76m  13m S   49  2.0 263:05.34
> > > vc_client_old.n
> > >
> > > So from this it seems that with the patch it still uses more and more
> CPU,
> > > but at a much lower rate. However, it seems to increase memory usage
> with
> > > the patch. I also tried to patch the wrappers.h file, but the memory
> > > consumption just exploded.
> > >
> > > So it is working better, but still not good enough. Is there some way
> to
> > > prevent this kind of behavior? That is, no extra memory usage and no
> extra
> > > CPU usage.
> > >
> > > I have attached some additional profiling if that would be of any
> interest.
> > > In short it seems to be that it is the GC that is consuming the CPU.
> > >
> > > Best,
> > >
> > > Hans Ole
> > >
> > >
> > > On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon <
> vouillon@pps.jussieu.fr>
> > > wrote:
> > >>
> > >> On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote:
> > >> > This reminds me of a problem I had with a specific C binding (for
> > >> > mysql),
> > >> > years ago. That binding allocated custom blocks with badly chosen
> > >> > parameters used/max (see the docs for caml_alloc_custom in
> > >> > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144).
> If
> > >> > the
> > >> > ratio used/max is > 0, these parameters accelerate the GC. If the
> custom
> > >> > blocks are frequently allocated, this can have a dramatic effect,
> even
> > >> > for
> > >> > quite small used/max ratios. The solution was to change the code,
> and to
> > >> > set used=0 and max=1.
> > >> >
> > >> > This type of problem would match your observation that the GC works
> more
> > >> > and more the longer the program runs, i.e. the more custom blocks
> have
> > >> > been allocated.
> > >> >
> > >> > The problem basically also exists with bigarrays - with
> > >> > used=<size_of_bigarary> and max=256M (hardcoded).
> > >>
> > >> I have also observed this with input-output channels (in_channel and
> > >> out_channel), where used = 1 and max = 1000. A full major GC is
> > >> performed every time a thousand files are opened, which can result on
> > >> a significant overhead when you open lot of files and the heap is
> > >> large.
> > >>
> > >> -- Jerome
> > >
> > >
> >
> >
>
> --
> ------------------------------------------------------------
> Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
> Creator of GODI and camlcity.org.
> Contact details:        http://www.camlcity.org/contact.html
> Company homepage:       http://www.gerd-stolpmann.de
> *** Searching for new projects! Need consulting for system
> *** programming in Ocaml? Gerd Stolpmann can help you.
> ------------------------------------------------------------
>
>

[-- Attachment #2: Type: text/html, Size: 11385 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Caml-list] GC speed for custom blocks, was: Strategies for finding memory leaks
  2012-04-04 11:30           ` Gabriel Scherer
  2012-04-04 12:55             ` Gerd Stolpmann
@ 2012-04-04 13:55             ` Gerd Stolpmann
  1 sibling, 0 replies; 16+ messages in thread
From: Gerd Stolpmann @ 2012-04-04 13:55 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: Hans Ole Rafaelsen, caml-list

I'm trying to think beyond this particular problem, and it seems to me
that the current "GC speed control" for custom blocks is too simple.
Right now, we can practically only base the speed on either

 - the size of a custom object relative to the expected maximum
   size of all custom objects of the type, or
 - the number of a custom objects relative to the maximum number
   of objects, or
 - we don't control the speed, and hope that the normal GC speed
   is sufficient to collect the custom blocks fast enough.

The problem of the first two strategies is that we normally cannot
predict how many objects we will create in a program, or how much memory
we will spend, and that most C bindings just choose constants for
used/max. For instance, the Lablgtk bindings assume that there are
normally no more than 10 gdk pixbufs (max/used = 1000/100 = 10), i.e.
after every 10 pixbuf allocations one GC cycle is performed.

What about a different scheme: For each custom block we record the
number of bytes allocated outside the heap (which is often known or can
be well guessed), and when calculating the GC speed, the size of this
external memory is just added to the numbers for the heap memory (i.e.
the size of allocations in the last GC slice to caml_allocated_words and
the total to caml_stat_heap_size), so we "virtually" consider the
external memory as an extension of the heap. The effect is that the same
speed control is used as for the heap (i.e. overhead-based control).

An implementation could store the number of external bytes in the custom
block (one more word is allocated than requested) - you need it for
updating the total.

Let's look a bit closer at the effect:

- If the size of the external memory is moderate, the GC speed is 
  linearly increased the more external memory is allocated. The
  good thing is that the speed increase is now relative to the
  size of the heap. If the heap is large anyway, the external memory
  almost does not count, and can never dominate the speed controls
  (which is a main problem with the current scheme). If the heap
  is small, the speed-up can be substantial, but this is not as
  problematic, because a small heap is quickly GC'ed.
- If the size of the external memory is large relative to the heap,
  it will still dominate the speed control. The user could, however,
  still influence the speed by setting the overhead parameter. So,
  still better than the current scheme.

Well, it's just an idea. Any comments?

Gerd

Am Mittwoch, den 04.04.2012, 13:30 +0200 schrieb Gabriel Scherer:
> May your program leak one of those GTK resources?
> 
> The effectiveness of your patch seems to indicate that you have a lot
> of one of these values allocated (and that they were requesting the GC
> much too frequently). The patch solves the CPU usage induced by
> additional GC, but does not change the reason why those GC were
> launched: apparently your code allocates a lot of those resources. If
> there indeed is a leak in your program, it will use more and more
> memory even if you fix the CPU-usage effect.
> 
> An interesting side-effect of your patch is that you could, by
> selectively disabling some of the change you made (eg. by changing
> Val_g_boxed but not Val_g_boxed_new), isolate which of those resources
> were provoking the increased CPU usage, because it was allocated in
> high number.
> 
> (Usual candidates that provoke leak are global data structures that
> store references to your data. A closure will also reference the data
> corresponding to the variables it captures, so storing closures in
> such tables can be an indirect cause for "leaks". Do you have global
> tables of callbacks or values for GTK-land?)
> 
> On Wed, Apr 4, 2012 at 12:53 PM, Hans Ole Rafaelsen
> <hrafaelsen@gmail.com> wrote:
> > Hi,
> >
> > Thanks for your suggestions. I tried to patch lablgtk2 with:
> >
> > --- src/ml_gdkpixbuf.c.orig     2012-04-03 13:56:29.618264702 +0200
> > +++ src/ml_gdkpixbuf.c  2012-04-03 13:56:58.106263510 +0200
> > @@ -119,7 +119,7 @@
> >    value ret;
> >    if (pb == NULL) ml_raise_null_pointer();
> >    ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb,
> > -                     100, 1000);
> > +                     0, 1);
> >    p = Data_custom_val (ret);
> >    *p = ref ? g_object_ref (pb) : pb;
> >    return ret;
> >
> > --- src/ml_gobject.c.orig       2012-04-03 15:40:11.002004506 +0200
> > +++ src/ml_gobject.c    2012-04-03 15:41:04.938002250 +0200
> > @@ -219,7 +219,7 @@
> >  CAMLprim value ml_g_value_new(void)
> >  {
> >      value ret = alloc_custom(&ml_custom_GValue,
> > sizeof(value)+sizeof(GValue),
> > -                             20, 1000);
> > +                             0, 1);
> >      /* create an MLPointer */
> >      Field(ret,1) = (value)2;
> >      ((GValue*)&Field(ret,2))->g_type = 0;
> > @@ -272,14 +272,14 @@
> >    custom_serialize_default, custom_deserialize_default };
> >  CAMLprim value Val_gboxed(GType t, gpointer p)
> >  {
> > -    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000);
> > +    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1);
> >      Store_pointer(ret, g_boxed_copy (t,p));
> >      Field(ret,2) = (value)t;
> >      return ret;
> >  }
> >  CAMLprim value Val_gboxed_new(GType t, gpointer p)
> >  {
> > -    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000);
> > +    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1);
> >      Store_pointer(ret, p);
> >      Field(ret,2) = (value)t;
> >      return ret;
> >
> >
> >
> > At startup is uses
> > top - 16:40:27 up 1 day,  7:01, 28 users,  load average: 0.47, 0.50, 0.35
> > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > Cpu(s):  4.8%us,  1.3%sy,  0.0%ni, 93.6%id,  0.2%wa,  0.0%hi,  0.1%si,
> > 0.0%st
> > Mem:   4004736k total,  3617960k used,   386776k free,   130704k buffers
> > Swap:  4070396k total,     9244k used,  4061152k free,  1730344k cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > COMMAND
> > 10275 hans      20   0  529m  77m  13m S   14  2.0   0:01.66
> > vc_client.nativ
> >
> > and 12 hours later
> > top - 04:40:07 up 1 day, 19:01, 35 users,  load average: 0.00, 0.01, 0.05
> > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > Cpu(s): 20.2%us,  3.4%sy,  0.0%ni, 76.1%id,  0.1%wa,  0.0%hi,  0.2%si,
> > 0.0%st
> > Mem:   4004736k total,  3828308k used,   176428k free,   143928k buffers
> > Swap:  4070396k total,    10708k used,  4059688k free,  1756524k cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > COMMAND
> > 10275 hans      20   0  534m  82m  13m S   17  2.1 110:11.19
> > vc_client.nativ
> >
> > Without the patch
> > top - 22:05:38 up 1 day, 12:26, 34 users,  load average: 0.35, 0.16, 0.13
> > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > Cpu(s):  5.6%us,  1.5%sy,  0.0%ni, 92.6%id,  0.2%wa,  0.0%hi,  0.1%si,
> > 0.0%st
> > Mem:   4004736k total,  3868136k used,   136600k free,   140900k buffers
> > Swap:  4070396k total,     9680k used,  4060716k free,  1837500k cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > COMMAND
> > 25111 hans      20   0  453m  76m  13m S   14  2.0   0:13.68 vc_client_old.n
> >
> > top - 10:05:19 up 2 days, 26 min, 35 users,  load average: 0.01, 0.04, 0.05
> > Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> > Cpu(s): 20.4%us,  3.2%sy,  0.0%ni, 75.8%id,  0.4%wa,  0.0%hi,  0.2%si,
> > 0.0%st
> > Mem:   4004736k total,  3830596k used,   174140k free,   261692k buffers
> > Swap:  4070396k total,    13640k used,  4056756k free,  1640452k cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > COMMAND
> > 25111 hans      20   0  453m  76m  13m S   49  2.0 263:05.34
> > vc_client_old.n
> >
> > So from this it seems that with the patch it still uses more and more CPU,
> > but at a much lower rate. However, it seems to increase memory usage with
> > the patch. I also tried to patch the wrappers.h file, but the memory
> > consumption just exploded.
> >
> > So it is working better, but still not good enough. Is there some way to
> > prevent this kind of behavior? That is, no extra memory usage and no extra
> > CPU usage.
> >
> > I have attached some additional profiling if that would be of any interest.
> > In short it seems to be that it is the GC that is consuming the CPU.
> >
> > Best,
> >
> > Hans Ole
> >
> >
> > On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon <vouillon@pps.jussieu.fr>
> > wrote:
> >>
> >> On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote:
> >> > This reminds me of a problem I had with a specific C binding (for
> >> > mysql),
> >> > years ago. That binding allocated custom blocks with badly chosen
> >> > parameters used/max (see the docs for caml_alloc_custom in
> >> > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If
> >> > the
> >> > ratio used/max is > 0, these parameters accelerate the GC. If the custom
> >> > blocks are frequently allocated, this can have a dramatic effect, even
> >> > for
> >> > quite small used/max ratios. The solution was to change the code, and to
> >> > set used=0 and max=1.
> >> >
> >> > This type of problem would match your observation that the GC works more
> >> > and more the longer the program runs, i.e. the more custom blocks have
> >> > been allocated.
> >> >
> >> > The problem basically also exists with bigarrays - with
> >> > used=<size_of_bigarary> and max=256M (hardcoded).
> >>
> >> I have also observed this with input-output channels (in_channel and
> >> out_channel), where used = 1 and max = 1000. A full major GC is
> >> performed every time a thousand files are opened, which can result on
> >> a significant overhead when you open lot of files and the heap is
> >> large.
> >>
> >> -- Jerome
> >
> >
> 
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-04-07 13:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-21  9:49 [Caml-list] Strategies for finding memory leaks Hans Ole Rafaelsen
2012-03-22 15:03 ` Goswin von Brederlow
2012-03-23  9:32   ` Hans Ole Rafaelsen
2012-03-24 14:00     ` Goswin von Brederlow
2012-04-01 19:57 ` Richard W.M. Jones
2012-04-02  8:15   ` Hans Ole Rafaelsen
2012-04-02 10:13     ` Richard W.M. Jones
2012-04-02 13:40       ` Hans Ole Rafaelsen
2012-04-02 11:26     ` John Carr
2012-04-03 10:42     ` Gerd Stolpmann
2012-04-03 12:13       ` Jerome Vouillon
2012-04-04 10:53         ` Hans Ole Rafaelsen
2012-04-04 11:30           ` Gabriel Scherer
2012-04-04 12:55             ` Gerd Stolpmann
2012-04-07 13:27               ` Hans Ole Rafaelsen
2012-04-04 13:55             ` [Caml-list] GC speed for custom blocks, was: " Gerd Stolpmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox