Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Hans Ole Rafaelsen <hrafaelsen@gmail.com>
To: Jerome Vouillon <vouillon@pps.jussieu.fr>
Cc: Gerd Stolpmann <info@gerd-stolpmann.de>,
	"Richard W.M. Jones" <rich@annexia.org>,
	caml-list@inria.fr
Subject: Re: [Caml-list] Strategies for finding memory leaks
Date: Wed, 4 Apr 2012 12:53:21 +0200	[thread overview]
Message-ID: <CALs4vDYYAjv0RN0kFx8wPqNzFW4t8LZNfaqE0-B5pzS69xB02g@mail.gmail.com> (raw)
In-Reply-To: <20120403121327.GA29159@pps.jussieu.fr>


[-- Attachment #1.1: Type: text/plain, Size: 5561 bytes --]

Hi,

Thanks for your suggestions. I tried to patch lablgtk2 with:

--- src/ml_gdkpixbuf.c.orig     2012-04-03 13:56:29.618264702 +0200
+++ src/ml_gdkpixbuf.c  2012-04-03 13:56:58.106263510 +0200
@@ -119,7 +119,7 @@
   value ret;
   if (pb == NULL) ml_raise_null_pointer();
   ret = alloc_custom (&ml_custom_GdkPixbuf, sizeof pb,
-                     100, 1000);
+                     0, 1);
   p = Data_custom_val (ret);
   *p = ref ? g_object_ref (pb) : pb;
   return ret;

--- src/ml_gobject.c.orig       2012-04-03 15:40:11.002004506 +0200
+++ src/ml_gobject.c    2012-04-03 15:41:04.938002250 +0200
@@ -219,7 +219,7 @@
 CAMLprim value ml_g_value_new(void)
 {
     value ret = alloc_custom(&ml_custom_GValue,
sizeof(value)+sizeof(GValue),
-                             20, 1000);
+                             0, 1);
     /* create an MLPointer */
     Field(ret,1) = (value)2;
     ((GValue*)&Field(ret,2))->g_type = 0;
@@ -272,14 +272,14 @@
   custom_serialize_default, custom_deserialize_default };
 CAMLprim value Val_gboxed(GType t, gpointer p)
 {
-    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000);
+    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1);
     Store_pointer(ret, g_boxed_copy (t,p));
     Field(ret,2) = (value)t;
     return ret;
 }
 CAMLprim value Val_gboxed_new(GType t, gpointer p)
 {
-    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 10, 1000);
+    value ret = alloc_custom(&ml_custom_gboxed, 2*sizeof(value), 0, 1);
     Store_pointer(ret, p);
     Field(ret,2) = (value)t;
     return ret;



At startup is uses
top - 16:40:27 up 1 day,  7:01, 28 users,  load average: 0.47, 0.50, 0.35
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.8%us,  1.3%sy,  0.0%ni, 93.6%id,  0.2%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:   4004736k total,  3617960k used,   386776k free,   130704k buffers
Swap:  4070396k total,     9244k used,  4061152k free,  1730344k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
10275 hans      20   0  529m  77m  13m S   14  2.0   0:01.66
vc_client.nativ

and 12 hours later
top - 04:40:07 up 1 day, 19:01, 35 users,  load average: 0.00, 0.01, 0.05
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.2%us,  3.4%sy,  0.0%ni, 76.1%id,  0.1%wa,  0.0%hi,  0.2%si,
0.0%st
Mem:   4004736k total,  3828308k used,   176428k free,   143928k buffers
Swap:  4070396k total,    10708k used,  4059688k free,  1756524k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
10275 hans      20   0  534m  82m  13m S   17  2.1 110:11.19
vc_client.nativ

Without the patch
top - 22:05:38 up 1 day, 12:26, 34 users,  load average: 0.35, 0.16, 0.13
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.6%us,  1.5%sy,  0.0%ni, 92.6%id,  0.2%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:   4004736k total,  3868136k used,   136600k free,   140900k buffers
Swap:  4070396k total,     9680k used,  4060716k free,  1837500k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
25111 hans      20   0  453m  76m  13m S   14  2.0   0:13.68
vc_client_old.n

top - 10:05:19 up 2 days, 26 min, 35 users,  load average: 0.01, 0.04, 0.05
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.4%us,  3.2%sy,  0.0%ni, 75.8%id,  0.4%wa,  0.0%hi,  0.2%si,
0.0%st
Mem:   4004736k total,  3830596k used,   174140k free,   261692k buffers
Swap:  4070396k total,    13640k used,  4056756k free,  1640452k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
25111 hans      20   0  453m  76m  13m S   49  2.0 263:05.34
vc_client_old.n

So from this it seems that with the patch it still uses more and more CPU,
but at a much lower rate. However, it seems to increase memory usage with
the patch. I also tried to patch the wrappers.h file, but the memory
consumption just exploded.

So it is working better, but still not good enough. Is there some way to
prevent this kind of behavior? That is, no extra memory usage and no extra
CPU usage.

I have attached some additional profiling if that would be of any interest.
In short it seems to be that it is the GC that is consuming the CPU.

Best,

Hans Ole

On Tue, Apr 3, 2012 at 2:13 PM, Jerome Vouillon <vouillon@pps.jussieu.fr>wrote:

> On Tue, Apr 03, 2012 at 12:42:08PM +0200, Gerd Stolpmann wrote:
> > This reminds me of a problem I had with a specific C binding (for mysql),
> > years ago. That binding allocated custom blocks with badly chosen
> > parameters used/max (see the docs for caml_alloc_custom in
> > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc144). If
> the
> > ratio used/max is > 0, these parameters accelerate the GC. If the custom
> > blocks are frequently allocated, this can have a dramatic effect, even
> for
> > quite small used/max ratios. The solution was to change the code, and to
> > set used=0 and max=1.
> >
> > This type of problem would match your observation that the GC works more
> > and more the longer the program runs, i.e. the more custom blocks have
> > been allocated.
> >
> > The problem basically also exists with bigarrays - with
> > used=<size_of_bigarary> and max=256M (hardcoded).
>
> I have also observed this with input-output channels (in_channel and
> out_channel), where used = 1 and max = 1000. A full major GC is
> performed every time a thousand files are opened, which can result on
> a significant overhead when you open lot of files and the heap is
> large.
>
> -- Jerome
>

[-- Attachment #1.2: Type: text/html, Size: 6515 bytes --]

[-- Attachment #2: MemoryProfiling.txt --]
[-- Type: text/plain, Size: 13629 bytes --]

vc_client.native has run for about 19 hours
vc_client_old.native has for about 14 hours

Tasks: 272 total,   3 running, 269 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.2%us,  3.2%sy,  0.0%ni, 76.3%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4004736k total,  3633672k used,   371064k free,   262916k buffers
Swap:  4070396k total,    14160k used,  4056236k free,  1417088k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                             
25111 hans      20   0  453m  76m  13m R   50  2.0 319:04.86 vc_client_old.n                                                                                                                      
10275 hans      20   0  553m  99m  13m S   19  2.5 188:30.50 
vc_client.nativ                                                                                                                     



vc_client_old.native
CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
230093   51.8571  vc_client_old.native     mark_slice
99659    22.4606  vc_client_old.native     sweep_slice
76563    17.2553  vc_client_old.native     caml_page_table_lookup
13215     2.9783  vc_client_old.native     caml_fl_merge_block
1814      0.4088  vc_client_old.native     truncate_flp
1013      0.2283  vc_client_old.native     caml_gc_stat
784       0.1767  vc_client_old.native     caml_iterate_global_roots
775       0.1747  vc_client_old.native     caml_fl_allocate
564       0.1271  vc_client_old.native     camlLividisvc__Nalbuf_tools__replace_pattern_1033
545       0.1228  vc_client_old.native     caml_oldify_local_roots
503       0.1134  vc_client_old.native     caml_modify
453       0.1021  vc_client_old.native     camlLividisvc__Nalbuf_bytestream__search_1047
446       0.1005  vc_client_old.native     caml_delete_global_root
412       0.0929  vc_client_old.native     caml_do_roots
343       0.0773  vc_client_old.native     caml_darken
306       0.0690  vc_client_old.native     lwt_glib_get_sources
297       0.0669  vc_client_old.native     caml_insert_global_root
250       0.0563  vc_client_old.native     camlUq_gtk__fun_1484
213       0.0480  vc_client_old.native     camlHashtbl__find_1093
205       0.0462  vc_client_old.native     compare_val
195       0.0439  vc_client_old.native     camlLwt_glib__enter_1079
191       0.0430  [vdso] (tgid:25111 range:0x7fff12dff000-0x7fff12e00000) [vdso] (tgid:25111 range:0x7fff12dff000-0x7fff12e00000)
190       0.0428  vc_client_old.native     caml_stash_backtrace
188       0.0424  vc_client_old.native     caml_oldify_one
185       0.0417  vc_client_old.native     camlHashtbl__replace_1112
179       0.0403  vc_client_old.native     caml_alloc_string
176       0.0397  vc_client_old.native     camlMedia_ctrl__fun_3055
176       0.0397  vc_client_old.native     caml_c_call
173       0.0390  vc_client_old.native     camlList__iter_1074
171       0.0385  vc_client_old.native     caml_apply2
164       0.0370  vc_client_old.native     caml_obj_dup
159       0.0358  vc_client_old.native     caml_send0


vc_client.native
CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
50295    45.3259  vc_client.native         mark_slice
17964    16.1892  vc_client.native         sweep_slice
14450    13.0224  vc_client.native         caml_page_table_lookup
2137      1.9259  vc_client.native         caml_fl_merge_block
1397      1.2590  vc_client.native         caml_fl_allocate
1305      1.1761  vc_client.native         caml_gc_stat
599       0.5398  vc_client.native         camlLividisvc__Nalbuf_tools__replace_pattern_1033
546       0.4921  vc_client.native         camlLividisvc__Nalbuf_bytestream__search_1047
544       0.4903  vc_client.native         caml_modify
540       0.4866  vc_client.native         caml_delete_global_root
422       0.3803  vc_client.native         caml_iterate_global_roots
404       0.3641  vc_client.native         truncate_flp
339       0.3055  vc_client.native         camlUq_gtk__fun_1484
327       0.2947  vc_client.native         camlHashtbl__find_1093
311       0.2803  vc_client.native         caml_insert_global_root
290       0.2613  vc_client.native         lwt_glib_get_sources
284       0.2559  vc_client.native         caml_oldify_local_roots
260       0.2343  vc_client.native         camlMedia_ctrl__fun_3055
260       0.2343  vc_client.native         camlUnixqueue_select__fun_2366
248       0.2235  vc_client.native         compare_val
239       0.2154  vc_client.native         caml_stash_backtrace
220       0.1983  [vdso] (tgid:10275 range:0x7fffea534000-0x7fffea535000) [vdso] (tgid:10275 range:0x7fffea534000-0x7fffea535000)
219       0.1974  vc_client.native         camlLwt_glib__enter_1079
206       0.1856  vc_client.native         camlList__iter_1074
202       0.1820  vc_client.native         caml_apply2
196       0.1766  vc_client.native         caml_c_call
194       0.1748  vc_client.native         caml_send0
180       0.1622  vc_client.native         camlHashtbl__replace_1112
180       0.1622  vc_client.native         caml_alloc_string
176       0.1586  vc_client.native         caml_obj_dup
176       0.1586  vc_client.native         caml_oldify_one
173       0.1559  vc_client.native         caml_remove_generational_global_root
171       0.1541  vc_client.native         caml_curry3_2




vc_client.native
CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated)
Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit 
mask of 0x41 (No unit mask) count 100000
samples  %        image name               symbol name
598      53.3452  vc_client.native         mark_slice
261      23.2828  vc_client.native         caml_page_table_lookup
23        2.0517  vc_client.native         sweep_slice
13        1.1597  vc_client.native         caml_fl_allocate
6         0.5352  vc_client.native         camlUq_gtk__fun_1484
5         0.4460  vc_client.native         caml_delete_global_root
4         0.3568  vc_client.native         camlEqueue__run_1119
4         0.3568  vc_client.native         camlHashtbl__find_1093
4         0.3568  vc_client.native         camlLividisvc__Nalbuf_bytestream__search_1047
4         0.3568  vc_client.native         camlMedia_ctrl__fun_3055
4         0.3568  vc_client.native         camlUnixqueue_select__method_run_1662
4         0.3568  vc_client.native         caml_callback_exn
4         0.3568  vc_client.native         caml_iterate_global_roots
4         0.3568  vc_client.native         caml_modify
3         0.2676  vc_client.native         camlAudio_encoder__transfer_audio_frame_1052
3         0.2676  vc_client.native         camlBitstring_structs__string_of_rtp_ext_1030
3         0.2676  vc_client.native         camlHashtbl__replace_1112
3         0.2676  vc_client.native         camlLwt_glib__leave_1085
3         0.2676  vc_client.native         camlSequencer__serialize_header_2231
3         0.2676  vc_client.native         camlUnixqueue_select__fun_2366
3         0.2676  vc_client.native         camlUq_gtk__code_begin
3         0.2676  vc_client.native         caml_apply2
3         0.2676  vc_client.native         caml_curry3_2
3         0.2676  vc_client.native         caml_gc_stat
3         0.2676  vc_client.native         caml_hash_univ_param
3         0.2676  vc_client.native         caml_send0
3         0.2676  vc_client.native         compare_val
2         0.1784  vc_client.native         camlBuffer__create_1039
2         0.1784  vc_client.native         camlList__iter_1074
2         0.1784  vc_client.native         camlLwt__wait_1290
2         0.1784  vc_client.native         camlLwt_sequence__loop_1098
2         0.1784  vc_client.native         camlMedia_ctrl__fun_3085
2         0.1784  vc_client.native         camlMedia_ctrl__receiver_cbfun_2123
2         0.1784  vc_client.native         camlPervasives__min_1022
2         0.1784  vc_client.native         camlProfiles__fec_percentage_1232
2         0.1784  vc_client.native         camlScanf__scan_1702
2         0.1784  vc_client.native         camlScanf__scan_format_1685
2         0.1784  vc_client.native         camlSequencer__my_push_2260
2         0.1784  vc_client.native         camlUnixqueue_select__fun_1864
2         0.1784  vc_client.native         camlUnixqueue_select__method_protect_1800
2         0.1784  vc_client.native         camlUnixqueue_select__method_uq_handler_1726
2         0.1784  vc_client.native         caml_curry4_1
2         0.1784  vc_client.native         caml_insert_global_root

vc_client_old.native
CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated)
Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 100000
samples  %        symbol name
2411     65.8563  mark_slice
923      25.2117  caml_page_table_lookup
101       2.7588  sweep_slice
10        0.2731  caml_iterate_global_roots
8         0.2185  caml_oldify_local_roots
7         0.1912  caml_fl_allocate
7         0.1912  caml_fl_merge_block
5         0.1366  camlHashtbl__replace_1112
5         0.1366  camlLividisvc__Nalbuf_bytestream__search_1047
5         0.1366  camlUq_gtk__fun_1484
5         0.1366  caml_delete_global_root
4         0.1093  camlLwt_glib__enter_1079
4         0.1093  caml_darken
4         0.1093  caml_send2
3         0.0819  camlBitstring___add_bits_1532
3         0.0819  camlEqueue__run_1119
3         0.0819  camlLwt_sequence__loop_1098
3         0.0819  camlSubstream_seqnum__code_begin
3         0.0819  caml_alloc_string
3         0.0819  compare_val
3         0.0819  lwt_glib_get_sources
2         0.0546  camlBitstring__add_byte_1515
2         0.0546  camlBuffer__create_1039
2         0.0546  camlGObj__method_draw_3325
2         0.0546  camlHashtbl__find_1093
2         0.0546  camlHashtbl__find_1198
2         0.0546  camlHashtbl__iter_1129
2         0.0546  camlList__iter_1074
2         0.0546  camlLividicore__Filter__eval_tree_1110
2         0.0546  camlMcsvc_types__add_header_1175
2         0.0546  camlMedia_ctrl__fun_3085
2         0.0546  camlPervasives__$5e_1112
2         0.0546  camlSubstream_seqnum__my_push_1197
2         0.0546  camlUnixqueue_select__fun_2314
2         0.0546  camlUnixqueue_util__dlogr_1060
2         0.0546  caml_apply2
2         0.0546  caml_apply3
2         0.0546  caml_c_call
2         0.0546  caml_curry3_1
2         0.0546  caml_do_roots
2         0.0546  caml_gc_stat
2         0.0546  caml_hash_univ_param
2         0.0546  caml_modify
2         0.0546  fdlist_to_fdset


at startup it looks like this:
top - 12:16:03 up 2 days,  2:37, 35 users,  load average: 0.02, 0.02, 0.05
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 19.7%us,  4.1%sy,  0.0%ni, 75.7%id,  0.2%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4004736k total,  3634484k used,   370252k free,   263976k buffers
Swap:  4070396k total,    14160k used,  4056236k free,  1424216k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                              
12906 hans      20   0  453m  76m  13m S   14  2.0   0:14.62 vc_client_old.n    

CPU: Intel Sandy Bridge microarchitecture, speed 2.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
12046    25.2045  vc_client_old.native     mark_slice
8590     17.9733  vc_client_old.native     caml_page_table_lookup
6311     13.2049  vc_client_old.native     sweep_slice
718       1.5023  vc_client_old.native     camlLividisvc__Nalbuf_tools__replace_pattern_1033
633       1.3245  vc_client_old.native     caml_fl_allocate
536       1.1215  vc_client_old.native     caml_modify
505       1.0566  vc_client_old.native     caml_do_roots
484       1.0127  vc_client_old.native     camlLividisvc__Nalbuf_bytestream__search_1047
483       1.0106  vc_client_old.native     caml_iterate_global_roots
471       0.9855  vc_client_old.native     caml_darken
412       0.8621  vc_client_old.native     caml_delete_global_root
371       0.7763  vc_client_old.native     caml_oldify_local_roots
323       0.6758  vc_client_old.native     caml_insert_global_root
304       0.6361  vc_client_old.native     caml_fl_merge_block
291       0.6089  vc_client_old.native     lwt_glib_get_sources
266       0.5566  vc_client_old.native     camlHashtbl__find_1093
216       0.4519  [vdso] (tgid:12906 range:0x7fffba5fc000-0x7fffba5fd000) [vdso] (tgid:12906 range:0x7fffba5fc000-0x7fffba5fd000)
216       0.4519  vc_client_old.native     compare_val
215       0.4499  vc_client_old.native     camlLwt_glib__enter_1079
205       0.4289  vc_client_old.native     camlUq_gtk__fun_1484
202       0.4227  vc_client_old.native     caml_alloc_string
200       0.4185  vc_client_old.native     caml_c_call
193       0.4038  vc_client_old.native     caml_oldify_one
192       0.4017  vc_client_old.native     caml_stash_backtrace
187       0.3913  vc_client_old.native     camlMedia_ctrl__fun_3055



  reply	other threads:[~2012-04-04 10:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-21  9:49 Hans Ole Rafaelsen
2012-03-22 15:03 ` Goswin von Brederlow
2012-03-23  9:32   ` Hans Ole Rafaelsen
2012-03-24 14:00     ` Goswin von Brederlow
2012-04-01 19:57 ` Richard W.M. Jones
2012-04-02  8:15   ` Hans Ole Rafaelsen
2012-04-02 10:13     ` Richard W.M. Jones
2012-04-02 13:40       ` Hans Ole Rafaelsen
2012-04-02 11:26     ` John Carr
2012-04-03 10:42     ` Gerd Stolpmann
2012-04-03 12:13       ` Jerome Vouillon
2012-04-04 10:53         ` Hans Ole Rafaelsen [this message]
2012-04-04 11:30           ` Gabriel Scherer
2012-04-04 12:55             ` Gerd Stolpmann
2012-04-07 13:27               ` Hans Ole Rafaelsen
2012-04-04 13:55             ` [Caml-list] GC speed for custom blocks, was: " Gerd Stolpmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALs4vDYYAjv0RN0kFx8wPqNzFW4t8LZNfaqE0-B5pzS69xB02g@mail.gmail.com \
    --to=hrafaelsen@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=info@gerd-stolpmann.de \
    --cc=rich@annexia.org \
    --cc=vouillon@pps.jussieu.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox