* Average cost of the OCaml GC @ 2010-11-11 3:59 Jianzhou Zhao 2010-11-11 9:08 ` [Caml-list] " Goswin von Brederlow 0 siblings, 1 reply; 8+ messages in thread From: Jianzhou Zhao @ 2010-11-11 3:59 UTC (permalink / raw) To: caml-list Hi, What is the average cost of the OCaml GC? I have a program that calls 'mark_slice' in 57% of the total execution time, and calls 'sweep_slice' in 21% of the total time, reported by Callgrind, which is a profiling tool in Valgrind. 57% and 21% are the 'self cost' --- the cost of the function itself ('Self Cost'), rather than the cost including all called functions ('Inclusive Cost'). I guess 'mark_slice' and 'sweep_slice' are functions from OCaml GC. Are these numbers normal? My program calls both OCaml and C, which passes around C data types in between. I also doubt if I defined the interface in an 'unefficient' way that slows down the GC. Are there any rules in mind to make GC work more efficiently? Thanks. -- Jianzhou ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] Average cost of the OCaml GC 2010-11-11 3:59 Average cost of the OCaml GC Jianzhou Zhao @ 2010-11-11 9:08 ` Goswin von Brederlow 2010-11-11 13:52 ` Jianzhou Zhao 0 siblings, 1 reply; 8+ messages in thread From: Goswin von Brederlow @ 2010-11-11 9:08 UTC (permalink / raw) To: Jianzhou Zhao; +Cc: caml-list Jianzhou Zhao <jianzhou@seas.upenn.edu> writes: > Hi, > > What is the average cost of the OCaml GC? I have a program that calls > 'mark_slice' in 57% of the total execution time, and calls > 'sweep_slice' in 21% of the total time, reported by Callgrind, which > is a profiling tool in Valgrind. 57% and 21% are the 'self cost' --- > the cost of the function itself ('Self Cost'), rather than the cost > including all called functions ('Inclusive Cost'). I guess > 'mark_slice' and 'sweep_slice' are functions from OCaml GC. Are > these numbers normal? Those numbers sound rather high to me. > My program calls both OCaml and C, which passes around C data types in > between. I also doubt if I defined the interface in an 'unefficient' > way that slows down the GC. Are there any rules in mind to make GC > work more efficiently? You can tune some of the GC parameters to suit your use case. Do you allocate custom types from C? In caml_alloc_custom(ops, size, used, max) the used and max do influence the GC how often to run. If you set them wrong you might trigger the GC too often. MfG Goswin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] Average cost of the OCaml GC 2010-11-11 9:08 ` [Caml-list] " Goswin von Brederlow @ 2010-11-11 13:52 ` Jianzhou Zhao 2010-11-11 14:14 ` Michael Ekstrand 2010-11-11 20:11 ` Goswin von Brederlow 0 siblings, 2 replies; 8+ messages in thread From: Jianzhou Zhao @ 2010-11-11 13:52 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: caml-list On Thu, Nov 11, 2010 at 4:08 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote: > Jianzhou Zhao <jianzhou@seas.upenn.edu> writes: > >> Hi, >> >> What is the average cost of the OCaml GC? I have a program that calls >> 'mark_slice' in 57% of the total execution time, and calls >> 'sweep_slice' in 21% of the total time, reported by Callgrind, which >> is a profiling tool in Valgrind. 57% and 21% are the 'self cost' --- >> the cost of the function itself ('Self Cost'), rather than the cost >> including all called functions ('Inclusive Cost'). I guess >> 'mark_slice' and 'sweep_slice' are functions from OCaml GC. Are >> these numbers normal? > > Those numbers sound rather high to me. > >> My program calls both OCaml and C, which passes around C data types in >> between. I also doubt if I defined the interface in an 'unefficient' >> way that slows down the GC. Are there any rules in mind to make GC >> work more efficiently? > > You can tune some of the GC parameters to suit your use case. > > Do you allocate custom types from C? In caml_alloc_custom(ops, size, > used, max) the used and max do influence the GC how often to run. Yes. The code uses caml_alloc_custom to create a lot of small objects (less then 8 bytes) frequently. The used and max are set to be default, 0 and 1. The manual says http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc140 ///////////////////// If your finalized blocks contain no pointers to out-of-heap resources, or if the previous discussion made little sense to you, just take used = 0 and max = 1. But if you later find that the finalization functions are not called “often enough”, consider increasing the used / max ratio. ////////////////////// Does this mean the default used and max let GC do finalization 'as slow as possible'? This does not seem to be the case if the costs 57% and 20% are too high. > If you set them wrong you might trigger the GC too often. In which case could they be set 'wrong'? For example, if 'used' is not equal to the real amount of allocated data; or is there a range of 'max' given a used? > > MfG > Goswin > -- Jianzhou ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] Average cost of the OCaml GC 2010-11-11 13:52 ` Jianzhou Zhao @ 2010-11-11 14:14 ` Michael Ekstrand 2010-11-11 20:11 ` Goswin von Brederlow 1 sibling, 0 replies; 8+ messages in thread From: Michael Ekstrand @ 2010-11-11 14:14 UTC (permalink / raw) To: caml-list On 11/11/2010 07:52 AM, Jianzhou Zhao wrote: > On Thu, Nov 11, 2010 at 4:08 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote: >> Jianzhou Zhao <jianzhou@seas.upenn.edu> writes: >> >>> Hi, >>> >>> What is the average cost of the OCaml GC? I have a program that calls >>> 'mark_slice' in 57% of the total execution time, and calls >>> 'sweep_slice' in 21% of the total time, reported by Callgrind, which >>> is a profiling tool in Valgrind. 57% and 21% are the 'self cost' --- >>> the cost of the function itself ('Self Cost'), rather than the cost >>> including all called functions ('Inclusive Cost'). I guess >>> 'mark_slice' and 'sweep_slice' are functions from OCaml GC. Are >>> these numbers normal? >> >> Those numbers sound rather high to me. They sound high to me as well, but not unheard of - I sometimes measure a lot of time in the GC. >>> My program calls both OCaml and C, which passes around C data types in >>> between. I also doubt if I defined the interface in an 'unefficient' >>> way that slows down the GC. Are there any rules in mind to make GC >>> work more efficiently? >> >> You can tune some of the GC parameters to suit your use case. >> >> Do you allocate custom types from C? In caml_alloc_custom(ops, size, >> used, max) the used and max do influence the GC how often to run. > > Yes. The code uses caml_alloc_custom to create a lot of small objects > (less then 8 bytes) frequently. The used and max are set to be > default, 0 and 1. The manual says > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc140 > > ///////////////////// > If your finalized blocks contain no pointers to out-of-heap resources, > or if the previous discussion made little sense to you, just take used > = 0 and max = 1. But if you later find that the finalization functions > are not called “often enough”, consider increasing the used / max > ratio. > ////////////////////// > > Does this mean the default used and max let GC do finalization 'as > slow as possible'? This does not seem to be the case if the costs 57% > and 20% are too high. Yes, with respect to GC cycles triggered by "too much" custom data allocation. There are a variety of things that can cause GC thrashing. One of them is the GC "space overhead" parameter, which controls how aggressive the GC is at reclaiming memory. Another is your minor heap size - if your minor heap is too small, it can cause excess GC activity. I documented the parameter tuning I have done to reduce GC cost on my blog[1], but here's a short summary: * Increase minor heap size. I usually use 1M or 4M words; my general rule of thumb is that I want one "work unit" with its temporary storage requirements to fit in a minor heap. This decreases the frequency both of minor collections and major slices. * Increase space_overhead; I increase this to 100 or 200 (the default is 80), as I typically run my large codes on machines with lots of spare RAM and can accept a space-speed tradeoff. * Increase the heap increment. If your process will require lots of RAM, this lets it allocate that memory in bigger chunks further decreasing the memory overhead. I also use a patched Bigarray that allows me to set the "max" parameter it uses in its invocations of caml_alloc_custom, but if you are not using bigarray that shouldn't be impacting your program's performance. It's quite critical when allocating large bigarrays, though! Having custom blocks allocated near or above the "max" param is a sure-fire recipe for GC thrashing. It sounds like you're avoiding that pitfall, though. So, the short short story: you're doing many of the right things (measuring, not letting custom allocations thrash the GC). Some more parameter tuning will hopefully help you decrease your GC overhead. 1. http://elehack.net/michael/blog/2010/06/ocaml-memory-tuning - Michael ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] Average cost of the OCaml GC 2010-11-11 13:52 ` Jianzhou Zhao 2010-11-11 14:14 ` Michael Ekstrand @ 2010-11-11 20:11 ` Goswin von Brederlow 2010-11-12 17:27 ` Jianzhou Zhao 1 sibling, 1 reply; 8+ messages in thread From: Goswin von Brederlow @ 2010-11-11 20:11 UTC (permalink / raw) To: Jianzhou Zhao; +Cc: Goswin von Brederlow, caml-list Jianzhou Zhao <jianzhou@seas.upenn.edu> writes: > On Thu, Nov 11, 2010 at 4:08 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote: >> Jianzhou Zhao <jianzhou@seas.upenn.edu> writes: >> >>> Hi, >>> >>> What is the average cost of the OCaml GC? I have a program that calls >>> 'mark_slice' in 57% of the total execution time, and calls >>> 'sweep_slice' in 21% of the total time, reported by Callgrind, which >>> is a profiling tool in Valgrind. 57% and 21% are the 'self cost' --- >>> the cost of the function itself ('Self Cost'), rather than the cost >>> including all called functions ('Inclusive Cost'). I guess >>> 'mark_slice' and 'sweep_slice' are functions from OCaml GC. Are >>> these numbers normal? >> >> Those numbers sound rather high to me. >> >>> My program calls both OCaml and C, which passes around C data types in >>> between. I also doubt if I defined the interface in an 'unefficient' >>> way that slows down the GC. Are there any rules in mind to make GC >>> work more efficiently? >> >> You can tune some of the GC parameters to suit your use case. >> >> Do you allocate custom types from C? In caml_alloc_custom(ops, size, >> used, max) the used and max do influence the GC how often to run. > > Yes. The code uses caml_alloc_custom to create a lot of small objects > (less then 8 bytes) frequently. The used and max are set to be > default, 0 and 1. The manual says > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc140 > > ///////////////////// > If your finalized blocks contain no pointers to out-of-heap resources, > or if the previous discussion made little sense to you, just take used > = 0 and max = 1. But if you later find that the finalization functions > are not called often enough, consider increasing the used / max > ratio. > ////////////////////// > > Does this mean the default used and max let GC do finalization 'as > slow as possible'? This does not seem to be the case if the costs 57% > and 20% are too high. I think 0/1 gives you the least amount of GC runs. >> If you set them wrong you might trigger the GC too often. > > In which case could they be set 'wrong'? For example, if 'used' is not > equal to the real amount of allocated data; or is there a range of > 'max' given a used? A used = 1000000 would be wrong here. Your 0/1 setting look fine to me. MfG Goswin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] Average cost of the OCaml GC 2010-11-11 20:11 ` Goswin von Brederlow @ 2010-11-12 17:27 ` Jianzhou Zhao 2010-11-12 21:54 ` ygrek 2010-11-16 10:02 ` Goswin von Brederlow 0 siblings, 2 replies; 8+ messages in thread From: Jianzhou Zhao @ 2010-11-12 17:27 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: caml-list On Thu, Nov 11, 2010 at 3:11 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote: > Jianzhou Zhao <jianzhou@seas.upenn.edu> writes: > >> On Thu, Nov 11, 2010 at 4:08 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote: >>> Jianzhou Zhao <jianzhou@seas.upenn.edu> writes: >>> >>>> Hi, >>>> >>>> What is the average cost of the OCaml GC? I have a program that calls >>>> 'mark_slice' in 57% of the total execution time, and calls >>>> 'sweep_slice' in 21% of the total time, reported by Callgrind, which >>>> is a profiling tool in Valgrind. 57% and 21% are the 'self cost' --- >>>> the cost of the function itself ('Self Cost'), rather than the cost >>>> including all called functions ('Inclusive Cost'). I guess >>>> 'mark_slice' and 'sweep_slice' are functions from OCaml GC. Are >>>> these numbers normal? >>> >>> Those numbers sound rather high to me. >>> >>>> My program calls both OCaml and C, which passes around C data types in >>>> between. I also doubt if I defined the interface in an 'unefficient' >>>> way that slows down the GC. Are there any rules in mind to make GC >>>> work more efficiently? >>> >>> You can tune some of the GC parameters to suit your use case. >>> >>> Do you allocate custom types from C? In caml_alloc_custom(ops, size, >>> used, max) the used and max do influence the GC how often to run. >> >> Yes. The code uses caml_alloc_custom to create a lot of small objects >> (less then 8 bytes) frequently. The used and max are set to be >> default, 0 and 1. The manual says >> http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc140 >> >> ///////////////////// >> If your finalized blocks contain no pointers to out-of-heap resources, >> or if the previous discussion made little sense to you, just take used >> = 0 and max = 1. But if you later find that the finalization functions >> are not called “often enough”, consider increasing the used / max >> ratio. >> ////////////////////// >> >> Does this mean the default used and max let GC do finalization 'as >> slow as possible'? This does not seem to be the case if the costs 57% >> and 20% are too high. > > I think 0/1 gives you the least amount of GC runs. > >>> If you set them wrong you might trigger the GC too often. >> >> In which case could they be set 'wrong'? For example, if 'used' is not >> equal to the real amount of allocated data; or is there a range of >> 'max' given a used? > > A used = 1000000 would be wrong here. Your 0/1 setting look fine to me. Do we still have other methods to debug such problems? Is it possible to know when and where GC runs, say, the number of times GC works after a particular usr-defined function? If this is possible, I was wondering if we can see which function in my code behave wrong. > > MfG > Goswin > -- Jianzhou ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] Average cost of the OCaml GC 2010-11-12 17:27 ` Jianzhou Zhao @ 2010-11-12 21:54 ` ygrek 2010-11-16 10:02 ` Goswin von Brederlow 1 sibling, 0 replies; 8+ messages in thread From: ygrek @ 2010-11-12 21:54 UTC (permalink / raw) To: caml-list On Fri, 12 Nov 2010 12:27:40 -0500 Jianzhou Zhao <jianzhou@seas.upenn.edu> wrote: > Do we still have other methods to debug such problems? Is it possible > to know when and where GC runs, say, the number of times GC works > after a particular usr-defined function? If this is possible, I was > wondering if we can see which function in my code behave wrong. Below is straghtforward "GC diffing" code which helps me to pinpoint excessive GC (like the ExtLib.String.nsplit in example). $ cat a.ml open Printf open Gc let bytes_string_f f = (* oh ugly *) let a = abs_float f in if a < 1024. then sprintf "%dB" (int_of_float f) else if a < 1024. *. 1024. then sprintf "%dKB" (int_of_float (f /. 1024.)) else if a < 1024. *. 1024. *. 1024. then sprintf "%.1fMB" (f /. 1024. /. 1024.) else sprintf "%.1fGB" (f /. 1024. /. 1024. /. 1024.) let bytes_string x = bytes_string_f (float_of_int x) let caml_words_f f = bytes_string_f (f *. (float_of_int (Sys.word_size / 8))) let caml_words x = caml_words_f (float_of_int x) let gc_diff st1 st2 = let allocated st = st.minor_words +. st.major_words -. st.promoted_words in let a = allocated st2 -. allocated st1 in let minor = st2.minor_collections - st1.minor_collections in let major = st2.major_collections - st1.major_collections in let compact = st2.compactions - st1. compactions in let heap = st2.heap_words - st1.heap_words in sprintf "allocated %10s, heap %10s, collection %d %d %d" (caml_words_f a) (caml_words heap) compact major minor let gc_show name f x = let st = Gc.quick_stat () in Std.finally (fun () -> let st2 = Gc.quick_stat () in eprintf "GC DIFF %s : %s\n" name (gc_diff st st2)) f x let () = let _ = gc_show "split" (ExtLib.String.nsplit (String.make 10000 'a')) "a" in gc_show "compact" Gc.compact () $ ocamlfind ocamlopt -linkpkg -package extlib a.ml -o a $ ./a GC DIFF split : allocated 48.1MB, heap 48.0MB, collection 0 21 373 GC DIFF compact : allocated 240B, heap -48.0MB, collection 1 2 0 -- ygrek http://ygrek.org.ua ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Caml-list] Average cost of the OCaml GC 2010-11-12 17:27 ` Jianzhou Zhao 2010-11-12 21:54 ` ygrek @ 2010-11-16 10:02 ` Goswin von Brederlow 1 sibling, 0 replies; 8+ messages in thread From: Goswin von Brederlow @ 2010-11-16 10:02 UTC (permalink / raw) To: Jianzhou Zhao; +Cc: caml-list Jianzhou Zhao <jianzhou@seas.upenn.edu> writes: > On Thu, Nov 11, 2010 at 3:11 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote: >> Jianzhou Zhao <jianzhou@seas.upenn.edu> writes: >> >>> On Thu, Nov 11, 2010 at 4:08 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote: >>>> Jianzhou Zhao <jianzhou@seas.upenn.edu> writes: >>>> >>>>> Hi, >>>>> >>>>> What is the average cost of the OCaml GC? I have a program that calls >>>>> 'mark_slice' in 57% of the total execution time, and calls >>>>> 'sweep_slice' in 21% of the total time, reported by Callgrind, which >>>>> is a profiling tool in Valgrind. 57% and 21% are the 'self cost' --- >>>>> the cost of the function itself ('Self Cost'), rather than the cost >>>>> including all called functions ('Inclusive Cost'). I guess >>>>> 'mark_slice' and 'sweep_slice' are functions from OCaml GC. Are >>>>> these numbers normal? >>>> >>>> Those numbers sound rather high to me. >>>> >>>>> My program calls both OCaml and C, which passes around C data types in >>>>> between. I also doubt if I defined the interface in an 'unefficient' >>>>> way that slows down the GC. Are there any rules in mind to make GC >>>>> work more efficiently? >>>> >>>> You can tune some of the GC parameters to suit your use case. >>>> >>>> Do you allocate custom types from C? In caml_alloc_custom(ops, size, >>>> used, max) the used and max do influence the GC how often to run. >>> >>> Yes. The code uses caml_alloc_custom to create a lot of small objects >>> (less then 8 bytes) frequently. The used and max are set to be >>> default, 0 and 1. The manual says >>> http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc140 >>> >>> ///////////////////// >>> If your finalized blocks contain no pointers to out-of-heap resources, >>> or if the previous discussion made little sense to you, just take used >>> = 0 and max = 1. But if you later find that the finalization functions >>> are not called often enough, consider increasing the used / max >>> ratio. >>> ////////////////////// >>> >>> Does this mean the default used and max let GC do finalization 'as >>> slow as possible'? This does not seem to be the case if the costs 57% >>> and 20% are too high. >> >> I think 0/1 gives you the least amount of GC runs. >> >>>> If you set them wrong you might trigger the GC too often. >>> >>> In which case could they be set 'wrong'? For example, if 'used' is not >>> equal to the real amount of allocated data; or is there a range of >>> 'max' given a used? >> >> A used = 1000000 would be wrong here. Your 0/1 setting look fine to me. > > Do we still have other methods to debug such problems? Is it possible > to know when and where GC runs, say, the number of times GC works > after a particular usr-defined function? If this is possible, I was > wondering if we can see which function in my code behave wrong. Only the interface the GC module exposes. You can turn the GC quite verbose. MfG Goswin ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-11-16 10:02 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-11-11 3:59 Average cost of the OCaml GC Jianzhou Zhao 2010-11-11 9:08 ` [Caml-list] " Goswin von Brederlow 2010-11-11 13:52 ` Jianzhou Zhao 2010-11-11 14:14 ` Michael Ekstrand 2010-11-11 20:11 ` Goswin von Brederlow 2010-11-12 17:27 ` Jianzhou Zhao 2010-11-12 21:54 ` ygrek 2010-11-16 10:02 ` Goswin von Brederlow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox