* Re: [Caml-list] Slow GC problem
2003-04-04 19:40 [Caml-list] Slow GC problem Shivkumar Chandrasekaran
@ 2003-04-03 21:07 ` Christophe Raffalli
2003-04-07 17:53 ` Shivkumar Chandrasekaran
2003-04-08 10:23 ` Damien Doligez
2 siblings, 0 replies; 15+ messages in thread
From: Christophe Raffalli @ 2003-04-03 21:07 UTC (permalink / raw)
To: Shivkumar Chandrasekaran; +Cc: caml-list
[-- Attachment #1: Type: text/plain, Size: 2817 bytes --]
Shivkumar Chandrasekaran wrote:
> I have a gc efficiency problem for which I require some advice. I have
> read both the O'Reilly book and the manual on gc.
>
> I am implementing a fast direct matrix solver for 2D PDEs. So it uses
> the Bigarray module a lot. I have two versions of my algorithm. On is an
> in-core algorithm and the other is the same solver, except that it is
> out-of-core (most of the matrices are stored in disk files).
> Unfortunately the out-of-core solver is *faster* than the in-core
> solver for the identical problem! I was expecting the out-of-core solver
> to be 10 times slower.
If your in-core version is swapping, your out-core is faster because you
optimize yourself the disk access compared to the swapping which knows
nothing about your data and algorithm. Moreover, at GC time all the
matrices/vectors need to be accessed and this produces more swapping ...
I am concluding that gc is to blame. Below I give
> the gc stats just before and after the solver routine is called in the
> in-core solver:
>
> "Just before" "Just after"
> minor_words: 46243376 139259767
> promoted_words: 928267 2595523
> major_words: 2883087 39489766
> minor_collections: 1412 4591
> major_collections: 18 52
> heap_words: 2150400 1044480
> heap_chunks: 35 17
> top_heap_words: 2150400 5038080
> live_words: 1842373 840037
> live_blocks: 253926 116816
> free_words: 307180 204440
> free_blocks: 47368 17
> largest_free: 10928 61440
> fragments: 847 3
> compactions: 0 2
>
> I tried changing some parameters using Gc.set but it did not make a
> significant difference. Does anybody see any obvious gc problems from
> the above data? Thanks,
>
> --shiv--
>
>
> PS: I wrote the out-of-core solver in just 3 days once the in-core
> solver was done, all in O'Caml. This would have have taken much longer
> in Fortran/C. Thanks to the O'Caml team.
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
--
Christophe Raffalli
Université de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex
tél: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution
can check this signature
---------------------------------------------
[-- Attachment #2: Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Caml-list] Slow GC problem
@ 2003-04-04 19:40 Shivkumar Chandrasekaran
2003-04-03 21:07 ` Christophe Raffalli
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-04-04 19:40 UTC (permalink / raw)
To: caml-list
I have a gc efficiency problem for which I require some advice. I have
read both the O'Reilly book and the manual on gc.
I am implementing a fast direct matrix solver for 2D PDEs. So it uses
the Bigarray module a lot. I have two versions of my algorithm. On is
an in-core algorithm and the other is the same solver, except that it
is out-of-core (most of the matrices are stored in disk files).
Unfortunately the out-of-core solver is *faster* than the in-core
solver for the identical problem! I was expecting the out-of-core
solver to be 10 times slower. I am concluding that gc is to blame.
Below I give the gc stats just before and after the solver routine is
called in the in-core solver:
"Just before" "Just after"
minor_words: 46243376 139259767
promoted_words: 928267 2595523
major_words: 2883087 39489766
minor_collections: 1412 4591
major_collections: 18 52
heap_words: 2150400 1044480
heap_chunks: 35 17
top_heap_words: 2150400 5038080
live_words: 1842373 840037
live_blocks: 253926 116816
free_words: 307180 204440
free_blocks: 47368 17
largest_free: 10928 61440
fragments: 847 3
compactions: 0 2
I tried changing some parameters using Gc.set but it did not make a
significant difference. Does anybody see any obvious gc problems from
the above data? Thanks,
--shiv--
PS: I wrote the out-of-core solver in just 3 days once the in-core
solver was done, all in O'Caml. This would have have taken much longer
in Fortran/C. Thanks to the O'Caml team.
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-04 19:40 [Caml-list] Slow GC problem Shivkumar Chandrasekaran
2003-04-03 21:07 ` Christophe Raffalli
@ 2003-04-07 17:53 ` Shivkumar Chandrasekaran
2003-04-07 19:08 ` Chris Hecker
2003-04-08 10:28 ` Damien Doligez
2003-04-08 10:23 ` Damien Doligez
2 siblings, 2 replies; 15+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-04-07 17:53 UTC (permalink / raw)
To: caml-list
Pursuing my earlier enquiry: I noticed that module Bigarray allocated
its arrays using "malloc" rather than on the ocaml heap. My problem
allocates a lot (say 100,000) bigarrays of rather small size (say 30 x
30). Can this potentially increase the cost of gc? (I have this mental
picture of the ocaml heap fragmented by these immovable malloc'd
bigarrays.) Any help will be appreciated. Thanks,
--shiv--
On Friday, April 4, 2003, at 11:40 AM, Shivkumar Chandrasekaran wrote:
> I have a gc efficiency problem for which I require some advice. I have
> read both the O'Reilly book and the manual on gc.
>
> I am implementing a fast direct matrix solver for 2D PDEs. So it uses
> the Bigarray module a lot. I have two versions of my algorithm. On is
> an in-core algorithm and the other is the same solver, except that it
> is out-of-core (most of the matrices are stored in disk files).
> Unfortunately the out-of-core solver is *faster* than the in-core
> solver for the identical problem! I was expecting the out-of-core
> solver to be 10 times slower. I am concluding that gc is to blame.
> Below I give the gc stats just before and after the solver routine is
> called in the in-core solver:
>
> "Just before" "Just after"
> minor_words: 46243376 139259767
> promoted_words: 928267 2595523
> major_words: 2883087 39489766
> minor_collections: 1412 4591
> major_collections: 18 52
> heap_words: 2150400 1044480
> heap_chunks: 35 17
> top_heap_words: 2150400 5038080
> live_words: 1842373 840037
> live_blocks: 253926 116816
> free_words: 307180 204440
> free_blocks: 47368 17
> largest_free: 10928 61440
> fragments: 847 3
> compactions: 0 2
>
> I tried changing some parameters using Gc.set but it did not make a
> significant difference. Does anybody see any obvious gc problems from
> the above data? Thanks,
>
> --shiv--
>
>
> PS: I wrote the out-of-core solver in just 3 days once the in-core
> solver was done, all in O'Caml. This would have have taken much longer
> in Fortran/C. Thanks to the O'Caml team.
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>
--shiv--
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-07 17:53 ` Shivkumar Chandrasekaran
@ 2003-04-07 19:08 ` Chris Hecker
2003-04-08 7:15 ` David Monniaux
2003-04-08 10:28 ` Damien Doligez
1 sibling, 1 reply; 15+ messages in thread
From: Chris Hecker @ 2003-04-07 19:08 UTC (permalink / raw)
To: Shivkumar Chandrasekaran, caml-list
>I am concluding that gc is to blame.
Did you profile to check this assumption? What's the profile look like?
>Pursuing my earlier enquiry: I noticed that module Bigarray allocated its
>arrays using "malloc" rather than on the ocaml heap. My problem allocates
>a lot (say 100,000) bigarrays of rather small size (say 30 x 30). Can this
>potentially increase the cost of gc? (I have this mental picture of the
>ocaml heap fragmented by these immovable malloc'd bigarrays.)
The malloc heap should be mostly separate from the caml heap (assuming the
malloc heap is allocated in big chunks). If you're really worried about
this, write your own bigarray allocator (it's easy) and allocate them
yourself out of an already allocated big block of memory.
Are you sure you're not paging? 100k 30x30 arrays is 360MB (or 720MB if
you're using doubles). Also, if you're constantly allocating and freeing
bigarrays you might be more dependent on malloc speed than caml GC speed.
Best to post the profile. Use the -p command to ocamlopt, or -ccopt and
-cclib commands to use another profiler. You can see examples of gc-bound
profiles in the list archive.
Chris
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-07 19:08 ` Chris Hecker
@ 2003-04-08 7:15 ` David Monniaux
0 siblings, 0 replies; 15+ messages in thread
From: David Monniaux @ 2003-04-08 7:15 UTC (permalink / raw)
To: Chris Hecker; +Cc: Shivkumar Chandrasekaran, caml-list
On Mon, 7 Apr 2003, Chris Hecker wrote:
> >I am concluding that gc is to blame.
> Did you profile to check this assumption? What's the profile look like?
A little bit of experience here: I have done such profiles with the
following tools:
* gprof (compile with ocamlopt -p and gcc -pg);
* oprofile (hardware-based, supports only certain platforms including
Linux/x86; can profile not only clock ticks, but also events such as
cache faults).
My (somehow unsurprising) experience was that:
* GC generates 15-20% of cache faults
* GC takes about 15% of time.
A conjecture on my part is that using a lot of unboxed floats is partly
responsible.
My conclusion: you should definitely profile your code before starting
to incriminate the GC or any other module.
David Monniaux http://www.di.ens.fr/~monniaux
Laboratoire d'informatique de l'École Normale Supérieure,
Paris, France
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-04 19:40 [Caml-list] Slow GC problem Shivkumar Chandrasekaran
2003-04-03 21:07 ` Christophe Raffalli
2003-04-07 17:53 ` Shivkumar Chandrasekaran
@ 2003-04-08 10:23 ` Damien Doligez
2003-04-10 21:21 ` Shivkumar Chandrasekaran
2 siblings, 1 reply; 15+ messages in thread
From: Damien Doligez @ 2003-04-08 10:23 UTC (permalink / raw)
To: Shivkumar Chandrasekaran; +Cc: caml-list
> I have a gc efficiency problem for which I require some advice. I have
> read both the O'Reilly book and the manual on gc.
[...]
> Below I give the gc stats just before and after the solver routine is
> called in the in-core solver:
>
> "Just before" "Just after"
> minor_words: 46243376 139259767
> promoted_words: 928267 2595523
> major_words: 2883087 39489766
> minor_collections: 1412 4591
> major_collections: 18 52
> heap_words: 2150400 1044480
> heap_chunks: 35 17
> top_heap_words: 2150400 5038080
> live_words: 1842373 840037
> live_blocks: 253926 116816
> free_words: 307180 204440
> free_blocks: 47368 17
> largest_free: 10928 61440
> fragments: 847 3
> compactions: 0 2
As others have said, this is not really enough information to tell
what is going on. What we can say from the above is:
1. You are allocating lots and lots of data structures in the major
heap (maybe finalized bigarray descriptors)
2. The compactor was called twice, which may indicate that you have
a fragmentation problem.
3. The compactor was called near the end of the solver routine,
which must have erased most of the evidence...
-- Damien
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-07 17:53 ` Shivkumar Chandrasekaran
2003-04-07 19:08 ` Chris Hecker
@ 2003-04-08 10:28 ` Damien Doligez
2003-04-08 23:03 ` Shivkumar Chandrasekaran
1 sibling, 1 reply; 15+ messages in thread
From: Damien Doligez @ 2003-04-08 10:28 UTC (permalink / raw)
To: Shivkumar Chandrasekaran; +Cc: caml-list
On Monday, April 7, 2003, at 07:53 PM, Shivkumar Chandrasekaran wrote:
> Pursuing my earlier enquiry: I noticed that module Bigarray allocated
> its arrays using "malloc" rather than on the ocaml heap. My problem
> allocates a lot (say 100,000) bigarrays of rather small size (say 30 x
> 30). Can this potentially increase the cost of gc? (I have this mental
> picture of the ocaml heap fragmented by these immovable malloc'd
> bigarrays.) Any help will be appreciated. Thanks,
It doesn't matter. The ocaml heap is allocated in big chunks from
malloc,
and other malloc blocks cannot interfere with its fragmentation.
If your program is allocating lots of temporary bigarrays and leaves
them
to be collected by the GC, then I would say it's using bigarrays in an
inefficient way. Bigarrays are mostly designed to be long-lived
objects.
If you can find a way to re-use your bigarrays instead of allocating new
ones, you should get better performance.
-- Damien
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-08 10:28 ` Damien Doligez
@ 2003-04-08 23:03 ` Shivkumar Chandrasekaran
0 siblings, 0 replies; 15+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-04-08 23:03 UTC (permalink / raw)
To: caml-list
This seems to be confirmed by some modifications I did to the code. I
was able to reduce a whole lot of temporary bigarray allocations and
got a 30% speed improvement. However any further reduction of bigarray
creation will require a major "add-on" modification to my entire
library design, or, memory re-use hacks all over the place. So I would
like to be reasonably sure that this is worth-while before I proceed.
On Tuesday, April 8, 2003, at 03:28 AM, Damien Doligez wrote:
> On Monday, April 7, 2003, at 07:53 PM, Shivkumar Chandrasekaran wrote:
>
>> Pursuing my earlier enquiry: I noticed that module Bigarray allocated
>> its arrays using "malloc" rather than on the ocaml heap. My problem
>> allocates a lot (say 100,000) bigarrays of rather small size (say 30
>> x 30). Can this potentially increase the cost of gc? (I have this
>> mental picture of the ocaml heap fragmented by these immovable
>> malloc'd bigarrays.) Any help will be appreciated. Thanks,
>
> It doesn't matter. The ocaml heap is allocated in big chunks from
> malloc,
> and other malloc blocks cannot interfere with its fragmentation.
>
> If your program is allocating lots of temporary bigarrays and leaves
> them
> to be collected by the GC, then I would say it's using bigarrays in an
> inefficient way. Bigarrays are mostly designed to be long-lived
> objects.
> If you can find a way to re-use your bigarrays instead of allocating
> new
> ones, you should get better performance.
>
> -- Damien
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>
--shiv--
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-08 10:23 ` Damien Doligez
@ 2003-04-10 21:21 ` Shivkumar Chandrasekaran
2003-04-10 21:51 ` Brian Hurt
2003-04-11 7:10 ` Chris Hecker
0 siblings, 2 replies; 15+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-04-10 21:21 UTC (permalink / raw)
To: caml-list
I took Damien's advice (thanks) and spent some time trying to re-use
all the bigarrays I was allocating. However, my bigarray use is spread
out over a fairly complicated algorithm, and the only way to make a
good dent on bigarray use would be to completely rewrite the algorithm
in a more traditional fortran77 style. Which would of course mean that
I would have to first figure out the entire memory usage pattern.
...... So I am pondering another solution:
What if I modified bigarray_stubs.c to use the malloc and free calls of
the Boehm gc (6.1-4) garbage collector? My reasoning is that malloc is
performing poorly due to fragmentation, and switching to a gc'd version
might help out.
Before I try this I would like some feedback from the list on the
soundness of this idea. Thanks,
--shiv--
On Tuesday, April 8, 2003, at 03:23 AM, Damien Doligez wrote:
>> I have a gc efficiency problem for which I require some advice. I
>> have read both the O'Reilly book and the manual on gc.
> [...]
>> Below I give the gc stats just before and after the solver routine
>> is called in the in-core solver:
>>
>> "Just before" "Just after"
>> minor_words: 46243376 139259767
>> promoted_words: 928267 2595523
>> major_words: 2883087 39489766
>> minor_collections: 1412 4591
>> major_collections: 18 52
>> heap_words: 2150400 1044480
>> heap_chunks: 35 17
>> top_heap_words: 2150400 5038080
>> live_words: 1842373 840037
>> live_blocks: 253926 116816
>> free_words: 307180 204440
>> free_blocks: 47368 17
>> largest_free: 10928 61440
>> fragments: 847 3
>> compactions: 0 2
>
> As others have said, this is not really enough information to tell
> what is going on. What we can say from the above is:
>
> 1. You are allocating lots and lots of data structures in the major
> heap (maybe finalized bigarray descriptors)
> 2. The compactor was called twice, which may indicate that you have
> a fragmentation problem.
> 3. The compactor was called near the end of the solver routine,
> which must have erased most of the evidence...
>
> -- Damien
>
--shiv--
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-10 21:21 ` Shivkumar Chandrasekaran
@ 2003-04-10 21:51 ` Brian Hurt
2003-04-11 7:10 ` Chris Hecker
1 sibling, 0 replies; 15+ messages in thread
From: Brian Hurt @ 2003-04-10 21:51 UTC (permalink / raw)
To: Shivkumar Chandrasekaran; +Cc: caml-list
In playing around with the C interface code myself, I noticed that
alloc_custom (the function that IMHO bigarray should be using to allocate
it's blocks) has two arguments you pass in, which control how often the
blocks get garbage collected- used and max. The gcc does a full gc every
used/max blocks allocated (more or less).
Perhaps the bigarray library should be extended to allow the user to set
these variables. I'm wondering if different values here might not solve
Shivkumar's problem in a more elegant (read: nice) way?
Brian
On Thu, 10 Apr 2003, Shivkumar Chandrasekaran wrote:
>
> I took Damien's advice (thanks) and spent some time trying to re-use
> all the bigarrays I was allocating. However, my bigarray use is spread
> out over a fairly complicated algorithm, and the only way to make a
> good dent on bigarray use would be to completely rewrite the algorithm
> in a more traditional fortran77 style. Which would of course mean that
> I would have to first figure out the entire memory usage pattern.
>
> ...... So I am pondering another solution:
>
> What if I modified bigarray_stubs.c to use the malloc and free calls of
> the Boehm gc (6.1-4) garbage collector? My reasoning is that malloc is
> performing poorly due to fragmentation, and switching to a gc'd version
> might help out.
>
> Before I try this I would like some feedback from the list on the
> soundness of this idea. Thanks,
>
> --shiv--
>
> On Tuesday, April 8, 2003, at 03:23 AM, Damien Doligez wrote:
>
> >> I have a gc efficiency problem for which I require some advice. I
> >> have read both the O'Reilly book and the manual on gc.
> > [...]
> >> Below I give the gc stats just before and after the solver routine
> >> is called in the in-core solver:
> >>
> >> "Just before" "Just after"
> >> minor_words: 46243376 139259767
> >> promoted_words: 928267 2595523
> >> major_words: 2883087 39489766
> >> minor_collections: 1412 4591
> >> major_collections: 18 52
> >> heap_words: 2150400 1044480
> >> heap_chunks: 35 17
> >> top_heap_words: 2150400 5038080
> >> live_words: 1842373 840037
> >> live_blocks: 253926 116816
> >> free_words: 307180 204440
> >> free_blocks: 47368 17
> >> largest_free: 10928 61440
> >> fragments: 847 3
> >> compactions: 0 2
> >
> > As others have said, this is not really enough information to tell
> > what is going on. What we can say from the above is:
> >
> > 1. You are allocating lots and lots of data structures in the major
> > heap (maybe finalized bigarray descriptors)
> > 2. The compactor was called twice, which may indicate that you have
> > a fragmentation problem.
> > 3. The compactor was called near the end of the solver routine,
> > which must have erased most of the evidence...
> >
> > -- Damien
> >
> --shiv--
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-10 21:21 ` Shivkumar Chandrasekaran
2003-04-10 21:51 ` Brian Hurt
@ 2003-04-11 7:10 ` Chris Hecker
2003-04-11 7:58 ` Christophe Raffalli
1 sibling, 1 reply; 15+ messages in thread
From: Chris Hecker @ 2003-04-11 7:10 UTC (permalink / raw)
To: Shivkumar Chandrasekaran, caml-list
>What if I modified bigarray_stubs.c to use the malloc and free calls of
>the Boehm gc (6.1-4) garbage collector? My reasoning is that malloc is
>performing poorly due to fragmentation, and switching to a gc'd version
>might help out.
>Before I try this I would like some feedback from the list on the
>soundness of this idea.
I don't mean to be a nag, but did you profile your application yet? A very
wise programmer once said, "Assume Nothing".
Chris
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-11 7:10 ` Chris Hecker
@ 2003-04-11 7:58 ` Christophe Raffalli
2003-04-11 16:35 ` Shivkumar Chandrasekaran
0 siblings, 1 reply; 15+ messages in thread
From: Christophe Raffalli @ 2003-04-11 7:58 UTC (permalink / raw)
To: Chris Hecker; +Cc: Shivkumar Chandrasekaran, caml-list
[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]
Chris Hecker wrote:
>
>> What if I modified bigarray_stubs.c to use the malloc and free calls
>> of the Boehm gc (6.1-4) garbage collector? My reasoning is that malloc
>> is performing poorly due to fragmentation, and switching to a gc'd
>> version might help out.
>> Before I try this I would like some feedback from the list on the
>> soundness of this idea.
>
>
> I don't mean to be a nag, but did you profile your application yet? A
> very wise programmer once said, "Assume Nothing".
>
> Chris
>
Always remember that if you do anything preventing the compiler to use
float optimizations, you will pay sometimes a 10 factor in speed and
there may be other reasons (may be your second algorithm is really more
efficient because you were more clever than you expected ?).
GC problem are in fact not so common. So "Assume Nothing" because I
suspect you are wrong ...
--
Christophe Raffalli
Université de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex
tél: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution
can check this signature
---------------------------------------------
[-- Attachment #2: Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
2003-04-11 7:58 ` Christophe Raffalli
@ 2003-04-11 16:35 ` Shivkumar Chandrasekaran
0 siblings, 0 replies; 15+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-04-11 16:35 UTC (permalink / raw)
To: caml-list
Hmmm. I thought I had posted the profile and the reason why it was not
useful. But apparently that email went to /dev/null. I will post it
again as soon as I have a chance.
The in-core and out-of-core solver are "identical" and all
floating-point operations are done by BLAS/LAPACK. So I don't think
ocaml compiler optimizations are an issue.
--shiv--
On Friday, April 11, 2003, at 12:58 AM, Christophe Raffalli wrote:
> Chris Hecker wrote:
>>> What if I modified bigarray_stubs.c to use the malloc and free calls
>>> of the Boehm gc (6.1-4) garbage collector? My reasoning is that
>>> malloc is performing poorly due to fragmentation, and switching to a
>>> gc'd version might help out.
>>> Before I try this I would like some feedback from the list on the
>>> soundness of this idea.
>> I don't mean to be a nag, but did you profile your application yet?
>> A very wise programmer once said, "Assume Nothing".
>> Chris
>
> Always remember that if you do anything preventing the compiler to use
> float optimizations, you will pay sometimes a 10 factor in speed and
> there may be other reasons (may be your second algorithm is really
> more efficient because you were more clever than you expected ?).
>
> GC problem are in fact not so common. So "Assume Nothing" because I
> suspect you are wrong ...
>
> --
> Christophe Raffalli
> Université de Savoie
> Batiment Le Chablais, bureau 21
> 73376 Le Bourget-du-Lac Cedex
>
> tél: (33) 4 79 75 81 03
> fax: (33) 4 79 75 87 42
> mail: Christophe.Raffalli@univ-savoie.fr
> www: http://www.lama.univ-savoie.fr/~RAFFALLI
> ---------------------------------------------
> IMPORTANT: this mail is signed using PGP/MIME
> At least Enigmail/Mozilla, mutt or evolution
> can check this signature
> ---------------------------------------------
> <mime-attachment>
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
@ 2003-10-23 8:17 Chris Hecker
0 siblings, 0 replies; 15+ messages in thread
From: Chris Hecker @ 2003-10-23 8:17 UTC (permalink / raw)
To: Shivkumar Chandrasekaran, caml-list
>What if I modified bigarray_stubs.c to use the malloc and free calls of
>the Boehm gc (6.1-4) garbage collector? My reasoning is that malloc is
>performing poorly due to fragmentation, and switching to a gc'd version
>might help out.
>Before I try this I would like some feedback from the list on the
>soundness of this idea.
I don't mean to be a nag, but did you profile your application yet? A
very
wise programmer once said, "Assume Nothing".
Chris
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives:
http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
@ 2003-04-14 16:37 Shivkumar Chandrasekaran
0 siblings, 0 replies; 15+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-04-14 16:37 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 441 bytes --]
I have attached the entire profile (such as is available on Mac OS X)
to the bottom of this message. One warning though: I am only interested
in profiling my solver (function SfTwoDsolver.DsfTwoDsolver.
superfastLUofBTDSSS'). However the time required to set-up my problem
is significant. So the profiler information may not be that helpful.
Direct timing shows that on this run my solver required about 43
seconds to run.
--shiv--
[-- Attachment #2: a_profile --]
[-- Type: application/octet-stream, Size: 50515 bytes --]
call graph profile:
The sum of self and descendents is the major sort
for this listing.
function entries:
index the index of the function in the call graph
listing, as an aid to locating it (see below).
%time the percentage of the total time of the program
accounted for by this function and its
descendents.
self the number of seconds spent in this function
itself.
descendents
the number of seconds spent in the descendents of
this function on behalf of this function.
called the number of times this function is called (other
than recursive calls).
self the number of times this function calls itself
recursively.
name the name of the function, with an indication of
its membership in a cycle, if any.
index the index of the function in the call graph
listing, as an aid to locating it.
parent listings:
self* the number of seconds of this function's self time
which is due to calls from this parent.
descendents*
the number of seconds of this function's
descendent time which is due to calls from this
parent.
called** the number of times this function is called by
this parent. This is the numerator of the
fraction which divides up the function's time to
its parents.
total* the number of times this function was called by
all of its parents. This is the denominator of
the propagation fraction.
parents the name of this parent, with an indication of the
parent's membership in a cycle, if any.
index the index of this parent in the call graph
listing, as an aid in locating it.
children listings:
self* the number of seconds of this child's self time
which is due to being called by this function.
descendent*
the number of seconds of this child's descendent's
time which is due to being called by this
function.
called** the number of times this child is called by this
function. This is the numerator of the
propagation fraction for this child.
total* the number of times this child is called by all
functions. This is the denominator of the
propagation fraction.
children the name of this child, and an indication of its
membership in a cycle, if any.
index the index of this child in the call graph listing,
as an aid to locating it.
* these fields are omitted for parents (or
children) in the same cycle as the function. If
the function (or child) is a member of a cycle,
the propagated times and propagation denominator
represent the self time and descendent time of the
cycle as a whole.
** static-only parents and children are indicated
by a call count of 0.
cycle listings:
the cycle as a whole is listed with the same
fields as a function entry. Below it are listed
the members of the cycle, and their contributions
to the time and call counts of the cycle.
\f
granularity: each sample hit covers 4 byte(s) for 0.09% of 11.47 seconds
called/total parents
index %time self descendents called+self name index
called/total children
<spontaneous>
[1] 13.9 1.60 0.00 _local_dger_ [1]
-----------------------------------------------
<spontaneous>
[2] 7.8 0.90 0.00 _region_for_ptr_no_lock [2]
-----------------------------------------------
<spontaneous>
[3] 4.6 0.53 0.00 _bigarray_set_aux [3]
-----------------------------------------------
<spontaneous>
[4] 3.7 0.42 0.00 _bigarray_offset [4]
-----------------------------------------------
<spontaneous>
[5] 3.7 0.42 0.00 _gemvT4x16 [5]
-----------------------------------------------
<spontaneous>
[6] 3.6 0.41 0.00 _mark_slice [6]
-----------------------------------------------
<spontaneous>
[7] 3.1 0.36 0.00 _DLACPY [7]
-----------------------------------------------
<spontaneous>
[8] 3.1 0.28 0.07 _caml_c_call [8]
0.02 0.00 328470/328470 _camlidl_lapack_cblas_dgemm [83]
0.00 0.02 68298/68298 _camlidl_lapack_d_transp [84]
0.01 0.00 273772/273772 _camlidl_lapack_dlacpy_ [101]
0.01 0.00 48704/48704 _camlidl_lapack_dgeqlf_ [102]
0.01 0.00 19963/19963 _camlidl_lapack_dtrtrs_ [103]
0.00 0.00 204939/204939 _camlidl_lapack_cblas_dscal [2632]
0.00 0.00 87830/87830 _camlidl_lapack_dormql_ [2633]
0.00 0.00 56730/56730 _camlidl_lapack_dormlq_ [2634]
0.00 0.00 19152/19152 _camlidl_lapack_dgesvd_ [2635]
0.00 0.00 9600/9600 _camlidl_lapack_dgelqf_ [2636]
-----------------------------------------------
<spontaneous>
[9] 2.6 0.30 0.00 _sweep_slice [9]
-----------------------------------------------
<spontaneous>
[10] 2.3 0.26 0.00 _adjust_gc_speed [10]
-----------------------------------------------
<spontaneous>
[11] 2.2 0.25 0.00 _szone_malloc [11]
-----------------------------------------------
<spontaneous>
[12] 1.8 0.21 0.00 _DLASR [12]
-----------------------------------------------
<spontaneous>
[13] 1.8 0.21 0.00 _Nla__fromInt_1391 [13]
-----------------------------------------------
<spontaneous>
[14] 1.7 0.20 0.00 _fl_allocate [14]
-----------------------------------------------
<spontaneous>
[15] 1.7 0.19 0.00 _bigarray_set_2 [15]
-----------------------------------------------
<spontaneous>
[16] 1.6 0.18 0.00 _ATL_dJIK0x0x0NN0x0x0_aX_bX [16]
-----------------------------------------------
<spontaneous>
[17] 1.6 0.18 0.00 _bigarray_get_N [17]
-----------------------------------------------
<spontaneous>
[18] 1.4 0.16 0.00 _copy_double [18]
-----------------------------------------------
<spontaneous>
[19] 1.1 0.13 0.00 _ATL_ddot_xp1yp1aXbX [19]
-----------------------------------------------
<spontaneous>
[20] 1.1 0.13 0.00 _DBDSQR [20]
-----------------------------------------------
<spontaneous>
[21] 1.0 0.12 0.00 _ATL_dJIK0x0x0NN5x1x16_aX_bX [21]
-----------------------------------------------
<spontaneous>
[22] 1.0 0.12 0.00 _ATL_dgemv [22]
-----------------------------------------------
<spontaneous>
[23] 1.0 0.12 0.00 _bigarray_finalize [23]
-----------------------------------------------
<spontaneous>
[24] 1.0 0.11 0.00 _ATL_dgezero [24]
-----------------------------------------------
<spontaneous>
[25] 0.9 0.10 0.00 _ATL_dger1_a1_x1_yX [25]
-----------------------------------------------
<spontaneous>
[26] 0.9 0.10 0.00 _ATL_dtrsmKLLNN [26]
-----------------------------------------------
<spontaneous>
[27] 0.9 0.10 0.00 _Nla__iter2_486 [27]
-----------------------------------------------
<spontaneous>
[28] 0.9 0.10 0.00 _sqrt [28]
-----------------------------------------------
<spontaneous>
[29] 0.8 0.09 0.00 _ATL_dJIK0x0x0NT0x0x0_aX_bX [29]
-----------------------------------------------
<spontaneous>
[30] 0.8 0.09 0.00 _bigarray_sub [30]
-----------------------------------------------
<spontaneous>
[31] 0.8 0.09 0.00 _szone_size [31]
-----------------------------------------------
<spontaneous>
[32] 0.6 0.07 0.00 _alloc_bigarray [32]
-----------------------------------------------
<spontaneous>
[33] 0.6 0.07 0.00 _bigarray_dim [33]
-----------------------------------------------
<spontaneous>
[34] 0.6 0.07 0.00 _bigarray_update_proxy [34]
-----------------------------------------------
<spontaneous>
[35] 0.6 0.07 0.00 _cblas_dgemv [35]
-----------------------------------------------
<spontaneous>
[36] 0.6 0.07 0.00 _dlamch_ [36]
-----------------------------------------------
<spontaneous>
[37] 0.6 0.07 0.00 _dlartg_ [37]
-----------------------------------------------
<spontaneous>
[38] 0.6 0.07 0.00 _free_list_remove_ptr [38]
-----------------------------------------------
<spontaneous>
[39] 0.5 0.06 0.00 _ATL_dNCmmJIK [39]
-----------------------------------------------
<spontaneous>
[40] 0.5 0.06 0.00 _ILAENV [40]
-----------------------------------------------
<spontaneous>
[41] 0.5 0.06 0.00 _Std_exit__code_end [41]
-----------------------------------------------
<spontaneous>
[42] 0.5 0.06 0.00 _bigarray_reshape [42]
-----------------------------------------------
<spontaneous>
[43] 0.5 0.06 0.00 _dlarfg_ [43]
-----------------------------------------------
<spontaneous>
[44] 0.4 0.05 0.00 _ATL_dJIK0x0x0NN1x4x16_aX_bX [44]
-----------------------------------------------
<spontaneous>
[45] 0.4 0.05 0.00 _ATL_dJIK0x0x0TN0x0x0_aX_bX [45]
-----------------------------------------------
<spontaneous>
[46] 0.4 0.05 0.00 _ATL_dscal_xp1yp0aXbX [46]
-----------------------------------------------
<spontaneous>
[47] 0.4 0.05 0.00 _Nla__iDU_1800 [47]
-----------------------------------------------
<spontaneous>
[48] 0.4 0.05 0.00 _Nla__normMax_1913 [48]
-----------------------------------------------
<spontaneous>
[49] 0.4 0.05 0.00 _SSQr [49]
-----------------------------------------------
<spontaneous>
[50] 0.4 0.05 0.00 _bigarray_get_2 [50]
-----------------------------------------------
<spontaneous>
[51] 0.4 0.05 0.00 _fl_merge_block [51]
-----------------------------------------------
<spontaneous>
[52] 0.4 0.05 0.00 _frexp [52]
-----------------------------------------------
<spontaneous>
[53] 0.4 0.05 0.00 _lsame_ [53]
-----------------------------------------------
<spontaneous>
[54] 0.3 0.04 0.00 _DLARF [54]
-----------------------------------------------
<spontaneous>
[55] 0.3 0.04 0.00 _alloc_custom [55]
-----------------------------------------------
<spontaneous>
[56] 0.3 0.04 0.00 _alloc_shr [56]
-----------------------------------------------
<spontaneous>
[57] 0.3 0.04 0.00 _bigarray_fill [57]
-----------------------------------------------
<spontaneous>
[58] 0.3 0.04 0.00 _caml_apply2 [58]
-----------------------------------------------
<spontaneous>
[59] 0.3 0.04 0.00 _f2c_dgemv [59]
-----------------------------------------------
<spontaneous>
[60] 0.3 0.04 0.00 _malloc_zone_free [60]
-----------------------------------------------
<spontaneous>
[61] 0.3 0.03 0.00 _ATL_dJIK0x0x0NT5x1x12_aX_bX [61]
-----------------------------------------------
<spontaneous>
[62] 0.3 0.03 0.00 _ATL_dcopy_xp0yp0aXbX [62]
-----------------------------------------------
<spontaneous>
[63] 0.3 0.03 0.00 _ATL_dcpsc_xp0yp0aXbX [63]
-----------------------------------------------
<spontaneous>
[64] 0.3 0.03 0.00 _ATL_ddot_xp0yp0aXbX [64]
-----------------------------------------------
<spontaneous>
[65] 0.3 0.03 0.00 _ATL_dger [65]
-----------------------------------------------
<spontaneous>
[66] 0.3 0.03 0.00 _ATL_dptgemm [66]
-----------------------------------------------
<spontaneous>
[67] 0.3 0.03 0.00 _Ltv__sfsolve_1040 [67]
-----------------------------------------------
<spontaneous>
[68] 0.3 0.03 0.00 _Ltv__superfastMul_456 [68]
-----------------------------------------------
<spontaneous>
[69] 0.3 0.03 0.00 _Nla__extractRange_1487 [69]
-----------------------------------------------
<spontaneous>
[70] 0.3 0.03 0.00 _allocate_block [70]
-----------------------------------------------
<spontaneous>
[71] 0.3 0.03 0.00 _check_urgent_gc [71]
-----------------------------------------------
<spontaneous>
[72] 0.3 0.03 0.00 _compare_val [72]
-----------------------------------------------
<spontaneous>
[73] 0.3 0.03 0.00 _dorm2l_ [73]
-----------------------------------------------
<spontaneous>
[74] 0.3 0.03 0.00 _free_list_add_ptr [74]
-----------------------------------------------
<spontaneous>
[75] 0.3 0.03 0.00 _gemv8x4 [75]
-----------------------------------------------
<spontaneous>
[76] 0.3 0.03 0.00 _gemvT_Nsmall [76]
-----------------------------------------------
<spontaneous>
[77] 0.3 0.03 0.00 _ger_Nle4 [77]
-----------------------------------------------
<spontaneous>
[78] 0.3 0.03 0.00 _malloc [78]
-----------------------------------------------
<spontaneous>
[79] 0.3 0.03 0.00 _malloc_zone_malloc [79]
-----------------------------------------------
<spontaneous>
[80] 0.3 0.03 0.00 _oldify_one [80]
-----------------------------------------------
<spontaneous>
[81] 0.2 0.02 0.00 restFP [81]
-----------------------------------------------
<spontaneous>
[82] 0.2 0.02 0.00 saveFP [82]
-----------------------------------------------
0.02 0.00 328470/328470 _caml_c_call [8]
[83] 0.2 0.02 0.00 328470 _camlidl_lapack_cblas_dgemm [83]
-----------------------------------------------
0.00 0.02 68298/68298 _caml_c_call [8]
[84] 0.2 0.00 0.02 68298 _camlidl_lapack_d_transp [84]
0.02 0.00 68298/68298 _d_transp [85]
-----------------------------------------------
0.02 0.00 68298/68298 _camlidl_lapack_d_transp [84]
[85] 0.2 0.02 0.00 68298 _d_transp [85]
-----------------------------------------------
<spontaneous>
[86] 0.2 0.02 0.00 _ATL_dGEMM2TN [86]
-----------------------------------------------
<spontaneous>
[87] 0.2 0.02 0.00 _ATL_ddot [87]
-----------------------------------------------
<spontaneous>
[88] 0.2 0.02 0.00 _Bigarray__dim1_152 [88]
-----------------------------------------------
<spontaneous>
[89] 0.2 0.02 0.00 _DTRTRS [89]
-----------------------------------------------
<spontaneous>
[90] 0.2 0.02 0.00 _Nla__matrix2x2_1693 [90]
-----------------------------------------------
<spontaneous>
[91] 0.2 0.02 0.00 _Nla__ql_2076 [91]
-----------------------------------------------
<spontaneous>
[92] 0.2 0.02 0.00 _Nla__svd_2301 [92]
-----------------------------------------------
<spontaneous>
[93] 0.2 0.02 0.00 _Nla__zeros_1459 [93]
-----------------------------------------------
<spontaneous>
[94] 0.2 0.02 0.00 _Pervasives__min_48 [94]
-----------------------------------------------
<spontaneous>
[95] 0.2 0.02 0.00 _cblas_dnrm2 [95]
-----------------------------------------------
<spontaneous>
[96] 0.2 0.02 0.00 _dgesvd_ [96]
-----------------------------------------------
<spontaneous>
[97] 0.2 0.02 0.00 _dlange_ [97]
-----------------------------------------------
<spontaneous>
[98] 0.2 0.02 0.00 _f2c_dger [98]
-----------------------------------------------
<spontaneous>
[99] 0.2 0.02 0.00 _lessequal [99]
-----------------------------------------------
<spontaneous>
[100] 0.2 0.02 0.00 _szone_free [100]
-----------------------------------------------
0.01 0.00 273772/273772 _caml_c_call [8]
[101] 0.1 0.01 0.00 273772 _camlidl_lapack_dlacpy_ [101]
-----------------------------------------------
0.01 0.00 48704/48704 _caml_c_call [8]
[102] 0.1 0.01 0.00 48704 _camlidl_lapack_dgeqlf_ [102]
-----------------------------------------------
0.01 0.00 19963/19963 _caml_c_call [8]
[103] 0.1 0.01 0.00 19963 _camlidl_lapack_dtrtrs_ [103]
-----------------------------------------------
<spontaneous>
[104] 0.1 0.01 0.00 _ATL_apply_tree [104]
-----------------------------------------------
<spontaneous>
[105] 0.1 0.01 0.00 _ATL_dGEMM2NN [105]
-----------------------------------------------
<spontaneous>
[106] 0.1 0.01 0.00 _ATL_dJIK0x0x0NN1x1x16_aX_bX [106]
-----------------------------------------------
<spontaneous>
[107] 0.1 0.01 0.00 _ATL_dJIK0x0x0NT1x4x12_aX_bX [107]
-----------------------------------------------
<spontaneous>
[108] 0.1 0.01 0.00 _ATL_dJIK0x0x0TN5x1x12_aX_bX [108]
-----------------------------------------------
<spontaneous>
[109] 0.1 0.01 0.00 _ATL_dcpsc [109]
-----------------------------------------------
<spontaneous>
[110] 0.1 0.01 0.00 _ATL_dptgemm_nt [110]
-----------------------------------------------
<spontaneous>
[111] 0.1 0.01 0.00 _ATL_dpttrsm_nt [111]
-----------------------------------------------
<spontaneous>
[112] 0.1 0.01 0.00 _ATL_dscal [112]
-----------------------------------------------
<spontaneous>
[113] 0.1 0.01 0.00 _ATL_join_tree [113]
-----------------------------------------------
<spontaneous>
[114] 0.1 0.01 0.00 _Bigarray__reshape_1_255 [114]
-----------------------------------------------
<spontaneous>
[115] 0.1 0.01 0.00 _DGEBD2 [115]
-----------------------------------------------
<spontaneous>
[116] 0.1 0.01 0.00 _DLAPY2 [116]
-----------------------------------------------
<spontaneous>
[117] 0.1 0.01 0.00 _List__rev_append_74 [117]
-----------------------------------------------
<spontaneous>
[118] 0.1 0.01 0.00 _Ltv__fastSub_895 [118]
-----------------------------------------------
<spontaneous>
[119] 0.1 0.01 0.00 _Nla__fun2mat_1529 [119]
-----------------------------------------------
<spontaneous>
[120] 0.1 0.01 0.00 _Nla__getArrayFromPool_1427 [120]
-----------------------------------------------
<spontaneous>
[121] 0.1 0.01 0.00 _Nla__lq_2052 [121]
-----------------------------------------------
<spontaneous>
[122] 0.1 0.01 0.00 _Nla__noOfCols_1415 [122]
-----------------------------------------------
<spontaneous>
[123] 0.1 0.01 0.00 _Nla__partition2x1_1574 [123]
-----------------------------------------------
<spontaneous>
[124] 0.1 0.01 0.00 _Nla__partitionInfx1_1620 [124]
-----------------------------------------------
<spontaneous>
[125] 0.1 0.01 0.00 _Nla__rowScale_2366 [125]
-----------------------------------------------
<spontaneous>
[126] 0.1 0.01 0.00 _Nla__setToL_1771 [126]
-----------------------------------------------
<spontaneous>
[127] 0.1 0.01 0.00 _Nla__transp_1890 [127]
-----------------------------------------------
<spontaneous>
[128] 0.1 0.01 0.00 _SSQ [128]
-----------------------------------------------
<spontaneous>
[129] 0.1 0.01 0.00 _bigarray_num_elts [129]
-----------------------------------------------
<spontaneous>
[130] 0.1 0.01 0.00 _caml_apply12 [130]
-----------------------------------------------
<spontaneous>
[131] 0.1 0.01 0.00 _caml_apply14 [131]
-----------------------------------------------
<spontaneous>
[132] 0.1 0.01 0.00 _caml_apply9 [132]
-----------------------------------------------
<spontaneous>
[133] 0.1 0.01 0.00 _caml_curry3_1 [133]
-----------------------------------------------
<spontaneous>
[134] 0.1 0.01 0.00 _cblas_dgemm [134]
-----------------------------------------------
<spontaneous>
[135] 0.1 0.01 0.00 _d_sign [135]
-----------------------------------------------
<spontaneous>
[136] 0.1 0.01 0.00 _dgeql2_ [136]
-----------------------------------------------
<spontaneous>
[137] 0.1 0.01 0.00 _dorgl2_ [137]
-----------------------------------------------
<spontaneous>
[138] 0.1 0.01 0.00 _dorml2_ [138]
-----------------------------------------------
<spontaneous>
[139] 0.1 0.01 0.00 _dormlq_ [139]
-----------------------------------------------
<spontaneous>
[140] 0.1 0.01 0.00 _free [140]
-----------------------------------------------
<spontaneous>
[141] 0.1 0.01 0.00 _gemvMlt8 [141]
-----------------------------------------------
<spontaneous>
[142] 0.1 0.01 0.00 _ger_Mle8 [142]
-----------------------------------------------
<spontaneous>
[143] 0.1 0.01 0.00 _minor_collection [143]
-----------------------------------------------
<spontaneous>
[144] 0.1 0.01 0.00 _pthread_attr_setdetachstate [144]
-----------------------------------------------
<spontaneous>
[145] 0.1 0.01 0.00 _scalbn [145]
-----------------------------------------------
<spontaneous>
[146] 0.1 0.01 0.00 _stat_alloc [146]
-----------------------------------------------
0.00 0.00 204939/204939 _caml_c_call [8]
[2632] 0.0 0.00 0.00 204939 _camlidl_lapack_cblas_dscal [2632]
-----------------------------------------------
0.00 0.00 87830/87830 _caml_c_call [8]
[2633] 0.0 0.00 0.00 87830 _camlidl_lapack_dormql_ [2633]
-----------------------------------------------
0.00 0.00 56730/56730 _caml_c_call [8]
[2634] 0.0 0.00 0.00 56730 _camlidl_lapack_dormlq_ [2634]
-----------------------------------------------
0.00 0.00 19152/19152 _caml_c_call [8]
[2635] 0.0 0.00 0.00 19152 _camlidl_lapack_dgesvd_ [2635]
-----------------------------------------------
0.00 0.00 9600/9600 _caml_c_call [8]
[2636] 0.0 0.00 0.00 9600 _camlidl_lapack_dgelqf_ [2636]
-----------------------------------------------
\f
flat profile:
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.
total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.
name the name of the function. This is the minor sort
for this listing. The index shows the location of
the function in the gprof listing. If the index is
in parenthesis it shows where it would appear in
the gprof listing if it were to be printed.
\f
granularity: each sample hit covers 4 byte(s) for 0.09% of 11.47 seconds
% cumulative self self total
time seconds seconds calls ms/call ms/call name
13.9 1.60 1.60 _local_dger_ [1]
7.8 2.50 0.90 _region_for_ptr_no_lock [2]
4.6 3.03 0.53 _bigarray_set_aux [3]
3.7 3.45 0.42 _bigarray_offset [4]
3.7 3.87 0.42 _gemvT4x16 [5]
3.6 4.28 0.41 _mark_slice [6]
3.1 4.64 0.36 _DLACPY [7]
2.6 4.94 0.30 _sweep_slice [9]
2.4 5.22 0.28 _caml_c_call [8]
2.3 5.48 0.26 _adjust_gc_speed [10]
2.2 5.73 0.25 _szone_malloc [11]
1.8 5.94 0.21 _DLASR [12]
1.8 6.15 0.21 _Nla__fromInt_1391 [13]
1.7 6.35 0.20 _fl_allocate [14]
1.7 6.54 0.19 _bigarray_set_2 [15]
1.6 6.72 0.18 _ATL_dJIK0x0x0NN0x0x0_aX_bX [16]
1.6 6.90 0.18 _bigarray_get_N [17]
1.4 7.06 0.16 _copy_double [18]
1.1 7.19 0.13 _ATL_ddot_xp1yp1aXbX [19]
1.1 7.32 0.13 _DBDSQR [20]
1.0 7.44 0.12 _ATL_dJIK0x0x0NN5x1x16_aX_bX [21]
1.0 7.56 0.12 _ATL_dgemv [22]
1.0 7.68 0.12 _bigarray_finalize [23]
1.0 7.79 0.11 _ATL_dgezero [24]
0.9 7.89 0.10 _ATL_dger1_a1_x1_yX [25]
0.9 7.99 0.10 _ATL_dtrsmKLLNN [26]
0.9 8.09 0.10 _Nla__iter2_486 [27]
0.9 8.19 0.10 _sqrt [28]
0.8 8.28 0.09 _ATL_dJIK0x0x0NT0x0x0_aX_bX [29]
0.8 8.37 0.09 _bigarray_sub [30]
0.8 8.46 0.09 _szone_size [31]
0.6 8.53 0.07 _alloc_bigarray [32]
0.6 8.60 0.07 _bigarray_dim [33]
0.6 8.67 0.07 _bigarray_update_proxy [34]
0.6 8.74 0.07 _cblas_dgemv [35]
0.6 8.81 0.07 _dlamch_ [36]
0.6 8.88 0.07 _dlartg_ [37]
0.6 8.95 0.07 _free_list_remove_ptr [38]
0.5 9.01 0.06 _ATL_dNCmmJIK [39]
0.5 9.07 0.06 _ILAENV [40]
0.5 9.13 0.06 _Std_exit__code_end [41]
0.5 9.19 0.06 _bigarray_reshape [42]
0.5 9.25 0.06 _dlarfg_ [43]
0.4 9.30 0.05 _ATL_dJIK0x0x0NN1x4x16_aX_bX [44]
0.4 9.35 0.05 _ATL_dJIK0x0x0TN0x0x0_aX_bX [45]
0.4 9.40 0.05 _ATL_dscal_xp1yp0aXbX [46]
0.4 9.45 0.05 _Nla__iDU_1800 [47]
0.4 9.50 0.05 _Nla__normMax_1913 [48]
0.4 9.55 0.05 _SSQr [49]
0.4 9.60 0.05 _bigarray_get_2 [50]
0.4 9.65 0.05 _fl_merge_block [51]
0.4 9.70 0.05 _frexp [52]
0.4 9.75 0.05 _lsame_ [53]
0.3 9.79 0.04 _DLARF [54]
0.3 9.83 0.04 _alloc_custom [55]
0.3 9.87 0.04 _alloc_shr [56]
0.3 9.91 0.04 _bigarray_fill [57]
0.3 9.95 0.04 _caml_apply2 [58]
0.3 9.99 0.04 _f2c_dgemv [59]
0.3 10.03 0.04 _malloc_zone_free [60]
0.3 10.06 0.03 _ATL_dJIK0x0x0NT5x1x12_aX_bX [61]
0.3 10.09 0.03 _ATL_dcopy_xp0yp0aXbX [62]
0.3 10.12 0.03 _ATL_dcpsc_xp0yp0aXbX [63]
0.3 10.15 0.03 _ATL_ddot_xp0yp0aXbX [64]
0.3 10.18 0.03 _ATL_dger [65]
0.3 10.21 0.03 _ATL_dptgemm [66]
0.3 10.24 0.03 _Ltv__sfsolve_1040 [67]
0.3 10.27 0.03 _Ltv__superfastMul_456 [68]
0.3 10.30 0.03 _Nla__extractRange_1487 [69]
0.3 10.33 0.03 _allocate_block [70]
0.3 10.36 0.03 _check_urgent_gc [71]
0.3 10.39 0.03 _compare_val [72]
0.3 10.42 0.03 _dorm2l_ [73]
0.3 10.45 0.03 _free_list_add_ptr [74]
0.3 10.48 0.03 _gemv8x4 [75]
0.3 10.51 0.03 _gemvT_Nsmall [76]
0.3 10.54 0.03 _ger_Nle4 [77]
0.3 10.57 0.03 _malloc [78]
0.3 10.60 0.03 _malloc_zone_malloc [79]
0.3 10.63 0.03 _oldify_one [80]
0.2 10.65 0.02 328470 0.00 0.00 _camlidl_lapack_cblas_dgemm [83]
0.2 10.67 0.02 68298 0.00 0.00 _d_transp [85]
0.2 10.69 0.02 _ATL_dGEMM2TN [86]
0.2 10.71 0.02 _ATL_ddot [87]
0.2 10.73 0.02 _Bigarray__dim1_152 [88]
0.2 10.75 0.02 _DTRTRS [89]
0.2 10.77 0.02 _Nla__matrix2x2_1693 [90]
0.2 10.79 0.02 _Nla__ql_2076 [91]
0.2 10.81 0.02 _Nla__svd_2301 [92]
0.2 10.83 0.02 _Nla__zeros_1459 [93]
0.2 10.85 0.02 _Pervasives__min_48 [94]
0.2 10.87 0.02 _cblas_dnrm2 [95]
0.2 10.89 0.02 _dgesvd_ [96]
0.2 10.91 0.02 _dlange_ [97]
0.2 10.93 0.02 _f2c_dger [98]
0.2 10.95 0.02 _lessequal [99]
0.2 10.97 0.02 _szone_free [100]
0.2 10.99 0.02 restFP [81]
0.2 11.01 0.02 saveFP [82]
0.1 11.02 0.01 273772 0.00 0.00 _camlidl_lapack_dlacpy_ [101]
0.1 11.03 0.01 48704 0.00 0.00 _camlidl_lapack_dgeqlf_ [102]
0.1 11.04 0.01 19963 0.00 0.00 _camlidl_lapack_dtrtrs_ [103]
0.1 11.05 0.01 _ATL_apply_tree [104]
0.1 11.06 0.01 _ATL_dGEMM2NN [105]
0.1 11.07 0.01 _ATL_dJIK0x0x0NN1x1x16_aX_bX [106]
0.1 11.08 0.01 _ATL_dJIK0x0x0NT1x4x12_aX_bX [107]
0.1 11.09 0.01 _ATL_dJIK0x0x0TN5x1x12_aX_bX [108]
0.1 11.10 0.01 _ATL_dcpsc [109]
0.1 11.11 0.01 _ATL_dptgemm_nt [110]
0.1 11.12 0.01 _ATL_dpttrsm_nt [111]
0.1 11.13 0.01 _ATL_dscal [112]
0.1 11.14 0.01 _ATL_join_tree [113]
0.1 11.15 0.01 _Bigarray__reshape_1_255 [114]
0.1 11.16 0.01 _DGEBD2 [115]
0.1 11.17 0.01 _DLAPY2 [116]
0.1 11.18 0.01 _List__rev_append_74 [117]
0.1 11.19 0.01 _Ltv__fastSub_895 [118]
0.1 11.20 0.01 _Nla__fun2mat_1529 [119]
0.1 11.21 0.01 _Nla__getArrayFromPool_1427 [120]
0.1 11.22 0.01 _Nla__lq_2052 [121]
0.1 11.23 0.01 _Nla__noOfCols_1415 [122]
0.1 11.24 0.01 _Nla__partition2x1_1574 [123]
0.1 11.25 0.01 _Nla__partitionInfx1_1620 [124]
0.1 11.26 0.01 _Nla__rowScale_2366 [125]
0.1 11.27 0.01 _Nla__setToL_1771 [126]
0.1 11.28 0.01 _Nla__transp_1890 [127]
0.1 11.29 0.01 _SSQ [128]
0.1 11.30 0.01 _bigarray_num_elts [129]
0.1 11.31 0.01 _caml_apply12 [130]
0.1 11.32 0.01 _caml_apply14 [131]
0.1 11.33 0.01 _caml_apply9 [132]
0.1 11.34 0.01 _caml_curry3_1 [133]
0.1 11.35 0.01 _cblas_dgemm [134]
0.1 11.36 0.01 _d_sign [135]
0.1 11.37 0.01 _dgeql2_ [136]
0.1 11.38 0.01 _dorgl2_ [137]
0.1 11.39 0.01 _dorml2_ [138]
0.1 11.40 0.01 _dormlq_ [139]
0.1 11.41 0.01 _free [140]
0.1 11.42 0.01 _gemvMlt8 [141]
0.1 11.43 0.01 _ger_Mle8 [142]
0.1 11.44 0.01 _minor_collection [143]
0.1 11.45 0.01 _pthread_attr_setdetachstate [144]
0.1 11.46 0.01 _scalbn [145]
0.1 11.47 0.01 _stat_alloc [146]
0.0 11.47 0.00 204939 0.00 0.00 _camlidl_lapack_cblas_dscal [2632]
0.0 11.47 0.00 87830 0.00 0.00 _camlidl_lapack_dormql_ [2633]
0.0 11.47 0.00 68298 0.00 0.00 _camlidl_lapack_d_transp [84]
0.0 11.47 0.00 56730 0.00 0.00 _camlidl_lapack_dormlq_ [2634]
0.0 11.47 0.00 19152 0.00 0.00 _camlidl_lapack_dgesvd_ [2635]
0.0 11.47 0.00 9600 0.00 0.00 _camlidl_lapack_dgelqf_ [2636]
\f
Index by function name
[104] _ATL_apply_tree [90] _Nla__matrix2x2_169 [71] _check_urgent_gc
[105] _ATL_dGEMM2NN [122] _Nla__noOfCols_1415 [72] _compare_val
[86] _ATL_dGEMM2TN [48] _Nla__normMax_1913 [18] _copy_double
[16] _ATL_dJIK0x0x0NN0x0 [123] _Nla__partition2x1_ [135] _d_sign
[106] _ATL_dJIK0x0x0NN1x1 [124] _Nla__partitionInfx [85] _d_transp
[44] _ATL_dJIK0x0x0NN1x4 [91] _Nla__ql_2076 [136] _dgeql2_
[21] _ATL_dJIK0x0x0NN5x1 [125] _Nla__rowScale_2366 [96] _dgesvd_
[29] _ATL_dJIK0x0x0NT0x0 [126] _Nla__setToL_1771 [36] _dlamch_
[107] _ATL_dJIK0x0x0NT1x4 [92] _Nla__svd_2301 [97] _dlange_
[61] _ATL_dJIK0x0x0NT5x1 [127] _Nla__transp_1890 [43] _dlarfg_
[45] _ATL_dJIK0x0x0TN0x0 [93] _Nla__zeros_1459 [37] _dlartg_
[108] _ATL_dJIK0x0x0TN5x1 [94] _Pervasives__min_48 [137] _dorgl2_
[39] _ATL_dNCmmJIK [128] _SSQ [73] _dorm2l_
[62] _ATL_dcopy_xp0yp0aX [49] _SSQr [138] _dorml2_
[109] _ATL_dcpsc [41] _Std_exit__code_end [139] _dormlq_
[63] _ATL_dcpsc_xp0yp0aX [10] _adjust_gc_speed [59] _f2c_dgemv
[87] _ATL_ddot [32] _alloc_bigarray [98] _f2c_dger
[64] _ATL_ddot_xp0yp0aXb [55] _alloc_custom [14] _fl_allocate
[19] _ATL_ddot_xp1yp1aXb [56] _alloc_shr [51] _fl_merge_block
[22] _ATL_dgemv [70] _allocate_block [140] _free
[65] _ATL_dger [33] _bigarray_dim [74] _free_list_add_ptr
[25] _ATL_dger1_a1_x1_yX [57] _bigarray_fill [38] _free_list_remove_p
[24] _ATL_dgezero [23] _bigarray_finalize [52] _frexp
[66] _ATL_dptgemm [50] _bigarray_get_2 [75] _gemv8x4
[110] _ATL_dptgemm_nt [17] _bigarray_get_N [141] _gemvMlt8
[111] _ATL_dpttrsm_nt [129] _bigarray_num_elts [5] _gemvT4x16
[112] _ATL_dscal [4] _bigarray_offset [76] _gemvT_Nsmall
[46] _ATL_dscal_xp1yp0aX [42] _bigarray_reshape [142] _ger_Mle8
[26] _ATL_dtrsmKLLNN [15] _bigarray_set_2 [77] _ger_Nle4
[113] _ATL_join_tree [3] _bigarray_set_aux [99] _lessequal
[88] _Bigarray__dim1_152 [30] _bigarray_sub [1] _local_dger_
[114] _Bigarray__reshape_ [34] _bigarray_update_pr [53] _lsame_
[20] _DBDSQR [130] _caml_apply12 [78] _malloc
[115] _DGEBD2 [131] _caml_apply14 [60] _malloc_zone_free
[7] _DLACPY [58] _caml_apply2 [79] _malloc_zone_malloc
[116] _DLAPY2 [132] _caml_apply9 [6] _mark_slice
[54] _DLARF [8] _caml_c_call [143] _minor_collection
[12] _DLASR [133] _caml_curry3_1 [80] _oldify_one
[89] _DTRTRS [83] _camlidl_lapack_cbl [144] _pthread_attr_setde
[40] _ILAENV [2632] _camlidl_lapack_cbl [2] _region_for_ptr_no_
[117] _List__rev_append_7 [84] _camlidl_lapack_d_t [145] _scalbn
[118] _Ltv__fastSub_895 [2636] _camlidl_lapack_dge [28] _sqrt
[67] _Ltv__sfsolve_1040 [102] _camlidl_lapack_dge [146] _stat_alloc
[68] _Ltv__superfastMul_[2635] _camlidl_lapack_dge [9] _sweep_slice
[69] _Nla__extractRange_ [101] _camlidl_lapack_dla [100] _szone_free
[13] _Nla__fromInt_1391 [2634] _camlidl_lapack_dor [11] _szone_malloc
[119] _Nla__fun2mat_1529 [2633] _camlidl_lapack_dor [31] _szone_size
[120] _Nla__getArrayFromP [103] _camlidl_lapack_dtr [81] restFP
[47] _Nla__iDU_1800 [134] _cblas_dgemm [82] saveFP
[27] _Nla__iter2_486 [35] _cblas_dgemv
[121] _Nla__lq_2052 [95] _cblas_dnrm2
[-- Attachment #3: Type: text/plain, Size: 1794 bytes --]
> Subject :Re: [Caml-list] Slow GC problem
> From :Damien Doligez < Damien.Doligez@inria.fr >
> Date: Tue, 8 Apr 2003 12:23:46 +0200
> Cc: caml-list@inria.fr
> In-Reply-To: < 3C821F52-66D5-11D7-A265-000393942C76@ece.ucsb.edu >
> > I have a gc efficiency problem for which I require some advice. I
> have
> > read both the O'Reilly book and the manual on gc.
> [...]
> > Below I give the gc stats just before and after the solver routine
> is
> > called in the in-core solver:
> >
> > "Just before" "Just after"
> > minor_words: 46243376 139259767
> > promoted_words: 928267 2595523
> > major_words: 2883087 39489766
> > minor_collections: 1412 4591
> > major_collections: 18 52
> > heap_words: 2150400 1044480
> > heap_chunks: 35 17
> > top_heap_words: 2150400 5038080
> > live_words: 1842373 840037
> > live_blocks: 253926 116816
> > free_words: 307180 204440
> > free_blocks: 47368 17
> > largest_free: 10928 61440
> > fragments: 847 3
> > compactions: 0 2
>
> As others have said, this is not really enough information to tell
> what is going on. What we can say from the above is:
>
> 1. You are allocating lots and lots of data structures in the major
> heap (maybe finalized bigarray descriptors)
> 2. The compactor was called twice, which may indicate that you have
> a fragmentation problem.
> 3. The compactor was called near the end of the solver routine,
> which must have erased most of the evidence...
>
> -- Damien
--shiv--
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2003-10-23 8:18 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-04 19:40 [Caml-list] Slow GC problem Shivkumar Chandrasekaran
2003-04-03 21:07 ` Christophe Raffalli
2003-04-07 17:53 ` Shivkumar Chandrasekaran
2003-04-07 19:08 ` Chris Hecker
2003-04-08 7:15 ` David Monniaux
2003-04-08 10:28 ` Damien Doligez
2003-04-08 23:03 ` Shivkumar Chandrasekaran
2003-04-08 10:23 ` Damien Doligez
2003-04-10 21:21 ` Shivkumar Chandrasekaran
2003-04-10 21:51 ` Brian Hurt
2003-04-11 7:10 ` Chris Hecker
2003-04-11 7:58 ` Christophe Raffalli
2003-04-11 16:35 ` Shivkumar Chandrasekaran
2003-04-14 16:37 Shivkumar Chandrasekaran
2003-10-23 8:17 Chris Hecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox