* ocamlprof versus gprof on Mac os X
@ 2005-01-09 18:46 Will M. Farr
0 siblings, 0 replies; only message in thread
From: Will M. Farr @ 2005-01-09 18:46 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 4701 bytes --]
Hello all,
I've been trying to optimize a quick nbody gravitational calculation in
ocaml (it's for the shootout benchmark at
http://shootout.alioth.debian.org/ ), and I've been getting some weird
results out of gprof. I have a function which is used twice to compute
the total energy of the system --- once at the beginning of the
integration, and once at the end. The output of ocamlprof on a
bytecode executable reflects the number of times this function is
called correctly:
let energy bl =
(* 2 *) let t_accumulator acc b =
(* 10 *) acc +. (t b) in
let v_accumulator acc a b =
(* 20 *) acc +. (v a b) in
let acc1 = List.fold_left t_accumulator 0.0 bl in
fold_left_n2 v_accumulator acc1 bl;;
However, no matter how many integration steps I take (1000, 1000000,
10000000), this function appears to take up the same fraction of the
runtime according to gprof (after compiling with ocamlopt -p):
% cumulative self self total
time seconds seconds calls ms/call ms/call name
28.4 28.52 28.52
_camlNbody__v_accumulator_198 [1]
22.8 51.47 22.95 ___sqrt [2]
16.8 68.38 16.91
_camlNbody__force_on_135 [3]
6.5 74.95 6.57
_camlNbody__norm_93 [4]
5.0 80.00 5.05
_camlList__iter_98 [5]
4.4 84.41 4.41
_camlNbody__update_pos_170 [6]
4.2 88.60 4.19
_camlNbody__norm2_91 [7]
3.2 91.77 3.17
_camlNbody__update_forces_145 [8]
2.4 94.14 2.37
_camlNbody__up_177 [9]
1.7 95.82 1.68
_camlNbody__loop_152 [10]
1.2 97.04 1.22 _caml_curry2 [11]
0.8 97.88 0.84
_camlNbody__zero_body_force_133 [12]
0.6 98.52 0.64 _caml_curry2_1
[13]
0.4 98.95 0.43
_camlNbody__iter_n2_149 [14]
0.4 99.35 0.40
_camlNbody__step_174 [15]
0.3 99.68 0.33 _camlList__tl_66
[16]
0.2 99.90 0.22 _camlList__hd_63
[17]
0.2 100.09 0.19
_camlNbody__main_223 [18]
0.1 100.18 0.09
_caml_oldify_local_roots [19]
0.1 100.24 0.06
_caml_major_collection_slice [20]
0.0 100.29 0.05 _caml_darken [21]
0.0 100.33 0.04 _caml_call_gc [22]
0.0 100.36 0.03
_caml_empty_minor_heap [23]
0.0 100.39 0.03 _caml_fl_allocate
[24]
0.0 100.42 0.03
_caml_minor_collection [25]
0.0 100.44 0.02 _caml_gc_message
[26]
0.0 100.46 0.02 _caml_oldify_one
[27]
0.0 100.48 0.02 _szone_malloc [28]
0.0 100.49 0.01 _alloc_to_do [29]
0.0 100.50 0.01
_caml_compact_heap_maybe [30]
0.0 100.51 0.01 _caml_do_roots
[31]
0.0 100.52 0.01
_caml_final_do_strong_roots [32]
0.0 100.53 0.01
_caml_final_empty_young [33]
0.0 100.54 0.01
_caml_final_update [34]
0.0 100.55 0.01
_caml_fl_merge_block [35]
0.0 100.56 0.01
_caml_oldify_mopup [36]
0.0 100.57 0.01 _mark_slice [37]
0.0 100.58 0.01 _sweep_slice [38]
This is on a run with 1 000 000 integration steps. Is ocamlprof -p
broken on Mac os X? There are several candidate functions which could
be taking that much time and don't appear in the list, so it looks like
it's just confused the names of various functions; is there a way I can
fix this?
Thanks for the help!
Will Farr
BTW, on my system, this calculation takes ~2 times as long as a java
program to do the same thing; if anyone wants to defend the honor of
ocaml better, I would be happy to pass along source code, and we can
see whether there's something faster to be done.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2005-01-09 18:56 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-09 18:46 ocamlprof versus gprof on Mac os X Will M. Farr
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox