* time profiling and nested function inlining
@ 2006-12-06 3:43 Quôc Peyrot
2006-12-06 8:55 ` [Caml-list] " Daniel Bünzli
0 siblings, 1 reply; 5+ messages in thread
From: Quôc Peyrot @ 2006-12-06 3:43 UTC (permalink / raw)
To: caml-list
Hello, I have two questions, sorry if they have already been asked,
but I searched
through the archives and couldn't find the answers:
- I tried to do some time profiling (Mac OS X, ppc G4) but for some
reasons it doesn't
seem to work. I compiled with OCamlMakefile using the command line
"make profiling-native-code".
When I execute the program, it does generate gmon.out, but when I run
gprof the only thing I get is:
called/total parents
index %time self descendents called+self name index
called/total children
0.00 0.00 1/1 ___fixunsdfdi
[476]
[21] 0.0 0.00 0.00 1 ___stub_getrealaddr
[21]
-----------------------------------------------
and
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.0 0.00 0.00 1 0.00 0.00
___stub_getrealaddr [21]
Thus, I'm wondering whether or not time profiling is supported on PPC
G4.
And if it is, can someone give me some clue to debug this issue?
If it isn't, I would appreciate if someone could give me alternative
solutions (apart from using an intel computer ;) )
- I was looking at the asm output to get familiar with efficient
coding style, and I tried the following example:
let rec log2_acc value acc =
if value = 0
then acc
else log2_acc (value lsr 1) (acc + 1)
let log2 value =
log2_acc value 0
which compiles to (using "ocamlopt -inline 100 -unsafe -S)
_camlTest_regular__log2_acc_57:
L101:
cmpwi r3, 1
bne L100
mr r3, r4
blr
L100:
srwi r5, r3, 1
ori r3, r5, 1
addi r4, r4, 2
b L101
.globl _camlTest_regular__log2_60
.text
.align 2
_camlTest_regular__log2_60:
L102:
li r4, 1
b _camlTest_regular__log2_acc_57
Although log2_acc could have been inlined (which might not be
beneficial in this case anyway), it looks quite ok.
But when I tried with a nested function:
let log2 value =
let rec log2_acc value acc =
if value = 0
then acc
else log2_acc (value lsr 1) (acc + 1)
in
log2_acc value 0
I got the following output:
_camlTest_nested__2:
.long _caml_curry2
.long 5
.long _camlTest_nested__log2_acc_59
.globl _camlTest_nested__log2_acc_59
.text
.align 2
_camlTest_nested__log2_acc_59:
L101:
cmpwi r3, 1
bne L100
mr r3, r4
blr
L100:
srwi r5, r3, 1
ori r3, r5, 1
addi r4, r4, 2
b L101
.globl _camlTest_nested__log2_57
.text
.align 2
_camlTest_nested__log2_57:
L102:
addis r4, 0, ha16(_camlTest_nested__2)
addi r4, r4, lo16(_camlTest_nested__2)
li r4, 1
b _camlTest_nested__log2_acc_59
I'm wondering what these computations before the call are (frame?)
and why the compiler couldn't get rid of them.
Not that I am utterly concerned by these small extra computations...
I'm just curious.
Thanks for the help/explanations,
--
Best Regards,
Quôc
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] time profiling and nested function inlining
2006-12-06 3:43 time profiling and nested function inlining Quôc Peyrot
@ 2006-12-06 8:55 ` Daniel Bünzli
2006-12-06 9:01 ` Daniel Bünzli
2006-12-07 10:43 ` Quôc Peyrot
0 siblings, 2 replies; 5+ messages in thread
From: Daniel Bünzli @ 2006-12-06 8:55 UTC (permalink / raw)
To: Quôc Peyrot; +Cc: caml-list
If you have the development tools installed, you can profile on
macosx using shark (you don't even need to compile with -p). Invoke
it from the command line on your executable as follows:
> shark -i -1 -q ./yourexec.opt args
This will write a .mshark file in the directory that you can open
with Shark.app.
> open *.mshark
Best,
Daniel
P.S. I remember having problems if the executable did not run for
long enough, shark couldn't take any samples.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] time profiling and nested function inlining
2006-12-06 8:55 ` [Caml-list] " Daniel Bünzli
@ 2006-12-06 9:01 ` Daniel Bünzli
2006-12-07 10:43 ` Quôc Peyrot
1 sibling, 0 replies; 5+ messages in thread
From: Daniel Bünzli @ 2006-12-06 9:01 UTC (permalink / raw)
To: Quôc Peyrot; +Cc: caml-list
Le 6 déc. 06 à 09:55, Daniel Bünzli a écrit :
> If you have the development tools installed, you can profile on
> macosx using shark (you don't even need to compile with -p).
However if you want to see the name of system function calls in the
profile you need to compile with -g.
Daniel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] time profiling and nested function inlining
2006-12-06 8:55 ` [Caml-list] " Daniel Bünzli
2006-12-06 9:01 ` Daniel Bünzli
@ 2006-12-07 10:43 ` Quôc Peyrot
2006-12-07 12:14 ` Jon Harrop
1 sibling, 1 reply; 5+ messages in thread
From: Quôc Peyrot @ 2006-12-07 10:43 UTC (permalink / raw)
To: caml-list
On Dec 6, 2006, at 12:55 AM, Daniel Bünzli wrote:
> If you have the development tools installed, you can profile on
> macosx using shark (you don't even need to compile with -p). Invoke
> it from the command line on your executable as follows:
>
> > shark -i -1 -q ./yourexec.opt args
>
> This will write a .mshark file in the directory that you can open
> with Shark.app.
>
> > open *.mshark
Thanks, it worked like a charm.
Anyone for my second question from my original email (about the
nested function)?
The more I look at the assembly output, the more I am puzzled.
Another simple example:
for i = 0 to len - 1 do
for j = 0 to len - 1 do
array.(i).(j) <- 0;
done;
done;
doesn't seem to be equivalent to
for i = 0 to len - 1 do
let array_i = array.(i) in
for j = 0 to len - 1 do
array_i.(j) <- 0;
done;
done;
in the former, the compiler doesn't detect the invariant "array.(i)"
and keeps it inside the inner loop.
I tried to pass -ccopt -O3 to ocamlopt but it didn't seem to change
anything.
--
Best Regards,
Quôc
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] time profiling and nested function inlining
2006-12-07 10:43 ` Quôc Peyrot
@ 2006-12-07 12:14 ` Jon Harrop
0 siblings, 0 replies; 5+ messages in thread
From: Jon Harrop @ 2006-12-07 12:14 UTC (permalink / raw)
To: caml-list
On Thursday 07 December 2006 10:43, Quôc Peyrot wrote:
> in the former, the compiler doesn't detect the invariant "array.(i)"
> and keeps it inside the inner loop.
> I tried to pass -ccopt -O3 to ocamlopt but it didn't seem to change
> anything.
Ocamlopt doesn't do that optimisation and -ccopt won't have any effect unless
you're compiling C code.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-12-07 12:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-06 3:43 time profiling and nested function inlining Quôc Peyrot
2006-12-06 8:55 ` [Caml-list] " Daniel Bünzli
2006-12-06 9:01 ` Daniel Bünzli
2006-12-07 10:43 ` Quôc Peyrot
2006-12-07 12:14 ` Jon Harrop
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox