* [Caml-list] ARM code generator problem
@ 2012-08-10 21:41 Jeffrey Scofield
2012-08-11 8:00 ` Benedikt Meurer
0 siblings, 1 reply; 9+ messages in thread
From: Jeffrey Scofield @ 2012-08-10 21:41 UTC (permalink / raw)
To: Caml List; +Cc: Jeffrey Scofield
Greetings,
While working on porting OCaml 4.00.0 to iOS, I ran across
what looks like a problem in the ARM code generation.
If you look at asmcomp/arm/emit.mlp you see lots of places where
s14 is used as a scratch register. The one that showed up in my
code is the code sequence for float_of_int:
| Lop(Ifloatofint) ->
` fmsr s14, {emit_reg i.arg.(0)}\n`;
` fsitod {emit_reg i.res.(0)}, s14\n`; 2
Note that the emitted code always uses s14 (unconditionally). This
suggests that s14 should be set aside as a scratch register.
However, s14 is also an alias for the low order part of d7. If you look
at asmcomp/arm/proc.ml you'll see that d7 is used as a general purpose
register.
The result is that a value in d7 is sometimes destroyed by a use
of s14 as a scratch register. In my code it was a call to float_of_int
that destroyed a float value being kept in d7.
I'm wondering if there's any wisdom on the list about this problem.
I don't see anything about it on Mantis.
For my own project, I think I can solve this simply by leaving d7 out of
the list of general registers in proc.ml. However, this might be a bit
drastic. Maybe there is a more subtle and wise solution.
You can read about OCaml4-on-iOS progress in my sporadic blog:
http://psellos.com/2012/07/2012.07.ocamlxarm-ocaml4-1.html
I can provide my OCaml code and the generated ARM code if it will help
show the problem. I haven't (yet) tried to whittle it down to a small
case.
Regards,
Jeffrey
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] ARM code generator problem
2012-08-10 21:41 [Caml-list] ARM code generator problem Jeffrey Scofield
@ 2012-08-11 8:00 ` Benedikt Meurer
2012-08-11 8:13 ` Benedikt Meurer
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Benedikt Meurer @ 2012-08-11 8:00 UTC (permalink / raw)
To: Jeffrey Scofield; +Cc: Caml List
On Aug 10, 2012, at 23:41 , Jeffrey Scofield wrote:
> Greetings,
Hey Jeffrey,
> While working on porting OCaml 4.00.0 to iOS, I ran across
> what looks like a problem in the ARM code generation.
>
> If you look at asmcomp/arm/emit.mlp you see lots of places where
> s14 is used as a scratch register. The one that showed up in my
> code is the code sequence for float_of_int:
>
> | Lop(Ifloatofint) ->
> ` fmsr s14, {emit_reg i.arg.(0)}\n`;
> ` fsitod {emit_reg i.res.(0)}, s14\n`; 2
>
> Note that the emitted code always uses s14 (unconditionally). This
> suggests that s14 should be set aside as a scratch register.
>
> However, s14 is also an alias for the low order part of d7. If you look
> at asmcomp/arm/proc.ml you'll see that d7 is used as a general purpose
> register.
>
> The result is that a value in d7 is sometimes destroyed by a use
> of s14 as a scratch register. In my code it was a call to float_of_int
> that destroyed a float value being kept in d7.
If you look at destroyed_at_oper in asmcomp/arm/proc.ml, you'll see that d7 (s14+s15) is marked as destroyed for those operations where it is used as scratch register.
If possible, it would probably also make sense to merge some of the iOS related code into the upstream ARM backend, in case you are interested.
> Regards,
> Jeffrey
greets,
Benedikt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] ARM code generator problem
2012-08-11 8:00 ` Benedikt Meurer
@ 2012-08-11 8:13 ` Benedikt Meurer
2012-08-11 8:57 ` Jeffrey Scofield
2012-08-11 8:52 ` [Caml-list] " Jeffrey Scofield
2012-08-13 19:21 ` [Caml-list] " Jeffrey Scofield
2 siblings, 1 reply; 9+ messages in thread
From: Benedikt Meurer @ 2012-08-11 8:13 UTC (permalink / raw)
To: Jeffrey Scofield; +Cc: Caml List
On Aug 11, 2012, at 10:00 , Benedikt Meurer wrote:
> If possible, it would probably also make sense to merge some of the iOS related code into the upstream ARM backend, in case you are interested.
Looking through the arm-as-to-ios script you published, I could merge most of the label, symbol addressing and jump table related code. BTW you're script isn't going to work for large compilation units, because the range of the LDR instruction is limited and you always allocate the pool at the end of the file.
greets,
Benedikt
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Caml-list] ARM code generator problem
2012-08-11 8:00 ` Benedikt Meurer
2012-08-11 8:13 ` Benedikt Meurer
@ 2012-08-11 8:52 ` Jeffrey Scofield
2012-08-13 19:21 ` [Caml-list] " Jeffrey Scofield
2 siblings, 0 replies; 9+ messages in thread
From: Jeffrey Scofield @ 2012-08-11 8:52 UTC (permalink / raw)
To: Benedikt Meurer; +Cc: Jeffrey Scofield, Caml List
Benedikt,
> If you look at destroyed_at_oper in asmcomp/arm/proc.ml, you'll see that
> d7 (s14+s15) is marked as destroyed for those operations where it is
> used as scratch register.
I definitely see d7 being overwritten in the way I described, and I
don't think I've changed these parts of the code. Most of the work
was in reformatting the output for the iOS assembler. There are
some smallish changes to the linkage for calling functions like
sin() and cos().
I'll look to see how destroyed_at_oper is working, maybe it will
explain things.
I made a pretty small file (35 lines or so) that shows the problem.
Unfortunately, I don't have access to a Linux/ARM machine, so I can't
easily try it on an unmodified version of OCaml 4.00.0. If I still
think there's a problem after figuring out destroyed_at_oper, I'll send
you a description in private mail.
> If possible, it would probably also make sense to merge some of the iOS
> related code into the upstream ARM backend, in case you are interested.
I'd definitely be interested, once I get things working reasonably well.
Thanks for the help.
Jeffrey
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Caml-list] ARM code generator problem
2012-08-11 8:13 ` Benedikt Meurer
@ 2012-08-11 8:57 ` Jeffrey Scofield
2012-08-11 9:48 ` [Caml-list] " Benedikt Meurer
0 siblings, 1 reply; 9+ messages in thread
From: Jeffrey Scofield @ 2012-08-11 8:57 UTC (permalink / raw)
To: Benedikt Meurer; +Cc: Jeffrey Scofield, Caml List
Benedikt,
> Looking through the arm-as-to-ios script you published, I could merge
> most of the label, symbol addressing and jump table related code. BTW
> you're script isn't going to work for large compilation units, because
> the range of the LDR instruction is limited and you always allocate the
> pool at the end of the file.
Since I only use the script to process arm.S, I didn't work *too* hard
at making it work for everything. But I thought it might be useful
to other people as a starting point, or as a catalog of the changes
I had to make.
If you're not too put off by the ugliness of the compatibility changes,
I'd be very happy to merge the code.
Thanks for looking at my work.
Best regards,
Jeffrey
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Caml-list] Re: ARM code generator problem
2012-08-11 8:57 ` Jeffrey Scofield
@ 2012-08-11 9:48 ` Benedikt Meurer
0 siblings, 0 replies; 9+ messages in thread
From: Benedikt Meurer @ 2012-08-11 9:48 UTC (permalink / raw)
To: Jeffrey Scofield; +Cc: Caml List
On Aug 11, 2012, at 10:57 , Jeffrey Scofield wrote:
> Benedikt,
Hey Jeffrey,
>> Looking through the arm-as-to-ios script you published, I could merge
>> most of the label, symbol addressing and jump table related code. BTW
>> you're script isn't going to work for large compilation units, because
>> the range of the LDR instruction is limited and you always allocate the
>> pool at the end of the file.
>
> Since I only use the script to process arm.S, I didn't work *too* hard
> at making it work for everything. But I thought it might be useful
> to other people as a starting point, or as a catalog of the changes
> I had to make.
>
> If you're not too put off by the ugliness of the compatibility changes,
> I'd be very happy to merge the code.
I started work on merging your code, see the diff here:
https://github.com/bmeurer/ocaml-arm/compare/bm/ios
That handles most of the basic stuff. Now there are some open issues, i.e. what about .arch / .machine? Is that armv6 vs. armv7 thing an ABI difference?
You can install Debian armel within qemu to easily test the Linux ARM stuff. Preinstalled Debian/squeeze images are available from http://people.debian.org/~aurel32/qemu/armel/
> Best regards,
> Jeffrey
Benedikt
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Caml-list] Re: ARM code generator problem
2012-08-11 8:00 ` Benedikt Meurer
2012-08-11 8:13 ` Benedikt Meurer
2012-08-11 8:52 ` [Caml-list] " Jeffrey Scofield
@ 2012-08-13 19:21 ` Jeffrey Scofield
2012-08-14 7:11 ` Benedikt Meurer
2 siblings, 1 reply; 9+ messages in thread
From: Jeffrey Scofield @ 2012-08-13 19:21 UTC (permalink / raw)
To: Benedikt Meurer; +Cc: Jeffrey Scofield, Caml List
OCamlers, Benedikt:
>> The result is that a value in d7 is sometimes destroyed by a use of s14
>> as a scratch register. In my code it was a call to float_of_int that
>> destroyed a float value being kept in d7.
>
> If you look at destroyed_at_oper in asmcomp/arm/proc.ml, you'll see that
> d7 (s14+s15) is marked as destroyed for those operations where it is
> used as scratch register.
I was able to reproduce this behavior with the stock OCaml 4.00.0 compiler,
so I really do think there's a problem.
I whittled my code down to just a few lines. Here it is:
let rate_pos scounts : float =
let m_MIN = -999.0
in let max1s = Array.make 14 m_MIN
in let max2s = Array.make_matrix 14 14 m_MIN
in let try_build (k1: int) (m: float) : unit =
let denom = 12
in let try1b (sawk1, xct) k =
let () =
if max2s.(k1).(k) > m then
let adjm = if m <= m_MIN then 0.0 else m
in let numer =
if k = k1 then 48
else if sawk1 then 36
else 24
in let f = float_of_int numer /. float_of_int denom
in let () =
if max1s.(k1) <= m_MIN then max1s.(k1) <- 0.0
in
max1s.(k1) <-
max1s.(k1) +. (max2s.(k1).(k) -. adjm) *. f
in
if k = k1 then
(true, xct)
else
(sawk1, xct + scounts.(k))
in
ignore (List.fold_left try1b (false, 0) [])
in let () = Array.iteri try_build max1s
in
0.0
(This is a heavily hacked up piece of an evaluation function for a card
game app.)
Here is my OCaml command line (running on Linux/ARM inside Qemu, as you
suggested--it works!):
$ ocamlopt -ffpu vfpv3 -c -S rate.ml
I'm using vfpv3 because that's what I use for my iOS port. The system
type is linux_eabihf, which is what you need to get vfpv3 support.
The section that seems to misbehave is these three lines:
in let f = float_of_int numer /. float_of_int denom
in let () =
if max1s.(k1) <= m_MIN then max1s.(k1) <- 0.0
Here is the assembly code with added annotations:
ldr r12, [r2, #16] @ r12 <- m_MIN block
mov r0, r7, asr #1
ldr r7, [r2, #20]
movs r6, #0xc @ r6 <- denom
fmsr s14, r6
fsitod d10, s14 @ d10 <- float_of_int denom
ldr r6, [r7, #-4]
fldd d7, [r12, #0] @ d7 <- m_MIN
ldr r12, [r2, #28]
fmsr s14, r0 @ *** d7 is destroyed here ***
fsitod d9, s14 @ d9 <- float_of_int numer
cmp r12, r6, lsr #10
bcs .L111
add r6, r7, r12, lsl #2
fldd d6, [r6, #-4] @ d6 <- max1s.(k1)
fdivd d8, d9, d10
fcmpd d6, d7 @ *** This comparison fails ***
fmstat
bhi .L104
I built the OCaml 4.00.0 compiler from sources inside Qemu. The
line for configure was just this:
$ ./configure --host armv5tejl-unknown-linux-gnueabihf
After that, I just built as usual.
If you agree that this is a problem, I can create a Mantis
bug report for it (if you like).
Best regards,
Jeffrey
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Caml-list] Re: ARM code generator problem
2012-08-13 19:21 ` [Caml-list] " Jeffrey Scofield
@ 2012-08-14 7:11 ` Benedikt Meurer
2012-08-17 4:26 ` Jeffrey Scofield
0 siblings, 1 reply; 9+ messages in thread
From: Benedikt Meurer @ 2012-08-14 7:11 UTC (permalink / raw)
To: Jeffrey Scofield; +Cc: Benedikt Meurer, Caml List
On Aug 13, 2012, at 21:21 , Jeffrey Scofield wrote:
> OCamlers, Benedikt:
Hey Jeffrey,
>>> The result is that a value in d7 is sometimes destroyed by a use of s14
>>> as a scratch register. In my code it was a call to float_of_int that
>>> destroyed a float value being kept in d7.
>>
>> If you look at destroyed_at_oper in asmcomp/arm/proc.ml, you'll see that
>> d7 (s14+s15) is marked as destroyed for those operations where it is
>> used as scratch register.
>
> I was able to reproduce this behavior with the stock OCaml 4.00.0 compiler,
> so I really do think there's a problem.
>
> I whittled my code down to just a few lines. Here it is:
>
> let rate_pos scounts : float =
> let m_MIN = -999.0
> in let max1s = Array.make 14 m_MIN
> in let max2s = Array.make_matrix 14 14 m_MIN
> in let try_build (k1: int) (m: float) : unit =
> let denom = 12
> in let try1b (sawk1, xct) k =
> let () =
> if max2s.(k1).(k) > m then
> let adjm = if m <= m_MIN then 0.0 else m
> in let numer =
> if k = k1 then 48
> else if sawk1 then 36
> else 24
> in let f = float_of_int numer /. float_of_int denom
> in let () =
> if max1s.(k1) <= m_MIN then max1s.(k1) <- 0.0
> in
> max1s.(k1) <-
> max1s.(k1) +. (max2s.(k1).(k) -. adjm) *. f
> in
> if k = k1 then
> (true, xct)
> else
> (sawk1, xct + scounts.(k))
> in
> ignore (List.fold_left try1b (false, 0) [])
> in let () = Array.iteri try_build max1s
> in
> 0.0
>
> (This is a heavily hacked up piece of an evaluation function for a card
> game app.)
>
> Here is my OCaml command line (running on Linux/ARM inside Qemu, as you
> suggested--it works!):
>
> $ ocamlopt -ffpu vfpv3 -c -S rate.ml
>
> I'm using vfpv3 because that's what I use for my iOS port. The system
> type is linux_eabihf, which is what you need to get vfpv3 support.
>
> The section that seems to misbehave is these three lines:
>
> in let f = float_of_int numer /. float_of_int denom
> in let () =
> if max1s.(k1) <= m_MIN then max1s.(k1) <- 0.0
>
> Here is the assembly code with added annotations:
>
> ldr r12, [r2, #16] @ r12 <- m_MIN block
> mov r0, r7, asr #1
> ldr r7, [r2, #20]
> movs r6, #0xc @ r6 <- denom
> fmsr s14, r6
> fsitod d10, s14 @ d10 <- float_of_int denom
> ldr r6, [r7, #-4]
> fldd d7, [r12, #0] @ d7 <- m_MIN
> ldr r12, [r2, #28]
> fmsr s14, r0 @ *** d7 is destroyed here ***
> fsitod d9, s14 @ d9 <- float_of_int numer
> cmp r12, r6, lsr #10
> bcs .L111
> add r6, r7, r12, lsl #2
> fldd d6, [r6, #-4] @ d6 <- max1s.(k1)
> fdivd d8, d9, d10
> fcmpd d6, d7 @ *** This comparison fails ***
> fmstat
> bhi .L104
>
> I built the OCaml 4.00.0 compiler from sources inside Qemu. The
> line for configure was just this:
>
> $ ./configure --host armv5tejl-unknown-linux-gnueabihf
>
> After that, I just built as usual.
>
> If you agree that this is a problem, I can create a Mantis
> bug report for it (if you like).
Jep, that's a bug indeed. Somewhow ocamlopt seems to believe that the Ifloatofint instruction preserves d7 although it is marked as destroyed for this operation.
> Best regards,
> Jeffrey
greets,
Benedikt
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Caml-list] Re: ARM code generator problem
2012-08-14 7:11 ` Benedikt Meurer
@ 2012-08-17 4:26 ` Jeffrey Scofield
0 siblings, 0 replies; 9+ messages in thread
From: Jeffrey Scofield @ 2012-08-17 4:26 UTC (permalink / raw)
To: Benedikt Meurer; +Cc: Jeffrey Scofield, Caml List
Benedikt and OCamlers,
On Aug 14, 2012, at 12:11 AM, Benedikt Meurer wrote:
> Jep, that's a bug indeed. Somewhow ocamlopt seems to believe that the
> Ifloatofint instruction preserves d7 although it is marked as destroyed
> for this operation.
I created a Mantis issue for this problem, 5731:
http://caml.inria.fr/mantis/view.php?id=5731
Regards,
Jeffrey
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-08-17 4:26 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-10 21:41 [Caml-list] ARM code generator problem Jeffrey Scofield
2012-08-11 8:00 ` Benedikt Meurer
2012-08-11 8:13 ` Benedikt Meurer
2012-08-11 8:57 ` Jeffrey Scofield
2012-08-11 9:48 ` [Caml-list] " Benedikt Meurer
2012-08-11 8:52 ` [Caml-list] " Jeffrey Scofield
2012-08-13 19:21 ` [Caml-list] " Jeffrey Scofield
2012-08-14 7:11 ` Benedikt Meurer
2012-08-17 4:26 ` Jeffrey Scofield
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox