[Caml-list] Compiler bug?

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

* [Caml-list] Compiler bug?
@ 2012-08-06 10:04 Dmitry Bely
  2012-08-06 10:11 ` Alain Frisch
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Bely @ 2012-08-06 10:04 UTC (permalink / raw)
  To: caml-list

Trying to catch some very rare floating-point related bug I've
realized that I probably don't understand some basic things about
OCaml execution model. Let's consider the simplest function:

let add1 x = x +. 1.

For x86 it is compiled as

	.CODE
	ALIGN	4
	PUBLIC	_camlTest__add1_1008
_camlTest__add1_1008:
	sub	esp, 8
L100:
	mov	ebx, eax
L101:	mov	eax, _caml_young_ptr
	sub	eax, 12
	mov	_caml_young_ptr, eax
	cmp	eax, _caml_young_limit
	jb	L102
	lea	eax, [eax+4]
	mov	DWORD PTR [eax-4],2301
	fld1
	fadd	REAL8 PTR [ebx]
	fstp	REAL8 PTR [eax]
	add    esp, 8
	ret
L102:	call	_caml_call_gc
L103:	jmp	L101

X parameter is passed as a pointer to float value ([eax] which is
immediately saved to [ebx]). But If GC happens (L102), float value can
be moved around the OCaml heap and [ebx] become invalid, right? So the
generated code is just wrong. Am I missing something?

- Dmitry Bely

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-06 10:04 [Caml-list] Compiler bug? Dmitry Bely
@ 2012-08-06 10:11 ` Alain Frisch
  2012-08-06 10:20   ` Dmitry Bely
  0 siblings, 1 reply; 30+ messages in thread
From: Alain Frisch @ 2012-08-06 10:11 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: caml-list

On 08/06/2012 12:04 PM, Dmitry Bely wrote:
> X parameter is passed as a pointer to float value ([eax] which is
> immediately saved to [ebx]). But If GC happens (L102), float value can
> be moved around the OCaml heap and [ebx] become invalid, right? So the
> generated code is just wrong. Am I missing something?

The GC knows about the location of OCaml values, including those stored 
in registers when it is called (the compiler generate tables to retrieve 
this information from the GC call site).  These values are used as roots 
for the GC, and they are updated when the value is moved.


Alain

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-06 10:11 ` Alain Frisch
@ 2012-08-06 10:20   ` Dmitry Bely
  2012-08-06 10:34     ` Alain Frisch
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Bely @ 2012-08-06 10:20 UTC (permalink / raw)
  To: caml-list

On Mon, Aug 6, 2012 at 2:11 PM, Alain Frisch <alain@frisch.fr> wrote:
> On 08/06/2012 12:04 PM, Dmitry Bely wrote:
>>
>> X parameter is passed as a pointer to float value ([eax] which is
>> immediately saved to [ebx]). But If GC happens (L102), float value can
>> be moved around the OCaml heap and [ebx] become invalid, right? So the
>> generated code is just wrong. Am I missing something?
>
>
> The GC knows about the location of OCaml values, including those stored in
> registers when it is called (the compiler generate tables to retrieve this
> information from the GC call site).  These values are used as roots for the
> GC, and they are updated when the value is moved.

I always thought that local roots are memory-based. How GC knows that
in this case [ebx] should be updated? (remember, the parameter was
passed in [eax]) Could you point me to the relevant part of Ocaml
compiler sources?

- Dmitry Bely

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-06 10:20   ` Dmitry Bely
@ 2012-08-06 10:34     ` Alain Frisch
  2012-08-06 11:03       ` Dmitry Bely
  0 siblings, 1 reply; 30+ messages in thread
From: Alain Frisch @ 2012-08-06 10:34 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: caml-list

On 08/06/2012 12:20 PM, Dmitry Bely wrote:
> I always thought that local roots are memory-based. How GC knows that
> in this case [ebx] should be updated? (remember, the parameter was
> passed in [eax]) Could you point me to the relevant part of Ocaml
> compiler sources?

The compiler generates 'frame descriptors', which associate to each 
possible GC call site the set of registers and stack slots which hold 
values.

Look for caml_frame_descriptors.  The logic to scan frame descriptors is 
in asmrun/roots.c.  The implementation of caml_call_gc (e.g. in amd64.S) 
saves/restored machine registers to/from caml_gc_regs.

Alain

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-06 10:34     ` Alain Frisch
@ 2012-08-06 11:03       ` Dmitry Bely
  2012-08-06 11:32         ` Alain Frisch
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Bely @ 2012-08-06 11:03 UTC (permalink / raw)
  To: caml-list

On Mon, Aug 6, 2012 at 2:34 PM, Alain Frisch <alain@frisch.fr> wrote:
> On 08/06/2012 12:20 PM, Dmitry Bely wrote:
>>
>> I always thought that local roots are memory-based. How GC knows that
>> in this case [ebx] should be updated? (remember, the parameter was
>> passed in [eax]) Could you point me to the relevant part of Ocaml
>> compiler sources?
>
>
> The compiler generates 'frame descriptors', which associate to each possible
> GC call site the set of registers and stack slots which hold values.
>
> Look for caml_frame_descriptors.  The logic to scan frame descriptors is in
> asmrun/roots.c.  The implementation of caml_call_gc (e.g. in amd64.S)
> saves/restored machine registers to/from caml_gc_regs.

Aha. _caml_call_gc is

_caml_call_gc:
    ; Record lowest stack address and return address
        mov	eax, [esp]
        mov     _caml_last_return_address, eax
        lea     eax, [esp+4]
        mov     _caml_bottom_of_stack, eax
    ; Save all regs used by the code generator
L105:   push    ebp
        push    edi
        push    esi
        push    edx
        push    ecx
        push    ebx
        push    eax
        mov     _caml_gc_regs, esp
    ; Call the garbage collector
        call	_caml_garbage_collection
    ; Restore all regs used by the code generator
	pop     eax
        pop     ebx
        pop     ecx
        pop     edx
        pop     esi
        pop     edi
        pop     ebp
    ; Return to caller
        ret

it preserves registers and so it cannot update them, like it does with
memory-based local roots. The initial question if [ebx] contents is
invalidated by GC remains open, at least for me.

- Dmitry Bely

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-06 11:03       ` Dmitry Bely
@ 2012-08-06 11:32         ` Alain Frisch
  2012-08-06 12:16           ` Dmitry Bely
                             ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Alain Frisch @ 2012-08-06 11:32 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: caml-list

On 08/06/2012 01:03 PM, Dmitry Bely wrote:
> _caml_call_gc:
>      ; Record lowest stack address and return address
>          mov	eax, [esp]
>          mov     _caml_last_return_address, eax
>          lea     eax, [esp+4]
>          mov     _caml_bottom_of_stack, eax
>      ; Save all regs used by the code generator
> L105:   push    ebp
>          push    edi
>          push    esi
>          push    edx
>          push    ecx
>          push    ebx
>          push    eax
>          mov     _caml_gc_regs, esp
>      ; Call the garbage collector
>          call	_caml_garbage_collection
>      ; Restore all regs used by the code generator
> 	pop     eax
>          pop     ebx
>          pop     ecx
>          pop     edx
>          pop     esi
>          pop     edi
>          pop     ebp
>      ; Return to caller
>          ret
>
> it preserves registers and so it cannot update them

No, this code does not preserves registers.  It stores them on the 
stack, pass a pointer to them to the C code (in caml_gc_regs), and then 
restore their values from the stack.  In between, the C code has updated 
the values on the stack, so the restored values are not the saved ones.

-- Alain

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-06 11:32         ` Alain Frisch
@ 2012-08-06 12:16           ` Dmitry Bely
  2012-08-07  1:35           ` Cedric Cellier
  2012-08-08 16:03           ` Dmitry Bely
  2 siblings, 0 replies; 30+ messages in thread
From: Dmitry Bely @ 2012-08-06 12:16 UTC (permalink / raw)
  To: caml-list

On Mon, Aug 6, 2012 at 3:32 PM, Alain Frisch <alain@frisch.fr> wrote:
> On 08/06/2012 01:03 PM, Dmitry Bely wrote:
>>
>> _caml_call_gc:
>>      ; Record lowest stack address and return address
>>          mov    eax, [esp]
>>          mov     _caml_last_return_address, eax
>>          lea     eax, [esp+4]
>>          mov     _caml_bottom_of_stack, eax
>>      ; Save all regs used by the code generator
>> L105:   push    ebp
>>          push    edi
>>          push    esi
>>          push    edx
>>          push    ecx
>>          push    ebx
>>          push    eax
>>          mov     _caml_gc_regs, esp
>>      ; Call the garbage collector
>>          call   _caml_garbage_collection
>>      ; Restore all regs used by the code generator
>>         pop     eax
>>          pop     ebx
>>          pop     ecx
>>          pop     edx
>>          pop     esi
>>          pop     edi
>>          pop     ebp
>>      ; Return to caller
>>          ret
>>
>> it preserves registers and so it cannot update them
>
>
> No, this code does not preserves registers.  It stores them on the stack,
> pass a pointer to them to the C code (in caml_gc_regs), and then restore
> their values from the stack.  In between, the C code has updated the values
> on the stack, so the restored values are not the saved ones.

OK, you are right. Now I think I understand better how it works. So
the problem is elsewhere (as expected). Thanks for the help!

- Dmitry Bely

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-06 11:32         ` Alain Frisch
  2012-08-06 12:16           ` Dmitry Bely
@ 2012-08-07  1:35           ` Cedric Cellier
  2012-08-08 16:03           ` Dmitry Bely
  2 siblings, 0 replies; 30+ messages in thread
From: Cedric Cellier @ 2012-08-07  1:35 UTC (permalink / raw)
  To: caml-list


----- Message d'origine -----
> > ; Save all regs used by the code generator
> No, this code does not preserves registers.

So you suggest this comment is a little misleading? :p


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-06 11:32         ` Alain Frisch
  2012-08-06 12:16           ` Dmitry Bely
  2012-08-07  1:35           ` Cedric Cellier
@ 2012-08-08 16:03           ` Dmitry Bely
  2012-08-08 18:03             ` Alain Frisch
  2 siblings, 1 reply; 30+ messages in thread
From: Dmitry Bely @ 2012-08-08 16:03 UTC (permalink / raw)
  To: caml-list

On Mon, Aug 6, 2012 at 3:32 PM, Alain Frisch <alain@frisch.fr> wrote:
> On 08/06/2012 01:03 PM, Dmitry Bely wrote:
>>
>> _caml_call_gc:
>>      ; Record lowest stack address and return address
>>          mov    eax, [esp]
>>          mov     _caml_last_return_address, eax
>>          lea     eax, [esp+4]
>>          mov     _caml_bottom_of_stack, eax
>>      ; Save all regs used by the code generator
>> L105:   push    ebp
>>          push    edi
>>          push    esi
>>          push    edx
>>          push    ecx
>>          push    ebx
>>          push    eax
>>          mov     _caml_gc_regs, esp
>>      ; Call the garbage collector
>>          call   _caml_garbage_collection
>>      ; Restore all regs used by the code generator
>>         pop     eax
>>          pop     ebx
>>          pop     ecx
>>          pop     edx
>>          pop     esi
>>          pop     edi
>>          pop     ebp
>>      ; Return to caller
>>          ret
>>
>> it preserves registers and so it cannot update them
>
>
> No, this code does not preserves registers.  It stores them on the stack,
> pass a pointer to them to the C code (in caml_gc_regs), and then restore
> their values from the stack.  In between, the C code has updated the values
> on the stack, so the restored values are not the saved ones.

One more question if you don't mind. Why FPU registers are not saved
before caml_garbage_collection? What guarantees that they are not
destroyed inside?

- Dmitry Bely

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-08 16:03           ` Dmitry Bely
@ 2012-08-08 18:03             ` Alain Frisch
  2012-08-08 18:22               ` Jesper Louis Andersen
  0 siblings, 1 reply; 30+ messages in thread
From: Alain Frisch @ 2012-08-08 18:03 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: caml-list

On 08/08/2012 06:03 PM, Dmitry Bely wrote:
> One more question if you don't mind. Why FPU registers are not saved
> before caml_garbage_collection? What guarantees that they are not
> destroyed inside?

I'm not sure, someone else would have to reply!  I guess that these 
registers are supposed to be preserved by the callee in the x64 ABI (and 
obviously, they don't hold pointers to OCaml values, so they don't have 
to be tracked by the GC).

-- Alain

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-08 18:03             ` Alain Frisch
@ 2012-08-08 18:22               ` Jesper Louis Andersen
  2012-08-08 18:40                 ` Dmitry Bely
                                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Jesper Louis Andersen @ 2012-08-08 18:22 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Dmitry Bely, caml-list

On Wed, Aug 8, 2012 at 8:03 PM, Alain Frisch <alain@frisch.fr> wrote:

> I'm not sure, someone else would have to reply!  I guess that these
> registers are supposed to be preserved by the callee in the x64 ABI (and
> obviously, they don't hold pointers to OCaml values, so they don't have to
> be tracked by the GC).

Also, the GC itself will not be using Floating Point code at all, so
we can save a lot by not saving/restoring these values on the stack.
It is akin to what the Linux kernel does: trap on the first FP
instruction and only then do work to save/restore the FP context - but
here we know a priori we never will access FP. By doing so we can cut
the time to the GC call down - and reap the benefits by having a
faster GC cycle.

-- 
J.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-08 18:22               ` Jesper Louis Andersen
@ 2012-08-08 18:40                 ` Dmitry Bely
  2012-08-08 19:29                   ` Fabrice Le Fessant
  2012-08-08 23:34                 ` Anil Madhavapeddy
  2012-08-09  0:53                 ` Francois Berenger
  2 siblings, 1 reply; 30+ messages in thread
From: Dmitry Bely @ 2012-08-08 18:40 UTC (permalink / raw)
  To: caml-list

On Wed, Aug 8, 2012 at 10:22 PM, Jesper Louis Andersen
<jesper.louis.andersen@gmail.com> wrote:
> On Wed, Aug 8, 2012 at 8:03 PM, Alain Frisch <alain@frisch.fr> wrote:
>
>> I'm not sure, someone else would have to reply!  I guess that these
>> registers are supposed to be preserved by the callee in the x64 ABI (and
>> obviously, they don't hold pointers to OCaml values, so they don't have to
>> be tracked by the GC).

Yes, they should not be tracked by GC, but they may hold some FP
values (can they across GC call?). Actually for x64 FPU registers are
saved; see amd64.S/amd64nt.asm. But not for x86. I'm curious why.

> Also, the GC itself will not be using Floating Point code at all, so
> we can save a lot by not saving/restoring these values on the stack.

This is not entirely true; GC also triggers pending signals processing
that can modify FPU state and calls C library fuctions (e.g. malloc)
that can do anything inside.

> It is akin to what the Linux kernel does: trap on the first FP
> instruction and only then do work to save/restore the FP context - but
> here we know a priori we never will access FP. By doing so we can cut
> the time to the GC call down - and reap the benefits by having a
> faster GC cycle.

- Dmitry Bely

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-08 18:40                 ` Dmitry Bely
@ 2012-08-08 19:29                   ` Fabrice Le Fessant
  0 siblings, 0 replies; 30+ messages in thread
From: Fabrice Le Fessant @ 2012-08-08 19:29 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: caml-list

On Wed, Aug 8, 2012 at 8:40 PM, Dmitry Bely <dmitry.bely@gmail.com> wrote:
>>> I'm not sure, someone else would have to reply!  I guess that these
>>> registers are supposed to be preserved by the callee in the x64 ABI (and
>>> obviously, they don't hold pointers to OCaml values, so they don't have to
>>> be tracked by the GC).
>
> Yes, they should not be tracked by GC, but they may hold some FP
> values (can they across GC call?). Actually for x64 FPU registers are
> saved; see amd64.S/amd64nt.asm. But not for x86. I'm curious why.

MMX registers are not used by OCaml on x86 (except in the SSE2
branch), the backend still uses the x87 stack. I have no x86 computer
at hand to check, but I suppose that all temporary floating point
values are stored on the C stack by the generated code between
floating point computations, for example when calling C functions
(including the garbage collector).

>> Also, the GC itself will not be using Floating Point code at all, so
>> we can save a lot by not saving/restoring these values on the stack.
>
> This is not entirely true; GC also triggers pending signals processing
> that can modify FPU state and calls C library fuctions (e.g. malloc)
> that can do anything inside.

The kernel is supposed to save all the context before calling signal
handlers, otherwise, you would have to disable signals handling during
all floating point computations, no ?

--Fabrice

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-08 18:22               ` Jesper Louis Andersen
  2012-08-08 18:40                 ` Dmitry Bely
@ 2012-08-08 23:34                 ` Anil Madhavapeddy
  2012-08-09  0:53                 ` Francois Berenger
  2 siblings, 0 replies; 30+ messages in thread
From: Anil Madhavapeddy @ 2012-08-08 23:34 UTC (permalink / raw)
  To: Jesper Louis Andersen
  Cc: Alain Frisch, Dmitry Bely, caml-list@inria.fr List, PALI Gabor Janos

On 8 Aug 2012, at 11:22, Jesper Louis Andersen <jesper.louis.andersen@gmail.com> wrote:

> On Wed, Aug 8, 2012 at 8:03 PM, Alain Frisch <alain@frisch.fr> wrote:
> 
>> I'm not sure, someone else would have to reply!  I guess that these
>> registers are supposed to be preserved by the callee in the x64 ABI (and
>> obviously, they don't hold pointers to OCaml values, so they don't have to
>> be tracked by the GC).
> 
> Also, the GC itself will not be using Floating Point code at all, so
> we can save a lot by not saving/restoring these values on the stack.
> It is akin to what the Linux kernel does: trap on the first FP
> instruction and only then do work to save/restore the FP context - but
> here we know a priori we never will access FP. By doing so we can cut
> the time to the GC call down - and reap the benefits by having a
> faster GC cycle.

The GC does use floating point code in places. See byterun/major_gc.c
and the caml_major_collection_slice() function.  Gabor Pali is hacking on
a FreeBSD kernel module port of the OCaml runtime [1] and disabled all the
FP use to run without tripping over the trap checks in kFreeBSD [2].

[1] https://github.com/pgj/mirage-kfreebsd
[2] https://github.com/pgj/mirage-kfreebsd/commit/4c1859b88d6da540e7246e493809ff6f38ea344e

-anil

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] Compiler bug?
  2012-08-08 18:22               ` Jesper Louis Andersen
  2012-08-08 18:40                 ` Dmitry Bely
  2012-08-08 23:34                 ` Anil Madhavapeddy
@ 2012-08-09  0:53                 ` Francois Berenger
  2 siblings, 0 replies; 30+ messages in thread
From: Francois Berenger @ 2012-08-09  0:53 UTC (permalink / raw)
  To: caml-list

On 08/09/2012 03:22 AM, Jesper Louis Andersen wrote:
> On Wed, Aug 8, 2012 at 8:03 PM, Alain Frisch <alain@frisch.fr> wrote:
>
>> I'm not sure, someone else would have to reply!  I guess that these
>> registers are supposed to be preserved by the callee in the x64 ABI (and
>> obviously, they don't hold pointers to OCaml values, so they don't have to
>> be tracked by the GC).
>
> Also, the GC itself will not be using Floating Point code at all,

Is it possible to enforce that no Floating Point code
is used in some part of a project (let's say a module or a library)?

I have no use for it, I'm just curious.

 > so
> we can save a lot by not saving/restoring these values on the stack.
> It is akin to what the Linux kernel does: trap on the first FP
> instruction and only then do work to save/restore the FP context - but
> here we know a priori we never will access FP. By doing so we can cut
> the time to the GC call down - and reap the benefits by having a
> faster GC cycle.
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* compiler bug?
@ 2006-05-17 23:14 Dan Koppel
  2006-05-17 23:33 ` [Caml-list] " John Carr
  2006-05-18 17:15 ` Xavier Leroy
  0 siblings, 2 replies; 30+ messages in thread
From: Dan Koppel @ 2006-05-17 23:14 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

Hello Caml Group,
  I would like to report what I think might be a bug in the Ocaml compiler.  But first I wanted to run this by this group in case there's something I'm missing.  I have some very simple code that consists of 2 nested loops.  Inside the inner loop, is a simple statement.  Furthermore, the inner loop is not "tight".  Ie. the number of iterations within the inner loop is very large and the number of iterations of the outer loop is very small.  I then manually time this.  I then change the code by inserting a simple function call between the inner and outer loops.  This should have virtually no effect whatsoever.  However, when I time this, I get exactly twice the time.  This is somewhat inexplicable.  I tried tinkering with the "-inline" option for ocamlopt but this had no effect.  Below is the actual code (main.ml):

  let main () =
    let dummy1 = ref 0 in
  let dummy2 = ref 0.0 in
    for i = 1 to 4 do
    for j = 1 to 1000000000 do
      dummy1 := !dummy1 + 1;
      dummy1 := !dummy1 - 1
    done;
    dummy2 := Unix.gettimeofday ()
  done

  let _ = main ()

  I compile as follows: ocamlopt unix.cmxa main.ml
and run: ./a.out

  Is this in fact a bug of the ocamlopt compiler?  Or is there some way currently to make this effect disappear?

  Thanks for any help!
Sincerely,
  Dan Koppel

---------------------------------
Sneak preview the  all-new Yahoo.com. It's not radically different. Just radically better. 

[-- Attachment #2: Type: text/html, Size: 1907 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-17 23:14 compiler bug? Dan Koppel
@ 2006-05-17 23:33 ` John Carr
  2006-05-18 17:15 ` Xavier Leroy
  1 sibling, 0 replies; 30+ messages in thread
From: John Carr @ 2006-05-17 23:33 UTC (permalink / raw)
  To: Dan Koppel; +Cc: caml-list

On SPARC the presence of the function call in the outer loop causes
the code generated for the inner loop to change so the dummy1 variable
is stored on the stack instead of in a register.  Each loop iteration
loads dummy1, modifies it, and stores it back onto the stack.
The store-load hazard, loading a value that is in the store buffer,
adds a large delay.  The loop runs in half the time if I comment out
either the store or the load in the assembly.

If the inner loop did more computation the effect would be much less.

This is surprising but not strictly a bug.  Xavier Leroy has posted
about similar minor changes causing the compiler to box or unbox a
floating point value with major changes in performance.

>   I would like to report what I think might be a bug in the Ocaml compiler.  But first I wanted to run this by this group in case there's something I'm missing.  I have some very simple code that consists of 2 nested loops.  Inside the inner loop, is a simple statement.  Furthermore, the inner loop is not "tight".  Ie. the number of iterations within the inner loop is very large and the number of iterations of the outer loop is very small.  I then manually time this.  I then change the code by inserting a simple function call between the inner and outer loops.  This should have virtually no effect whatsoever.  However, when I time this, I get exactly twice the time.  This is somewhat inexplicable.  I tried tinkering with the "-inline" option for ocamlopt but this had no effect.  Below is the actual code (main.ml):
>    
>   let main () =
>     let dummy1 = ref 0 in
>   let dummy2 = ref 0.0 in
>     for i = 1 to 4 do
>     for j = 1 to 1000000000 do
>       dummy1 := !dummy1 + 1;
>       dummy1 := !dummy1 - 1
>     done;
>     dummy2 := Unix.gettimeofday ()
>   done
>    
>   let _ = main ()
>    
>   I compile as follows: ocamlopt unix.cmxa main.ml
> and run: ./a.out
>    
>   Is this in fact a bug of the ocamlopt compiler?  Or is there some way currently to make this effect disappear?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-17 23:14 compiler bug? Dan Koppel
  2006-05-17 23:33 ` [Caml-list] " John Carr
@ 2006-05-18 17:15 ` Xavier Leroy
  2006-05-18 17:34   ` Jacques Carette
  1 sibling, 1 reply; 30+ messages in thread
From: Xavier Leroy @ 2006-05-18 17:15 UTC (permalink / raw)
  To: Dan Koppel; +Cc: caml-list

>   I would like to report what I think might be a bug in the Ocaml
> compiler.  But first I wanted to run this by this group in case there's
> something I'm missing.  I have some very simple code that consists of 2
> nested loops.  Inside the inner loop, is a simple statement.
> Furthermore, the inner loop is not "tight".  Ie. the number of
> iterations within the inner loop is very large and the number of
> iterations of the outer loop is very small.  I then manually time this.
> I then change the code by inserting a simple function call between the
> inner and outer loops.  This should have virtually no effect
> whatsoever.  However, when I time this, I get exactly twice the time.
> This is somewhat inexplicable.

As John Carr mentioned, the function call forces the variables that
are live across the call (i.e. dummy1) to be spilled, causing one additional
memory read and one additional write at each iteration of the inner
loop.  Since your inner loop is approximately 4 integer instructions,
the additional memory traffic can easily halve performance.

I grant you that spill code generation could be more clever here and
spill/reload around the whole inner loop.  But, no, it's not a bug:
Ocaml has a bug when the generated code crashes or produces incorrect
results.

> I learned in basic compiler theory that when a function is called, you
> save all the registers before entering the function.

That's an over-simplification.  First, many compilers designate a
number of registers as "callee-save", meaning that all functions must
preserve their values.  These would be preserved across a call.
It happens that ocamlopt does not implement callee-save registers, as
they make garbage collection (more exactly, stack traversal) and
exception handling more expensive.  But even when a function call
destroys all registers, as with ocamlopt, there are many possible
placements for spills (saving a register in the stack) and reloads (of
a register from the stack), and the placement you suggest (only around
calls) is rarely optimal.  Well, in your example it is :-)

Generation of spill code is a dark corner of compiler construction.
As with many other compilation problems, there are a bunch of
NP-completeness results.  Unlike many other compilation problems, I'm
not aware of any published heuristics that works well in most cases.
Well, there is the paper by George and Appel where they use an ILP
solver (integer linear programming) to produce optimal spills, but
that's kind of expensive...

- Xavier Leroy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-18 17:15 ` Xavier Leroy
@ 2006-05-18 17:34   ` Jacques Carette
  2006-05-18 17:46     ` Xavier Leroy
  2006-05-18 18:19     ` skaller
  0 siblings, 2 replies; 30+ messages in thread
From: Jacques Carette @ 2006-05-18 17:34 UTC (permalink / raw)
  To: caml-list

Xavier Leroy wrote:

>Generation of spill code is a dark corner of compiler construction.
>As with many other compilation problems, there are a bunch of
>NP-completeness results.  Unlike many other compilation problems, I'm
>not aware of any published heuristics that works well in most cases.
>Well, there is the paper by George and Appel where they use an ILP
>solver (integer linear programming) to produce optimal spills, but
>that's kind of expensive...
>  
>

It is my impression that users of compilers are "ready" for the 
following situation:
1) an optimizing compiler (like ocamlopt!) that produces good code 
efficiently
2) a super-optimizing compiler that produces fantastic code, at whatever 
cost.

Such a compiler would probably rapidly find a niche of fervent users. 

Jacques


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-18 17:34   ` Jacques Carette
@ 2006-05-18 17:46     ` Xavier Leroy
  2006-05-18 19:31       ` Jacques Carette
  2006-05-18 18:19     ` skaller
  1 sibling, 1 reply; 30+ messages in thread
From: Xavier Leroy @ 2006-05-18 17:46 UTC (permalink / raw)
  To: Jacques Carette; +Cc: caml-list

>> Well, there is the paper by George and Appel where they use an ILP
>> solver (integer linear programming) to produce optimal spills, but
>> that's kind of expensive...
>
> It is my impression that users of compilers are "ready" for the
> following situation:
> 1) an optimizing compiler (like ocamlopt!) that produces good code
> efficiently
> 2) a super-optimizing compiler that produces fantastic code, at whatever
> cost.

Clearly, you're not the guy who would have to support both compilers :-)
More seriously, I agree that optional "super-optimization" passes
could be useful in some cases.

Actually, George and Appel found that compilation times with their
approach were almost reasonable (e.g. a few minutes instead of a few
seconds for a standard compiler), but they had to use a commercial ILP
solver.  If only there were *really good* ILP and SAT solvers under free
licenses...

- Xavier Leroy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-18 17:46     ` Xavier Leroy
@ 2006-05-18 19:31       ` Jacques Carette
  2006-05-18 20:07         ` David Brown
  0 siblings, 1 reply; 30+ messages in thread
From: Jacques Carette @ 2006-05-18 19:31 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

Xavier Leroy wrote:

>Clearly, you're not the guy who would have to support both compilers :-)
>  
>
Clearly :-).  [Although I've done my time maintaining ancient software 
used by ~2 million users, so you only get so much sympathy from me ;-) ]

>Actually, George and Appel found that compilation times with their
>approach were almost reasonable (e.g. a few minutes instead of a few
>seconds for a standard compiler), but they had to use a commercial ILP
>solver.  If only there were *really good* ILP and SAT solvers under free
>licenses...
>  
>
In Computer Algebra, people use Groebner bases all the time.  They have 
doubly-exponential worst-case complexity -- but seem to work rather well 
in practice.  So I have stopped paying attention to worst-case; average 
case, when available, does matter a lot more.

For ILP, I have found
http://www-unix.mcs.anl.gov/otc/Guide/faq/linear-programming-faq.html#Q2
to be quite informative about current sources of "free" ILP solvers.  Of 
particular interest are:
GLPK: http://www.gnu.org/software/glpk/glpk.html
lp_solve: http://groups.yahoo.com/group/lp_solve/ 

For SAT, things are weirder.  Of course there is
http://www.satlib.org/
as well as SVC
http://chicory.stanford.edu/SVC/
and CVC Lite
http://www.cs.nyu.edu/acsys/cvcl/
(these last 2 for SMT rather than pure SAT) which are under compatible 
licenses.  But at
http://www.qbflib.org/
and
http://www.satlive.org/
there are a number of additional candidates.

Of course, getting an agreement with SRI to use Yices 
(http://fm.csl.sri.com/yices/) would go a long way towards satisfying 
the "really good" requirement...

Jacques

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-18 19:31       ` Jacques Carette
@ 2006-05-18 20:07         ` David Brown
  2006-05-18 20:15           ` Jacques Carette
  2006-05-18 20:20           ` Alain Frisch
  0 siblings, 2 replies; 30+ messages in thread
From: David Brown @ 2006-05-18 20:07 UTC (permalink / raw)
  To: Jacques Carette; +Cc: Xavier Leroy, caml-list

On Thu, May 18, 2006 at 03:31:26PM -0400, Jacques Carette wrote:

> In Computer Algebra, people use Groebner bases all the time.  They have 
> doubly-exponential worst-case complexity -- but seem to work rather well 
> in practice.  So I have stopped paying attention to worst-case; average 
> case, when available, does matter a lot more.

Except when someone made a decision like this, in say a revision control
system, and suddenly you discover that you've provoked a worst case
scenario, and it suddenly takes hundreds of cpu hours to check in a file
rather than a few seconds.

For something like a compiler, worse case behavior is very important.  It
is not generally acceptable for a build to just hang in the compiler.

Dave

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-18 20:07         ` David Brown
@ 2006-05-18 20:15           ` Jacques Carette
  2006-05-18 20:20           ` Alain Frisch
  1 sibling, 0 replies; 30+ messages in thread
From: Jacques Carette @ 2006-05-18 20:15 UTC (permalink / raw)
  To: David Brown; +Cc: caml-list



David Brown wrote:

>>In Computer Algebra, people use Groebner bases all the time.  They have 
>>doubly-exponential worst-case complexity -- but seem to work rather well 
>>in practice.  So I have stopped paying attention to worst-case; average 
>>case, when available, does matter a lot more.
>>    
>>
>
>Except when someone made a decision like this, in say a revision control
>system, and suddenly you discover that you've provoked a worst case
>scenario, and it suddenly takes hundreds of cpu hours to check in a file
>rather than a few seconds.
>  
>
I see you've used ClearCase!   Rather unpleasant experience.

>For something like a compiler, worse case behavior is very important.  It
>is not generally acceptable for a build to just hang in the compiler.
>
Not the compiler, the super-optimizing pass in the compiler.  I 
completely agree that the base compiler should not hang a build, and 
that complexity (including worst-case) there matters a lot.

Speaking of which, have you ever tried to compile an Ocaml program with 
a LOT of functors in it?  It requires quite a bit of patience...

Jacques


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-18 20:07         ` David Brown
  2006-05-18 20:15           ` Jacques Carette
@ 2006-05-18 20:20           ` Alain Frisch
  1 sibling, 0 replies; 30+ messages in thread
From: Alain Frisch @ 2006-05-18 20:20 UTC (permalink / raw)
  To: David Brown; +Cc: caml-list

David Brown wrote:
> For something like a compiler, worse case behavior is very important.

All the decent programming languages I'm using nowadays have a type
system whose worse case complexity is exponential in the size of the
source. No compiler can be better than exponential. Do you think I
should stop using these languages?

-- Alain

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-18 17:34   ` Jacques Carette
  2006-05-18 17:46     ` Xavier Leroy
@ 2006-05-18 18:19     ` skaller
  2006-05-18 18:53       ` Jacques Carette
  1 sibling, 1 reply; 30+ messages in thread
From: skaller @ 2006-05-18 18:19 UTC (permalink / raw)
  To: Jacques Carette; +Cc: caml-list

On Thu, 2006-05-18 at 13:34 -0400, Jacques Carette wrote:

> It is my impression that users of compilers are "ready" for the 
> following situation:
> 1) an optimizing compiler (like ocamlopt!) that produces good code 
> efficiently
> 2) a super-optimizing compiler that produces fantastic code, at whatever 
> cost.
> 
> Such a compiler would probably rapidly find a niche of fervent users. 

What about high level optimisations?

Felix supports this:

	reduce revrev[t] (x:list[t]): rev (rev x) => x;

which, combined with inlining, removes adjacent list reversals.
This is a fairly trivial example of integrating logic with
programming as a way of achieving both correctness and
performance: the reduction above provides both semantic
knowledge to the reader as well as allowing the compiler
to generate better code.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-18 18:19     ` skaller
@ 2006-05-18 18:53       ` Jacques Carette
  2006-05-19  1:47         ` skaller
  0 siblings, 1 reply; 30+ messages in thread
From: Jacques Carette @ 2006-05-18 18:53 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

skaller wrote:

>What about high level optimisations?
>
>Felix supports this:
>
>	reduce revrev[t] (x:list[t]): rev (rev x) => x;
>
>which, combined with inlining, removes adjacent list reversals.
>This is a fairly trivial example of integrating logic with
>programming as a way of achieving both correctness and
>performance: the reduction above provides both semantic
>knowledge to the reader as well as allowing the compiler
>to generate better code.
>  
>

Haskell (GHC to be precise) allows that too. But is syntactic 
term-rewriting, in other words it is *untyped*.

Ocaml is great because it is statically typed.  MetaOCaml is wonderful 
because it is statically typed.  Adding an untyped layer to a typed 
language seems like a definite step backwards (i.e. a hack).. 

Jacques


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-18 18:53       ` Jacques Carette
@ 2006-05-19  1:47         ` skaller
  2006-05-19  2:17           ` Brian Hurt
  2006-05-19 16:48           ` Jacques Carette
  0 siblings, 2 replies; 30+ messages in thread
From: skaller @ 2006-05-19  1:47 UTC (permalink / raw)
  To: Jacques Carette; +Cc: caml-list

On Thu, 2006-05-18 at 14:53 -0400, Jacques Carette wrote:
> skaller wrote:
> 
> >What about high level optimisations?
> >
> >Felix supports this:
> >
> >	reduce revrev[t] (x:list[t]): rev (rev x) => x;

> Haskell (GHC to be precise) allows that too. But is syntactic 
> term-rewriting, in other words it is *untyped*.

It's well typed. x:list[t] means x is of type list[t].

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-19  1:47         ` skaller
@ 2006-05-19  2:17           ` Brian Hurt
  2006-05-19  3:11             ` skaller
  2006-05-19 16:48           ` Jacques Carette
  1 sibling, 1 reply; 30+ messages in thread
From: Brian Hurt @ 2006-05-19  2:17 UTC (permalink / raw)
  To: skaller; +Cc: Jacques Carette, caml-list

On Fri, 19 May 2006, skaller wrote:

> On Thu, 2006-05-18 at 14:53 -0400, Jacques Carette wrote:
>> skaller wrote:
>>
>>> What about high level optimisations?
>>>
>>> Felix supports this:
>>>
>>> 	reduce revrev[t] (x:list[t]): rev (rev x) => x;
>
>> Haskell (GHC to be precise) allows that too. But is syntactic
>> term-rewriting, in other words it is *untyped*.
>
> It's well typed. x:list[t] means x is of type list[t].

I'm just wondering how often such optimizations are worthwhile.  From what 
I've read, basic register allocation and peephole optimizations gain you 
2x the performance- after that, it's a fight for percentage points.

Especially considering that humans tend to be a lot better at recognizing 
opportunities for high-level optimizations than computers are.  For 
example, a human would be much more likely to notice a double list 
reversal when the two reversals are in seperate modules, or otherwise 
obscured from the compiler.  Even more so, the programmer may decide that 
maybe a list isn't the right datastructure for this data- a decision I 
wouldn't trust to a compiler.

Brian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-19  2:17           ` Brian Hurt
@ 2006-05-19  3:11             ` skaller
  0 siblings, 0 replies; 30+ messages in thread
From: skaller @ 2006-05-19  3:11 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list, Jacques Carette

On Thu, 2006-05-18 at 22:17 -0400, Brian Hurt wrote:

> >>> 	reduce revrev[t] (x:list[t]): rev (rev x) => x;

> I'm just wondering how often such optimizations are worthwhile. 

Me too. I don't know. However the above type of optimisation
is only a hint at what can be done. It's what I could implement
easily in a few hours, after noting in generated code there
really were cases of double list reversals.

>  From what 
> I've read, basic register allocation and peephole optimizations gain you 
> 2x the performance- after that, it's a fight for percentage points.

Most people seem to think eliminating array bounds checks is
worthwhile. 

Haskell certainly thinks deforestation and closure
elimination to be useful. These are high level optimisations
which I expect to have much more impact in most cases
than register fiddling. (They'd better .. GHC is still
a snail solving some very simple problems)

Indeed, the current Felix optimiser was obtained by noting
the woeful code generated for things like the Shootout
multiple loop test, and adding enough heuristics to
eliminate closures so the code reduced to the same
kind of basic C code a C programmer would write.

Of course I hope gcc -O3 etc will do the low level optimisations
for me, but it has to start with decent C code for that
to work.

> Especially considering that humans tend to be a lot better at recognizing 
> opportunities for high-level optimizations than computers are.  For 
> example, a human would be much more likely to notice a double list 
> reversal when the two reversals are in seperate modules, or otherwise 
> obscured from the compiler.  Even more so, the programmer may decide that 
> maybe a list isn't the right datastructure for this data- a decision I 
> wouldn't trust to a compiler.

I agree. Also note applying even reductions of the form above is
extremely expensive in compile time (matching every subexpression
looking for double list reversals cannot be fast :)

However, the point is that there is scope for much improvement
with high level optimisations. 

In particular, one hopes they can be hinted at in a way that
not only makes faster code .. but does so without compromising
correctness or other safety features.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-19  1:47         ` skaller
  2006-05-19  2:17           ` Brian Hurt
@ 2006-05-19 16:48           ` Jacques Carette
  2006-05-19 19:10             ` skaller
  1 sibling, 1 reply; 30+ messages in thread
From: Jacques Carette @ 2006-05-19 16:48 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

skaller wrote:

>>>What about high level optimisations?
>>>
>>>Felix supports this:
>>>
>>>	reduce revrev[t] (x:list[t]): rev (rev x) => x;
>>>      
>>>
>>Haskell (GHC to be precise) allows that too. But is syntactic 
>>term-rewriting, in other words it is *untyped*.
>>    
>>
>
>It's well typed. x:list[t] means x is of type list[t].
>  
>
The *result* is well-typed.  What is the 'type' of the rule (ie the 
'function' reduce) ?  reduce acts on the programming language syntax, 
not on semantic values.

Jacques


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Caml-list] compiler bug?
  2006-05-19 16:48           ` Jacques Carette
@ 2006-05-19 19:10             ` skaller
  0 siblings, 0 replies; 30+ messages in thread
From: skaller @ 2006-05-19 19:10 UTC (permalink / raw)
  To: Jacques Carette; +Cc: caml-list

On Fri, 2006-05-19 at 12:48 -0400, Jacques Carette wrote:
> skaller wrote:
> 
> >>>What about high level optimisations?
> >>>
> >>>Felix supports this:
> >>>
> >>>	reduce revrev[t] (x:list[t]): rev (rev x) => x;
> >>>      
> >>>
> >>Haskell (GHC to be precise) allows that too. But is syntactic 
> >>term-rewriting, in other words it is *untyped*.

> >It's well typed. x:list[t] means x is of type list[t].
> >  
> >
> The *result* is well-typed.  What is the 'type' of the rule (ie the 
> 'function' reduce) ?  

> reduce acts on the programming language syntax, 
> not on semantic values.

Yes, but so what?

Originally you said it was *untyped*.
As if it were like a macro which rewrites all terms

	rev (rev x)

to

	x

but this is not the case.

The thing is Felix ALSO has macros which are indeed untyped:

	macro fun f(x) = x + x;

so f x is replaced by x + x everywhere the f is in scope.

So when you later said

"Ocaml is great because it is statically typed.  MetaOCaml is wonderful 
because it is statically typed.  Adding an untyped layer to a typed 
language seems like a definite step backwards (i.e. a hack).. "

I think you're missing the point: I'd agree with that
comment and also agree it applied to Felix macros,
because they are indeed manipulations of untyped terms.
Macro processing is done first, before typing.

But 'reduce' is done after typing, it manipulates
typed terms: it isn't adding an *untyped* layer
to the system, on the contrary the semantics implied
are actually *beyond* ordinary static typing. We have
learned:

	rev ^ 2 = id (revrev)

and this property of 'rev' is a constraint on its semantics
(but of course not a complete specification).

What one might say is that the mechanism is somewhat
ad hoc rather than being systematic.

The problem here, if any, you yourself pointed out
in private email some time ago -- there's no assurance
the rules are actually reductions: one or more such
rules may well diverge.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2012-08-09  0:53 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-06 10:04 [Caml-list] Compiler bug? Dmitry Bely
2012-08-06 10:11 ` Alain Frisch
2012-08-06 10:20   ` Dmitry Bely
2012-08-06 10:34     ` Alain Frisch
2012-08-06 11:03       ` Dmitry Bely
2012-08-06 11:32         ` Alain Frisch
2012-08-06 12:16           ` Dmitry Bely
2012-08-07  1:35           ` Cedric Cellier
2012-08-08 16:03           ` Dmitry Bely
2012-08-08 18:03             ` Alain Frisch
2012-08-08 18:22               ` Jesper Louis Andersen
2012-08-08 18:40                 ` Dmitry Bely
2012-08-08 19:29                   ` Fabrice Le Fessant
2012-08-08 23:34                 ` Anil Madhavapeddy
2012-08-09  0:53                 ` Francois Berenger
  -- strict thread matches above, loose matches on Subject: below --
2006-05-17 23:14 compiler bug? Dan Koppel
2006-05-17 23:33 ` [Caml-list] " John Carr
2006-05-18 17:15 ` Xavier Leroy
2006-05-18 17:34   ` Jacques Carette
2006-05-18 17:46     ` Xavier Leroy
2006-05-18 19:31       ` Jacques Carette
2006-05-18 20:07         ` David Brown
2006-05-18 20:15           ` Jacques Carette
2006-05-18 20:20           ` Alain Frisch
2006-05-18 18:19     ` skaller
2006-05-18 18:53       ` Jacques Carette
2006-05-19  1:47         ` skaller
2006-05-19  2:17           ` Brian Hurt
2006-05-19  3:11             ` skaller
2006-05-19 16:48           ` Jacques Carette
2006-05-19 19:10             ` skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox