* [Caml-list] an implicit GC rule? @ 2018-05-02 16:09 Frederic Perriot 2018-05-05 3:24 ` Chet Murthy 0 siblings, 1 reply; 6+ messages in thread From: Frederic Perriot @ 2018-05-02 16:09 UTC (permalink / raw) To: caml-list Hello caml-list, I have a GC-related question. To give you some context, I'm writing a tool to parse .cmi files and generate .h and .c files, to facilitate constructing OCaml variants from C bindings. For instance, given the following source: type 'a tree = Leaf of 'a | Tree of 'a tree * 'a tree [@@h_file] the tool produces C functions: CAMLprim value Leaf(value arg1) { CAMLparam1(arg1); CAMLlocal1(obj); obj = caml_alloc_small(1, 0); Field(obj, 0) = arg1; CAMLreturn(obj); } CAMLprim value Tree(value arg1, value arg2) { // similar code here } From there, it's tempting to nest calls to variant constructors from C and write code such as: CAMLprim value left_comb(value a, value b, value c) { CAMLparam3(a, b, c); CAMLreturn(Tree(Tree(Leaf(a), Leaf(b)), Leaf(c))); } The problem with the above is the GC root loss due to the nesting of calls to allocating functions. Say Leaf(c) is constructed first, and the resulting value cached in a register, then Leaf(b) triggers a collection, thus invalidating the register contents, and leaving a dangling pointer in the top Tree. Here is an actual ocamlopt output, with Leaf(c) getting cached in rbx: 0x000000000040dbf4 <+149>: callq 0x40d8fd <Leaf> 0x000000000040dbf9 <+154>: mov %rax,%rbx 0x000000000040dbfc <+157>: mov -0x90(%rbp),%rax 0x000000000040dc03 <+164>: mov %rax,%rdi 0x000000000040dc06 <+167>: callq 0x40d8fd <Leaf> 0x000000000040dc0b <+172>: mov %rax,%r12 0x000000000040dc0e <+175>: mov -0x88(%rbp),%rax 0x000000000040dc15 <+182>: mov %rax,%rdi 0x000000000040dc18 <+185>: callq 0x40d8fd <Leaf> 0x000000000040dc1d <+190>: mov %r12,%rsi 0x000000000040dc20 <+193>: mov %rax,%rdi 0x000000000040dc23 <+196>: callq 0x40da19 <Tree> 0x000000000040dc28 <+201>: mov %rbx,%rsi 0x000000000040dc2b <+204>: mov %rax,%rdi 0x000000000040dc2e <+207>: callq 0x40da19 <Tree> While the C code clearly violates the spirit of the GC rules, I can't help but feel this is still a pitfall. Rule 2 of the manual states: "Local variables of type value must be declared with one of the CAMLlocal macros. [...]" But here, I'm not declaring local variables, unless you count compiler temporaries as local variables? I can see some other people making the same mistake I did. Should there be an explicit warning in the rules? maybe underlining that compiler temps count as variables, or discouraging the kind of nested calls returning values displayed above? thanks, Frédéric Perriot PS: this is also my first time posting to the list, so I take this opportunity to thank you for the great Q's and A's I've read here over the years ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] an implicit GC rule? 2018-05-02 16:09 [Caml-list] an implicit GC rule? Frederic Perriot @ 2018-05-05 3:24 ` Chet Murthy 2018-05-05 7:42 ` Xavier Leroy 0 siblings, 1 reply; 6+ messages in thread From: Chet Murthy @ 2018-05-05 3:24 UTC (permalink / raw) To: Frederic Perriot; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 4010 bytes --] Frederic, It's been a while since I did this sort of thing, but I suspect if you declare CAMLlocal variables for each intermediate expression, and stick in the assignments, that should solve your problem (while not making your code too ugly). E.g. CAMLprim value left_comb(value a, value b, value c) { CAMLparam3(a, b, c); CAMLlocal5(l1, l2, l3, l4, l5); CAMLreturn(l1 = Tree(l2 = Tree((l3 = Leaf(a)), (l4 = Leaf(b)), (l5 = Leaf(c)))); } Even better, you could linearize the tree of expressions into a sequence, and that should solve your problem, also. Uh, I think. Been a while since I wrote a lotta C/C++ code to interface with Ocaml, but this oughta work. --chet-- On Wed, May 2, 2018 at 9:09 AM, Frederic Perriot <fperriot@gmail.com> wrote: > Hello caml-list, > > I have a GC-related question. To give you some context, I'm writing a > tool to parse .cmi files and generate .h and .c files, to facilitate > constructing OCaml variants from C bindings. > > For instance, given the following source: > > type 'a tree = Leaf of 'a | Tree of 'a tree * 'a tree [@@h_file] > > > the tool produces C functions: > > CAMLprim value Leaf(value arg1) > { > CAMLparam1(arg1); > CAMLlocal1(obj); > > obj = caml_alloc_small(1, 0); > > Field(obj, 0) = arg1; > > CAMLreturn(obj); > } > > CAMLprim value Tree(value arg1, value arg2) > { > // similar code here > } > > > From there, it's tempting to nest calls to variant constructors from C > and write code such as: > > CAMLprim value left_comb(value a, value b, value c) > { > CAMLparam3(a, b, c); > CAMLreturn(Tree(Tree(Leaf(a), Leaf(b)), Leaf(c))); > } > > > The problem with the above is the GC root loss due to the nesting of > calls to allocating functions. > > Say Leaf(c) is constructed first, and the resulting value cached in a > register, then Leaf(b) triggers a collection, thus invalidating the > register contents, and leaving a dangling pointer in the top Tree. > > Here is an actual ocamlopt output, with Leaf(c) getting cached in rbx: > > 0x000000000040dbf4 <+149>: callq 0x40d8fd <Leaf> > 0x000000000040dbf9 <+154>: mov %rax,%rbx > 0x000000000040dbfc <+157>: mov -0x90(%rbp),%rax > 0x000000000040dc03 <+164>: mov %rax,%rdi > 0x000000000040dc06 <+167>: callq 0x40d8fd <Leaf> > 0x000000000040dc0b <+172>: mov %rax,%r12 > 0x000000000040dc0e <+175>: mov -0x88(%rbp),%rax > 0x000000000040dc15 <+182>: mov %rax,%rdi > 0x000000000040dc18 <+185>: callq 0x40d8fd <Leaf> > 0x000000000040dc1d <+190>: mov %r12,%rsi > 0x000000000040dc20 <+193>: mov %rax,%rdi > 0x000000000040dc23 <+196>: callq 0x40da19 <Tree> > 0x000000000040dc28 <+201>: mov %rbx,%rsi > 0x000000000040dc2b <+204>: mov %rax,%rdi > 0x000000000040dc2e <+207>: callq 0x40da19 <Tree> > > > While the C code clearly violates the spirit of the GC rules, I can't > help but feel this is still a pitfall. > > Rule 2 of the manual states: "Local variables of type value must be > declared with one of the CAMLlocal macros. [...]" > > But here, I'm not declaring local variables, unless you count compiler > temporaries as local variables? > > I can see some other people making the same mistake I did. Should > there be an explicit warning in the rules? maybe underlining that > compiler temps count as variables, or discouraging the kind of nested > calls returning values displayed above? > > thanks, > Frédéric Perriot > > PS: this is also my first time posting to the list, so I take this > opportunity to thank you for the great Q's and A's I've read here over > the years > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs [-- Attachment #2: Type: text/html, Size: 5307 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] an implicit GC rule? 2018-05-05 3:24 ` Chet Murthy @ 2018-05-05 7:42 ` Xavier Leroy 2018-05-05 14:11 ` [Caml-list] [ANN] Release 2.8.5 of Caph, a functional/dataflow language for programming FPGAs Jocelyn Sérot 2018-05-06 19:23 ` [Caml-list] an implicit GC rule? Chet Murthy 0 siblings, 2 replies; 6+ messages in thread From: Xavier Leroy @ 2018-05-05 7:42 UTC (permalink / raw) To: Chet Murthy; +Cc: Frederic Perriot, caml-list [-- Attachment #1: Type: text/plain, Size: 5207 bytes --] On Sat, May 5, 2018 at 5:25 AM Chet Murthy <murthy.chet@gmail.com> wrote: > It's been a while since I did this sort of thing, but I suspect if you > declare CAMLlocal variables for each intermediate expression, and stick in > the assignments, that should solve your problem (while not making your code > too ugly). E.g. > > CAMLprim value left_comb(value a, value b, value c) > { > CAMLparam3(a, b, c); > CAMLlocal5(l1, l2, l3, l4, l5); > CAMLreturn(l1 = Tree(l2 = Tree((l3 = Leaf(a)), (l4 = Leaf(b)), (l5 = > Leaf(c)))); > } > That's bold C/C++ programming! It might even work in C++, where assignment expressions are l-values if I remember correctly. However, I'm afraid it won't work in C because an assignment expression "lv = rv" is a r-value equal to the value of rv converted to the type of lv at the time the assignment is evaluated. So, if lv is a local variable registered with the GC, the GC will update lv when needed, but the "lv = rv" expression will keep its initial value. There's also the rules concerning sequence points. I think the code above respects the C99 rules but I'm less sure about the C11 rules. > > Even better, you could linearize the tree of expressions into a sequence, > and that should solve your problem, also. > Yes, that's the robust solution. Spelling it out: CAMLprim value left_comb(value a, value b, value c) { CAMLparam3(a, b, c); CAMLlocal5(la, lb, lc, tab, t); la = Leaf(a); lb = Leaf(b); lc = Leaf(c); tab = Tree(la, lb); t = Tree(tab, lc); CAMLreturn(t); } You can also do "CAMLreturn(Tree(tab, lc))" directly. - Xavier Leroy > Uh, I think. Been a while since I wrote a lotta C/C++ code to interface > with Ocaml, but this oughta work. > > --chet-- > > > On Wed, May 2, 2018 at 9:09 AM, Frederic Perriot <fperriot@gmail.com> > wrote: > >> Hello caml-list, >> >> I have a GC-related question. To give you some context, I'm writing a >> tool to parse .cmi files and generate .h and .c files, to facilitate >> constructing OCaml variants from C bindings. >> >> For instance, given the following source: >> >> type 'a tree = Leaf of 'a | Tree of 'a tree * 'a tree [@@h_file] >> >> >> the tool produces C functions: >> >> CAMLprim value Leaf(value arg1) >> { >> CAMLparam1(arg1); >> CAMLlocal1(obj); >> >> obj = caml_alloc_small(1, 0); >> >> Field(obj, 0) = arg1; >> >> CAMLreturn(obj); >> } >> >> CAMLprim value Tree(value arg1, value arg2) >> { >> // similar code here >> } >> >> >> From there, it's tempting to nest calls to variant constructors from C >> and write code such as: >> >> CAMLprim value left_comb(value a, value b, value c) >> { >> CAMLparam3(a, b, c); >> CAMLreturn(Tree(Tree(Leaf(a), Leaf(b)), Leaf(c))); >> } >> >> >> The problem with the above is the GC root loss due to the nesting of >> calls to allocating functions. >> >> Say Leaf(c) is constructed first, and the resulting value cached in a >> register, then Leaf(b) triggers a collection, thus invalidating the >> register contents, and leaving a dangling pointer in the top Tree. >> >> Here is an actual ocamlopt output, with Leaf(c) getting cached in rbx: >> >> 0x000000000040dbf4 <+149>: callq 0x40d8fd <Leaf> >> 0x000000000040dbf9 <+154>: mov %rax,%rbx >> 0x000000000040dbfc <+157>: mov -0x90(%rbp),%rax >> 0x000000000040dc03 <+164>: mov %rax,%rdi >> 0x000000000040dc06 <+167>: callq 0x40d8fd <Leaf> >> 0x000000000040dc0b <+172>: mov %rax,%r12 >> 0x000000000040dc0e <+175>: mov -0x88(%rbp),%rax >> 0x000000000040dc15 <+182>: mov %rax,%rdi >> 0x000000000040dc18 <+185>: callq 0x40d8fd <Leaf> >> 0x000000000040dc1d <+190>: mov %r12,%rsi >> 0x000000000040dc20 <+193>: mov %rax,%rdi >> 0x000000000040dc23 <+196>: callq 0x40da19 <Tree> >> 0x000000000040dc28 <+201>: mov %rbx,%rsi >> 0x000000000040dc2b <+204>: mov %rax,%rdi >> 0x000000000040dc2e <+207>: callq 0x40da19 <Tree> >> >> >> While the C code clearly violates the spirit of the GC rules, I can't >> help but feel this is still a pitfall. >> >> Rule 2 of the manual states: "Local variables of type value must be >> declared with one of the CAMLlocal macros. [...]" >> >> But here, I'm not declaring local variables, unless you count compiler >> temporaries as local variables? >> >> I can see some other people making the same mistake I did. Should >> there be an explicit warning in the rules? maybe underlining that >> compiler temps count as variables, or discouraging the kind of nested >> calls returning values displayed above? >> >> thanks, >> Frédéric Perriot >> >> PS: this is also my first time posting to the list, so I take this >> opportunity to thank you for the great Q's and A's I've read here over >> the years >> >> -- >> Caml-list mailing list. Subscription management and archives: >> https://sympa.inria.fr/sympa/arc/caml-list >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs > > > [-- Attachment #2: Type: text/html, Size: 7293 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Caml-list] [ANN] Release 2.8.5 of Caph, a functional/dataflow language for programming FPGAs 2018-05-05 7:42 ` Xavier Leroy @ 2018-05-05 14:11 ` Jocelyn Sérot 2018-05-06 19:23 ` [Caml-list] an implicit GC rule? Chet Murthy 1 sibling, 0 replies; 6+ messages in thread From: Jocelyn Sérot @ 2018-05-05 14:11 UTC (permalink / raw) To: caml-list Dear Ocaml users, It is my pleasure to announce the latest release (2.8.5) of CAPH, a domain-specific language relying on the dataflow model of computation for describing and implementing stream-processing applications. CAPH can simulate dataflow programs, generate cycle-accurate SystemC and synthetizable VHDL code for implementation on reconfigurable hardware such as FPGAs. CAPH has a strong functional inspiration : dataflow networks are described using a purely functional, higher-order formalism and the definition of actor behavior relies on a pattern matching similar to that used for defining functions in functional languages. CAPH is also equipped with a rich type system with sized-integers, booleans, floats, fully polymorphic algebraic data types and dependent types. And, of course, CAPH is entirely written in OCaml ;) Source code and pre-compiled binaries for Mac OS and Windows can be downloaded from the web site (http://caph.univ-bpclermont.fr) or via Github (github.com/jserot/caph). JS ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] an implicit GC rule? 2018-05-05 7:42 ` Xavier Leroy 2018-05-05 14:11 ` [Caml-list] [ANN] Release 2.8.5 of Caph, a functional/dataflow language for programming FPGAs Jocelyn Sérot @ 2018-05-06 19:23 ` Chet Murthy 2018-05-07 17:01 ` Frederic Perriot 1 sibling, 1 reply; 6+ messages in thread From: Chet Murthy @ 2018-05-06 19:23 UTC (permalink / raw) To: Xavier Leroy; +Cc: Frederic Perriot, caml-list [-- Attachment #1: Type: text/plain, Size: 5860 bytes --] Oh shit, and also, uh, oh right! I forgot that (for example in Tree((v1 = e1), (v2 = e2)) it could transpire that after evaluating e1 and assigning to v1, the evaluation of e2 could end up moving the value pointed-to by v1, updaiting v1, but NOT updating the result of the expression (v1 = e1) (b/c of course, it's evaluated and on-stack waiting for the call to Tree(). Oof. On Sat, May 5, 2018 at 12:42 AM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote: > > > On Sat, May 5, 2018 at 5:25 AM Chet Murthy <murthy.chet@gmail.com> wrote: > >> It's been a while since I did this sort of thing, but I suspect if you >> declare CAMLlocal variables for each intermediate expression, and stick in >> the assignments, that should solve your problem (while not making your code >> too ugly). E.g. >> >> CAMLprim value left_comb(value a, value b, value c) >> { >> CAMLparam3(a, b, c); >> CAMLlocal5(l1, l2, l3, l4, l5); >> CAMLreturn(l1 = Tree(l2 = Tree((l3 = Leaf(a)), (l4 = Leaf(b)), (l5 = >> Leaf(c)))); >> } >> > > That's bold C/C++ programming! It might even work in C++, where > assignment expressions are l-values if I remember correctly. > > However, I'm afraid it won't work in C because an assignment expression > "lv = rv" is a r-value equal to the value of rv converted to the type of lv > at the time the assignment is evaluated. So, if lv is a local variable > registered with the GC, the GC will update lv when needed, but the "lv = > rv" expression will keep its initial value. > > There's also the rules concerning sequence points. I think the code above > respects the C99 rules but I'm less sure about the C11 rules. > >> >> Even better, you could linearize the tree of expressions into a sequence, >> and that should solve your problem, also. >> > > Yes, that's the robust solution. Spelling it out: > > CAMLprim value left_comb(value a, value b, value c) > { > CAMLparam3(a, b, c); > CAMLlocal5(la, lb, lc, tab, t); > la = Leaf(a); > lb = Leaf(b); > lc = Leaf(c); > tab = Tree(la, lb); > t = Tree(tab, lc); > CAMLreturn(t); > } > > You can also do "CAMLreturn(Tree(tab, lc))" directly. > > - Xavier Leroy > > >> Uh, I think. Been a while since I wrote a lotta C/C++ code to interface >> with Ocaml, but this oughta work. >> >> --chet-- >> >> >> On Wed, May 2, 2018 at 9:09 AM, Frederic Perriot <fperriot@gmail.com> >> wrote: >> >>> Hello caml-list, >>> >>> I have a GC-related question. To give you some context, I'm writing a >>> tool to parse .cmi files and generate .h and .c files, to facilitate >>> constructing OCaml variants from C bindings. >>> >>> For instance, given the following source: >>> >>> type 'a tree = Leaf of 'a | Tree of 'a tree * 'a tree [@@h_file] >>> >>> >>> the tool produces C functions: >>> >>> CAMLprim value Leaf(value arg1) >>> { >>> CAMLparam1(arg1); >>> CAMLlocal1(obj); >>> >>> obj = caml_alloc_small(1, 0); >>> >>> Field(obj, 0) = arg1; >>> >>> CAMLreturn(obj); >>> } >>> >>> CAMLprim value Tree(value arg1, value arg2) >>> { >>> // similar code here >>> } >>> >>> >>> From there, it's tempting to nest calls to variant constructors from C >>> and write code such as: >>> >>> CAMLprim value left_comb(value a, value b, value c) >>> { >>> CAMLparam3(a, b, c); >>> CAMLreturn(Tree(Tree(Leaf(a), Leaf(b)), Leaf(c))); >>> } >>> >>> >>> The problem with the above is the GC root loss due to the nesting of >>> calls to allocating functions. >>> >>> Say Leaf(c) is constructed first, and the resulting value cached in a >>> register, then Leaf(b) triggers a collection, thus invalidating the >>> register contents, and leaving a dangling pointer in the top Tree. >>> >>> Here is an actual ocamlopt output, with Leaf(c) getting cached in rbx: >>> >>> 0x000000000040dbf4 <+149>: callq 0x40d8fd <Leaf> >>> 0x000000000040dbf9 <+154>: mov %rax,%rbx >>> 0x000000000040dbfc <+157>: mov -0x90(%rbp),%rax >>> 0x000000000040dc03 <+164>: mov %rax,%rdi >>> 0x000000000040dc06 <+167>: callq 0x40d8fd <Leaf> >>> 0x000000000040dc0b <+172>: mov %rax,%r12 >>> 0x000000000040dc0e <+175>: mov -0x88(%rbp),%rax >>> 0x000000000040dc15 <+182>: mov %rax,%rdi >>> 0x000000000040dc18 <+185>: callq 0x40d8fd <Leaf> >>> 0x000000000040dc1d <+190>: mov %r12,%rsi >>> 0x000000000040dc20 <+193>: mov %rax,%rdi >>> 0x000000000040dc23 <+196>: callq 0x40da19 <Tree> >>> 0x000000000040dc28 <+201>: mov %rbx,%rsi >>> 0x000000000040dc2b <+204>: mov %rax,%rdi >>> 0x000000000040dc2e <+207>: callq 0x40da19 <Tree> >>> >>> >>> While the C code clearly violates the spirit of the GC rules, I can't >>> help but feel this is still a pitfall. >>> >>> Rule 2 of the manual states: "Local variables of type value must be >>> declared with one of the CAMLlocal macros. [...]" >>> >>> But here, I'm not declaring local variables, unless you count compiler >>> temporaries as local variables? >>> >>> I can see some other people making the same mistake I did. Should >>> there be an explicit warning in the rules? maybe underlining that >>> compiler temps count as variables, or discouraging the kind of nested >>> calls returning values displayed above? >>> >>> thanks, >>> Frédéric Perriot >>> >>> PS: this is also my first time posting to the list, so I take this >>> opportunity to thank you for the great Q's and A's I've read here over >>> the years >>> >>> -- >>> Caml-list mailing list. Subscription management and archives: >>> https://sympa.inria.fr/sympa/arc/caml-list >>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >>> Bug reports: http://caml.inria.fr/bin/caml-bugs >> >> >> [-- Attachment #2: Type: text/html, Size: 8340 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] an implicit GC rule? 2018-05-06 19:23 ` [Caml-list] an implicit GC rule? Chet Murthy @ 2018-05-07 17:01 ` Frederic Perriot 0 siblings, 0 replies; 6+ messages in thread From: Frederic Perriot @ 2018-05-07 17:01 UTC (permalink / raw) To: Chet Murthy; +Cc: Xavier Leroy, caml-list Chet Murthy and Xavier Leroy answered: >>> Even better, you could linearize the tree of expressions into a sequence, >> Yes, that's the robust solution. [...] thanks folks, I'll do that :) FP ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-05-07 17:01 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-02 16:09 [Caml-list] an implicit GC rule? Frederic Perriot 2018-05-05 3:24 ` Chet Murthy 2018-05-05 7:42 ` Xavier Leroy 2018-05-05 14:11 ` [Caml-list] [ANN] Release 2.8.5 of Caph, a functional/dataflow language for programming FPGAs Jocelyn Sérot 2018-05-06 19:23 ` [Caml-list] an implicit GC rule? Chet Murthy 2018-05-07 17:01 ` Frederic Perriot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox