* How to write a CUDA kernel in ocaml? @ 2009-12-15 15:37 Eray Ozkural 2009-12-15 16:07 ` [Caml-list] " Basile STARYNKEVITCH 0 siblings, 1 reply; 14+ messages in thread From: Eray Ozkural @ 2009-12-15 15:37 UTC (permalink / raw) To: caml-list Hello there, I've looked at the CUDA bindings for ocaml, but it seems the kernels were in C, am I right? How can I write the kernel in ocaml? I have an ocaml program that is badly in need of parallelization and it fits the NVIDIA architecture. If ocaml changes are required please explain to me a little, I have sufficient knowledge of compilers, I've worked on a commercial C-to-FPGA compiler project for 2 years. Of course it would be best if I can just handle it with a makefile :) Best Regards, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-15 15:37 How to write a CUDA kernel in ocaml? Eray Ozkural @ 2009-12-15 16:07 ` Basile STARYNKEVITCH 2009-12-15 16:20 ` Eray Ozkural 0 siblings, 1 reply; 14+ messages in thread From: Basile STARYNKEVITCH @ 2009-12-15 16:07 UTC (permalink / raw) To: Eray Ozkural; +Cc: caml-list, Emmanuel Chailloux Eray Ozkural wrote: > Hello there, > > I've looked at the CUDA bindings for ocaml, but it seems the kernels > were in C, am I right? How can I write the kernel in ocaml? I have an > ocaml program that is badly in need of parallelization and it fits the > NVIDIA architecture. If ocaml changes are required please explain to > me a little, I have sufficient knowledge of compilers, I've worked on > a commercial C-to-FPGA compiler project for 2 years. Of course it > would be best if I can just handle it with a makefile :) You cannot do that today easily. The French OpenGPU project -funded by French public money http://www.competitivite.gouv.fr/spip.php?article581 (in which I will be a partner)- will start in january 2010 to deal with such issues (probably with OpenCL, not CUDA). There won't be usable results soon (and I have no idea if at end of the project, there will be a simple solution to your problem). You could try to help the OpenGPU partners involved. Ask Emmanuel Chailloux (in CC). If you need today to call a CUDA kernel from Ocaml, you have to use C! Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} *** ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-15 16:07 ` [Caml-list] " Basile STARYNKEVITCH @ 2009-12-15 16:20 ` Eray Ozkural 2009-12-15 16:29 ` Basile STARYNKEVITCH 0 siblings, 1 reply; 14+ messages in thread From: Eray Ozkural @ 2009-12-15 16:20 UTC (permalink / raw) To: Basile STARYNKEVITCH; +Cc: caml-list, Emmanuel Chailloux On Tue, Dec 15, 2009 at 6:07 PM, Basile STARYNKEVITCH <basile@starynkevitch.net> wrote: > Eray Ozkural wrote: >> >> Hello there, >> >> I've looked at the CUDA bindings for ocaml, but it seems the kernels >> were in C, am I right? How can I write the kernel in ocaml? I have an >> ocaml program that is badly in need of parallelization and it fits the >> NVIDIA architecture. If ocaml changes are required please explain to >> me a little, I have sufficient knowledge of compilers, I've worked on >> a commercial C-to-FPGA compiler project for 2 years. Of course it >> would be best if I can just handle it with a makefile :) > > You cannot do that today easily. > > The French OpenGPU project -funded by French public money > http://www.competitivite.gouv.fr/spip.php?article581 (in which I will be a > partner)- will start in january 2010 to deal with such issues (probably with > OpenCL, not CUDA). There won't be usable results soon (and I have no idea if > at end of the project, there will be a simple solution to your problem). > > You could try to help the OpenGPU partners involved. Ask Emmanuel Chailloux > (in CC). > > If you need today to call a CUDA kernel from Ocaml, you have to use C! It's great to hear such an effort, I will be following the developments. Of course OpenCL will be just as good. Pretty similar, anyway. I've seen some restrictions in OpenCL's C99 extension for compiling kernels, which is *not good* and might ultimately impair the implementation of a functional language. I don't see how one could think of putting any restrictions. We didn't have to invent any restrictions when compiling to hardware! At any rate, the obvious question from a compiler standpoint is, cannot we compile ocaml to C, is there a way to translate to C first and then to whatever works for kernel? I know little about the ocaml compiler so please forgive my naive questions. Best, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-15 16:20 ` Eray Ozkural @ 2009-12-15 16:29 ` Basile STARYNKEVITCH 2009-12-15 17:46 ` Eray Ozkural 2009-12-15 23:18 ` David Allsopp 0 siblings, 2 replies; 14+ messages in thread From: Basile STARYNKEVITCH @ 2009-12-15 16:29 UTC (permalink / raw) To: Eray Ozkural; +Cc: caml-list, Emmanuel Chailloux Eray Ozkural wrote: >>> I've looked at the CUDA bindings for ocaml, but it seems the kernels >>> were in C, am I right? How can I write the kernel in ocaml? > At any rate, the obvious question from a compiler standpoint is, > cannot we compile ocaml to C, is there a way to translate to C first > and then to whatever works for kernel? I know little about the ocaml > compiler so please forgive my naive questions. Compiling Ocaml to efficient C is not easy and probably impossible (or extremely difficult) in the general case. In particular, tail recursive calls are essential in Ocaml, and are not available in C in most compilers. Some C compilers are able to generate (in the machine) a teil call for a limited kind of C functions, which are not compatible with Ocaml's runtime system (& garbage collector). You could perhaps translate (in a dummy & inefficient way) the full Ocaml bytecode of an entire Ocaml application into a C program (for example, as a huge monolithic single C function, which would make gcc very unhappy to compile it). Of course, you could do much better, but it is not a trivial task. Another issue is that Ocaml might not box (or unbox) your floating point values as CUDA (or OpenCL) expects them. But I am not an expert on these things. Good luck. Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} *** ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-15 16:29 ` Basile STARYNKEVITCH @ 2009-12-15 17:46 ` Eray Ozkural 2009-12-15 23:18 ` David Allsopp 1 sibling, 0 replies; 14+ messages in thread From: Eray Ozkural @ 2009-12-15 17:46 UTC (permalink / raw) To: Basile STARYNKEVITCH; +Cc: caml-list, Emmanuel Chailloux On Tue, Dec 15, 2009 at 6:29 PM, Basile STARYNKEVITCH <basile@starynkevitch.net> wrote: > Eray Ozkural wrote: >>>> >>>> I've looked at the CUDA bindings for ocaml, but it seems the kernels >>>> were in C, am I right? How can I write the kernel in ocaml? > >> At any rate, the obvious question from a compiler standpoint is, >> cannot we compile ocaml to C, is there a way to translate to C first >> and then to whatever works for kernel? I know little about the ocaml >> compiler so please forgive my naive questions. > > Compiling Ocaml to efficient C is not easy and probably impossible (or > extremely difficult) in the general case. In particular, tail recursive > calls are essential in Ocaml, and are not available in C in most compilers. > Some C compilers are able to generate (in the machine) a teil call for a > limited kind of C functions, which are not compatible with Ocaml's runtime > system (& garbage collector). > > You could perhaps translate (in a dummy & inefficient way) the full Ocaml > bytecode of an entire Ocaml application into a C program (for example, as a > huge monolithic single C function, which would make gcc very unhappy to > compile it). Of course, you could do much better, but it is not a trivial > task. > > Another issue is that Ocaml might not box (or unbox) your floating point > values as CUDA (or OpenCL) expects them. > > But I am not an expert on these things. Can any expert please chime in? The proposed approach of converting from bytecode is sound. In a hardware compiler project we started from assembly, so I know it can be done. One problem I see is, I suppose there would be function pointers, and they won't work with OpenCL? I suppose I should investigate a bit more! Best, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-15 16:29 ` Basile STARYNKEVITCH 2009-12-15 17:46 ` Eray Ozkural @ 2009-12-15 23:18 ` David Allsopp 2009-12-16 0:39 ` Jon Harrop 2009-12-16 6:26 ` Basile STARYNKEVITCH 1 sibling, 2 replies; 14+ messages in thread From: David Allsopp @ 2009-12-15 23:18 UTC (permalink / raw) To: 'Basile STARYNKEVITCH', 'Eray Ozkural'; +Cc: caml-list Basile Starynkevitch wrote: > Eray Ozkural wrote: > > Compiling Ocaml to efficient C is not easy and probably impossible (or > extremely difficult) in the general case. In > particular, tail recursive calls are essential in Ocaml, and are not > available in C in most compilers. What's this based on (out of interest)? Most C compilers don't necessarily identify tail recursion in *C* code but if you're emitting C as an OCaml backend then there's no reason not to convert tail recursive *OCaml* functions to C code based on goto or similar looping constructs (yes, you'd almost-always-virtually-never use goto in a hand-crafted C program without apologising profusely at Dijkstra's grave but if you're using C as an intermediate language then that's a different matter). If I recall correctly from an old post on this list, this is how Felix handles tail recursion when translating Felix to C++ David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-15 23:18 ` David Allsopp @ 2009-12-16 0:39 ` Jon Harrop 2009-12-16 13:41 ` Mattias Engdegård 2009-12-16 6:26 ` Basile STARYNKEVITCH 1 sibling, 1 reply; 14+ messages in thread From: Jon Harrop @ 2009-12-16 0:39 UTC (permalink / raw) To: caml-list On Tuesday 15 December 2009 23:18:57 David Allsopp wrote: > Basile Starynkevitch wrote: > > Eray Ozkural wrote: > > > > Compiling Ocaml to efficient C is not easy and probably impossible (or > > extremely difficult) in the general case. In > > particular, tail recursive calls are essential in Ocaml, and are not > > available in C in most compilers. > > What's this based on (out of interest)? Most C compilers don't necessarily > identify tail recursion in *C* code but if you're emitting C as an OCaml > backend then there's no reason not to convert tail recursive *OCaml* > functions to C code based on goto or similar looping constructs (yes, you'd > almost-always-virtually-never use goto in a hand-crafted C program without > apologising profusely at Dijkstra's grave but if you're using C as an > intermediate language then that's a different matter). If I recall > correctly from an old post on this list, this is how Felix handles tail > recursion when translating Felix to C++ And trampolines to eliminate tail calls that cannot be eliminated using goto. However, trampolines are ~10x slower than TCO in the code gen. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-16 0:39 ` Jon Harrop @ 2009-12-16 13:41 ` Mattias Engdegård 2009-12-16 13:47 ` Eray Ozkural 2010-01-12 6:15 ` Eray Ozkural 0 siblings, 2 replies; 14+ messages in thread From: Mattias Engdegård @ 2009-12-16 13:41 UTC (permalink / raw) To: jon; +Cc: caml-list >And trampolines to eliminate tail calls that cannot be eliminated using goto. >However, trampolines are ~10x slower than TCO in the code gen. With some care, gcc's sibcall mechanism can be exploited. For example, by having one standard signature for all generated C functions, and taking care not to pass pointers to variables in the caller's stack frame. This should give fairly good performance (better than trampolines anyway), at the cost of portability (but gcc is good at that). It would give full TCO, even across compilation units. It should work well with a Cheney-on-the-MTA-style GC, too. How suitable it is depends on the reason why compilation to C is done in the first place. It might be one of: 1) portability to odd platforms with semi-decent performance (ie, better than interpreted bytecode) 2) a simple target for maintaining bootstrapping capability for the compiler (but bytecode works well for this too) 3) simpler (?) interfacing to libraries in C etc 4) flat-out maximum performance by exploiting the optimisations that modern C compilers are capable of Of course, these days we have llvm which has a lot going for it. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-16 13:41 ` Mattias Engdegård @ 2009-12-16 13:47 ` Eray Ozkural 2009-12-17 0:34 ` Philippe Wang 2010-01-12 6:15 ` Eray Ozkural 1 sibling, 1 reply; 14+ messages in thread From: Eray Ozkural @ 2009-12-16 13:47 UTC (permalink / raw) To: Mattias Engdegård; +Cc: jon, caml-list On Wed, Dec 16, 2009 at 3:41 PM, Mattias Engdegård <mattias@virtutech.se> wrote: >>And trampolines to eliminate tail calls that cannot be eliminated using goto. >>However, trampolines are ~10x slower than TCO in the code gen. > > With some care, gcc's sibcall mechanism can be exploited. For example, > by having one standard signature for all generated C functions, and > taking care not to pass pointers to variables in the caller's stack > frame. This should give fairly good performance (better than > trampolines anyway), at the cost of portability (but gcc is good at > that). It would give full TCO, even across compilation units. It > should work well with a Cheney-on-the-MTA-style GC, too. > > How suitable it is depends on the reason why compilation to C is done in > the first place. It might be one of: > > 1) portability to odd platforms with semi-decent performance (ie, > better than interpreted bytecode) > 2) a simple target for maintaining bootstrapping capability for the > compiler (but bytecode works well for this too) > 3) simpler (?) interfacing to libraries in C etc > 4) flat-out maximum performance by exploiting the optimisations that > modern C compilers are capable of > > Of course, these days we have llvm which has a lot going for it. Well, the original question was to be able to use the CUDA or OpenCL compiler on that generated C code. Possible or impossible? :) One trivial and low-performance solution that comes to mind is: make an ocaml bytecode interpreter into a CUDA kernel and then pass the bytecode to it, and then voila, at least we have some 512-way parallelism on the GT300. How does that sound? We'd be losing some performance but massive parallelism will cover up for some of that. Best, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-16 13:47 ` Eray Ozkural @ 2009-12-17 0:34 ` Philippe Wang 2009-12-17 6:45 ` Eray Ozkural 0 siblings, 1 reply; 14+ messages in thread From: Philippe Wang @ 2009-12-17 0:34 UTC (permalink / raw) To: Eray Ozkural; +Cc: caml-list On Wed, Dec 16, 2009 at 2:47 PM, Eray Ozkural <examachine@gmail.com> wrote: > One trivial and low-performance solution that comes to mind is: make > an ocaml bytecode interpreter into a CUDA kernel and then pass the > bytecode to it, and then voila, at least we have some 512-way > parallelism on the GT300. How does that sound? We'd be losing some > performance but massive parallelism will cover up for some of that. With parallel processors, you move very quickly the performance bottleneck from processor(s) to memory bandwidth, such that - it's hell to program because you have to manage concurrency and it has a real cost - it's useful for very specific programs that have very few memory access compared to processor computations (such as some compression algorithms, a more specific and very easy to write example is matrix multiplications). Imagine you have 3000MHz for memory bandwidth, which is extremely good today (I think). And imagine you have 100 processors that share this memory bandwidth. If they all want to access memory at the same time, even if you forget the concurrency management cost, you have 3000/100MHz/processor=30MHz/processor, which is very very very low. So think about 10 processors instead of 100 to be more realistic, it's still 300MHz/processor, which looks like what we had about a decade ago... (IMHO) A not-too-too-bad-but-still-realistic way to take benefit of GPUs today, with OCaml (or any high-level language), is to write computation functions in C (possibly with some assembly), and to write composition functions in OCaml. Or (less realistic in a short amount of time) maybe to write a compiler that may do the job for you, but it's not quite easy... Good luck, -- Philippe Wang mail@philippewang.info ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-17 0:34 ` Philippe Wang @ 2009-12-17 6:45 ` Eray Ozkural 2009-12-17 10:59 ` Philippe Wang 0 siblings, 1 reply; 14+ messages in thread From: Eray Ozkural @ 2009-12-17 6:45 UTC (permalink / raw) To: caml-list On Thu, Dec 17, 2009 at 2:34 AM, Philippe Wang <philippe.wang.lists@gmail.com> wrote: > On Wed, Dec 16, 2009 at 2:47 PM, Eray Ozkural <examachine@gmail.com> wrote: > >> One trivial and low-performance solution that comes to mind is: make >> an ocaml bytecode interpreter into a CUDA kernel and then pass the >> bytecode to it, and then voila, at least we have some 512-way >> parallelism on the GT300. How does that sound? We'd be losing some >> performance but massive parallelism will cover up for some of that. > > > With parallel processors, you move very quickly the performance > bottleneck from processor(s) to memory bandwidth, such that > - it's hell to program because you have to manage concurrency and it > has a real cost > - it's useful for very specific programs that have very few memory > access compared to processor computations (such as some compression > algorithms, a more specific and very easy to write example is matrix > multiplications). > > Imagine you have 3000MHz for memory bandwidth, which is extremely good > today (I think). And imagine you have 100 processors that share this > memory bandwidth. If they all want to access memory at the same time, > even if you forget the concurrency management cost, you have > 3000/100MHz/processor=30MHz/processor, which is very very very low. So > think about 10 processors instead of 100 to be more realistic, it's > still 300MHz/processor, which looks like what we had about a decade > ago... > > (IMHO) A not-too-too-bad-but-still-realistic way to take benefit of > GPUs today, with OCaml (or any high-level language), is to write > computation functions in C (possibly with some assembly), and to write > composition functions in OCaml. Or (less realistic in a short amount > of time) maybe to write a compiler that may do the job for you, but > it's not quite easy... > > Good luck, First, the GT300 will have great memory bandwidth, probably 256 GB/s. Half a gig/sec per core, not bad I think. With a smart ocaml bytecode interpreter, we could derive some performance from this (hypothetical yet!) baby. GT300 is great, it makes the reconfigurable computing project I worked on mostly obsolete =) Of course, you are right that the "memory wall" is a serious obstacle to *any* parallel architecture, not just this architecture or that architecture. I didn't read very thoroughly but in the Fermi architecture the caches and local memory in NVIDIA naturally have severe limitations. You have 512 cores. You can't give each huge caches. In the context of the following comments assume that a PRAM algorithm is given. Obviously, we may expect the performance of a memory bound algorithm to suffer in *both* multicore architectures and GPU's (that's where reconfigurable computing might take over...). However, if the algorithm is compute-bound, and can do with a moderate memory bandwidth per processor, then I think this becomes an ideal architecture. Not necessarily "embarrassingly parallel" algorithms, but as seen in the CUDA pages of NVIDIA, those will work great! My application is a perfect match for NVIDIA. It needs just 1-2 mb storage per processor. And it spends more time computing than accessing memory, so I think it will do well. The ocaml bytecode interpreter is written in C. For a baseline implementation we could try to port this to OpenCL. http://caml.inria.fr/cgi-bin/viewcvs.cgi/ocaml/trunk/byterun/ Would be a cool experiment at least =) What I want to do is to run the ocaml bytecode interpreter on each core, and then feed the relevant bytecode to those. It can be done, I suppose? Or am I missing something crucial? :) The runtime library would have to be ported to OpenCL/CUDA, as well, isn't that possible? Best, PS: Sorry for having mailed this to you personally, I intended to post it to the mailing list. -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-17 6:45 ` Eray Ozkural @ 2009-12-17 10:59 ` Philippe Wang 0 siblings, 0 replies; 14+ messages in thread From: Philippe Wang @ 2009-12-17 10:59 UTC (permalink / raw) To: Eray Ozkural; +Cc: caml-list On Thu, Dec 17, 2009 at 7:45 AM, Eray Ozkural <examachine@gmail.com> wrote: > What I want to do is to run the ocaml bytecode interpreter on each core, and > then feed the relevant bytecode to those. It can be done, I suppose? Or am I > missing something crucial? :) The runtime library would have to be ported to > OpenCL/CUDA, as well, isn't that possible? I don't see why it wouldn't be possible. After all, there are Java, JavaScript and OCaml implementations of that VM, so it could probably be implemented with any "normal" programming language (exclude those that are not Turing complete and exclude those such as brainfuck or sed) ! But I don't quite see how it could help gaining performance, at least not yet. Anyway, I'm looking forward to seeing a new esoteric implementation of that nice VM ! :-) > PS: Sorry for having mailed this to you personally, I intended to post > it to the > mailing list. no problem ;-) -- Philippe Wang mail@philippewang.info ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-16 13:41 ` Mattias Engdegård 2009-12-16 13:47 ` Eray Ozkural @ 2010-01-12 6:15 ` Eray Ozkural 1 sibling, 0 replies; 14+ messages in thread From: Eray Ozkural @ 2010-01-12 6:15 UTC (permalink / raw) To: Mattias Engdegård; +Cc: jon, caml-list On Wed, Dec 16, 2009 at 3:41 PM, Mattias Engdegård <mattias@virtutech.se> wrote: > Of course, these days we have llvm which has a lot going for it. Suppose we compile to llvm, can we run llvm code on GPU's? -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Caml-list] How to write a CUDA kernel in ocaml? 2009-12-15 23:18 ` David Allsopp 2009-12-16 0:39 ` Jon Harrop @ 2009-12-16 6:26 ` Basile STARYNKEVITCH 1 sibling, 0 replies; 14+ messages in thread From: Basile STARYNKEVITCH @ 2009-12-16 6:26 UTC (permalink / raw) To: David Allsopp; +Cc: 'Eray Ozkural', caml-list David Allsopp wrote: > Basile Starynkevitch wrote: >> Eray Ozkural wrote: >> >> Compiling Ocaml to efficient C is not easy and probably impossible (or >> extremely difficult) in the general case. In >> particular, tail recursive calls are essential in Ocaml, and are not >> available in C in most compilers. > > What's this based on (out of interest)? Most C compilers don't necessarily > identify tail recursion in *C* code but if you're emitting C as an OCaml > backend then there's no reason not to convert tail recursive *OCaml* > functions to C code based on goto or similar looping constructs (yes, you'd > almost-always-virtually-never use goto in a hand-crafted C program without > apologising profusely at Dijkstra's grave but if you're using C as an > intermediate language then that's a different matter). If I recall correctly > from an old post on this list, this is how Felix handles tail recursion when > translating Felix to C++ I am not sure this can work when tail-calling an unknown function. How would you translate to C let rec f g x = if x < 0 then g x (*tail recursive call to an unknown function*) else f g (x - 1) ;; ocamlopt -S gives (I am keeping only the crucial code) camlEsstr__f_58: .L101: movq %rax, %rsi cmpq $1, %rbx jge .L100 movq (%rsi), %rdi movq %rbx, %rax movq %rsi, %rbx jmp *%rdi ;;;; tail rec call to g .align 4 .L100: addq $-2, %rbx movq %rsi, %rax jmp .L101 As you can see, the tail recursive call is translated to an undirect jmp. Please explain how you would translate the above example to portable & efficient C. Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} *** ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-01-12 6:15 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-12-15 15:37 How to write a CUDA kernel in ocaml? Eray Ozkural 2009-12-15 16:07 ` [Caml-list] " Basile STARYNKEVITCH 2009-12-15 16:20 ` Eray Ozkural 2009-12-15 16:29 ` Basile STARYNKEVITCH 2009-12-15 17:46 ` Eray Ozkural 2009-12-15 23:18 ` David Allsopp 2009-12-16 0:39 ` Jon Harrop 2009-12-16 13:41 ` Mattias Engdegård 2009-12-16 13:47 ` Eray Ozkural 2009-12-17 0:34 ` Philippe Wang 2009-12-17 6:45 ` Eray Ozkural 2009-12-17 10:59 ` Philippe Wang 2010-01-12 6:15 ` Eray Ozkural 2009-12-16 6:26 ` Basile STARYNKEVITCH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox