How to write a CUDA kernel in ocaml?

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

* How to write a CUDA kernel in ocaml?
@ 2009-12-15 15:37 Eray Ozkural
  2009-12-15 16:07 ` [Caml-list] " Basile STARYNKEVITCH
  0 siblings, 1 reply; 14+ messages in thread
From: Eray Ozkural @ 2009-12-15 15:37 UTC (permalink / raw)
  To: caml-list

Hello there,

I've looked at the CUDA bindings for ocaml, but it seems the kernels
were in C, am I right? How can I write the kernel in ocaml? I have an
ocaml program that is badly in need of parallelization and it fits the
NVIDIA architecture. If ocaml changes are required please explain to
me a little, I have sufficient knowledge of compilers, I've worked on
a commercial C-to-FPGA compiler project for 2 years. Of course it
would be best if I can just handle it with a makefile :)

Best Regards,

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-15 15:37 How to write a CUDA kernel in ocaml? Eray Ozkural
@ 2009-12-15 16:07 ` Basile STARYNKEVITCH
  2009-12-15 16:20   ` Eray Ozkural
  0 siblings, 1 reply; 14+ messages in thread
From: Basile STARYNKEVITCH @ 2009-12-15 16:07 UTC (permalink / raw)
  To: Eray Ozkural; +Cc: caml-list, Emmanuel Chailloux

Eray Ozkural wrote:
> Hello there,
> 
> I've looked at the CUDA bindings for ocaml, but it seems the kernels
> were in C, am I right? How can I write the kernel in ocaml? I have an
> ocaml program that is badly in need of parallelization and it fits the
> NVIDIA architecture. If ocaml changes are required please explain to
> me a little, I have sufficient knowledge of compilers, I've worked on
> a commercial C-to-FPGA compiler project for 2 years. Of course it
> would be best if I can just handle it with a makefile :)

You cannot do that today easily.

The French OpenGPU project -funded by French public money http://www.competitivite.gouv.fr/spip.php?article581 (in which 
I will be a partner)- will start in january 2010 to deal with such issues (probably with OpenCL, not CUDA). There won't 
be usable results soon (and I have no idea if at end of the project, there will be a simple solution to your problem).

You could try to help the OpenGPU partners involved. Ask Emmanuel Chailloux (in CC).

If you need today to call a CUDA kernel from Ocaml, you have to use C!

Regards.

-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-15 16:07 ` [Caml-list] " Basile STARYNKEVITCH
@ 2009-12-15 16:20   ` Eray Ozkural
  2009-12-15 16:29     ` Basile STARYNKEVITCH
  0 siblings, 1 reply; 14+ messages in thread
From: Eray Ozkural @ 2009-12-15 16:20 UTC (permalink / raw)
  To: Basile STARYNKEVITCH; +Cc: caml-list, Emmanuel Chailloux

On Tue, Dec 15, 2009 at 6:07 PM, Basile STARYNKEVITCH
<basile@starynkevitch.net> wrote:
> Eray Ozkural wrote:
>>
>> Hello there,
>>
>> I've looked at the CUDA bindings for ocaml, but it seems the kernels
>> were in C, am I right? How can I write the kernel in ocaml? I have an
>> ocaml program that is badly in need of parallelization and it fits the
>> NVIDIA architecture. If ocaml changes are required please explain to
>> me a little, I have sufficient knowledge of compilers, I've worked on
>> a commercial C-to-FPGA compiler project for 2 years. Of course it
>> would be best if I can just handle it with a makefile :)
>
> You cannot do that today easily.
>
> The French OpenGPU project -funded by French public money
> http://www.competitivite.gouv.fr/spip.php?article581 (in which I will be a
> partner)- will start in january 2010 to deal with such issues (probably with
> OpenCL, not CUDA). There won't be usable results soon (and I have no idea if
> at end of the project, there will be a simple solution to your problem).
>
> You could try to help the OpenGPU partners involved. Ask Emmanuel Chailloux
> (in CC).
>
> If you need today to call a CUDA kernel from Ocaml, you have to use C!

It's great to hear such an effort, I will be following the
developments. Of course OpenCL will be just as good. Pretty similar,
anyway. I've seen some restrictions in OpenCL's C99 extension for
compiling kernels, which is *not good* and might ultimately impair the
implementation of a functional language.  I don't see how one could
think of putting any restrictions. We didn't have to invent any
restrictions when compiling to hardware!

At any rate, the obvious question from a compiler standpoint is,
cannot we compile ocaml to C, is there a way to translate to C first
and then to whatever works for kernel? I know little about the ocaml
compiler so please forgive my naive questions.

Best,

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-15 16:20   ` Eray Ozkural
@ 2009-12-15 16:29     ` Basile STARYNKEVITCH
  2009-12-15 17:46       ` Eray Ozkural
  2009-12-15 23:18       ` David Allsopp
  0 siblings, 2 replies; 14+ messages in thread
From: Basile STARYNKEVITCH @ 2009-12-15 16:29 UTC (permalink / raw)
  To: Eray Ozkural; +Cc: caml-list, Emmanuel Chailloux

Eray Ozkural wrote:
>>> I've looked at the CUDA bindings for ocaml, but it seems the kernels
>>> were in C, am I right? How can I write the kernel in ocaml? 

> At any rate, the obvious question from a compiler standpoint is,
> cannot we compile ocaml to C, is there a way to translate to C first
> and then to whatever works for kernel? I know little about the ocaml
> compiler so please forgive my naive questions.

Compiling Ocaml to efficient C is not easy and probably impossible (or extremely difficult) in the general case. In 
particular, tail recursive calls are essential in Ocaml, and are not available in C in most compilers. Some C compilers 
are able to generate (in the machine) a teil call for a limited kind of C functions, which are not compatible with 
Ocaml's runtime system (& garbage collector).

You could perhaps translate (in a dummy & inefficient way) the full Ocaml bytecode of an entire Ocaml application into a 
C program (for example, as a huge monolithic single C function, which would make gcc very unhappy to compile it). Of 
course, you could do much better, but it is not a trivial task.

Another issue is that Ocaml might not box (or unbox) your floating point values as CUDA (or OpenCL) expects them.

But I am not an expert on these things.

Good luck.

Regards.
-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-15 16:29     ` Basile STARYNKEVITCH
@ 2009-12-15 17:46       ` Eray Ozkural
  2009-12-15 23:18       ` David Allsopp
  1 sibling, 0 replies; 14+ messages in thread
From: Eray Ozkural @ 2009-12-15 17:46 UTC (permalink / raw)
  To: Basile STARYNKEVITCH; +Cc: caml-list, Emmanuel Chailloux

On Tue, Dec 15, 2009 at 6:29 PM, Basile STARYNKEVITCH
<basile@starynkevitch.net> wrote:
> Eray Ozkural wrote:
>>>>
>>>> I've looked at the CUDA bindings for ocaml, but it seems the kernels
>>>> were in C, am I right? How can I write the kernel in ocaml?
>
>> At any rate, the obvious question from a compiler standpoint is,
>> cannot we compile ocaml to C, is there a way to translate to C first
>> and then to whatever works for kernel? I know little about the ocaml
>> compiler so please forgive my naive questions.
>
> Compiling Ocaml to efficient C is not easy and probably impossible (or
> extremely difficult) in the general case. In particular, tail recursive
> calls are essential in Ocaml, and are not available in C in most compilers.
> Some C compilers are able to generate (in the machine) a teil call for a
> limited kind of C functions, which are not compatible with Ocaml's runtime
> system (& garbage collector).
>
> You could perhaps translate (in a dummy & inefficient way) the full Ocaml
> bytecode of an entire Ocaml application into a C program (for example, as a
> huge monolithic single C function, which would make gcc very unhappy to
> compile it). Of course, you could do much better, but it is not a trivial
> task.
>
> Another issue is that Ocaml might not box (or unbox) your floating point
> values as CUDA (or OpenCL) expects them.
>
> But I am not an expert on these things.


Can any expert please chime in?

The proposed approach of converting from bytecode is sound. In a
hardware compiler project we started from assembly, so I know it can
be done.

One problem I see is, I suppose there would be function pointers, and
they won't work with OpenCL?

I suppose I should investigate a bit more!

Best,

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-15 16:29     ` Basile STARYNKEVITCH
  2009-12-15 17:46       ` Eray Ozkural
@ 2009-12-15 23:18       ` David Allsopp
  2009-12-16  0:39         ` Jon Harrop
  2009-12-16  6:26         ` Basile STARYNKEVITCH
  1 sibling, 2 replies; 14+ messages in thread
From: David Allsopp @ 2009-12-15 23:18 UTC (permalink / raw)
  To: 'Basile STARYNKEVITCH', 'Eray Ozkural'; +Cc: caml-list

Basile Starynkevitch wrote: 
> Eray Ozkural wrote:
> 
> Compiling Ocaml to efficient C is not easy and probably impossible (or
> extremely difficult) in the general case. In
> particular, tail recursive calls are essential in Ocaml, and are not
> available in C in most compilers.

What's this based on (out of interest)? Most C compilers don't necessarily
identify tail recursion in *C* code but if you're emitting C as an OCaml
backend then there's no reason not to convert tail recursive *OCaml*
functions to C code based on goto or similar looping constructs (yes, you'd
almost-always-virtually-never use goto in a hand-crafted C program without
apologising profusely at Dijkstra's grave but if you're using C as an
intermediate language then that's a different matter). If I recall correctly
from an old post on this list, this is how Felix handles tail recursion when
translating Felix to C++

David 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-15 23:18       ` David Allsopp
@ 2009-12-16  0:39         ` Jon Harrop
  2009-12-16 13:41           ` Mattias Engdegård
  2009-12-16  6:26         ` Basile STARYNKEVITCH
  1 sibling, 1 reply; 14+ messages in thread
From: Jon Harrop @ 2009-12-16  0:39 UTC (permalink / raw)
  To: caml-list

On Tuesday 15 December 2009 23:18:57 David Allsopp wrote:
> Basile Starynkevitch wrote:
> > Eray Ozkural wrote:
> >
> > Compiling Ocaml to efficient C is not easy and probably impossible (or
> > extremely difficult) in the general case. In
> > particular, tail recursive calls are essential in Ocaml, and are not
> > available in C in most compilers.
>
> What's this based on (out of interest)? Most C compilers don't necessarily
> identify tail recursion in *C* code but if you're emitting C as an OCaml
> backend then there's no reason not to convert tail recursive *OCaml*
> functions to C code based on goto or similar looping constructs (yes, you'd
> almost-always-virtually-never use goto in a hand-crafted C program without
> apologising profusely at Dijkstra's grave but if you're using C as an
> intermediate language then that's a different matter). If I recall
> correctly from an old post on this list, this is how Felix handles tail
> recursion when translating Felix to C++

And trampolines to eliminate tail calls that cannot be eliminated using goto. 
However, trampolines are ~10x slower than TCO in the code gen.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-16  0:39         ` Jon Harrop
@ 2009-12-16 13:41           ` Mattias Engdegård
  2009-12-16 13:47             ` Eray Ozkural
  2010-01-12  6:15             ` Eray Ozkural
  0 siblings, 2 replies; 14+ messages in thread
From: Mattias Engdegård @ 2009-12-16 13:41 UTC (permalink / raw)
  To: jon; +Cc: caml-list

>And trampolines to eliminate tail calls that cannot be eliminated using goto. 
>However, trampolines are ~10x slower than TCO in the code gen.

With some care, gcc's sibcall mechanism can be exploited. For example,
by having one standard signature for all generated C functions, and
taking care not to pass pointers to variables in the caller's stack
frame. This should give fairly good performance (better than
trampolines anyway), at the cost of portability (but gcc is good at
that). It would give full TCO, even across compilation units. It
should work well with a Cheney-on-the-MTA-style GC, too.

How suitable it is depends on the reason why compilation to C is done in
the first place. It might be one of:

1) portability to odd platforms with semi-decent performance (ie,
   better than interpreted bytecode)
2) a simple target for maintaining bootstrapping capability for the
   compiler (but bytecode works well for this too)
3) simpler (?) interfacing to libraries in C etc
4) flat-out maximum performance by exploiting the optimisations that
   modern C compilers are capable of

Of course, these days we have llvm which has a lot going for it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-16 13:41           ` Mattias Engdegård
@ 2009-12-16 13:47             ` Eray Ozkural
  2009-12-17  0:34               ` Philippe Wang
  2010-01-12  6:15             ` Eray Ozkural
  1 sibling, 1 reply; 14+ messages in thread
From: Eray Ozkural @ 2009-12-16 13:47 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: jon, caml-list

On Wed, Dec 16, 2009 at 3:41 PM, Mattias Engdegård <mattias@virtutech.se> wrote:
>>And trampolines to eliminate tail calls that cannot be eliminated using goto.
>>However, trampolines are ~10x slower than TCO in the code gen.
>
> With some care, gcc's sibcall mechanism can be exploited. For example,
> by having one standard signature for all generated C functions, and
> taking care not to pass pointers to variables in the caller's stack
> frame. This should give fairly good performance (better than
> trampolines anyway), at the cost of portability (but gcc is good at
> that). It would give full TCO, even across compilation units. It
> should work well with a Cheney-on-the-MTA-style GC, too.
>
> How suitable it is depends on the reason why compilation to C is done in
> the first place. It might be one of:
>
> 1) portability to odd platforms with semi-decent performance (ie,
>   better than interpreted bytecode)
> 2) a simple target for maintaining bootstrapping capability for the
>   compiler (but bytecode works well for this too)
> 3) simpler (?) interfacing to libraries in C etc
> 4) flat-out maximum performance by exploiting the optimisations that
>   modern C compilers are capable of
>
> Of course, these days we have llvm which has a lot going for it.

Well, the original question was to be able to use the CUDA or OpenCL
compiler on that generated C code.

Possible or impossible? :)

One trivial and low-performance solution that comes to mind is: make
an ocaml bytecode interpreter into a CUDA kernel and then pass the
bytecode to it, and then voila, at least we have some 512-way
parallelism on the GT300. How does that sound? We'd be losing some
performance but massive parallelism will cover up for some of that.

Best,

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-16 13:47             ` Eray Ozkural
@ 2009-12-17  0:34               ` Philippe Wang
  2009-12-17  6:45                 ` Eray Ozkural
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Wang @ 2009-12-17  0:34 UTC (permalink / raw)
  To: Eray Ozkural; +Cc: caml-list

On Wed, Dec 16, 2009 at 2:47 PM, Eray Ozkural <examachine@gmail.com> wrote:

> One trivial and low-performance solution that comes to mind is: make
> an ocaml bytecode interpreter into a CUDA kernel and then pass the
> bytecode to it, and then voila, at least we have some 512-way
> parallelism on the GT300. How does that sound? We'd be losing some
> performance but massive parallelism will cover up for some of that.

With parallel processors, you move very quickly the performance
bottleneck from processor(s) to memory bandwidth, such that
- it's hell to program because you have to manage concurrency and it
has a real cost
- it's useful for very specific programs that have very few memory
access compared to processor computations (such as some compression
algorithms, a more specific and very easy to write example is matrix
multiplications).

Imagine you have 3000MHz for memory bandwidth, which is extremely good
today (I think). And imagine you have 100 processors that share this
memory bandwidth. If they all want to access memory at the same time,
even if you forget the concurrency management cost, you have
3000/100MHz/processor=30MHz/processor, which is very very very low. So
think about 10 processors instead of 100 to be more realistic, it's
still 300MHz/processor, which looks like what we had about a decade
ago...

(IMHO) A not-too-too-bad-but-still-realistic way to take benefit of
GPUs today, with OCaml (or any high-level language), is to write
computation functions in C (possibly with some assembly), and to write
composition functions in OCaml. Or (less realistic in a short amount
of time) maybe to write a compiler that may do the job for you, but
it's not quite easy...

Good luck,

-- 
Philippe Wang
   mail@philippewang.info

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-17  0:34               ` Philippe Wang
@ 2009-12-17  6:45                 ` Eray Ozkural
  2009-12-17 10:59                   ` Philippe Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Eray Ozkural @ 2009-12-17  6:45 UTC (permalink / raw)
  To: caml-list

On Thu, Dec 17, 2009 at 2:34 AM, Philippe Wang
<philippe.wang.lists@gmail.com> wrote:
> On Wed, Dec 16, 2009 at 2:47 PM, Eray Ozkural <examachine@gmail.com> wrote:
>
>> One trivial and low-performance solution that comes to mind is: make
>> an ocaml bytecode interpreter into a CUDA kernel and then pass the
>> bytecode to it, and then voila, at least we have some 512-way
>> parallelism on the GT300. How does that sound? We'd be losing some
>> performance but massive parallelism will cover up for some of that.
>
>
> With parallel processors, you move very quickly the performance
> bottleneck from processor(s) to memory bandwidth, such that
> - it's hell to program because you have to manage concurrency and it
> has a real cost
> - it's useful for very specific programs that have very few memory
> access compared to processor computations (such as some compression
> algorithms, a more specific and very easy to write example is matrix
> multiplications).
>
> Imagine you have 3000MHz for memory bandwidth, which is extremely good
> today (I think). And imagine you have 100 processors that share this
> memory bandwidth. If they all want to access memory at the same time,
> even if you forget the concurrency management cost, you have
> 3000/100MHz/processor=30MHz/processor, which is very very very low. So
> think about 10 processors instead of 100 to be more realistic, it's
> still 300MHz/processor, which looks like what we had about a decade
> ago...
>
> (IMHO) A not-too-too-bad-but-still-realistic way to take benefit of
> GPUs today, with OCaml (or any high-level language), is to write
> computation functions in C (possibly with some assembly), and to write
> composition functions in OCaml. Or (less realistic in a short amount
> of time) maybe to write a compiler that may do the job for you, but
> it's not quite easy...
>
> Good luck,

First, the GT300 will have great memory bandwidth, probably 256 GB/s.
Half a gig/sec per core, not bad I think. With a smart ocaml bytecode
interpreter, we could derive some performance from this (hypothetical
yet!) baby. GT300 is great, it makes the reconfigurable computing
project I worked on mostly obsolete =)

Of course, you are right that the "memory wall" is a serious
obstacle to *any* parallel architecture, not just this architecture or
that architecture. I didn't read very thoroughly but in the Fermi
architecture the caches and local memory in NVIDIA naturally have
severe limitations. You have 512 cores. You can't give each huge
caches.

In the context of the following comments assume that a PRAM algorithm is given.

Obviously, we may expect the performance of a memory bound algorithm
to suffer in *both* multicore architectures and GPU's (that's where
reconfigurable computing might take over...).

However, if the algorithm is compute-bound, and can do with a moderate
memory bandwidth per processor, then I think this becomes an ideal
architecture. Not necessarily "embarrassingly parallel" algorithms,
but as seen in the CUDA pages of NVIDIA, those will work great!

My application is a perfect match for NVIDIA. It needs just 1-2 mb storage per
processor. And it spends more time computing than accessing memory, so
I think it will do well.

The ocaml bytecode interpreter is written in C. For a baseline
implementation we could try to port this to OpenCL.
http://caml.inria.fr/cgi-bin/viewcvs.cgi/ocaml/trunk/byterun/

Would be a cool experiment at least =)

What I want to do is to run the ocaml bytecode interpreter on each core, and
then feed the relevant bytecode to those. It can be done, I suppose? Or am I
missing something crucial? :) The runtime library would have to be ported to
OpenCL/CUDA, as well, isn't that possible?

Best,

PS: Sorry for having mailed this to you personally, I intended to post
it to the
mailing list.

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-17  6:45                 ` Eray Ozkural
@ 2009-12-17 10:59                   ` Philippe Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Philippe Wang @ 2009-12-17 10:59 UTC (permalink / raw)
  To: Eray Ozkural; +Cc: caml-list

On Thu, Dec 17, 2009 at 7:45 AM, Eray Ozkural <examachine@gmail.com> wrote:
> What I want to do is to run the ocaml bytecode interpreter on each core, and
> then feed the relevant bytecode to those. It can be done, I suppose? Or am I
> missing something crucial? :) The runtime library would have to be ported to
> OpenCL/CUDA, as well, isn't that possible?

I don't see why it wouldn't be possible. After all, there are Java,
JavaScript and OCaml implementations of that VM, so it could probably
be implemented with any "normal" programming language (exclude those
that are not Turing complete and exclude those such as brainfuck or
sed) ! But I don't quite see how it could help gaining performance, at
least not yet.

Anyway, I'm looking forward to seeing a new esoteric implementation of
that nice VM ! :-)

> PS: Sorry for having mailed this to you personally, I intended to post
> it to the
> mailing list.

no problem ;-)

-- 
Philippe Wang
   mail@philippewang.info

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-16 13:41           ` Mattias Engdegård
  2009-12-16 13:47             ` Eray Ozkural
@ 2010-01-12  6:15             ` Eray Ozkural
  1 sibling, 0 replies; 14+ messages in thread
From: Eray Ozkural @ 2010-01-12  6:15 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: jon, caml-list

On Wed, Dec 16, 2009 at 3:41 PM, Mattias Engdegård <mattias@virtutech.se> wrote:
> Of course, these days we have llvm which has a lot going for it.

Suppose we compile to llvm, can we run llvm code on GPU's?


-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] How to write a CUDA kernel in ocaml?
  2009-12-15 23:18       ` David Allsopp
  2009-12-16  0:39         ` Jon Harrop
@ 2009-12-16  6:26         ` Basile STARYNKEVITCH
  1 sibling, 0 replies; 14+ messages in thread
From: Basile STARYNKEVITCH @ 2009-12-16  6:26 UTC (permalink / raw)
  To: David Allsopp; +Cc: 'Eray Ozkural', caml-list

David Allsopp wrote:
> Basile Starynkevitch wrote: 
>> Eray Ozkural wrote:
>>
>> Compiling Ocaml to efficient C is not easy and probably impossible (or
>> extremely difficult) in the general case. In
>> particular, tail recursive calls are essential in Ocaml, and are not
>> available in C in most compilers.
> 
> What's this based on (out of interest)? Most C compilers don't necessarily
> identify tail recursion in *C* code but if you're emitting C as an OCaml
> backend then there's no reason not to convert tail recursive *OCaml*
> functions to C code based on goto or similar looping constructs (yes, you'd
> almost-always-virtually-never use goto in a hand-crafted C program without
> apologising profusely at Dijkstra's grave but if you're using C as an
> intermediate language then that's a different matter). If I recall correctly
> from an old post on this list, this is how Felix handles tail recursion when
> translating Felix to C++

I am not sure this can work when tail-calling an unknown function. How would you translate to C

let rec f g x =
   if x < 0 then
      g x (*tail recursive call to an unknown function*)
   else
      f g (x - 1)
;;

ocamlopt -S gives (I am keeping only the crucial code)

camlEsstr__f_58:
.L101:
	movq	%rax, %rsi
	cmpq	$1, %rbx
	jge	.L100
	movq	(%rsi), %rdi
	movq	%rbx, %rax
	movq	%rsi, %rbx
	jmp	*%rdi                      ;;;; tail rec call to g
	.align	4
.L100:
	addq	$-2, %rbx
	movq	%rsi, %rax
	jmp	.L101

As you can see, the tail recursive call is translated to an undirect jmp.

Please explain how you would translate the above example to portable & efficient C.

Regards.

-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-01-12  6:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-15 15:37 How to write a CUDA kernel in ocaml? Eray Ozkural
2009-12-15 16:07 ` [Caml-list] " Basile STARYNKEVITCH
2009-12-15 16:20   ` Eray Ozkural
2009-12-15 16:29     ` Basile STARYNKEVITCH
2009-12-15 17:46       ` Eray Ozkural
2009-12-15 23:18       ` David Allsopp
2009-12-16  0:39         ` Jon Harrop
2009-12-16 13:41           ` Mattias Engdegård
2009-12-16 13:47             ` Eray Ozkural
2009-12-17  0:34               ` Philippe Wang
2009-12-17  6:45                 ` Eray Ozkural
2009-12-17 10:59                   ` Philippe Wang
2010-01-12  6:15             ` Eray Ozkural
2009-12-16  6:26         ` Basile STARYNKEVITCH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox