* More registers in modern day CPUs @ 2007-09-06 6:20 Tom 2007-09-06 7:17 ` [Caml-list] " skaller ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Tom @ 2007-09-06 6:20 UTC (permalink / raw) To: Caml-list List [-- Attachment #1: Type: text/plain, Size: 622 bytes --] (This question may not be OCaml specific, but I guess it is not specific at all, and there are quite some people here that have implemented compilers, so I post it here...) I was thinking about compiler implementation recently, and figured that it is difficult to design the compiler for a variable number of hardware registers - compared for designing a compiler witha fixed number of registers. However, would it be possible to "emulate" cpu registers using software? By keeping registers in the main memory, but accessing them often enough to keep them in primary cache? That would be quite fast I believe... - Tom [-- Attachment #2: Type: text/html, Size: 662 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] More registers in modern day CPUs 2007-09-06 6:20 More registers in modern day CPUs Tom @ 2007-09-06 7:17 ` skaller 2007-09-06 9:07 ` Richard Jones 2007-09-06 14:55 ` Chris King 2 siblings, 0 replies; 28+ messages in thread From: skaller @ 2007-09-06 7:17 UTC (permalink / raw) To: Tom; +Cc: Caml-list List On Thu, 2007-09-06 at 08:20 +0200, Tom wrote: > (This question may not be OCaml specific, you'd be surprised .. > However, would it be possible to "emulate" cpu registers using > software? By keeping registers in the main memory, but accessing them > often enough to keep them in primary cache? That would be quite fast I > believe... The technique is called 'boxing'. This is one reason why Ocaml is so fast, when you'd expect the extra dereferences required all the time to be a big penalty. Instead, if the address is used but not the data (eg generic operation) cache is saved compared to an expanded representation. The cache is loaded if the pointer is dereferenced, and subsequent derefs are effectively free provided only a small number of boxes is opened: there is an extra cost of one word for the address, which is the price of the lazy loading, and is amortised away by generic operations. This is even faster than one might think because cache can do speculative preload of the pointed at data. [Does Ocaml bother to generate those instructions?] IMHO, the main purpose of registers is to organise the interleaving of parallel operations (memory reads mainly) based on dependencies. They differ from main memory (and cache) in that they're usually thread local (whereas all the other stuff is shared) so they're expressing coupling between data and flow of control. for example in: R1 = a R2 = b R3 = R1 + R2 R4 = c R5 = d R6 = R4 + R5 you'd be mainly wrong to think of these instructions as operating on data. No. Not today. These instructions are chopping up the control flow into parallel threads: a b c d | | | | V V V V + + | | I think that's the main reason for registers, not memory operands. Registers only need a few bits to name, so the dispatching to functional units is easier to calculate with less hardware. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] More registers in modern day CPUs 2007-09-06 6:20 More registers in modern day CPUs Tom 2007-09-06 7:17 ` [Caml-list] " skaller @ 2007-09-06 9:07 ` Richard Jones 2007-09-06 14:55 ` Chris King 2 siblings, 0 replies; 28+ messages in thread From: Richard Jones @ 2007-09-06 9:07 UTC (permalink / raw) To: Tom; +Cc: Caml-list List On Thu, Sep 06, 2007 at 08:20:06AM +0200, Tom wrote: > I was thinking about compiler implementation recently, and figured that it > is difficult to design the compiler for a variable number of hardware > registers - compared for designing a compiler witha fixed number of > registers. > > However, would it be possible to "emulate" cpu registers using software? By > keeping registers in the main memory, but accessing them often enough to > keep them in primary cache? That would be quite fast I believe... You might want to grab a good book on compilers and read about register allocation. Or take a look at this Wikipedia page: http://en.wikipedia.org/wiki/Register_allocation Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] More registers in modern day CPUs 2007-09-06 6:20 More registers in modern day CPUs Tom 2007-09-06 7:17 ` [Caml-list] " skaller 2007-09-06 9:07 ` Richard Jones @ 2007-09-06 14:55 ` Chris King 2007-09-06 15:17 ` Brian Hurt ` (2 more replies) 2 siblings, 3 replies; 28+ messages in thread From: Chris King @ 2007-09-06 14:55 UTC (permalink / raw) To: Tom; +Cc: Caml-list List On 9/6/07, Tom <tom.primozic@gmail.com> wrote: > However, would it be possible to "emulate" cpu registers using software? By > keeping registers in the main memory, but accessing them often enough to > keep them in primary cache? That would be quite fast I believe... This makes me wonder... why have registers to begin with? I wonder how feasible a chip with a, say, 256-byte "register-level" cache would be. - Chris ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] More registers in modern day CPUs 2007-09-06 14:55 ` Chris King @ 2007-09-06 15:17 ` Brian Hurt 2007-09-06 15:54 ` Harrison, John R 2007-09-06 20:48 ` [Caml-list] More registers in modern day CPUs Richard Jones [not found] ` <20070906204524.GB10798@furbychan.cocan.org> 2 siblings, 1 reply; 28+ messages in thread From: Brian Hurt @ 2007-09-06 15:17 UTC (permalink / raw) To: Chris King; +Cc: Tom, Caml-list List [-- Attachment #1: Type: text/plain, Size: 1824 bytes --] Chris King wrote: >On 9/6/07, Tom <tom.primozic@gmail.com> wrote: > > >>However, would it be possible to "emulate" cpu registers using software? By >>keeping registers in the main memory, but accessing them often enough to >>keep them in primary cache? That would be quite fast I believe... >> >> > >This makes me wonder... why have registers to begin with? I wonder >how feasible a chip with a, say, 256-byte "register-level" cache would >be. > > Such chips exist. The Itanium is one example. The problem is gate delays. The purpose of registers is to be faster than L1 cache (which typically has a 2-3 clock delay associated with it). But the more registers you have, the more gate delays you need to read or write registers- the naive implementation takes O(log N) gate delays to access O(N) registers- reality is more complicated than this. But the rule more registers = more gate delays holds true. And these gate delays translate into a slower chip (one way or another- either you have to lower your clock rate or add more pipeline stages or both to deal with the larger register cache). Of course, more registers make compilers happy, and lowers pressure on the cache bandwidth (as the compiler doesn't need to spill/refill registers quite so often). This is why the 64-bit x86 is generally faster than the 32-bit x86- going from 8 (6 in practice) to 16 (14 in practice) registers was a big step up. The Itanium has a large enough register set that it's performance is probably getting hurt by it, but it's hard to tell with the everything else going on. The sweet spot for register sets seems to be in the 16-64 range- less than that, and you're being hurt by the increased memory pressure, more than that and you're probably being hurt by the slower register addressing. Brian [-- Attachment #2: Type: text/html, Size: 2396 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [Caml-list] More registers in modern day CPUs 2007-09-06 15:17 ` Brian Hurt @ 2007-09-06 15:54 ` Harrison, John R 2007-09-06 17:10 ` David MENTRE 0 siblings, 1 reply; 28+ messages in thread From: Harrison, John R @ 2007-09-06 15:54 UTC (permalink / raw) To: Caml-list List [-- Attachment #1: Type: text/plain, Size: 982 bytes --] Chris King wrote: | This makes me wonder... why have registers to begin with? I wonder how | feasible a chip with a, say, 256-byte "register-level" cache would be. and Brian Hurt said: | Such chips exist. The Itanium is one example. The Itanium is indeed an example of an architecture with a relatively large number of registers, and where the register file has certain memory-like features such as automatic indexing offsets. But as I understood it, Chris was proposing the opposite: have few or no registers, and rely on main memory instead, with some extra fast inner level cache to speed it up. Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba Cell processor have/had a dedicated area of fast memory, rather like a giant memory-based register file. In each case this is explicitly visible to user-level software rather than being a cache in the usual sense. John. [-- Attachment #2: Type: text/html, Size: 5883 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] More registers in modern day CPUs 2007-09-06 15:54 ` Harrison, John R @ 2007-09-06 17:10 ` David MENTRE 2007-09-06 18:27 ` Harrison, John R 2007-09-06 18:28 ` Christophe Raffalli 0 siblings, 2 replies; 28+ messages in thread From: David MENTRE @ 2007-09-06 17:10 UTC (permalink / raw) To: Harrison, John R; +Cc: Caml-list List Hello, "Harrison, John R" <john.r.harrison@intel.com> writes: > Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba > Cell processor have/had a dedicated area of fast memory, rather like a > giant memory-based register file. The Cell SPE has 128 registers of 128 bits. http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/FC857AE550F7EB83872571A80061F788/$file/CBE_Tutorial_v2.1_1March2007.pdf "Synergistic Processor Elements (SPEs) The eight SPEs are SIMD processors optimized for data-rich operations allocated to them by the PPE. Each of these identical elements contains a RISC core, 256-KB, software-controlled local store for instructions and data, and a large (128-bit, 128-entry) unified register file." Yours, d. -- GPG/PGP key: A3AD7A2A David MENTRE <dmentre@linux-france.org> 5996 CC46 4612 9CA4 3562 D7AC 6C67 9E96 A3AD 7A2A ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [Caml-list] More registers in modern day CPUs 2007-09-06 17:10 ` David MENTRE @ 2007-09-06 18:27 ` Harrison, John R 2007-09-06 18:28 ` Christophe Raffalli 1 sibling, 0 replies; 28+ messages in thread From: Harrison, John R @ 2007-09-06 18:27 UTC (permalink / raw) To: David MENTRE; +Cc: Caml-list List | > Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba | > Cell processor have/had a dedicated area of fast memory, rather like a | > giant memory-based register file. | | The Cell SPE has 128 registers of 128 bits. Yes, but I was referring to the "256-KB software controlled local store" rather than the actual register file. I didn't mean to imply that the Cell has few actual registers. (Though the transputer does, in fact.) John. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] More registers in modern day CPUs 2007-09-06 17:10 ` David MENTRE 2007-09-06 18:27 ` Harrison, John R @ 2007-09-06 18:28 ` Christophe Raffalli 2007-09-06 18:48 ` Brian Hurt 2007-09-06 18:48 ` Pal-Kristian Engstad 1 sibling, 2 replies; 28+ messages in thread From: Christophe Raffalli @ 2007-09-06 18:28 UTC (permalink / raw) To: David MENTRE; +Cc: Harrison, John R, Caml-list List [-- Attachment #1: Type: text/plain, Size: 985 bytes --] David MENTRE a écrit : > Hello, > > "Harrison, John R" <john.r.harrison@intel.com> writes: > > >> Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba >> Cell processor have/had a dedicated area of fast memory, rather like a >> giant memory-based register file. >> > > The Cell SPE has 128 registers of 128 bits. > > http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/FC857AE550F7EB83872571A80061F788/$file/CBE_Tutorial_v2.1_1March2007.pdf > > "Synergistic Processor Elements (SPEs) The eight SPEs are SIMD > processors optimized for data-rich operations allocated to them by the > PPE. Each of these identical elements contains a RISC core, 256-KB, > software-controlled local store for instructions and data, and a large > (128-bit, 128-entry) unified register file." > > > Yours, > d. > And apart from the playstation III (under linux for sure ;-), what kind of not too expensive computer can we buy with Cell Processors inside ? Regards, C. [-- Attachment #2: Christophe.Raffalli.vcf --] [-- Type: text/x-vcard, Size: 298 bytes --] begin:vcard fn:Christophe Raffalli n:Raffalli;Christophe org:LAMA (UMR 5127) email;internet:christophe.raffalli@univ-savoie.fr title;quoted-printable:Ma=C3=AEtre de conf=C3=A9rences tel;work:+33 4 79 75 81 03 note:http://www.lama.univ-savoie.fr/~raffalli x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] More registers in modern day CPUs 2007-09-06 18:28 ` Christophe Raffalli @ 2007-09-06 18:48 ` Brian Hurt 2007-09-06 18:48 ` Pal-Kristian Engstad 1 sibling, 0 replies; 28+ messages in thread From: Brian Hurt @ 2007-09-06 18:48 UTC (permalink / raw) To: Christophe Raffalli; +Cc: Caml-list List [-- Attachment #1: Type: text/plain, Size: 251 bytes --] Christophe Raffalli wrote: > >And apart from the playstation III (under linux for sure ;-), what kind >of not too expensive computer >can we buy with Cell Processors inside ? > > > At that, they're cheaper than the Itanics. Er, Itaniums. Brian [-- Attachment #2: Type: text/html, Size: 617 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] More registers in modern day CPUs 2007-09-06 18:28 ` Christophe Raffalli 2007-09-06 18:48 ` Brian Hurt @ 2007-09-06 18:48 ` Pal-Kristian Engstad 2007-11-20 15:32 ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Mike Hogan 1 sibling, 1 reply; 28+ messages in thread From: Pal-Kristian Engstad @ 2007-09-06 18:48 UTC (permalink / raw) To: Christophe Raffalli; +Cc: David MENTRE, Harrison, John R, Caml-list List Hi, IBM sells their IBM BladeCenter QS20 blade for around $20,000, which may be a bit much for most people. Instead, why not install Linux on the PS3? Or buy 3 or 4, for the price of one "gaming PC"? For instance, http://www.youtube.com/watch?v=oLte5f34ya8 Thanks, PKE. Christophe Raffalli wrote: > David MENTRE a écrit : > >> Hello, >> >> "Harrison, John R" <john.r.harrison@intel.com> writes: >> >> >> >>> Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba >>> Cell processor have/had a dedicated area of fast memory, rather like a >>> giant memory-based register file. >>> >>> >> The Cell SPE has 128 registers of 128 bits. >> >> http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/FC857AE550F7EB83872571A80061F788/$file/CBE_Tutorial_v2.1_1March2007.pdf >> >> "Synergistic Processor Elements (SPEs) The eight SPEs are SIMD >> processors optimized for data-rich operations allocated to them by the >> PPE. Each of these identical elements contains a RISC core, 256-KB, >> software-controlled local store for instructions and data, and a large >> (128-bit, 128-entry) unified register file." >> >> >> Yours, >> d. >> >> > And apart from the playstation III (under linux for sure ;-), what kind > of not too expensive computer > can we buy with Cell Processors inside ? > > Regards, > C. > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- Pål-Kristian Engstad (engstad@naughtydog.com), Lead Graphics & Engine Programmer, Naughty Dog, Inc., 1601 Cloverfield Blvd, 6000 North, Santa Monica, CA 90404, USA. Ph.: (310) 633-9112. "Most of us would do well to remember that there is a reason Carmack is Carmack, and we are not Carmack.", Jonathan Blow, 2/1/2006, GD Algo Mailing List ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) 2007-09-06 18:48 ` Pal-Kristian Engstad @ 2007-11-20 15:32 ` Mike Hogan 2007-11-21 17:20 ` Richard Jones 2007-12-02 10:14 ` [Caml-list] OCalm " Xavier Leroy 0 siblings, 2 replies; 28+ messages in thread From: Mike Hogan @ 2007-11-20 15:32 UTC (permalink / raw) To: caml-list I have recently compiled OCaml 3.10 for the PS3 running Yellow Dog Linux. Seems to work fine, but I haven't tested it rigorously (and at this point, I wouldn't even know how to test it ... um ...what's the opposite of "rigorously"? ... non-rigorously?) At any rate, I would be interested in learning a little more about how to build an open source item like this for a particular platform and then contribute back to the community (i.e. how to test to standards for this community, how to create RPMs and where to post them etc.). I'd also be interested in any ideas for starting to explore whether/how the Cell BE's power can be exploited using OCaml (hopefully simple ideas at the outset, I'm a newb on several fronts here). Thanks, mike hogan Pal-Kristian Engstad wrote: > > Hi, > > IBM sells their IBM BladeCenter QS20 blade for around $20,000, which may > be a bit much for most people. Instead, why not install Linux on the > PS3? Or buy 3 or 4, for the price of one "gaming PC"? For instance, > http://www.youtube.com/watch?v=oLte5f34ya8 > > Thanks, > > PKE. > > Christophe Raffalli wrote: >> David MENTRE a écrit : >> >>> Hello, >>> >>> "Harrison, John R" <john.r.harrison@intel.com> writes: >>> >>> >>> >>>> Both the old Inmos Transputer and the the more recent IBM/Sony/Toshiba >>>> Cell processor have/had a dedicated area of fast memory, rather like a >>>> giant memory-based register file. >>>> >>>> >>> The Cell SPE has 128 registers of 128 bits. >>> >>> http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/FC857AE550F7EB83872571A80061F788/$file/CBE_Tutorial_v2.1_1March2007.pdf >>> >>> "Synergistic Processor Elements (SPEs) The eight SPEs are SIMD >>> processors optimized for data-rich operations allocated to them by the >>> PPE. Each of these identical elements contains a RISC core, 256-KB, >>> software-controlled local store for instructions and data, and a large >>> (128-bit, 128-entry) unified register file." >>> >>> >>> Yours, >>> d. >>> >>> >> And apart from the playstation III (under linux for sure ;-), what kind >> of not too expensive computer >> can we buy with Cell Processors inside ? >> >> Regards, >> C. >> >> _______________________________________________ >> Caml-list mailing list. Subscription management: >> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list >> Archives: http://caml.inria.fr >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs >> > > -- > Pål-Kristian Engstad (engstad@naughtydog.com), > Lead Graphics & Engine Programmer, > Naughty Dog, Inc., 1601 Cloverfield Blvd, 6000 North, > Santa Monica, CA 90404, USA. Ph.: (310) 633-9112. > > "Most of us would do well to remember that there is a reason Carmack > is Carmack, and we are not Carmack.", > Jonathan Blow, 2/1/2006, GD Algo Mailing List > > > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a13858952 Sent from the Caml Discuss2 mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) 2007-11-20 15:32 ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Mike Hogan @ 2007-11-21 17:20 ` Richard Jones 2007-11-21 19:05 ` [Caml-list] OCaml " Mike Hogan 2007-11-23 6:44 ` Mike Hogan 2007-12-02 10:14 ` [Caml-list] OCalm " Xavier Leroy 1 sibling, 2 replies; 28+ messages in thread From: Richard Jones @ 2007-11-21 17:20 UTC (permalink / raw) To: Mike Hogan; +Cc: caml-list On Tue, Nov 20, 2007 at 07:32:34AM -0800, Mike Hogan wrote: > I have recently compiled OCaml 3.10 for the PS3 running Yellow Dog Linux. > Seems to work fine, but I haven't tested it rigorously (and at this point, I > wouldn't even know how to test it ... um ...what's the opposite of > "rigorously"? ... non-rigorously?) Native compiler? 64 bits?? Which version of OCaml??? Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] OCaml on Sony PS3 (was Re: More registers in modern day CPUs) 2007-11-21 17:20 ` Richard Jones @ 2007-11-21 19:05 ` Mike Hogan 2007-11-23 6:44 ` Mike Hogan 1 sibling, 0 replies; 28+ messages in thread From: Mike Hogan @ 2007-11-21 19:05 UTC (permalink / raw) To: caml-list I'll try to check out the details tonight, but I have to confess that I am a newb's newb -- never compiled a line of open source in my life until about a week-and-a-half ago. Offhand, I'm not sure about the version beyond the fact that it's "3.10" (something?). It is some labeled version (not the development trunk) and I'm pretty sure that I built it as plain PPC and in byte-code interpreted mode (i.e. there was no ocamlopt after the build). I did end up with an ocamlc.opt and actually copied ocamlc.opt to "ocamlopt" in order to build Coq 8.1pl2 (there seems to be a problem w/ the builds in Coq under the "opt=byte" option where it insists on using ocamlopt in some cases, despite "opt=byte" option being asserted). In light of your question, I'm hoping that I can manage to improve the builds for the Cell BE w/o too much trouble (maybe by following the pattern of some other architectures in the build?). PPC native would be great, ppc64 would be fantastic. As an aside, Coqide seems to run proofs noticeably faster on my PS3 than on my XP laptop (1.86GHz Centrino), even though the PS3 is built in byte-interpreted mode (I'm presuming that Coq on Windows uses a native build). That was a nice surprise. My larger goal is to try to use camlp4 as a way to generate highly parallel Cell SPU code -- kind of modeled after CorePy's "synthetic programming" idea. Hopefully any lack of a native build for PS3 won't be a roadblock for this. Richard Jones-4 wrote: > > On Tue, Nov 20, 2007 at 07:32:34AM -0800, Mike Hogan wrote: >> I have recently compiled OCaml 3.10 for the PS3 running Yellow Dog Linux. >> Seems to work fine, but I haven't tested it rigorously (and at this >> point, I >> wouldn't even know how to test it ... um ...what's the opposite of >> "rigorously"? ... non-rigorously?) > > Native compiler? 64 bits?? Which version of OCaml??? > > Rich. > > -- > Richard Jones > Red Hat > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a13883899 Sent from the Caml Discuss2 mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] OCaml on Sony PS3 (was Re: More registers in modern day CPUs) 2007-11-21 17:20 ` Richard Jones 2007-11-21 19:05 ` [Caml-list] OCaml " Mike Hogan @ 2007-11-23 6:44 ` Mike Hogan 1 sibling, 0 replies; 28+ messages in thread From: Mike Hogan @ 2007-11-23 6:44 UTC (permalink / raw) To: caml-list Ok, it's 3.10.0 When I did ./configure -host powerpc-ydl-linux the resulting report seemed to suggest that it was going to build native 32-bit, but native tools like ocamlopt were not present post-build. At this point I'm too inexperienced to unravel the mysteries behind the less-than-ideal result. Richard Jones-4 wrote: > > On Tue, Nov 20, 2007 at 07:32:34AM -0800, Mike Hogan wrote: >> I have recently compiled OCaml 3.10 for the PS3 running Yellow Dog Linux. >> Seems to work fine, but I haven't tested it rigorously (and at this >> point, I >> wouldn't even know how to test it ... um ...what's the opposite of >> "rigorously"? ... non-rigorously?) > > Native compiler? 64 bits?? Which version of OCaml??? > > Rich. > > -- > Richard Jones > Red Hat > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a13907602 Sent from the Caml Discuss2 mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) 2007-11-20 15:32 ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Mike Hogan 2007-11-21 17:20 ` Richard Jones @ 2007-12-02 10:14 ` Xavier Leroy 2007-12-02 16:22 ` Mike Hogan 2007-12-04 2:29 ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Gordon Henriksen 1 sibling, 2 replies; 28+ messages in thread From: Xavier Leroy @ 2007-12-02 10:14 UTC (permalink / raw) To: Mike Hogan; +Cc: caml-list > I have recently compiled OCaml 3.10 for the PS3 running Yellow Dog Linux. > Seems to work fine, but I haven't tested it rigorously (and at this point, I > wouldn't even know how to test it ... um ...what's the opposite of > "rigorously"? ... non-rigorously?) I confirm that OCaml compiles correctly on the PS/3 with YDL. The native-code compiler works fine (in 32-bit mode) provided it's configured with -host powerpc-unknown-linux. (Autodetection reports powerpc64-unknown-linux, even though the default compilation mode on this distro is 32-bit; I'll hack the configure script to work around this issue.) Of course, the generated code runs on the PPC core of the Cell processor, not on the SPU cores. Performance is unimpressive: about 1/5th of that of a recent Intel Core2 processor. > I'd also be interested in any ideas for starting to explore whether/how the > Cell BE's power can be exploited using OCaml (hopefully simple ideas at the > outset, I'm a newb on several fronts here). The SPU cores only have 256 Kb of local memory, so there is no hope to run a Caml run-time system on them. For some applications (linear algebra, bignums), it might be possible to link with C libraries that offload work to the SPU cores. A more general but extremely difficult approach is two-level programming, where the Caml program, running on the PPC core, generates programs in a simple data-parallel language which is then compiled on the fly to SPU code. Such an approach could also target graphics coprocessors (the "GPGPU" approach). But I have no idea what such an intermediate language would look like. - Xavier Leroy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) 2007-12-02 10:14 ` [Caml-list] OCalm " Xavier Leroy @ 2007-12-02 16:22 ` Mike Hogan 2007-12-02 22:19 ` Konrad Meyer 2007-12-04 2:29 ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Gordon Henriksen 1 sibling, 1 reply; 28+ messages in thread From: Mike Hogan @ 2007-12-02 16:22 UTC (permalink / raw) To: caml-list Xavier Leroy wrote: > > I confirm that OCaml compiles correctly on the PS/3 with YDL. The > native-code compiler works fine (in 32-bit mode) provided it's > configured with -host powerpc-unknown-linux. (Autodetection reports > powerpc64-unknown-linux, even though the default compilation mode on > this distro is 32-bit; I'll hack the configure script to work around > this issue.) > Nice -- I'll try the "host powerpc-unknown-linux" option. Xavier Leroy wrote: > > A more general but extremely difficult approach is two-level > programming, where the Caml program, running on the PPC core, > generates programs in a simple data-parallel language which is then > compiled on the fly to SPU code. > This is exactly what I would like to do. There is a Python Extension for the PS3 SPU's called "CorePy" that can be used to more-or-less directly generate assembly instructions for the PPC, its associated AltiVec and the SPUs. In essence, CorePy makes a class for each particular processor on your system and this class has processor-specific instructions as methods. The extensions take care of the details for loading the code, binding between the Python interpreter and the assembler that was generated on the fly etc. Xavier Leroy wrote: > > Such an approach could also target > graphics coprocessors (the "GPGPU" approach). But I have no idea what > such an intermediate language would look like. > This would actually push the system's abilities up by an order of magnitude in some cases, but unfortunately the "Other OS" hypervisor on the PS3 bars access to the GPU. It's a shame, since the PS3 GPU is supposed to be one of NVIDIA's hottest chips. -- View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a14116972 Sent from the Caml Discuss2 mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) 2007-12-02 16:22 ` Mike Hogan @ 2007-12-02 22:19 ` Konrad Meyer 2007-12-03 0:09 ` [Caml-list] OCaml " Mike Hogan 0 siblings, 1 reply; 28+ messages in thread From: Konrad Meyer @ 2007-12-02 22:19 UTC (permalink / raw) To: caml-list [-- Attachment #1: Type: text/plain, Size: 569 bytes --] Quoth Mike Hogan: > This would actually push the system's abilities up by an order of magnitude > in some cases, but unfortunately the "Other OS" hypervisor on the PS3 bars > access to the GPU. It's a shame, since the PS3 GPU is supposed to be one of > NVIDIA's hottest chips. Actually, (and I don't know much about it, sorry) there's a group of folks over at ps2dev.org trying to get at the GPU. Just thought I'd share. < http://forums.ps2dev.org/viewtopic.php?t=8364 > Regards, -- Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/ [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] OCaml on Sony PS3 (was Re: More registers in modern day CPUs) 2007-12-02 22:19 ` Konrad Meyer @ 2007-12-03 0:09 ` Mike Hogan 2007-12-03 20:16 ` minithread (was OCaml on Sony PS3) Christophe Raffalli 0 siblings, 1 reply; 28+ messages in thread From: Mike Hogan @ 2007-12-03 0:09 UTC (permalink / raw) To: caml-list Interesting ... but a little rough for my tastes. I'm always amazed at how determined some folks are to hack their way into stuff. I think that a PC with a decent GFORCE gpu and the CUDA library might be the easier route for CAML -> GPGPU oriented experiments. BTW, the idea of an OCaml based DSL for the cell processor or various GPUs is a proposed summer intern project at Jane St. Capital's site (http://osp2007.janestcapital.com/suggested-projects/), so there seems to be an audience for this kind of stuff. In fact, GPGPU in general seems like an incredibly hot topic right now and NVIDIA's support by way of the CUDA architecture is kind of an interesting development. Konrad Meyer-2 wrote: > > Quoth Mike Hogan: >> This would actually push the system's abilities up by an order of >> magnitude >> in some cases, but unfortunately the "Other OS" hypervisor on the PS3 >> bars >> access to the GPU. It's a shame, since the PS3 GPU is supposed to be one >> of >> NVIDIA's hottest chips. > > Actually, (and I don't know much about it, sorry) there's a group of folks > over at ps2dev.org trying to get at the GPU. Just thought I'd share. > > < http://forums.ps2dev.org/viewtopic.php?t=8364 > > > Regards, > -- > Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/ > > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a14121946 Sent from the Caml Discuss2 mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 28+ messages in thread
* minithread (was OCaml on Sony PS3) 2007-12-03 0:09 ` [Caml-list] OCaml " Mike Hogan @ 2007-12-03 20:16 ` Christophe Raffalli 2007-12-04 14:25 ` [Caml-list] " David MENTRE ` (3 more replies) 0 siblings, 4 replies; 28+ messages in thread From: Christophe Raffalli @ 2007-12-03 20:16 UTC (permalink / raw) Cc: caml-list [-- Attachment #1.1: Type: text/plain, Size: 2298 bytes --] I propose the following idea for OCaml on Cell PowerPC or multicore machine (this is just an idea, there ay be a lot of thing I did not see ... in other word there is probably a lot of work to do, but may be not too much): - Create two functions and one data type to start "mini-thread": type 'a result_channel launch : int -> ('a -> 'b) -> 'a -> 'b result_channel. get_result : 'b result_channel list -> 'b option (or many similar functions to wait with or without blocking the result for one or more mini-thread). Now the point is this: each mini-thread has its own minor-heap whose size is given as the first argument with the following restrictions: 1) the minor heap is used as a cache : access to the major heap copy the data in the minor heap. One need to mix the copying minor GC with standard caching algorithm. 2) to ease the task 1), mutation of data in the heaps of the main thread by a mini-thread is illegal (raises an exception in the main thread ? Static check ?). This includes the arguments of the mini-thread. 3) a mini-thread can not start another mini-thread (raises an exception in the main thread ? Static check) 4) 2-3) imply that a mini-thread can not access data of other mini-threads and that the only way for the main thread to get values from a mini-thread is via their 'b result_channel. Thus, if you have a main thread M and many mini-threads T1 ... TN runnnig, Ti can only acces its own data and the data of M (read only). And, M can not acces the data of T1 ... TN. If you launch one minithread per SPU or CORE with a minor heap of the correct size and you fine tune you application to produce not too much cache misses, then, I think this simple model could be usefull ???? Cheers, Christophe -- Christophe Raffalli Universite de Savoie Batiment Le Chablais, bureau 21 73376 Le Bourget-du-Lac Cedex tel: (33) 4 79 75 81 03 fax: (33) 4 79 75 87 42 mail: Christophe.Raffalli@univ-savoie.fr www: http://www.lama.univ-savoie.fr/~RAFFALLI --------------------------------------------- IMPORTANT: this mail is signed using PGP/MIME At least Enigmail/Mozilla, mutt or evolution can check this signature. The public key is stored on www.keyserver.net --------------------------------------------- [-- Attachment #1.2: Christophe_Raffalli.vcf --] [-- Type: text/x-vcard, Size: 310 bytes --] begin:vcard fn:Christophe Raffalli n:Raffalli;Christophe org:LAMA (UMR 5127) email;internet:christophe.raffalli@univ-savoie.fr title;quoted-printable:Ma=C3=AEtre de conf=C3=A9rences tel;work:+33 4 79 75 81 03 note:http://www.lama.univ-savoie.fr/~raffalli x-mozilla-html:TRUE version:2.1 end:vcard [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 249 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] minithread (was OCaml on Sony PS3) 2007-12-03 20:16 ` minithread (was OCaml on Sony PS3) Christophe Raffalli @ 2007-12-04 14:25 ` David MENTRE 2007-12-04 14:37 ` Basile STARYNKEVITCH ` (2 subsequent siblings) 3 siblings, 0 replies; 28+ messages in thread From: David MENTRE @ 2007-12-04 14:25 UTC (permalink / raw) To: Christophe Raffalli; +Cc: caml-list Hello, 2007/12/3, Christophe Raffalli <Christophe.Raffalli@univ-savoie.fr>: > If you launch one minithread per SPU or CORE with a minor heap of the > correct size and you fine tune you application to produce not too much > cache misses, then, I think this simple model could be usefull ???? I might have not completely understood your proposal but it seems to me that those mini-threads do not solve the issue. In the Cell architecture, the SPU are *independent* processors. They access the main memory through DMA like operations and do not have cache. In other words, for you mini-threads to work on the SPU, you need to fit the mini-thread s' data, code and environment (e.g. GC) in 256 KB of memory. As Xavier said, it seems quite difficult if not impossible. Yours, david ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] minithread (was OCaml on Sony PS3) 2007-12-03 20:16 ` minithread (was OCaml on Sony PS3) Christophe Raffalli 2007-12-04 14:25 ` [Caml-list] " David MENTRE @ 2007-12-04 14:37 ` Basile STARYNKEVITCH 2007-12-04 16:25 ` Mattias Engdegård 2007-12-04 17:33 ` Gerd Stolpmann 2007-12-04 18:00 ` Mike Hogan 3 siblings, 1 reply; 28+ messages in thread From: Basile STARYNKEVITCH @ 2007-12-04 14:37 UTC (permalink / raw) To: Christophe Raffalli; +Cc: caml-list Christophe Raffalli wrote: > I propose the following idea for OCaml on Cell PowerPC or multicore > machine (this is just an idea, > there ay be a lot of thing I did not see ... in other word there is > probably a lot of work to do, but may be not too much): > > - Create two functions and one data type to start "mini-thread": As David MENTRE explained, this is not very realistic. However, (one of the) the CELL coprocessor -eg SPU) might be used to implemented Ocaml garbage collector. A copying GC has to move quite a lot of data, and it could be possible that CELL's coprocessors could be useful for that (assuming that they access memory as quickly as the processor). I don't know if Gallium has resources for that (I suppose not, except perhaps maybe for an internship?), and I have no idea if it is easily doable or nearly impossible (maybe the current SPU limitations -in particular code size- are too strong). Anyway, it might not help that much performance on Cells systems (eg PS3) because the GC is probably at most eating less than half of the resources (Damien & Xavier told me recently that the GC is usually using less than 20% of CPU, the KnuthBendix test case on ocamlopt being unusual to eat about a third of the CPU time). The Ocaml GC is quite good (a big bravo to Damien Doligez & Xavier Leroy). I still think that SPU on PS3 are only useful for games, or specialized (e.g. graphical) applications. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} *** ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] minithread (was OCaml on Sony PS3) 2007-12-04 14:37 ` Basile STARYNKEVITCH @ 2007-12-04 16:25 ` Mattias Engdegård 0 siblings, 0 replies; 28+ messages in thread From: Mattias Engdegård @ 2007-12-04 16:25 UTC (permalink / raw) To: basile; +Cc: Christophe.Raffalli, caml-list >However, (one of the) the CELL coprocessor -eg SPU) might be used to >implemented Ocaml garbage collector. > >A copying GC has to move quite a lot of data, and it could be possible >that CELL's coprocessors could be useful for that (assuming that they >access memory as quickly as the processor). They don't so it isn't, and doing GC by a coprocessor that cannot directly access the memory it manages does not sound very practical. The PPE has the memories of all SPUs mapped into its physical address space, so it could possibly do the GC for them. But again, given the limited amount of SPU-private memory, it would probably not be a useful approach. Better use of the SPUs would be to run computations that can use manual memory management (perhaps not using a heap at all), operating on small chunks of data at a time. Such computations could be described in a simpler language that is more amenable to parallelisation. >I still think that SPU on PS3 are only useful for games, or specialized >(e.g. graphical) applications. Maybe, but there are cell blades with more reasonable amounts of memory, and for experimentation regarding how to use the processor, a PS3 goes quite far and is very economical. Ground-breaking science has been made in less than 256 MB. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] minithread (was OCaml on Sony PS3) 2007-12-03 20:16 ` minithread (was OCaml on Sony PS3) Christophe Raffalli 2007-12-04 14:25 ` [Caml-list] " David MENTRE 2007-12-04 14:37 ` Basile STARYNKEVITCH @ 2007-12-04 17:33 ` Gerd Stolpmann 2007-12-04 18:00 ` Mike Hogan 3 siblings, 0 replies; 28+ messages in thread From: Gerd Stolpmann @ 2007-12-04 17:33 UTC (permalink / raw) To: Christophe Raffalli; +Cc: caml-list Am Montag, den 03.12.2007, 21:16 +0100 schrieb Christophe Raffalli: > Now the point is this: each mini-thread has its own minor-heap whose > size is given as the first argument with the following restrictions: What could work is that you really switch to a copying collector. That means that there are two minor heaps of fixed size, and a minor GC copies one heap to the other. While one heap is used, the other is unused. Every coprocessor would have such a pair of heaps. Of course, this means that: - You have very limited memory, and you have to set its max size in advance. This heap cannot be extended as needed. But this is ok for a coprocessor. - You waste half of the memory. E.g. if you want to have 64 K of heap, you have to buy 128 K. On the other hand, this saves a lot of code in the OCaml runtime, surely more than 64 K, so this is a net win. - Maybe even this works: The minor GC is done by the main processor, and the other heap is also there. This could work if the GC is not invoked too often. Such a copy collector is very small (the minor_gc.c file in the runtime has less than 300 lines, so you could have a miniature memory manager in only a few K). If you remove most features of the OCaml runtime, there are some chances that it really fits into the remaining memory: no I/O, no generic comparison, no backtraces, no MD5, no lexing, ... You couldn't use these features in the SPU anyway. From the stdlib I would only keep arrays and strings (no lists), and add a communication channel with the main processor. Of course, programming in this context then does not make any fun. I mean you'll get stack overflows really quickly. Maybe you can run very simplistic algorithms. On the one hand I really have doubts whether it makes sense to run OCaml in such an environment, but on the other hand it's fun to have such a thing at all... Gerd > > 1) the minor heap is used as a cache : access to the major heap copy > the data in the minor heap. One need to mix the copying > minor GC with standard caching algorithm. > > 2) to ease the task 1), mutation of data in the heaps of the main thread > by a mini-thread is illegal (raises an exception in the main thread ? > Static check ?). This includes the arguments of the mini-thread. > > 3) a mini-thread can not start another mini-thread (raises an exception > in the main thread ? Static check) > > 4) 2-3) imply that a mini-thread can not access data of other > mini-threads and that the only way for the main thread to > get values from a mini-thread is via their 'b result_channel. Thus, if > you have a main thread M and many mini-threads T1 ... TN > runnnig, Ti can only acces its own data and the data of M (read only). > And, M can not acces the data of T1 ... TN. > > If you launch one minithread per SPU or CORE with a minor heap of the > correct size and you fine tune you application to produce not too much > cache misses, then, I think this simple model could be usefull ???? > > Cheers, > Christophe > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs -- ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] minithread (was OCaml on Sony PS3) 2007-12-03 20:16 ` minithread (was OCaml on Sony PS3) Christophe Raffalli ` (2 preceding siblings ...) 2007-12-04 17:33 ` Gerd Stolpmann @ 2007-12-04 18:00 ` Mike Hogan 3 siblings, 0 replies; 28+ messages in thread From: Mike Hogan @ 2007-12-04 18:00 UTC (permalink / raw) To: caml-list Neat stuff. For anyone genuinely interested in this problem, a look at CorePy may be in order -- it is about the simplest model imaginable for processor-specific access and the Cell interface offers insight into the architectural specifics that would need to be addressed for the Cell. Not knowing Caml at all, really, I wonder if a similar approach can be applied to Caml -- basically "escape" to specialized SPU code wrapped in an encapsulation (a la mini-thread"?). If the SPU code can be autogenerated and transparently integrated using a Caml-based DSL of some sort, then that would be even better. I was also wondering recently if there is any practical possibility of code extraction from Coq to Python in order to make a verified CorePy application (basically CorePy as an IL) -- or is this just swapping one very difficult problem for another? Coq (which also seems to run fine on the PS3) seems to open up some interesting possibilities, for example optimization proofs where some algorithm "X" converted to a high-performance Cell-specific equivalent using some magic transformation "Y" results in the algorithm "Cell-X" whose results are equivalent to the original "X". Furthermore, the characteristics of the "Y"s developed along the way would seem to provide formal insight into what a "CoreCaml" language might entail. As an aside, would the Y's be a functors, "Cell" be a domain and the inverse of any A from "Cell" be the domain "algorithms that can be transformed to "Cell" equivalents using functor "A"" (and apologies in advance of this is a stupid question beneath an answer). Christophe Raffalli-2 wrote: > > > I propose the following idea for OCaml on Cell PowerPC or multicore > machine (this is just an idea, > there ay be a lot of thing I did not see ... in other word there is > probably a lot of work to do, but may be not too much): > > - Create two functions and one data type to start "mini-thread": > > type 'a result_channel > launch : int -> ('a -> 'b) -> 'a -> 'b result_channel. > get_result : 'b result_channel list -> 'b option > (or many similar functions to wait with or without blocking the result > for one or more mini-thread). > > Now the point is this: each mini-thread has its own minor-heap whose > size is given as the first argument with the following restrictions: > > 1) the minor heap is used as a cache : access to the major heap copy > the data in the minor heap. One need to mix the copying > minor GC with standard caching algorithm. > > 2) to ease the task 1), mutation of data in the heaps of the main thread > by a mini-thread is illegal (raises an exception in the main thread ? > Static check ?). This includes the arguments of the mini-thread. > > 3) a mini-thread can not start another mini-thread (raises an exception > in the main thread ? Static check) > > 4) 2-3) imply that a mini-thread can not access data of other > mini-threads and that the only way for the main thread to > get values from a mini-thread is via their 'b result_channel. Thus, if > you have a main thread M and many mini-threads T1 ... TN > runnnig, Ti can only acces its own data and the data of M (read only). > And, M can not acces the data of T1 ... TN. > > If you launch one minithread per SPU or CORE with a minor heap of the > correct size and you fine tune you application to produce not too much > cache misses, then, I think this simple model could be usefull ???? > > Cheers, > Christophe > > -- > Christophe Raffalli > Universite de Savoie > Batiment Le Chablais, bureau 21 > 73376 Le Bourget-du-Lac Cedex > > tel: (33) 4 79 75 81 03 > fax: (33) 4 79 75 87 42 > mail: Christophe.Raffalli@univ-savoie.fr > www: http://www.lama.univ-savoie.fr/~RAFFALLI > --------------------------------------------- > IMPORTANT: this mail is signed using PGP/MIME > At least Enigmail/Mozilla, mutt or evolution > can check this signature. The public key is > stored on www.keyserver.net > --------------------------------------------- > > > begin:vcard > fn:Christophe Raffalli > n:Raffalli;Christophe > org:LAMA (UMR 5127) > email;internet:christophe.raffalli@univ-savoie.fr > title;quoted-printable:Ma=C3=AEtre de conf=C3=A9rences > tel;work:+33 4 79 75 81 03 > note:http://www.lama.univ-savoie.fr/~raffalli > x-mozilla-html:TRUE > version:2.1 > end:vcard > > > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- View this message in context: http://www.nabble.com/More-registers-in-modern-day-CPUs-tf4389938.html#a14156018 Sent from the Caml Discuss2 mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) 2007-12-02 10:14 ` [Caml-list] OCalm " Xavier Leroy 2007-12-02 16:22 ` Mike Hogan @ 2007-12-04 2:29 ` Gordon Henriksen 1 sibling, 0 replies; 28+ messages in thread From: Gordon Henriksen @ 2007-12-04 2:29 UTC (permalink / raw) To: caml-list On Dec 2, 2007, at 05:14, Xavier Leroy wrote: >> I'd also be interested in any ideas for starting to explore whether/ >> how the Cell BE's power can be exploited using OCaml (hopefully >> simple ideas at the outset, I'm a newb on several fronts here). > > The SPU cores only have 256 Kb of local memory, so there is no hope > to run a Caml run-time system on them. For some applications > (linear algebra, bignums), it might be possible to link with C > libraries that offload work to the SPU cores. > > A more general but extremely difficult approach is two-level > programming, where the Caml program, running on the PPC core, > generates programs in a simple data-parallel language which is then > compiled on the fly to SPU code. Such an approach could also target > graphics coprocessors (the "GPGPU" approach). But I have no idea > what such an intermediate language would look like. Though difficult, this is probably a more practical approach. Statically extracting useful parallel programs is a very difficult task. (Witness the emergence of OpenMP.) It is probably easier for functional programs, but still. In related news, Areospace legal recently cleared the CellSPU backend for upstream contribution to LLVM (http://llvm.org/) and its author is finally committing it today. As has been mentioned, it's possible to efficiently build LLVM IR in memory with Ocaml. Anyone interested in leveraging SPUs from Ocaml in this manner could spool up quickly with LLVM. Since LLVM has solid support for mainstream CPUs, so it would be quite possible to write programs which were also portable to standard SMP hardware. — Gordon ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Caml-list] More registers in modern day CPUs 2007-09-06 14:55 ` Chris King 2007-09-06 15:17 ` Brian Hurt @ 2007-09-06 20:48 ` Richard Jones [not found] ` <20070906204524.GB10798@furbychan.cocan.org> 2 siblings, 0 replies; 28+ messages in thread From: Richard Jones @ 2007-09-06 20:48 UTC (permalink / raw) To: caml-list On Thu, Sep 06, 2007 at 10:55:20AM -0400, Chris King wrote: > On 9/6/07, Tom <tom.primozic@gmail.com> wrote: > > However, would it be possible to "emulate" cpu registers using software? By > > keeping registers in the main memory, but accessing them often enough to > > keep them in primary cache? That would be quite fast I believe... > > This makes me wonder... why have registers to begin with? I wonder > how feasible a chip with a, say, 256-byte "register-level" cache would The 6502 was a successful 8-bit processor where the "on chip" registers were very few, but the first part of RAM acted as memory mapped registers. http://en.wikipedia.org/wiki/6502 This is not feasible in current chips for a whole variety of reasons, starting with the fact that current RAM is hundreds of times slower than registers (and even L1 cache is 4-8 times slower). You should read "Computer Architecture: A Quantitative Approach" by Hennessy & Patterson. Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20070906204524.GB10798@furbychan.cocan.org>]
* Re: [Caml-list] More registers in modern day CPUs [not found] ` <20070906204524.GB10798@furbychan.cocan.org> @ 2007-09-06 20:59 ` Chris King 0 siblings, 0 replies; 28+ messages in thread From: Chris King @ 2007-09-06 20:59 UTC (permalink / raw) To: Richard Jones; +Cc: Caml List On 9/6/07, Richard Jones <rich@annexia.org> wrote: > The 6502 was a successful 8-bit processor where the "on chip" > registers were very few, but the first part of RAM acted as memory > mapped registers. I grew up on the 6502... beautiful architecture :) > This is not feasible in current chips for a whole variety of reasons, > starting with the fact that current RAM is hundreds of times slower > than registers (and even L1 cache is 4-8 times slower). Right, hence my notion of "register-level cache"... something smaller and faster than L1 that replaces registers entirely. (John Harrison got what I was on about.) I will check out that book though... I know very little about cache structures. - Chris ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2007-12-04 18:00 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-09-06 6:20 More registers in modern day CPUs Tom 2007-09-06 7:17 ` [Caml-list] " skaller 2007-09-06 9:07 ` Richard Jones 2007-09-06 14:55 ` Chris King 2007-09-06 15:17 ` Brian Hurt 2007-09-06 15:54 ` Harrison, John R 2007-09-06 17:10 ` David MENTRE 2007-09-06 18:27 ` Harrison, John R 2007-09-06 18:28 ` Christophe Raffalli 2007-09-06 18:48 ` Brian Hurt 2007-09-06 18:48 ` Pal-Kristian Engstad 2007-11-20 15:32 ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Mike Hogan 2007-11-21 17:20 ` Richard Jones 2007-11-21 19:05 ` [Caml-list] OCaml " Mike Hogan 2007-11-23 6:44 ` Mike Hogan 2007-12-02 10:14 ` [Caml-list] OCalm " Xavier Leroy 2007-12-02 16:22 ` Mike Hogan 2007-12-02 22:19 ` Konrad Meyer 2007-12-03 0:09 ` [Caml-list] OCaml " Mike Hogan 2007-12-03 20:16 ` minithread (was OCaml on Sony PS3) Christophe Raffalli 2007-12-04 14:25 ` [Caml-list] " David MENTRE 2007-12-04 14:37 ` Basile STARYNKEVITCH 2007-12-04 16:25 ` Mattias Engdegård 2007-12-04 17:33 ` Gerd Stolpmann 2007-12-04 18:00 ` Mike Hogan 2007-12-04 2:29 ` [Caml-list] OCalm on Sony PS3 (was Re: More registers in modern day CPUs) Gordon Henriksen 2007-09-06 20:48 ` [Caml-list] More registers in modern day CPUs Richard Jones [not found] ` <20070906204524.GB10798@furbychan.cocan.org> 2007-09-06 20:59 ` Chris King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox