* I'm sure it's been discussed a few times, but here we go.... single-precision floats @ 2006-03-06 12:23 Asfand Yar Qazi 2006-03-06 13:13 ` [Caml-list] " skaller 0 siblings, 1 reply; 5+ messages in thread From: Asfand Yar Qazi @ 2006-03-06 12:23 UTC (permalink / raw) To: Caml-list Hi, I recently performed some tests on GNU C++, and found that (for a small fast fourier transform operation anyway) double precision out-performs single precision floating point. However, on the Ogre 3D engine forum, I had a lengthy discussion and a conclusion was reached that the reason single-precision floats are preferred for games (and I assume all high-performance applications that require huge amounts of data) is that more of them can fit into the cache of the processor. All the OCaml discussions about floating point precision I have seen so far evolve around how fast operations are performed on them - but the critical thing for things like collision detection, etc. in games is the amount of data that can fit into the CPU cache and be operated on before the cache must be reloaded. Obviously, twice as many single precision floats can fit into any CPU's cache than double precision floats. We're talking huge dynamic data structures with millions of floating point coordinates that all have to be iterated over many times a second - preferably by using multithreaded algorithms, so that multiple CPUs can be used efficiently. Since doing this sort of work (i.e. parallel computing) in C++ is a pain in the **** ('scuse my French :-), I want to learn a language that will make it easy and less error-prone - hence my study of OCaml. So, is there any way (I'm thinking similar to 'nativeint') to use floats in OCaml to maximize the data that can be stored and operated on in the CPUs cache such that system memory is accessed as little as possible? Thanks ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] I'm sure it's been discussed a few times, but here we go.... single-precision floats 2006-03-06 12:23 I'm sure it's been discussed a few times, but here we go.... single-precision floats Asfand Yar Qazi @ 2006-03-06 13:13 ` skaller 2006-03-06 17:01 ` Asfand Yar Qazi 0 siblings, 1 reply; 5+ messages in thread From: skaller @ 2006-03-06 13:13 UTC (permalink / raw) To: Asfand Yar Qazi; +Cc: Caml-list On Mon, 2006-03-06 at 12:23 +0000, Asfand Yar Qazi wrote: > All the OCaml discussions about floating point precision I have seen so far > evolve around how fast operations are performed on them - but the critical > thing for things like collision detection, etc. in games is the amount of data > that can fit into the CPU cache and be operated on before the cache must be > reloaded. Obviously, twice as many single precision floats can fit into any > CPU's cache than double precision floats. No, it isn't obvious. I have some routines using the Taka algorithm which is heavily recursive and therefore depends heavily on cache use. Code for gcc and Felix uses single precision floats. It is competing with Ocaml .. which is not only using double precision .. it might be boxing as well!??? [Perhaps gcc is passing the single precision floats as double anyhow ..?] http://felix.sourceforge.net/current/speed/en_flx_perf_0012.html This is an older set of results. On my newer AMD64x2 3800, gcc opt wins, but Ocaml is second. The effect of cache usage ignoring FP calculation speed is seen here: http://felix.sourceforge.net/current/speed/en_flx_perf_0005.html where Felix trashes everything by a clear margin simply because it happens to use one less word on the stack than it's nearest competitor. Perhaps Xavier can explain how Ocaml manages to be so dang fast on the FP stuff .. if you try C or Felix with doubles they drop right off the chart. > We're talking huge dynamic data structures with millions of floating point > coordinates that all have to be iterated over many times a second - preferably > by using multithreaded algorithms, so that multiple CPUs can be used > efficiently. Ocaml doesn't currently permit multi-processing. Felix does, it might be a better alternative for games and possibly High Performance Computing apps. Other options include Haskell and MLton. -- John Skaller <skaller at users dot sourceforge dot net> Async PL, Realtime software consultants Checkout Felix: http://felix.sourceforge.net ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] I'm sure it's been discussed a few times, but here we go.... single-precision floats 2006-03-06 13:13 ` [Caml-list] " skaller @ 2006-03-06 17:01 ` Asfand Yar Qazi 0 siblings, 0 replies; 5+ messages in thread From: Asfand Yar Qazi @ 2006-03-06 17:01 UTC (permalink / raw) To: Caml-list skaller wrote: > On Mon, 2006-03-06 at 12:23 +0000, Asfand Yar Qazi wrote: > > >>All the OCaml discussions about floating point precision I have seen so far >>evolve around how fast operations are performed on them - but the critical >>thing for things like collision detection, etc. in games is the amount of data >>that can fit into the CPU cache and be operated on before the cache must be >>reloaded. Obviously, twice as many single precision floats can fit into any >>CPU's cache than double precision floats. > > > No, it isn't obvious. I have some routines using the Taka algorithm > which is heavily recursive and therefore depends heavily on > cache use. > > Code for gcc and Felix uses single precision floats. > It is competing with Ocaml .. which is not only using > double precision .. it might be boxing as well!??? > [Perhaps gcc is passing the single precision floats > as double anyhow ..?] > > http://felix.sourceforge.net/current/speed/en_flx_perf_0012.html > > This is an older set of results. On my newer AMD64x2 3800, > gcc opt wins, but Ocaml is second. Point taken - but I'm not experienced enough to comment - perhaps John Carmack would be a good person to contacta about this :-) > > The effect of cache usage ignoring FP calculation speed is > seen here: > > http://felix.sourceforge.net/current/speed/en_flx_perf_0005.html > > where Felix trashes everything by a clear margin simply because > it happens to use one less word on the stack than it's nearest > competitor. > > Perhaps Xavier can explain how Ocaml manages to be so dang > fast on the FP stuff .. if you try C or Felix with > doubles they drop right off the chart. > > >>We're talking huge dynamic data structures with millions of floating point >>coordinates that all have to be iterated over many times a second - preferably >>by using multithreaded algorithms, so that multiple CPUs can be used >>efficiently. > > > Ocaml doesn't currently permit multi-processing. > Felix does, it might be a better alternative for games > and possibly High Performance Computing apps. > Other options include Haskell and MLton. > Ah, didn't know that. On some comments made by Tim Sweeny (Epic Games of Unreal Engine fame), I thought functional programming make multi-threaded coding easy. I'm referring to some slides found here: http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/sweeny.pdf Here's what he says about 'Performance' for example: § When updating 10,000 objects at 60 FPS, everything is performance-sensitive § But: – Productivity is just as important – Will gladly sacrifice 10% of our performance for 10% higher productivity – We never use assembly language § There is not a simple set of “hotspots” to optimize! That’s all! Somebody commented about it here (http://www.gamedev.net/community/forums/topic.asp?topic_id=373751) by saying: The bottom line is that C/C++ is becoming way too costly to be used in the future. Both in terms of money (bugs cost money) but also in terms of performance. It's very, very hard to leverage multiple threads in C/C++, and it's only getting worse as we get more cores in our systems -- in contrast, concurrency in languages such as Haskell is almost as simple as single-threaded programming (due to it being purely functional, and having STM). That's why I thought about learning OCaml - I thought it had the same functional programming ability as Haskell, but with more (non-functional) features. But I still want to learn it, it seems a thousand light years different (in a good way) than C++. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] I'm sure it's been discussed a few times, but here we go.... single-precision floats @ 2006-03-06 15:22 Jonathan Harrop 2006-03-06 16:34 ` Brian Hurt 0 siblings, 1 reply; 5+ messages in thread From: Jonathan Harrop @ 2006-03-06 15:22 UTC (permalink / raw) To: Caml-list, Asfand Yar Qazi On Mon Mar 6 12:23 , Asfand Yar Qazi <email@asfandyar.cjb.net> sent: >All the OCaml discussions about floating point precision I have seen so far >evolve around how fast operations are performed on them - but the critical >thing for things like collision detection, etc. in games is the amount of data >that can fit into the CPU cache and be operated on before the cache must be >reloaded. Obviously, twice as many single precision floats can fit into any >CPU's cache than double precision floats. Yes. >We're talking huge dynamic data structures with millions of floating point >coordinates that all have to be iterated over many times a second - preferably >by using multithreaded algorithms, so that multiple CPUs can be used >efficiently. Since doing this sort of work (i.e. parallel computing) in C++ >is a pain in the **** ('scuse my French :-), I want to learn a language that >will make it easy and less error-prone - hence my study of OCaml. Due to OCaml's lack of a concurrent GC, there is no good way to low-level parallelise OCaml programs. You can, of course, use message passing between separate OCaml processes to parallelise at a higher level. OCaml's advantages center around the ability to design and use sophisticated data structures easily - the precise opposite of iterating over long arrays. >So, is there any way (I'm thinking similar to 'nativeint') to use floats in >OCaml to maximize the data that can be stored and operated on in the CPUs >cache such that system memory is accessed as little as possible? Currently, your only choice is to use big arrays of 32-bit floats. There is no other way to store a single 32-bit float in an OCaml data structure. Such functionality would be useful in the case of my ray tracer, for example: http://www.ffconsultancy.com/free/ray_tracer where efficient use of a big array would require fundamental alterations. However, my AMD64 wastes a lot of memory on pointers as well... Cheers, Jon. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] I'm sure it's been discussed a few times, but here we go.... single-precision floats 2006-03-06 15:22 Jonathan Harrop @ 2006-03-06 16:34 ` Brian Hurt 0 siblings, 0 replies; 5+ messages in thread From: Brian Hurt @ 2006-03-06 16:34 UTC (permalink / raw) To: Jonathan Harrop; +Cc: Caml-list, Asfand Yar Qazi On Mon, 6 Mar 2006, Jonathan Harrop wrote: > OCaml's advantages center around the ability to design and use > sophisticated data structures easily - the precise opposite of iterating > over long arrays. Which is why, were I writting a video game in Linux, I'd be writting the game logic in Ocaml- and the rendering logic in C/assembly. Or at least I'd be looking to write some sort of C/assembly library to handle large arrays of vectors of single precision floats, and do operations on them using the various SIMD extensions. The problems that Jak and Daxter ran into doing pretty much precisely this mostly came from one of two problems: 1) they were inventing, and maintaining, their own language- and thus had the usual assortment of compiler bugs to work out. Using a mature, debugged language like Ocaml would solve this. And 2) Unfamiliarity of the development staff with Functional programming and it's patterns. See: http://www.gamasutra.com/features/20020710/white_02.htm A couple of other, generic, comments on this topic: 1) IMHO, most game developers are focusing too much on technology, and not enough on game play. Games with great game play, but even very low CPU/Graphics requirements, like Tetris, SpaceWar, Asteroids, PacMan, Nethack, etc., are still great fun to play, and are still played by large numbers of people. This is despite the exceptionally crude aspects of lots of them (Nethack and Tetris can even be played on text consoles). Games which are technical acheivements but weaker on the game play tend to be flash in the pans, at best- who still plays Myst in any serious way? Spending your time adding even more realistic blood splatter to a first person shooter strikes me as being a suboptimal use of time. 2) If Ocaml isn't the best possible language to use for game design, so what? Outside of game design, the vast majority of numerical programs are much more about data structures and algorithms (two things that Ocaml kicks ass on) than they are raw FLOPS. A classic example- two programs (A and B), both solving the same program. Program A has 10x the FLOPS rate as program B- and yet program B is 10x faster? Why? Because the problem is multiplying two sparse matricies, each matrix having only 10% non-zero elements. Program A is doing it as dense multiplication, taking full advantage of all SIMD extensions, etc., while program B has implemented a sparse matrix data structure. And while program B is boxing it's floats, and doing a lot of data structure overhead, which means it's only issuing 1 floating point instruction for every 10 FP instructions program A is issuing, it's only needing to issue 0.1 * 0.1 = 1/100th the number of floating point instructions, and thus is still 10x faster. That's a simple case, but it illustrates the real advantage Ocaml has. But there is a wider point here. A number of languages (and C++ is particularly bad for this) trying to be golden hammers. They're a floor wax and a dessert topping! I disbeleive in golden hammers- in my experience, languages that try to be all things to all people end up being the wrong tool for all jobs. Numeric programming and game programming put together are a small corner of programming. By any measure you want to apply, business logic programming is at least 10 times as big as both of those put together- wether it be the number of programmers working on that code, the amount of money spent on it, the number of users, the total time of all users spent using that code, etc. In this market, absolute performance is less of an issue (witness Java) than is things like correctness, ease of maintainability, speed of development, etc. I wouldn't try to write the rendering image for a video game in Java either- but that doesn't mean that Java isn't a successfull and usefull language. Brian ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-03-06 16:59 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-03-06 12:23 I'm sure it's been discussed a few times, but here we go.... single-precision floats Asfand Yar Qazi 2006-03-06 13:13 ` [Caml-list] " skaller 2006-03-06 17:01 ` Asfand Yar Qazi 2006-03-06 15:22 Jonathan Harrop 2006-03-06 16:34 ` Brian Hurt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox