* [Caml-list] Wish List for Large Mutable Objects @ 2004-07-31 18:29 David McClain 2004-08-01 3:36 ` Brandon J. Van Every 2004-08-01 4:06 ` Brandon J. Van Every 0 siblings, 2 replies; 11+ messages in thread From: David McClain @ 2004-07-31 18:29 UTC (permalink / raw) To: caml Something I would like to see appear in the OCaml libraries, and I don't have it yet myself, is the use of Copy-on-Write and Scatter-Gather applied to large mutable objects such as BigArrays. When a request to copy the object arrives, it is immediately satisfied in mere nanoseconds, delaying the actual copying operation until (if ever) some code attempts to mutate one of the cells. Actual copying would frequently be a huge undertaking and costing a great deal in runtime performance. The Scatter-Gather would be useful in managing arrays where only a small part of the array has actually been mutated. Perhaps some kind of frame paging applied to the array proper. That way the COW only has to replicate small portions of the array for the user who requested a copy. David McClain Senior Corporate Scientist Avisere, Inc. +1.520.390.7738 (USA) david.mcclain@avisere.com ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [Caml-list] Wish List for Large Mutable Objects 2004-07-31 18:29 [Caml-list] Wish List for Large Mutable Objects David McClain @ 2004-08-01 3:36 ` Brandon J. Van Every 2004-08-02 5:28 ` Brandon J. Van Every 2004-08-01 4:06 ` Brandon J. Van Every 1 sibling, 1 reply; 11+ messages in thread From: Brandon J. Van Every @ 2004-08-01 3:36 UTC (permalink / raw) To: caml David McClain wrote: > > Something I would like to see appear in the OCaml libraries, > and I don't have it yet myself, is the use of Copy-on-Write > and Scatter-Gather applied to large mutable objects such as > BigArrays. > > [and some other things in other posts] OCaml is not a lazy language. Is it reasonable to expect Bigarray to perform lazy copies, under any permutation of complexity you have in mind? It seems like what you really want is to design the memory management of an Operating System. If your files are so huge that they don't fit in main memory, why aren't you willing to use virtual memory? Scatter-Gather DMA is a device driver level capability that's hardware dependent. I don't see why a high level language like OCaml should be exposing that kind of functionality, and I'm not entirely sure if it should be doing it under the hood either. If your OS doesn't have the kind of memory management you want, maybe you should modify an open source OS, like Linux or BSD Unix, to do what you want? You ask why Array1, Array2, Array3 should be special cases. Well, clearly because they're the most common, and you can perform access optimizations for each of these common cases. It seems that you are only thinking of ***BIG*** arrays, i.e. your problems and nobody else's. Lotsa people don't have your notion of 'big'. Indeed, I don't personally care about arrays being particularly big. 100MB would be pretty darn big for what I do in game development right now. I do care about their contents being unboxed. If someone wanted to rename Bigarray to UnboxedArray, that would suit my own priorities just fine. I don't understand the "starting from zero" complaint, with respect to arbitrary file formats. If the file format is arbitrarily structured, it is not an array. You will have to read it some other way. Arrays are, generally speaking, composed of uniform elements. At least, that's how all of us pedal-to-the-metal guys view them. I suppose high level language guys often define the word 'array' to mean anything they want, like a list or a hash table or a map or whatever, but I don't think they should. I don't see why your Scientific notion of an 'infinite array' should be a basic language interface. What would be so difficult about building your favorite array windowing scheme on top of the basic fixed length components, and calling that a library? Like 'InfiniteArray' or something. Then you'd write some access functions in some syntax you like, it would behave the way you like, and for your problems you'd be good. I've done similar things to perform addressing on icosahedrons, to try to regularize the mathematics of a tiling of it. I don't bother the user about it, my functions just do some computing to make it all work under the hood. I do wish Bigarray handled heterogeneous C structures. Homogeneous arrays impose some design and interop constraints. Finally, I'm told that the "%" in the names of called functions in the sources means that ocamlopt generates different, better code. The C routines are ignored, they're only used for ocamlc. Cheers, www.indiegamedesign.com Brandon Van Every Seattle, WA 20% of the world is real. 80% is gobbledygook we make up inside our own heads. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [Caml-list] Wish List for Large Mutable Objects 2004-08-01 3:36 ` Brandon J. Van Every @ 2004-08-02 5:28 ` Brandon J. Van Every 0 siblings, 0 replies; 11+ messages in thread From: Brandon J. Van Every @ 2004-08-02 5:28 UTC (permalink / raw) To: caml Brandon J. Van Every wrote: > > I do wish Bigarray handled heterogeneous C structures. Homogeneous > arrays impose some design and interop constraints. I meant, I wish any one Bigarray could handle any one type of C structure. I will be looking at how to fake this behavior. Cheers, www.indiegamedesign.com Brand*n Van Every S*attle, WA Praise Be to the caml-list Bayesian filter! It blesseth my postings, it is evil crap! evil crap! Bigarray! Unboxed overhead group! Wondering! chant chant chant... ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [Caml-list] Wish List for Large Mutable Objects 2004-07-31 18:29 [Caml-list] Wish List for Large Mutable Objects David McClain 2004-08-01 3:36 ` Brandon J. Van Every @ 2004-08-01 4:06 ` Brandon J. Van Every 2004-08-02 2:38 ` David McClain 1 sibling, 1 reply; 11+ messages in thread From: Brandon J. Van Every @ 2004-08-01 4:06 UTC (permalink / raw) To: caml David McClain wrote: > > Something I would like to see appear in the OCaml libraries, > and I don't have it yet myself, is the use of Copy-on-Write > and Scatter-Gather applied to large mutable objects such as > BigArrays. > > [and some other things in other posts] OCaml is not a lazy language. Is it reasonable to expect Bigarray to perform lazy copies, under any permutation of complexity you have in mind? It seems like what you really want is to design the memory management of an Operating System. If your files are so huge that they don't fit in main memory, why aren't you willing to use virtual memory? Scatter-Gather DMA is a device driver level capability that's hardware dependent. I don't see why a high level language like OCaml should be exposing that kind of functionality, and I'm not entirely sure if it should be doing it under the hood either. If your OS doesn't have the kind of memory management you want, maybe you should modify an open source OS, like Linux or BSD Unix, to do what you want? You ask why Array1, Array2, Array3 should be special cases. Well, clearly because they're the most common, and you can perform access optimizations for each of these common cases. It seems that you are only thinking of ***BIG*** arrays, i.e. your problems and nobody else's. Lotsa people don't have your notion of 'big'. Indeed, I don't personally care about arrays being particularly big. 100MB would be pretty darn big for what I do in game development right now. I do care about their contents being unboxed. If someone wanted to rename Bigarray to UnboxedArray, that would suit my own priorities just fine. I don't understand the "starting from zero" complaint, with respect to arbitrary file formats. If the file format is arbitrarily structured, it is not an array. You will have to read it some other way. Arrays are, generally speaking, composed of uniform elements. At least, that's how all of us pedal-to-the-metal guys view them. I suppose high level language guys often define the word 'array' to mean anything they want, like a list or a hash table or a map or whatever, but I don't think they should. I don't see why your Scientific notion of an 'infinite array' should be a basic language interface. What would be so difficult about building your favorite array windowing scheme on top of the basic fixed length components, and calling that a library? Like 'InfiniteArray' or something. Then you'd write some access functions in some syntax you like, it would behave the way you like, and for your problems you'd be good. I've done similar things to perform addressing on icosahedrons, to try to regularize the mathematics of a tiling of it. I don't bother the user about it, my functions just do some computing to make it all work under the hood. I do wish Bigarray handled heterogeneous C structures. Homogeneous arrays impose some design and interop constraints. Finally, I'm told that the "%" in the names of called functions in the sources means that ocamlopt generates different, better code. The C routines are ignored, they're only used for ocamlc. Cheers, www.indiegamedesign.com Brand*n Van Every S*attle, WA Praise Be to the caml-list Bayesian filter! It blesseth my postings, it is evil crap! evil crap! Bigarray! Unboxed overhead group! Wondering! chant chant chant... ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Wish List for Large Mutable Objects 2004-08-01 4:06 ` Brandon J. Van Every @ 2004-08-02 2:38 ` David McClain 2004-08-02 3:20 ` Brandon J. Van Every 2004-08-04 7:24 ` Alex Baretta 0 siblings, 2 replies; 11+ messages in thread From: David McClain @ 2004-08-02 2:38 UTC (permalink / raw) To: Brandon J. Van Every, caml Okay now... not trying to start any flame wars, but you "guys in the trenches" so to speak seem a bit short on real life experience outside of your own field. I have a perfectly good running VM as user process library running right now in C++ that allows for mixed array files, arbitrary offsets into the file for various array pointers, and this is all transparent to the user just as I indicated in my wish list for OCaml. In more than 20 years of scientific data access and analysis I have only seen uniform arrays, one per file, generated by neophytes. In just about every case I can remember; NetCDF, HDF, FITS, RIF Wave Files, MPEG, etc., these are all compound object files. The trouble with the simple minded approach of one array per file is that most data acquisitions will then end up with dozens of component data files and it becomes a tracking nightmare to keep them all coordinated. Not so if you permit compound document files. With a language as rich and wonderful as OCaml, I really can't understand your hostility to useful additions to the language. If you don't want to play, you don't have to join my sandbox -- find another. David McClain Senior Corporate Scientist Avisere, Inc. +1.520.390.7738 (USA) david.mcclain@avisere.com ----- Original Message ----- From: "Brandon J. Van Every" <vanevery@indiegamedesign.com> To: "caml" <caml-list@inria.fr> Sent: Saturday, July 31, 2004 21:06 Subject: RE: [Caml-list] Wish List for Large Mutable Objects > David McClain wrote: > > > > Something I would like to see appear in the OCaml libraries, > > and I don't have it yet myself, is the use of Copy-on-Write > > and Scatter-Gather applied to large mutable objects such as > > BigArrays. > > > > [and some other things in other posts] > > OCaml is not a lazy language. Is it reasonable to expect Bigarray to > perform lazy copies, under any permutation of complexity you have in > mind? > > It seems like what you really want is to design the memory management of > an Operating System. If your files are so huge that they don't fit in > main memory, why aren't you willing to use virtual memory? > Scatter-Gather DMA is a device driver level capability that's hardware > dependent. I don't see why a high level language like OCaml should be > exposing that kind of functionality, and I'm not entirely sure if it > should be doing it under the hood either. If your OS doesn't have the > kind of memory management you want, maybe you should modify an open > source OS, like Linux or BSD Unix, to do what you want? > > You ask why Array1, Array2, Array3 should be special cases. Well, > clearly because they're the most common, and you can perform access > optimizations for each of these common cases. It seems that you are > only thinking of ***BIG*** arrays, i.e. your problems and nobody else's. > Lotsa people don't have your notion of 'big'. Indeed, I don't > personally care about arrays being particularly big. 100MB would be > pretty darn big for what I do in game development right now. I do care > about their contents being unboxed. If someone wanted to rename > Bigarray to UnboxedArray, that would suit my own priorities just fine. > > I don't understand the "starting from zero" complaint, with respect to > arbitrary file formats. If the file format is arbitrarily structured, > it is not an array. You will have to read it some other way. Arrays > are, generally speaking, composed of uniform elements. At least, that's > how all of us pedal-to-the-metal guys view them. I suppose high level > language guys often define the word 'array' to mean anything they want, > like a list or a hash table or a map or whatever, but I don't think they > should. > > I don't see why your Scientific notion of an 'infinite array' should be > a basic language interface. What would be so difficult about building > your favorite array windowing scheme on top of the basic fixed length > components, and calling that a library? Like 'InfiniteArray' or > something. Then you'd write some access functions in some syntax you > like, it would behave the way you like, and for your problems you'd be > good. I've done similar things to perform addressing on icosahedrons, > to try to regularize the mathematics of a tiling of it. I don't bother > the user about it, my functions just do some computing to make it all > work under the hood. > > I do wish Bigarray handled heterogeneous C structures. Homogeneous > arrays impose some design and interop constraints. > > Finally, I'm told that the "%" in the names of called functions in the > sources means that ocamlopt generates different, better code. The C > routines are ignored, they're only used for ocamlc. > > > Cheers, www.indiegamedesign.com > Brand*n Van Every S*attle, WA > > Praise Be to the caml-list Bayesian filter! It blesseth > my postings, it is evil crap! evil crap! Bigarray! > Unboxed overhead group! Wondering! chant chant chant... > > ------------------- > To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr > Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [Caml-list] Wish List for Large Mutable Objects 2004-08-02 2:38 ` David McClain @ 2004-08-02 3:20 ` Brandon J. Van Every 2004-08-02 3:32 ` David McClain 2004-08-04 7:24 ` Alex Baretta 1 sibling, 1 reply; 11+ messages in thread From: Brandon J. Van Every @ 2004-08-02 3:20 UTC (permalink / raw) To: caml David McClain wrote: > > I have a perfectly good running VM as user process library > running right now > in C++ that allows for mixed array files, arbitrary offsets > into the file > for various array pointers, and this is all transparent to > the user just as I indicated in my wish list for OCaml. But it doesn't do scatter-gather DMA. A user process only grants so much control, and you seem to want an awful lot of control. Hence my suggestion that you tweak an OS. > In more than 20 years of scientific data access and analysis > I have only > seen uniform arrays, one per file, generated by neophytes. In > just about > every case I can remember; NetCDF, HDF, FITS, RIF Wave Files, > MPEG, etc., these are all compound object files. Us neophytes call them 'file formats'. They aren't arrays. I think we'll be at loggerheads until we agree what an 'array' is. > The trouble with the simple minded > approach of one array per file is that most data acquisitions > will then end > up with dozens of component data files and it becomes a > tracking nightmare > to keep them all coordinated. Not so if you permit compound > document files. What does this have to do with Bigarray? Bigarray provides uniform basic types in unboxed consecutive memory locations, ala C or Fortran. That's the entire point, to communicate with arrays as C and Fortran do them. Why are you expecting it to be something exceedingly different? > With a language as rich and wonderful as OCaml, I really > can't understand your hostility I haven't spoken with hostility. I gather you're somewhat attached to your problems, to view my comments as hostility. > to useful additions to the language. Clearly, you think your ideas are useful to you. Whether others think they're useful to them, remains to be seen. > If you don't want to > play, you don't have to join my sandbox -- find another. You've lost me here. Are you saying that if you hear feedback you don't like, that those giving the feedback should leave caml-list or just be quiet? Cheers, www.indiegamedesign.com Brand*n Van Every S*attle, WA Praise Be to the caml-list Bayesian filter! It blesseth my postings, it is evil crap! evil crap! Bigarray! Unboxed overhead group! Wondering! chant chant chant... ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Wish List for Large Mutable Objects 2004-08-02 3:20 ` Brandon J. Van Every @ 2004-08-02 3:32 ` David McClain 2004-08-02 5:14 ` Brandon J. Van Every 0 siblings, 1 reply; 11+ messages in thread From: David McClain @ 2004-08-02 3:32 UTC (permalink / raw) To: Brandon J. Van Every, caml ... I'm open to reasoned feedback, of course. Yours seemed overtly hostile to me, as though you were somehow protecting the virtue of your young sister, OCaml, against inferred accostations. How is it you claim to speak for my C++ manager about scatter gather? It appears that you have some real boundary issues here, and this probably needs to be taken offline... I was actually addressing most of my comments to the language designers themselves, without referring to them by name. I am perfectly capable of adding such primitives to the core language myself. But I was offering some useful insight into the way that scientists view the universe, as contrasted with conventional programming language design. If the language would choose to implement some of these additions it could become more immediately attractive to the audience in my corner of the universe. That's all.... I see, after thinking some time about the Array1, Array2, etc., versus Generic Arrays, that Xavier et al needed to protect the typability of their language, and so they made a concession to the masses in restricting the convenient x.{ix1, ix2}, etc. syntax to the more common uses. In any event, handling arbitrary arrays, I'm unlikely to use this syntax anyway, preferring the more general Get/Set primitives on computed index lists. So, in this case, I have answered my own question, and I'm not really losing anything by their choice. David McClain Senior Corporate Scientist Avisere, Inc. +1.520.390.7738 (USA) david.mcclain@avisere.com ----- Original Message ----- From: "Brandon J. Van Every" <vanevery@indiegamedesign.com> To: "caml" <caml-list@inria.fr> Sent: Sunday, August 01, 2004 20:20 Subject: RE: [Caml-list] Wish List for Large Mutable Objects > David McClain wrote: > > > > I have a perfectly good running VM as user process library > > running right now > > in C++ that allows for mixed array files, arbitrary offsets > > into the file > > for various array pointers, and this is all transparent to > > the user just as I indicated in my wish list for OCaml. > > But it doesn't do scatter-gather DMA. A user process only grants so > much control, and you seem to want an awful lot of control. Hence my > suggestion that you tweak an OS. > > > In more than 20 years of scientific data access and analysis > > I have only > > seen uniform arrays, one per file, generated by neophytes. In > > just about > > every case I can remember; NetCDF, HDF, FITS, RIF Wave Files, > > MPEG, etc., these are all compound object files. > > Us neophytes call them 'file formats'. They aren't arrays. I think > we'll be at loggerheads until we agree what an 'array' is. > > > The trouble with the simple minded > > approach of one array per file is that most data acquisitions > > will then end > > up with dozens of component data files and it becomes a > > tracking nightmare > > to keep them all coordinated. Not so if you permit compound > > document files. > > What does this have to do with Bigarray? Bigarray provides uniform > basic types in unboxed consecutive memory locations, ala C or Fortran. > That's the entire point, to communicate with arrays as C and Fortran do > them. Why are you expecting it to be something exceedingly different? > > > With a language as rich and wonderful as OCaml, I really > > can't understand your hostility > > I haven't spoken with hostility. I gather you're somewhat attached to > your problems, to view my comments as hostility. > > > to useful additions to the language. > > Clearly, you think your ideas are useful to you. Whether others think > they're useful to them, remains to be seen. > > > If you don't want to > > play, you don't have to join my sandbox -- find another. > > You've lost me here. Are you saying that if you hear feedback you don't > like, that those giving the feedback should leave caml-list or just be > quiet? > > > Cheers, www.indiegamedesign.com > Brand*n Van Every S*attle, WA > > Praise Be to the caml-list Bayesian filter! It blesseth > my postings, it is evil crap! evil crap! Bigarray! > Unboxed overhead group! Wondering! chant chant chant... > > > ------------------- > To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr > Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [Caml-list] Wish List for Large Mutable Objects 2004-08-02 3:32 ` David McClain @ 2004-08-02 5:14 ` Brandon J. Van Every 2004-08-02 8:00 ` Ville-Pertti Keinonen 0 siblings, 1 reply; 11+ messages in thread From: Brandon J. Van Every @ 2004-08-02 5:14 UTC (permalink / raw) To: caml David McClain wrote: > > How is it you claim to speak for my C++ manager about scatter > gather? I used to write 3d device drivers for a living. I was never much for the 'nasty innards' of OS internals, preferring to concentrate on ASM loop optimizations for 3d graphics. That said, Scatter-Gather DMA is generally a property of a memory controller, i.e. a chipset on a motherboard. I seriously doubt you have user mode access to such memory controllers. If you do, point me at the API for it. I'm happy to stand corrected, but as far as I know, scatter-gather DMA is kernel mode stuff on all common architectures. There is the outside possibility that you mean something different by 'scatter-gather' than a device driver writer means by 'scatter-gather'. A similar loggerhead to what an 'array' is. A third possibility is you have written a library that assumes scatter-gather DMA is happening under the hood somehow, but doesn't explicitly control it in any way. To which I say, memory controllers are different. In the absence of a query interface to determine their capabilities, I don't see how you'd rigorously control algorithmic performance. Maybe you do not regard rigor as so important - memory cache hierarchies sorta work without anyone doing anything explicit, after all. But I would say, without rigor, you probably won't end up with anything. Just an idea that something should be fast under some circumstances, rather than any proven, repeatable reality. For clarity, these aren't personal comments. This is just my understanding of scatter-gather DMA vs. whatever your understanding is. Cheers, www.indiegamedesign.com Brand*n Van Every S*attle, WA Praise Be to the caml-list Bayesian filter! It blesseth my postings, it is evil crap! evil crap! Bigarray! Unboxed overhead group! Wondering! chant chant chant... ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Wish List for Large Mutable Objects 2004-08-02 5:14 ` Brandon J. Van Every @ 2004-08-02 8:00 ` Ville-Pertti Keinonen 2004-08-02 9:12 ` David McClain 0 siblings, 1 reply; 11+ messages in thread From: Ville-Pertti Keinonen @ 2004-08-02 8:00 UTC (permalink / raw) To: Brandon J. Van Every; +Cc: caml Brandon J. Van Every wrote: >loop optimizations for 3d graphics. That said, Scatter-Gather DMA is >generally a property of a memory controller, i.e. a chipset on a >motherboard. I seriously doubt you have user mode access to such memory >controllers. If you do, point me at the API for it. I'm happy to stand >corrected, but as far as I know, scatter-gather DMA is kernel mode stuff >on all common architectures. > > Often things like the readv(2)/writev(2) interface are referred to as "scatter-gather". It just means I/O on regions of memory that aren't contiguous in a single operation. I'm not sure what David McClain is referring to - but I think it's the ability for an "array" to provide another level of virtualization so that the underlying data needn't be contiguous in the address space of the process. That seems a bit excessive - a more limited part of what he's suggesting could be more reasonable - being able to map a part of a file, from an arbitrary offset, as a Bigarray could be useful. Even this would require some separation of storage management from the actual "Bigarray header", since operating systems require the underlying mappings to be page aligned. I suspect this could be as simple as passing a page-truncated offset to mmap(2), adding the remaining offset to the returned address and page-truncating the address passed to munmap(2). This doesn't address the problem with most CPUs requiring the actual objects to be aligned, for which adding an "offset" between the beginning of the mapping and the beginning of the visible array isn't sufficient if the mapped file doesn't align things appropriately for the CPU (for arbitrary file formats, there's also endianness issues to consider). Using subarrays instead of offsetted memory mappings protects against this, and makes offsetted mappings "unnecessary" altogether, but obviously if the file is big enough, the waste of address space due to the extra mappings can be significant on 32-bit systems...which as I understand was part of the original problem. Typing this on an Athlon 64 and sitting next to an Alpha, such things seem like legacy issues to me, especially since OCaml supports both architectures natively. The main issue I have with the suggestions regarding turning Bigarrays into higher-level abstractions altogether is that it would make them considerably less efficient. The higher-level abstractions can always be implemented as layers on top of the current abstractions, which in my opinion is the right approach. Of course it isn't my decision in any way. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Wish List for Large Mutable Objects 2004-08-02 8:00 ` Ville-Pertti Keinonen @ 2004-08-02 9:12 ` David McClain 0 siblings, 0 replies; 11+ messages in thread From: David McClain @ 2004-08-02 9:12 UTC (permalink / raw) To: caml Some good points here by Ville-Pertti, Indeed the Scientific mode requires a balanced modulus operation on each array index, not the one presently offered by OCaml Pervasive. But this is used in lieu of bounds checking anyway, and the world has come to accept the slight cost of array bounds checking. There are really two issues that sort of got mixed together here, only because BigArray mixed them up... One is the use of Scientific mode for some arrays. The other is memory mapped arrays. These are really two separate issues, and the extra cost on accessing mmapped arrays is worth the price over the cost of slower buffered file I/O. It wouldn't be acceptable cost for normal memory bound arrays. Some processors do have alignment requirements, but every file system I was referring to always guarantees a minimal alignment based on the underlying array element type. These alignments generally coincide with the most stringent alignment requirements in use today. Some processors like the G4 appear to be more lax on alignment requirements, but my bet is that misaligned data cause some slowdown. I think the X86 architectures operate this way too. However, you do raise an interesting point about endianess. The more portable file formats have generally accepted network byte ordering, generally by incorporating old Sun-XDR data representations. And indeed for memory mapped arrays, this would be an extra cost. But still, in this case, it would far faster than buffered file I/O. My own tests show that a more or less random access pattern in the mmapped array is 200 times faster than fread/fwrite style of data accessing. So any addition machine cycles can easily be hidden in that performance difference. But again, let's separate these two issues. I generally know when I'm accessing a mmapped array and when I'm not. I had to offer up a filename in order to do mmapping... The only reason these two conversation threads merged is because when I read the BigArray documentation, I found out that these offer a primitive form of mmapped access in addition to normal memory bound array accessing. Not sure what multiple mappings you were referring to... I meant to allow a kind of scatter-gather COW on normal memory bound arrays. Memmaped arrays are a problem apart from this. Despite what might appear as a cost overhead, the savings can be quite significant when combined with smart array slicing and sectioning. For example, in my NML, whenever I do an array slice (more complex operations than supported by BigArray), what I actually do is pay the price of all the if-then-else branching on only the first descent, generating a tree of lambda closures on the way back out, so that all the actual copying operation occurs without any more testing along the way. Sort of like reaching down your throat and pulling yourself inside-out... heh! The speed of these compound slicings is enormously faster than conventional imperative logic. So while some operations are more costly, others benefit greatly from higher order logic. In fact, a simple minded analysis shows that if you ever intend to read or write a mutating representation array then it pays to simply create a native double array once, and pay the cost of representation mutation just once, and then allow repeat non-mutating, faster, accesses to the underlying data. Keeping the array around in a foreign format just adds incremental costs that will exceed this copying cost, if you hit every element several times. But as often as not, we do slice interesting sections from the data. Not sure if this ever happens without first hitting every element on a vectorized math op... My guess is no... and so the cost of copying must occur no matter what. David McClain Senior Corporate Scientist Avisere, Inc. +1.520.390.7738 (USA) david.mcclain@avisere.com ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Wish List for Large Mutable Objects 2004-08-02 2:38 ` David McClain 2004-08-02 3:20 ` Brandon J. Van Every @ 2004-08-04 7:24 ` Alex Baretta 1 sibling, 0 replies; 11+ messages in thread From: Alex Baretta @ 2004-08-04 7:24 UTC (permalink / raw) To: David McClain, Ocaml David McClain wrote: > With a language as rich and wonderful as OCaml, I really can't understand > your hostility to useful additions to the language. If you don't want to > play, you don't have to join my sandbox -- find another. > The language is truly great, but that doesn't mean that all libraries we've ever dreamt of are actually available. I think your idea of windowed mmapped access to files would make an excellent library. Only, don't ask Xavier to do it. Alex ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2004-08-04 7:23 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-07-31 18:29 [Caml-list] Wish List for Large Mutable Objects David McClain 2004-08-01 3:36 ` Brandon J. Van Every 2004-08-02 5:28 ` Brandon J. Van Every 2004-08-01 4:06 ` Brandon J. Van Every 2004-08-02 2:38 ` David McClain 2004-08-02 3:20 ` Brandon J. Van Every 2004-08-02 3:32 ` David McClain 2004-08-02 5:14 ` Brandon J. Van Every 2004-08-02 8:00 ` Ville-Pertti Keinonen 2004-08-02 9:12 ` David McClain 2004-08-04 7:24 ` Alex Baretta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox