* [Caml-list] Memory mapped values @ 2001-10-08 19:24 Don Syme 2001-10-08 19:32 ` Basile STARYNKEVITCH ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: Don Syme @ 2001-10-08 19:24 UTC (permalink / raw) To: caml-list This is just a random idea... Would it be possible in theory for "input_value" to work by memory-mapping the file being read, rather than by immediately reading the file? The idea would be that the structured value would then only actually be realised in physical memory as it is touched by execution and the corresponding pages of the memory-mapped file dragged in by the virtual memory mechanism. (To be honest, I haven't actually checked if this is how input_value currently works, though I'm certain it can't be.) This technique would certainly require some modification to the GC, and I'm not even sure if the relocation of internal pointers in the data structure could be made to work (do any memory mapping primitives provide that functionality?). But if it could work, then that could make for one of the very best and easiest ways of persisting data structures - easier than moving to a relational database, and directly related to the programming model. In addition, the layout of data structures on disk could be then be optimized to take into account the access pattern at runtime. With a page-fault costing something in the order of a million cycles these days that could be very valuable... Cheers, Don ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Caml-list] Memory mapped values 2001-10-08 19:24 [Caml-list] Memory mapped values Don Syme @ 2001-10-08 19:32 ` Basile STARYNKEVITCH 2001-10-09 7:23 ` Fabrice Le Fessant 2001-10-12 14:42 ` Xavier Leroy 2 siblings, 0 replies; 4+ messages in thread From: Basile STARYNKEVITCH @ 2001-10-08 19:32 UTC (permalink / raw) To: Don Syme; +Cc: caml-list >>>>> "Don" == Don Syme <dsyme@microsoft.com> writes: Don> Would it be possible in theory for "input_value" to work by Don> memory-mapping the file being read, rather than by Don> immediately reading the file? [... interesting discussion skipped ....] Perhaps, but I think that input_value is mostly useful for sequential byte *streams* (not randomly accessed files) such as TCP/IP sockets. I hope that input_value will still be able to work on sockets in future versions of Ocaml. Regards to all -- Basile STARYNKEVITCH http://lesours.starynkevitch.net/ email: basile<at>starynkevitch<dot>net alias: basile<at>tunes<dot>org 8, rue de la Faïencerie, 92340 Bourg La Reine, France ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Memory mapped values 2001-10-08 19:24 [Caml-list] Memory mapped values Don Syme 2001-10-08 19:32 ` Basile STARYNKEVITCH @ 2001-10-09 7:23 ` Fabrice Le Fessant 2001-10-12 14:42 ` Xavier Leroy 2 siblings, 0 replies; 4+ messages in thread From: Fabrice Le Fessant @ 2001-10-09 7:23 UTC (permalink / raw) To: Don Syme; +Cc: caml-list In the CDK, you will find a very small library "mmap" closed to what you are talking about. The idea is to output the values in the file as if they were in memory, so that the file can be directly mapped in memory, and the values directly used by Ocaml. The library has not yet being tested a lot. Of course, these values cannot be collected by the garbage collector, nor should be mutable. However, there is a big (unsolved yet) problem with compaction. Another problem is the size of the pages bitmap used by the garbage collector, since the file might be mapped very far from the main heap. Regards, - Fabrice ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Caml-list] Memory mapped values 2001-10-08 19:24 [Caml-list] Memory mapped values Don Syme 2001-10-08 19:32 ` Basile STARYNKEVITCH 2001-10-09 7:23 ` Fabrice Le Fessant @ 2001-10-12 14:42 ` Xavier Leroy 2 siblings, 0 replies; 4+ messages in thread From: Xavier Leroy @ 2001-10-12 14:42 UTC (permalink / raw) To: Don Syme; +Cc: caml-list > Would it be possible in theory for "input_value" to work by > memory-mapping the file being read, rather than by immediately reading > the file? The idea would be that the structured value would then only > actually be realised in physical memory as it is touched by execution > and the corresponding pages of the memory-mapped file dragged in by the > virtual memory mechanism. (To be honest, I haven't actually checked if > this is how input_value currently works, though I'm certain it can't > be.) No, that's not how input_value currently works :-) What you describe sounds feasible, with two caveats: - You need a serialization format that is "isomorphic" to the memory representation of the data, i.e. that occupies the same space. The original Caml Light implementation of serialization used such a format: the on-disk representation was essentially produced by a copying GC applied to the value being externed, and input_value would just read it in heap and replace offsets by pointers. There were two problems with this approach. One is 32/64 bit interoperability, where you need to expand or shrink the data accordingly during input_value; this is expensive and would prevent direct access to a page as you describe. The other is that this serialization format wastes space, resulting in huge files that are slow to read. The "compact" format that OCaml uses (basically, a prefix notation for the DAG of memory blocks composing the externed value) is much more compact (by a factor of 10, roughly), and while it takes more CPU time to do input_value, this is well offset by the reduced file reading time. - You need to relocate offsets into pointers when a page is first accessed. Under Unix, this could possibly be done by mapping the file without read and write access, then catch the segmentation violation that occurs when one of the pages is accessed, patch the pointers, and change the page protections to read-write. All this is highly non-portable and quite slow, though. (I think it's Appel and Li that tried VM tricks to implement concurrent copying GC in the late 80s; they found out later that the cost of changing page permissions is so high under all Unix implementations they tested that the scheme was impractical.) Because of this cost issue, your scheme would be interesting only if the program accesses a small fragment of the memory-mapped data. If you're going to use all of the data, reading it in one step is more efficient (it saves the cost of trapping SEGV and changing page protections). > But if it could work, then that could > make for one of the very best and easiest ways of persisting data > structures - easier than moving to a relational database, and directly > related to the programming model. I'm pretty ignorant with databases, but still what you describe is vaguely reminiscent of some OO databases (ObjectStore, maybe?). Two issues remain to be addressed, though: how to modify incrementally the data structure (modifying it in core and re-dumping it whole to disk doesn't suffice), and how to deal with atomicity of updates... Best wishes, - Xavier Leroy ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2001-10-12 14:42 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-10-08 19:24 [Caml-list] Memory mapped values Don Syme 2001-10-08 19:32 ` Basile STARYNKEVITCH 2001-10-09 7:23 ` Fabrice Le Fessant 2001-10-12 14:42 ` Xavier Leroy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox