* [Caml-list] Bigarray question: Detecting subarrays, overlaps, ... @ 2012-02-28 20:17 Goswin von Brederlow 2012-02-28 21:09 ` Gerd Stolpmann 0 siblings, 1 reply; 5+ messages in thread From: Goswin von Brederlow @ 2012-02-28 20:17 UTC (permalink / raw) To: caml-list Hi, I'm implementing a RAID in userspace using ocaml and NBD (Network Block Device) as protocol to export the device. For this I'm using Bigarray.Array1 as buffer for data and wrote myself the right read(), write(), pread() and pwrite() stubs. The advantage of this (over strings) is that Bigarrays are not copied around by the GC so they don't need to copy data around between the Bigarray and a temporary buffer in the C stub. For efficiency each request stores all its data in a single Bigarray.Array1. For reasons of the RAID implementation large requests are split into 4k chunks using Bigarray.Array1.sub and grouped into stripes. The stripes are then acted upon independently until all stripes involved in a request are finished. Then the request is completed. Now I get a problem when 2 requests come in that overlap. Say I get a write request for blocks 0 - 6 and a read request for blocks 4-9 in that order. Since I already have the data for block 4-6 in memory from the write request I do not want to reread them from disk. On the stripe level the data looks like this: |W0 W1 W2 W3 W4 W5 W6 . . . | write 0-6 | W4 W5 W6 R7 R8 R9 | read 4-9 For the read request I need to copy W4-6 to R4-6 or send out W4-6 + R7-9 in two writes. I think I would like to avoid sending each stripe chunk out seperately. On the other hand I could implement (p)readv() and (p)writev() stubs. Anyway, my question now is how to detect which subarrays in the stripes are part of a common larger array? Do I need to write my own C stub that looks into the internas of the arrays to see if they share a common ancestor? I think that would be preferable to tracking the origin of each subarray myself. On a similar note how does Bigarray.Array1.blit handle arrays that are part of the same larger array and overlap? ---------------------------------------------------------------------- I'm missing functions like: val super : ('a, 'b, 'c) t -> ('a, 'b, 'c) t -> ('a, 'b, 'c) t Create a merged sub-array of 2 adjacent sub-arrays of the same big array. val same_parent : ('a, 'b, 'c) t -> ('a, 'b, 'c) t -> bool True if the 2 (sub-)arrays are part of the same big array. val offset : ('a, 'b, 'c) t -> int Offset of the sub-array in the underlying big array. MfG Goswin ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Bigarray question: Detecting subarrays, overlaps, ... 2012-02-28 20:17 [Caml-list] Bigarray question: Detecting subarrays, overlaps, Goswin von Brederlow @ 2012-02-28 21:09 ` Gerd Stolpmann 2012-02-29 9:23 ` Goswin von Brederlow 0 siblings, 1 reply; 5+ messages in thread From: Gerd Stolpmann @ 2012-02-28 21:09 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: caml-list Am Dienstag, den 28.02.2012, 21:17 +0100 schrieb Goswin von Brederlow: > Hi, > > I'm implementing a RAID in userspace using ocaml and NBD (Network Block > Device) as protocol to export the device. For this I'm using > Bigarray.Array1 as buffer for data and wrote myself the right read(), > write(), pread() and pwrite() stubs. The advantage of this (over > strings) is that Bigarrays are not copied around by the GC so they don't > need to copy data around between the Bigarray and a temporary buffer in > the C stub. I used the same approach for PlasmaFS. The bigarray-based reads and writes are really missing in the stdlib. (If anybody wants to experiment, look into the Netsys_mem module of Ocamlnet, and use the functions mem_read and mem_write.) Btw, you should try to allocate the bigarrays in a page-aligned way. This makes the syscalls even more efficient. In my use case, I did not write to devices directly, and could assume that the blocks are backed by the page cache. So I did not run into this problem you describe in the following. > For efficiency each request stores all its data in a single > Bigarray.Array1. For reasons of the RAID implementation large requests > are split into 4k chunks using Bigarray.Array1.sub and grouped into > stripes. The stripes are then acted upon independently until all stripes > involved in a request are finished. Then the request is completed. > > Now I get a problem when 2 requests come in that overlap. Say I get a > write request for blocks 0 - 6 and a read request for blocks 4-9 in > that order. Since I already have the data for block 4-6 in memory from > the write request I do not want to reread them from disk. On the stripe > level the data looks like this: > > |W0 W1 W2 W3 W4 W5 W6 . . . | write 0-6 > | W4 W5 W6 R7 R8 R9 | read 4-9 > > For the read request I need to copy W4-6 to R4-6 or send out W4-6 + R7-9 > in two writes. I think I would like to avoid sending each stripe chunk > out seperately. Why not? This seems to be an elegant solution, and I don't see why this should make the accesses slower. The time for the extra context switches in negligible compared to the disk accesses. > On the other hand I could implement (p)readv() and > (p)writev() stubs. > > Anyway, my question now is how to detect which subarrays in the stripes > are part of a common larger array? Do I need to write my own C stub that > looks into the internas of the arrays to see if they share a common > ancestor? I think that would be preferable to tracking the origin of > each subarray myself. Yes, subarrays are tracked, but this feature exists only to catch the right moment for unmmapping bigarrays (if needed). As far as I remember, this is not tracked as a sub/super relationship, but the bigarrays accessing the same buffer share then the same buffer descriptor, and when the use count drops to 0, the buffer is finally destroyed. So, you cannot say which bigarray is the original one, and it can even be that the original bigarray is already deallocated but the backing buffer is not yet. > On a similar note how does Bigarray.Array1.blit handle arrays that are > part of the same larger array and overlap? > > ---------------------------------------------------------------------- > I'm missing functions like: > > val super : ('a, 'b, 'c) t -> ('a, 'b, 'c) t -> ('a, 'b, 'c) t > Create a merged sub-array of 2 adjacent sub-arrays of the same > big array. This function would be possible to implement. The requirement would be that both bigarrays use the same buffer descriptor. > val same_parent : ('a, 'b, 'c) t -> ('a, 'b, 'c) t -> bool > True if the 2 (sub-)arrays are part of the same big array. I would not call it "same_parent", but "same_backing_buffer". > val offset : ('a, 'b, 'c) t -> int > Offset of the sub-array in the underlying big array. I think this information is in the current implementation not available. As buffer sharing is also possible after reshaping, this is also not meaningful in the general case. Gerd > MfG > Goswin > -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Bigarray question: Detecting subarrays, overlaps, ... 2012-02-28 21:09 ` Gerd Stolpmann @ 2012-02-29 9:23 ` Goswin von Brederlow 2012-02-29 13:21 ` Gerd Stolpmann 0 siblings, 1 reply; 5+ messages in thread From: Goswin von Brederlow @ 2012-02-29 9:23 UTC (permalink / raw) To: Gerd Stolpmann; +Cc: Goswin von Brederlow, caml-list Gerd Stolpmann <info@gerd-stolpmann.de> writes: > Am Dienstag, den 28.02.2012, 21:17 +0100 schrieb Goswin von Brederlow: >> Hi, >> >> I'm implementing a RAID in userspace using ocaml and NBD (Network Block >> Device) as protocol to export the device. For this I'm using >> Bigarray.Array1 as buffer for data and wrote myself the right read(), >> write(), pread() and pwrite() stubs. The advantage of this (over >> strings) is that Bigarrays are not copied around by the GC so they don't >> need to copy data around between the Bigarray and a temporary buffer in >> the C stub. > > I used the same approach for PlasmaFS. The bigarray-based reads and > writes are really missing in the stdlib. (If anybody wants to > experiment, look into the Netsys_mem module of Ocamlnet, and use the > functions mem_read and mem_write.) > > Btw, you should try to allocate the bigarrays in a page-aligned way. > This makes the syscalls even more efficient. Which is actualy a requirement for libaio. At least alignment to blocksize. My libaio bindings include a stub to create an aligned Bigarray. > In my use case, I did not write to devices directly, and could assume > that the blocks are backed by the page cache. So I did not run into this > problem you describe in the following. > >> For efficiency each request stores all its data in a single >> Bigarray.Array1. For reasons of the RAID implementation large requests >> are split into 4k chunks using Bigarray.Array1.sub and grouped into >> stripes. The stripes are then acted upon independently until all stripes >> involved in a request are finished. Then the request is completed. >> >> Now I get a problem when 2 requests come in that overlap. Say I get a >> write request for blocks 0 - 6 and a read request for blocks 4-9 in >> that order. Since I already have the data for block 4-6 in memory from >> the write request I do not want to reread them from disk. On the stripe >> level the data looks like this: >> >> |W0 W1 W2 W3 W4 W5 W6 . . . | write 0-6 >> | W4 W5 W6 R7 R8 R9 | read 4-9 >> >> For the read request I need to copy W4-6 to R4-6 or send out W4-6 + R7-9 >> in two writes. I think I would like to avoid sending each stripe chunk >> out seperately. > > Why not? This seems to be an elegant solution, and I don't see why this > should make the accesses slower. The time for the extra context switches > in negligible compared to the disk accesses. A lot of the time data will just move between caches so disk speed is secondary. And sending each chunk seperately would mean up to 32 context switches instead of 1. And writing to a socket in chunks can lead to the data being send out in frames less than the MTU which would be wastefull. So I think there is some gain in limiting this. >> On the other hand I could implement (p)readv() and >> (p)writev() stubs. >> >> Anyway, my question now is how to detect which subarrays in the stripes >> are part of a common larger array? Do I need to write my own C stub that >> looks into the internas of the arrays to see if they share a common >> ancestor? I think that would be preferable to tracking the origin of >> each subarray myself. > > Yes, subarrays are tracked, but this feature exists only to catch the > right moment for unmmapping bigarrays (if needed). As far as I remember, > this is not tracked as a sub/super relationship, but the bigarrays > accessing the same buffer share then the same buffer descriptor, and > when the use count drops to 0, the buffer is finally destroyed. So, you > cannot say which bigarray is the original one, and it can even be that > the original bigarray is already deallocated but the backing buffer is > not yet. All subarrays share a struct caml_ba_proxy, as you say to catch the right moment for unmmapping bigarrays. So two subarrays are part of the same bigarray if they have the same proxy object. That seems like an easy enough test. Which one is the original bigarray doesn't matter, at least to me. >> On a similar note how does Bigarray.Array1.blit handle arrays that are >> part of the same larger array and overlap? >> >> ---------------------------------------------------------------------- >> I'm missing functions like: >> >> val super : ('a, 'b, 'c) t -> ('a, 'b, 'c) t -> ('a, 'b, 'c) t >> Create a merged sub-array of 2 adjacent sub-arrays of the same >> big array. > > This function would be possible to implement. The requirement would be > that both bigarrays use the same buffer descriptor. > >> val same_parent : ('a, 'b, 'c) t -> ('a, 'b, 'c) t -> bool >> True if the 2 (sub-)arrays are part of the same big array. > > I would not call it "same_parent", but "same_backing_buffer". Maybe "related" would be better? >> val offset : ('a, 'b, 'c) t -> int >> Offset of the sub-array in the underlying big array. > > I think this information is in the current implementation not available. > As buffer sharing is also possible after reshaping, this is also not > meaningful in the general case. offset = array->data - array->proxy->data; After reshaping the offset would be the offset into the original array after reshaping that too. For me the offset would be relevant to see which 2 sub arrays could be merged with "super" above. That it wouldn't always be meaningfull I could accept. > Gerd MfG Goswin ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Bigarray question: Detecting subarrays, overlaps, ... 2012-02-29 9:23 ` Goswin von Brederlow @ 2012-02-29 13:21 ` Gerd Stolpmann 2012-03-01 8:52 ` Goswin von Brederlow 0 siblings, 1 reply; 5+ messages in thread From: Gerd Stolpmann @ 2012-02-29 13:21 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: caml-list Am Mittwoch, den 29.02.2012, 10:23 +0100 schrieb Goswin von Brederlow: > >> For efficiency each request stores all its data in a single > >> Bigarray.Array1. For reasons of the RAID implementation large requests > >> are split into 4k chunks using Bigarray.Array1.sub and grouped into > >> stripes. The stripes are then acted upon independently until all stripes > >> involved in a request are finished. Then the request is completed. > >> > >> Now I get a problem when 2 requests come in that overlap. Say I get a > >> write request for blocks 0 - 6 and a read request for blocks 4-9 in > >> that order. Since I already have the data for block 4-6 in memory from > >> the write request I do not want to reread them from disk. On the stripe > >> level the data looks like this: > >> > >> |W0 W1 W2 W3 W4 W5 W6 . . . | write 0-6 > >> | W4 W5 W6 R7 R8 R9 | read 4-9 > >> > >> For the read request I need to copy W4-6 to R4-6 or send out W4-6 + R7-9 > >> in two writes. I think I would like to avoid sending each stripe chunk > >> out seperately. > > > > Why not? This seems to be an elegant solution, and I don't see why this > > should make the accesses slower. The time for the extra context switches > > in negligible compared to the disk accesses. > > A lot of the time data will just move between caches so disk speed is > secondary. And sending each chunk seperately would mean up to 32 context > switches instead of 1. And writing to a socket in chunks can lead to the > data being send out in frames less than the MTU which would be > wastefull. So I think there is some gain in limiting this. A context switch takes less than a microsecond on typical hardware, including the time for rebuilding the TLB from the page table. Sending a 4K block over gigabit network takes more than 30 times longer. I doubt you can gain from combining several chunks. Sending the data in frames less than the MTU can normally only happen when you disable the Nagle algorithm (did you?). And even then, you only need to fill the TCP buffer faster than the network can transport the data to avoid it. > >> On the other hand I could implement (p)readv() and > >> (p)writev() stubs. > >> > >> Anyway, my question now is how to detect which subarrays in the stripes > >> are part of a common larger array? Do I need to write my own C stub that > >> looks into the internas of the arrays to see if they share a common > >> ancestor? I think that would be preferable to tracking the origin of > >> each subarray myself. > > > > Yes, subarrays are tracked, but this feature exists only to catch the > > right moment for unmmapping bigarrays (if needed). As far as I remember, > > this is not tracked as a sub/super relationship, but the bigarrays > > accessing the same buffer share then the same buffer descriptor, and > > when the use count drops to 0, the buffer is finally destroyed. So, you > > cannot say which bigarray is the original one, and it can even be that > > the original bigarray is already deallocated but the backing buffer is > > not yet. > > All subarrays share a struct caml_ba_proxy, as you say to catch the > right moment for unmmapping bigarrays. So two subarrays are part of the > same bigarray if they have the same proxy object. That seems like an > easy enough test. Which one is the original bigarray doesn't matter, at > least to me. > > >> On a similar note how does Bigarray.Array1.blit handle arrays that are > >> part of the same larger array and overlap? > >> > >> ---------------------------------------------------------------------- > >> I'm missing functions like: > >> > >> val super : ('a, 'b, 'c) t -> ('a, 'b, 'c) t -> ('a, 'b, 'c) t > >> Create a merged sub-array of 2 adjacent sub-arrays of the same > >> big array. > > > > This function would be possible to implement. The requirement would be > > that both bigarrays use the same buffer descriptor. > > > >> val same_parent : ('a, 'b, 'c) t -> ('a, 'b, 'c) t -> bool > >> True if the 2 (sub-)arrays are part of the same big array. > > > > I would not call it "same_parent", but "same_backing_buffer". > > Maybe "related" would be better? > > >> val offset : ('a, 'b, 'c) t -> int > >> Offset of the sub-array in the underlying big array. > > > > I think this information is in the current implementation not available. > > As buffer sharing is also possible after reshaping, this is also not > > meaningful in the general case. > > offset = array->data - array->proxy->data; > > After reshaping the offset would be the offset into the original array > after reshaping that too. > > For me the offset would be relevant to see which 2 sub arrays could be > merged with "super" above. That it wouldn't always be meaningfull I > could accept. What I did not understand up to now: It is really easy to keep all this information in a helper data structure. Why extend Bigarray? Gerd > > > Gerd > > MfG > Goswin > -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Bigarray question: Detecting subarrays, overlaps, ... 2012-02-29 13:21 ` Gerd Stolpmann @ 2012-03-01 8:52 ` Goswin von Brederlow 0 siblings, 0 replies; 5+ messages in thread From: Goswin von Brederlow @ 2012-03-01 8:52 UTC (permalink / raw) To: Gerd Stolpmann; +Cc: caml-list Gerd Stolpmann <info@gerd-stolpmann.de> writes: > What I did not understand up to now: It is really easy to keep all this > information in a helper data structure. Why extend Bigarray? > > Gerd A helper data structure would mean keeping track of that information on my own complicating the source and wasting memory at runtime. On the other hand what I suggested only makes use of what is already there in the Bigarray data structure. The functions would only make that information available to the user. I'm not a fan of duplicating information that is already there. Worst case the helper data structure could be wrong due to some bug in the code. MfG Goswin ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-03-01 8:52 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-28 20:17 [Caml-list] Bigarray question: Detecting subarrays, overlaps, Goswin von Brederlow 2012-02-28 21:09 ` Gerd Stolpmann 2012-02-29 9:23 ` Goswin von Brederlow 2012-02-29 13:21 ` Gerd Stolpmann 2012-03-01 8:52 ` Goswin von Brederlow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox