Am Freitag, den 18.10.2013, 13:20 +0100 schrieb Anil Madhavapeddy: > On Fri, Oct 18, 2013 at 02:16:12PM +0200, rixed@happyleptic.org wrote: > > -[ Fri, Oct 18, 2013 at 12:59:55PM +0100, Anil Madhavapeddy ]---- > > > One feature I'd really like to see in Bitstring is support for Bigarray, > > > since that avoids a copy into the OCaml heap and lets us do quite high > > > performance parsing. If I remember right, there was a patch on the > > > Bitstring issue tracker, but it wasn't parameterised (so it's either > > > Bistring+string or Bitstring+bigarray, which isn't ideal). > > > > Pardon my lack of familiarity with bigarrays, but I can't see what's the > > difference between copying packets from pcap ring buffer into a bigarray > > or into a string. Or do you mean using Bigarray.map_file on the whole > > raw ring buffer and handle it without pcap help? Without knowing details: maybe no copy is required at all? The pcap ring buffer could be directly wrapped as Bigarray. > We have a number of use-cases that run OCaml in kernel mode, directly > operating on packets read from a network driver that's also written in > OCaml. Bigarrays are used as the mechanism for passing around externally > allocated memory (i.e. network card buffers) directly, whereas inspecting > them with a string-based Bigarray requires an expensive data copy. > > See: http://anil.recoil.org/papers/2013-asplos-mirage.pdf > or http://www.openmirage.org For similar reasons, I also added some Bigarray functions to Ocamlnet: http://projects.camlcity.org/projects/dl/ocamlnet-3.7.3/doc/html-main/Netsys_mem.html If you look at the stub behind e.g. Unix.read, you'll see that the data is first read into an internal unaligned buffer, and then copied to the string buffer. This means usually two copies of the data: one from the kernel buffer to the internal buffer, and one from there to the string. If you use a Bigarray instead the internal buffer becomes superfluous: Bigarrays are malloc'ed memory, and cannot be moved by the GC. Hence, you can invoke the read() syscall directly with the Bigarray as buffer. If you additionally ensure that the Bigarray is page-aligned, the kernel can sometimes even avoid copying at all (though only some OS seem to implement such a strategy, as changing the page mapping or doing some direct I/O can be more costly than copying). Another advantage here is that you can freely choose the size of the buffer (Unix.read et al use fixed-size 64K for the internal buffer). Also you can allocate the buffer in a shared area. Ocamlnet now prefers Bigarrays as primary buffers where reasonable, and where a speedup (or lower CPU consumption) can be expected. E.g. The HTTP client first reads data into a bigarray, splits the header there into lines (which are then normal strings again), and gathers the data chunks from the HTTP body (which can be strings or Bigarrays, at the user's choice). Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de My OCaml site: http://www.camlcity.org Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de ------------------------------------------------------------