* Re: [Caml-list] Re: How to read different ints from a Bigarray?
@ 2009-11-03 17:16 Charles Forsyth
0 siblings, 0 replies; 23+ messages in thread
From: Charles Forsyth @ 2009-11-03 17:16 UTC (permalink / raw)
To: caml-list, goswin-v-b
>And mips.
and powerpc (if you'd like to run caml on Blue Gene, and why not)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-11-02 20:27 ` Richard Jones
@ 2009-11-03 13:18 ` Goswin von Brederlow
0 siblings, 0 replies; 23+ messages in thread
From: Goswin von Brederlow @ 2009-11-03 13:18 UTC (permalink / raw)
To: Richard Jones; +Cc: caml-list
Richard Jones <rich@annexia.org> writes:
> On Mon, Nov 02, 2009 at 05:33:24PM +0100, Mauricio Fernandez wrote:
>> It might be possible to hack support for C-- expressions in external
>> declarations. That'd be a sort of portable assembler.
>
> To be honest I'm far more interested in x86-64-specific instructions
> (SSE3/4 in particular). There are only two processor architectures
> that matter in the world in any practical sense, x86-64 and ARM.
>
> Rich.
And mips.
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-11-02 16:33 ` Mauricio Fernandez
2009-11-02 20:27 ` Richard Jones
@ 2009-11-02 20:48 ` Goswin von Brederlow
1 sibling, 0 replies; 23+ messages in thread
From: Goswin von Brederlow @ 2009-11-02 20:48 UTC (permalink / raw)
To: caml-list
Mauricio Fernandez <mfp@acm.org> writes:
> On Mon, Nov 02, 2009 at 05:11:27PM +0100, Goswin von Brederlow wrote:
>> Richard Jones <rich@annexia.org> writes:
>>
>> > On Sun, Nov 01, 2009 at 04:11:52PM +0100, Goswin von Brederlow wrote:
>> >> But C calls are still 33% slower than direct access in ocaml (if one
>> >> doesn't use the polymorphic functions).
>> >
>> > Are you using noalloc calls?
>> >
>> > http://camltastic.blogspot.com/2008/08/tip-calling-c-functions-directly-with.html
>>
>> Yes. And I looked at the bigarray module and couldn't figure out how
>> they differ from my own external function. Only difference I see is
>> the leading "%" on the external name. What does that do?
>
> That means that it is using a hardcoded OCaml primitive, whose code can be
> generated by the compiler via C--. See asmcomp/cmmgen.ml.
>
>> > I would love to see inline assembler supported by the compiler.
>
> It might be possible to hack support for C-- expressions in external
> declarations. That'd be a sort of portable assembler.
This brings me a lot closer to a fast buffer structure. I know have
this code:
(* buffer.ml: Buffer module for libaio-ocaml
* Copyright (C) 2009 Goswin von Brederlow
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as
* published by the Free Software Foundation, either version 3 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
* Under Debian a copy can be found in /usr/share/common-licenses/LGPL-3.
*)
open Bigarray
type buffer = (int, int8_unsigned_elt, c_layout) Array1.t
exception Unaligned
let create size = (Array1.create int8_unsigned c_layout size : buffer)
let unsafe_get_uint8 (buf : buffer) off = Array1.unsafe_get buf off
let unsafe_get_uint16 (buf : buffer) off =
let off = off asr 1 in
let buf = ((Obj.magic buf) : (int, int16_unsigned_elt, c_layout) Array1.t)
in
Array1.unsafe_get buf off
let unsafe_get_int31 (buf : buffer) off =
let off = off asr 2 in
let buf = ((Obj.magic buf) : (int32, int32, c_layout) Array1.t) in
let x = Array1.unsafe_get buf off
in
Int32.to_int x
let unsafe_get_int63 (buf : buffer) off =
let off = off asr 3 in
let buf = ((Obj.magic buf) : (int, int, c_layout) Array1.t)
in
Array1.unsafe_get buf off
Looking at the generated code I see that this works nicely for 8 and
16bit:
0000000000404a50 <camlBuffer__unsafe_get_uint8_131>:
404a50: 48 d1 fb sar %rbx
404a53: 48 8b 40 08 mov 0x8(%rax),%rax
404a57: 48 0f b6 04 18 movzbq (%rax,%rbx,1),%rax
404a5c: 48 8d 44 00 01 lea 0x1(%rax,%rax,1),%rax
404a61: c3 retq
0000000000404a90 <camlBuffer__unsafe_get_uint16_137>:
404a90: 48 d1 fb sar %rbx
404a93: 48 83 cb 01 or $0x1,%rbx
404a97: 48 d1 fb sar %rbx
404a9a: 48 8b 40 08 mov 0x8(%rax),%rax
404a9e: 48 0f b7 04 58 movzwq (%rax,%rbx,2),%rax
404aa3: 48 8d 44 00 01 lea 0x1(%rax,%rax,1),%rax
404aa8: c3 retq
But for 31/63 bits I get:
0000000000404b90 <camlBuffer__unsafe_get_int31_145>:
404b90: 48 83 ec 08 sub $0x8,%rsp
404b94: 48 c1 fb 02 sar $0x2,%rbx
404b98: 48 83 cb 01 or $0x1,%rbx
404b9c: 48 89 c7 mov %rax,%rdi
404b9f: 48 89 de mov %rbx,%rsi
404ba2: 48 8b 05 5f bc 21 00 mov 0x21bc5f(%rip),%rax # 620808 <_DYNAMIC+0x7e0>
404ba9: e8 92 2a 01 00 callq 417640 <caml_c_call>
404bae: 48 63 40 08 movslq 0x8(%rax),%rax
404bb2: 48 d1 e0 shl %rax
404bb5: 48 83 c8 01 or $0x1,%rax
404bb9: 48 83 c4 08 add $0x8,%rsp
404bbd: c3 retq
0000000000404ca0 <camlBuffer__unsafe_get_int63_154>:
404ca0: 48 83 ec 08 sub $0x8,%rsp
404ca4: 48 c1 fb 03 sar $0x3,%rbx
404ca8: 48 83 cb 01 or $0x1,%rbx
404cac: 48 89 c7 mov %rax,%rdi
404caf: 48 89 de mov %rbx,%rsi
404cb2: 48 8b 05 4f bb 21 00 mov 0x21bb4f(%rip),%rax # 620808 <_DYNAMIC+0x7e0>
404cb9: e8 82 29 01 00 callq 417640 <caml_c_call>
404cbe: 48 83 c4 08 add $0x8,%rsp
404cc2: c3 retq
At least in the int63 case I would have thought the compiler would
emit asm code to read the int instead of a function call. In the 31bit
case I would have hoped it would optimize the intermittend int32 away.
Is there something I can do better to get_int31? I was hoping for code
like this:
0000000000404a90 <camlBuffer__unsafe_get_uint31_137>:
404c90: 48 c1 fb 03 sar $0x3,%rbx
404a94: 48 83 cb 01 or $0x1,%rbx
404a98: 48 d1 fb sar %rbx
404a9b: 48 8b 40 08 mov 0x8(%rax),%rax
404a9f: xx xx xx xx xx movzwq (%rax,%rbx,4),%rax
404aa4: 48 8d 44 00 01 lea 0x1(%rax,%rax,1),%rax
404aa9: c3 retq
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-11-02 16:33 ` Mauricio Fernandez
@ 2009-11-02 20:27 ` Richard Jones
2009-11-03 13:18 ` Goswin von Brederlow
2009-11-02 20:48 ` Goswin von Brederlow
1 sibling, 1 reply; 23+ messages in thread
From: Richard Jones @ 2009-11-02 20:27 UTC (permalink / raw)
To: caml-list
On Mon, Nov 02, 2009 at 05:33:24PM +0100, Mauricio Fernandez wrote:
> It might be possible to hack support for C-- expressions in external
> declarations. That'd be a sort of portable assembler.
To be honest I'm far more interested in x86-64-specific instructions
(SSE3/4 in particular). There are only two processor architectures
that matter in the world in any practical sense, x86-64 and ARM.
Rich.
--
Richard Jones
Red Hat
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-11-02 16:11 ` Goswin von Brederlow
@ 2009-11-02 16:33 ` Mauricio Fernandez
2009-11-02 20:27 ` Richard Jones
2009-11-02 20:48 ` Goswin von Brederlow
0 siblings, 2 replies; 23+ messages in thread
From: Mauricio Fernandez @ 2009-11-02 16:33 UTC (permalink / raw)
To: caml-list
On Mon, Nov 02, 2009 at 05:11:27PM +0100, Goswin von Brederlow wrote:
> Richard Jones <rich@annexia.org> writes:
>
> > On Sun, Nov 01, 2009 at 04:11:52PM +0100, Goswin von Brederlow wrote:
> >> But C calls are still 33% slower than direct access in ocaml (if one
> >> doesn't use the polymorphic functions).
> >
> > Are you using noalloc calls?
> >
> > http://camltastic.blogspot.com/2008/08/tip-calling-c-functions-directly-with.html
>
> Yes. And I looked at the bigarray module and couldn't figure out how
> they differ from my own external function. Only difference I see is
> the leading "%" on the external name. What does that do?
That means that it is using a hardcoded OCaml primitive, whose code can be
generated by the compiler via C--. See asmcomp/cmmgen.ml.
> > I would love to see inline assembler supported by the compiler.
It might be possible to hack support for C-- expressions in external
declarations. That'd be a sort of portable assembler.
--
Mauricio Fernandez - http://eigenclass.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-11-01 19:57 ` Richard Jones
@ 2009-11-02 16:11 ` Goswin von Brederlow
2009-11-02 16:33 ` Mauricio Fernandez
0 siblings, 1 reply; 23+ messages in thread
From: Goswin von Brederlow @ 2009-11-02 16:11 UTC (permalink / raw)
To: Richard Jones; +Cc: Goswin von Brederlow, caml-list
Richard Jones <rich@annexia.org> writes:
> On Sun, Nov 01, 2009 at 04:11:52PM +0100, Goswin von Brederlow wrote:
>> But C calls are still 33% slower than direct access in ocaml (if one
>> doesn't use the polymorphic functions).
>
> Are you using noalloc calls?
>
> http://camltastic.blogspot.com/2008/08/tip-calling-c-functions-directly-with.html
Yes. And I looked at the bigarray module and couldn't figure out how
they differ from my own external function. Only difference I see is
the leading "%" on the external name. What does that do?
> I would love to see inline assembler supported by the compiler.
>
> Rich.
And some primitive operations on integers like sign extending and byte
swapping in the Pervasives module where the compiler emits cpu
specific code instead of a caml/C call.
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-11-01 15:11 ` Goswin von Brederlow
@ 2009-11-01 19:57 ` Richard Jones
2009-11-02 16:11 ` Goswin von Brederlow
0 siblings, 1 reply; 23+ messages in thread
From: Richard Jones @ 2009-11-01 19:57 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: caml-list
On Sun, Nov 01, 2009 at 04:11:52PM +0100, Goswin von Brederlow wrote:
> But C calls are still 33% slower than direct access in ocaml (if one
> doesn't use the polymorphic functions).
Are you using noalloc calls?
http://camltastic.blogspot.com/2008/08/tip-calling-c-functions-directly-with.html
I would love to see inline assembler supported by the compiler.
Rich.
--
Richard Jones
Red Hat
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-30 20:30 ` Richard Jones
@ 2009-11-01 15:11 ` Goswin von Brederlow
2009-11-01 19:57 ` Richard Jones
0 siblings, 1 reply; 23+ messages in thread
From: Goswin von Brederlow @ 2009-11-01 15:11 UTC (permalink / raw)
To: Richard Jones; +Cc: Goswin von Brederlow, caml-list
Richard Jones <rich@annexia.org> writes:
> On Thu, Oct 29, 2009 at 06:07:59PM +0100, Goswin von Brederlow wrote:
>> I still can reuse a lot of this. Esspecially the syntax extension
>> seems like a good idea. Maybe reduced to bytes instead of bits
>> though. I don't intend to use such fine grained structures to need bit
>> access.
>
> Take a close look at bitstring. In all the cases where it can
> *statically* determine that accesses are on byte or larger boundaries,
> it does *not* do any bitfiddling but uses the most efficient, direct C
> calls possible.
>
> We really did spend a lot of time optimizing the bitmatch case.
>
> Rich.
But C calls are still 33% slower than direct access in ocaml (if one
doesn't use the polymorphic functions).
What would be great would be to use whatever Bigarray uses to get the
compiler to emit direct access to the data instead of C calls. Time to
hit the source.
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-29 17:07 ` Goswin von Brederlow
@ 2009-10-30 20:30 ` Richard Jones
2009-11-01 15:11 ` Goswin von Brederlow
0 siblings, 1 reply; 23+ messages in thread
From: Richard Jones @ 2009-10-30 20:30 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: caml-list
On Thu, Oct 29, 2009 at 06:07:59PM +0100, Goswin von Brederlow wrote:
> I still can reuse a lot of this. Esspecially the syntax extension
> seems like a good idea. Maybe reduced to bytes instead of bits
> though. I don't intend to use such fine grained structures to need bit
> access.
Take a close look at bitstring. In all the cases where it can
*statically* determine that accesses are on byte or larger boundaries,
it does *not* do any bitfiddling but uses the most efficient, direct C
calls possible.
We really did spend a lot of time optimizing the bitmatch case.
Rich.
--
Richard Jones
Red Hat
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-29 23:43 ` Goswin von Brederlow
@ 2009-10-30 0:48 ` Gerd Stolpmann
0 siblings, 0 replies; 23+ messages in thread
From: Gerd Stolpmann @ 2009-10-30 0:48 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: Florian Weimer, caml-list
Am Freitag, den 30.10.2009, 00:43 +0100 schrieb Goswin von Brederlow:
> Gerd Stolpmann <gerd@gerd-stolpmann.de> writes:
>
> > Am Donnerstag, den 29.10.2009, 21:40 +0100 schrieb Florian Weimer:
> >> * Goswin von Brederlow:
> >>
> >> > - The data is passed to libaio and needs to be kept alive and unmoved
> >> > as long as libaio knows it.
> >>
> >> It also has to be aligned to a 512-byte boundary, so you can use
> >> O_DIRECT. Linux does not support truely asynchronous I/O without
> >> O_DIRECT AFAIK, which rarely makes it worth the trouble.
> >
> > Right. There is also the question whether aio for regular files (i.e.
> > files backed by page cache) is continued to be supported at all - it is
> > well known that Linus Torvalds doesn't like it. It can happen that at
> > some day aio will be restricted to block devices only.
> >
> > So I wouldn't use it for production code, but it is of course still an
> > interesting interface.
> >
> > Gerd
>
> Damn. That seems so stupid. Then writing asynchronous will only be
> possible with creating a pot full of worker thread, each one writing
> one chunk. So you get all those chunks in random order submitted to
> the kernel, the kernel has to reorder them, fit them back together,
> write them and then wake up the right thread for each piece
> completed. So much extra work while libaio has all the data already in
> perfect structures for the kernel.
Well, this is exactly the implementation of the POSIX aio functions in
glibc. They are mapped to a bunch of threads.
> And how will you do barriers when writing with threads? Wait for all
> threads to complete every time you hit a barrier and thereby stalling
> the pipeline?
You can't implement barriers. When you have page-cache backed I/O (i.e.
non-direct I/O, no matter of aio or sync I/O) there is no control when
data is written. Ok, there is fsync but this is very coarse-grained
control.
Gerd
--
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany
gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-29 21:04 ` Gerd Stolpmann
@ 2009-10-29 23:43 ` Goswin von Brederlow
2009-10-30 0:48 ` Gerd Stolpmann
0 siblings, 1 reply; 23+ messages in thread
From: Goswin von Brederlow @ 2009-10-29 23:43 UTC (permalink / raw)
To: Gerd Stolpmann; +Cc: Florian Weimer, Goswin von Brederlow, caml-list
Gerd Stolpmann <gerd@gerd-stolpmann.de> writes:
> Am Donnerstag, den 29.10.2009, 21:40 +0100 schrieb Florian Weimer:
>> * Goswin von Brederlow:
>>
>> > - The data is passed to libaio and needs to be kept alive and unmoved
>> > as long as libaio knows it.
>>
>> It also has to be aligned to a 512-byte boundary, so you can use
>> O_DIRECT. Linux does not support truely asynchronous I/O without
>> O_DIRECT AFAIK, which rarely makes it worth the trouble.
>
> Right. There is also the question whether aio for regular files (i.e.
> files backed by page cache) is continued to be supported at all - it is
> well known that Linus Torvalds doesn't like it. It can happen that at
> some day aio will be restricted to block devices only.
>
> So I wouldn't use it for production code, but it is of course still an
> interesting interface.
>
> Gerd
Damn. That seems so stupid. Then writing asynchronous will only be
possible with creating a pot full of worker thread, each one writing
one chunk. So you get all those chunks in random order submitted to
the kernel, the kernel has to reorder them, fit them back together,
write them and then wake up the right thread for each piece
completed. So much extra work while libaio has all the data already in
perfect structures for the kernel.
And how will you do barriers when writing with threads? Wait for all
threads to complete every time you hit a barrier and thereby stalling
the pipeline?
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-29 20:40 ` Florian Weimer
2009-10-29 21:04 ` Gerd Stolpmann
@ 2009-10-29 23:38 ` Goswin von Brederlow
1 sibling, 0 replies; 23+ messages in thread
From: Goswin von Brederlow @ 2009-10-29 23:38 UTC (permalink / raw)
To: Florian Weimer; +Cc: caml-list
Florian Weimer <fw@deneb.enyo.de> writes:
> * Goswin von Brederlow:
>
>> - The data is passed to libaio and needs to be kept alive and unmoved
>> as long as libaio knows it.
>
> It also has to be aligned to a 512-byte boundary, so you can use
> O_DIRECT. Linux does not support truely asynchronous I/O without
> O_DIRECT AFAIK, which rarely makes it worth the trouble.
True. But the libaio can provide a Aio.Buffer.make that returns an
aligned Bigarray (or string or whatever, currently a custom type).
If you write to files on a filesystem without O_DIRECT it will block
when submitting the requests till they have completed. Not sure what
happens on block devices without O_DIRECT.
My use case is for a Fuse Filesystem and writing to disks. O_DIRECT is
quite alright there. If you can't use O_DIRECT then you are left with
going multithreaded.
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-29 18:48 ` Sylvain Le Gall
@ 2009-10-29 23:25 ` Goswin von Brederlow
0 siblings, 0 replies; 23+ messages in thread
From: Goswin von Brederlow @ 2009-10-29 23:25 UTC (permalink / raw)
To: Sylvain Le Gall; +Cc: caml-list
Sylvain Le Gall <sylvain@le-gall.net> writes:
> On 29-10-2009, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>> Xavier Leroy <Xavier.Leroy@inria.fr> writes:
>>> Goswin von Brederlow wrote:
>>
>> Here are some benchmark results:
>>
>> get an int out of a string:
>> C Ocaml
>> uint8 le 19.496 17.433
>> int8 le 19.298 17.850
>> uint16 le 19.427 25.046
>> int16 le 19.383 27.664
>> uint16 be 20.502 23.200
>> int16 be 20.350 27.535
>>
>> get an int out of a Bigarray.Array1.t:
>> safe unsafe
>> uint8 le 55.194s 54.508s
>> uint64 le 80.51s 81.46s
>>
>
> Can you provide us with the corresponding code and benchmark?
>
> Maybe you can just commit this in libaio/test/bench.ml.
>
> Regards,
> Sylvain Le Gall
As Christophe guessed the problem was polymorphic functions. If I
specify a fixed Array1 type then the compiler uses the optimized
access functions. Makes unsafe Bigarray slightly faster than unsafe
string actually (must not optimize int_of_char/Char.unsafe_chr away)
and that independent of argument size (on set, on get allocating
int32/int64 costs time so they are slower).
So Bigarray is the fastest but getting different types out of a
Bigarray will be tricky. Unaligned even more so if not impossible.
I have to sleep on this. Maybe in my use case I can have all
structures int64 aligned and then split the int64 up in ocaml where
structures have smaller members. Would have been too much to have a
Bigarray with access functions for any type. Maybe some little wrapper
with Obj.Magic will do *hide*.
As for libaio it should be easy to make it create and use any Bigarray
type.
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-29 20:40 ` Florian Weimer
@ 2009-10-29 21:04 ` Gerd Stolpmann
2009-10-29 23:43 ` Goswin von Brederlow
2009-10-29 23:38 ` Goswin von Brederlow
1 sibling, 1 reply; 23+ messages in thread
From: Gerd Stolpmann @ 2009-10-29 21:04 UTC (permalink / raw)
To: Florian Weimer; +Cc: Goswin von Brederlow, caml-list
Am Donnerstag, den 29.10.2009, 21:40 +0100 schrieb Florian Weimer:
> * Goswin von Brederlow:
>
> > - The data is passed to libaio and needs to be kept alive and unmoved
> > as long as libaio knows it.
>
> It also has to be aligned to a 512-byte boundary, so you can use
> O_DIRECT. Linux does not support truely asynchronous I/O without
> O_DIRECT AFAIK, which rarely makes it worth the trouble.
Right. There is also the question whether aio for regular files (i.e.
files backed by page cache) is continued to be supported at all - it is
well known that Linus Torvalds doesn't like it. It can happen that at
some day aio will be restricted to block devices only.
So I wouldn't use it for production code, but it is of course still an
interesting interface.
Gerd
--
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany
gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-28 15:00 ` [Caml-list] " Goswin von Brederlow
2009-10-28 15:17 ` Sylvain Le Gall
@ 2009-10-29 20:40 ` Florian Weimer
2009-10-29 21:04 ` Gerd Stolpmann
2009-10-29 23:38 ` Goswin von Brederlow
1 sibling, 2 replies; 23+ messages in thread
From: Florian Weimer @ 2009-10-29 20:40 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: caml-list
* Goswin von Brederlow:
> - The data is passed to libaio and needs to be kept alive and unmoved
> as long as libaio knows it.
It also has to be aligned to a 512-byte boundary, so you can use
O_DIRECT. Linux does not support truely asynchronous I/O without
O_DIRECT AFAIK, which rarely makes it worth the trouble.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-29 12:20 ` Richard Jones
@ 2009-10-29 17:07 ` Goswin von Brederlow
2009-10-30 20:30 ` Richard Jones
0 siblings, 1 reply; 23+ messages in thread
From: Goswin von Brederlow @ 2009-10-29 17:07 UTC (permalink / raw)
To: caml-list
Richard Jones <rich@annexia.org> writes:
> On Thu, Oct 29, 2009 at 10:50:31AM +0100, Goswin von Brederlow wrote:
>> but no
>>
>> let unparse_foo (x, y) =
>> bitmake { x : 16 : littleendian; y : 16 : littleendian } x y
>
> See:
>
> http://et.redhat.com/~rjones/bitstring/html/Bitstring.html#2_Constructingbitstrings
>
> I don't necessarily think bitstring is suitable here though because
> you still need to read your data into a string (or fake a string on
> the C heap as Olivier Andrieu mentioned). I think in this case you'd
> be better off just writing this part of the code in C.
>
> Rich.
I still can reuse a lot of this. Esspecially the syntax extension
seems like a good idea. Maybe reduced to bytes instead of bits
though. I don't intend to use such fine grained structures to need bit
access.
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-29 9:50 ` Goswin von Brederlow
2009-10-29 10:34 ` Goswin von Brederlow
@ 2009-10-29 12:20 ` Richard Jones
2009-10-29 17:07 ` Goswin von Brederlow
1 sibling, 1 reply; 23+ messages in thread
From: Richard Jones @ 2009-10-29 12:20 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: blue storm, Sylvain Le Gall, caml-list
On Thu, Oct 29, 2009 at 10:50:31AM +0100, Goswin von Brederlow wrote:
> but no
>
> let unparse_foo (x, y) =
> bitmake { x : 16 : littleendian; y : 16 : littleendian } x y
See:
http://et.redhat.com/~rjones/bitstring/html/Bitstring.html#2_Constructingbitstrings
I don't necessarily think bitstring is suitable here though because
you still need to read your data into a string (or fake a string on
the C heap as Olivier Andrieu mentioned). I think in this case you'd
be better off just writing this part of the code in C.
Rich.
--
Richard Jones
Red Hat
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-29 9:50 ` Goswin von Brederlow
@ 2009-10-29 10:34 ` Goswin von Brederlow
2009-10-29 12:20 ` Richard Jones
1 sibling, 0 replies; 23+ messages in thread
From: Goswin von Brederlow @ 2009-10-29 10:34 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: blue storm, Sylvain Le Gall, caml-list
Goswin von Brederlow <goswin-v-b@web.de> writes:
> blue storm <bluestorm.dylc@gmail.com> writes:
>
>> On Wed, Oct 28, 2009 at 6:57 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>>> Maybe ideal would be a format string based interface that calls C with
>>> a format string and a record of values. Because what I really need is
>>> to read/write records in an architecture independend way. Something
>>> like
>>>
>>> type t = { x:int; y:char; z:int64 }
>>> let t_format = "%2u%c%8d"
>>>
>>> put_formated buf t_format t
>>>
>>> But how to get that type safe? Maybe a camlp4 module that generates
>>> the format string and type from a single declaration so they always
>>> match.
>>
>> It's possibly off-topic, but you might be interested in Richard
>> Jones's Bitstring project [1] wich deals with similar issues quite
>> nicely in my opinion.
>>
>> [1] http://code.google.com/p/bitstring/
>
> No, quite on-topic.
>
> I glanced at the examples and code and it looks to me though as if
> this can only parse bitstrings but not create them from a pattern.
> You have
>
> let parse_foo bits =
> bitmatch bits with
> | { x : 16 : littleendian; y : 16 : littleendian } -> fun x y -> (x, y)
>
> but no
>
> let unparse_foo (x, y) =
> bitmake { x : 16 : littleendian; y : 16 : littleendian } x y
>
>
> Idealy would be something along
>
> let pattern = make_pattern { x : 16 : littleendian; y : 16 : littleendian }
> let parse_foo bits = parse pattern (fun x y -> (x, y))
> let unparse_foo (x, y) = unparse pattern x y
>
> But I know how to do that with CPS already. I just need the primitives
> to get/set the basic types.
>
> MfG
> Goswin
And I was wrong. There is
http://code.google.com/p/bitstring/source/browse/trunk/examples/make_ipv4_header.ml
as an example. Not ideal since parsing and unparsing will duplicate
the pattern definition but that will be locale for each type.
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-28 22:48 ` blue storm
@ 2009-10-29 9:50 ` Goswin von Brederlow
2009-10-29 10:34 ` Goswin von Brederlow
2009-10-29 12:20 ` Richard Jones
0 siblings, 2 replies; 23+ messages in thread
From: Goswin von Brederlow @ 2009-10-29 9:50 UTC (permalink / raw)
To: blue storm; +Cc: Goswin von Brederlow, Sylvain Le Gall, caml-list
blue storm <bluestorm.dylc@gmail.com> writes:
> On Wed, Oct 28, 2009 at 6:57 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>> Maybe ideal would be a format string based interface that calls C with
>> a format string and a record of values. Because what I really need is
>> to read/write records in an architecture independend way. Something
>> like
>>
>> type t = { x:int; y:char; z:int64 }
>> let t_format = "%2u%c%8d"
>>
>> put_formated buf t_format t
>>
>> But how to get that type safe? Maybe a camlp4 module that generates
>> the format string and type from a single declaration so they always
>> match.
>
> It's possibly off-topic, but you might be interested in Richard
> Jones's Bitstring project [1] wich deals with similar issues quite
> nicely in my opinion.
>
> [1] http://code.google.com/p/bitstring/
No, quite on-topic.
I glanced at the examples and code and it looks to me though as if
this can only parse bitstrings but not create them from a pattern.
You have
let parse_foo bits =
bitmatch bits with
| { x : 16 : littleendian; y : 16 : littleendian } -> fun x y -> (x, y)
but no
let unparse_foo (x, y) =
bitmake { x : 16 : littleendian; y : 16 : littleendian } x y
Idealy would be something along
let pattern = make_pattern { x : 16 : littleendian; y : 16 : littleendian }
let parse_foo bits = parse pattern (fun x y -> (x, y))
let unparse_foo (x, y) = unparse pattern x y
But I know how to do that with CPS already. I just need the primitives
to get/set the basic types.
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-28 17:57 ` [Caml-list] " Goswin von Brederlow
2009-10-28 18:19 ` Sylvain Le Gall
@ 2009-10-28 22:48 ` blue storm
2009-10-29 9:50 ` Goswin von Brederlow
1 sibling, 1 reply; 23+ messages in thread
From: blue storm @ 2009-10-28 22:48 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: Sylvain Le Gall, caml-list
On Wed, Oct 28, 2009 at 6:57 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> Maybe ideal would be a format string based interface that calls C with
> a format string and a record of values. Because what I really need is
> to read/write records in an architecture independend way. Something
> like
>
> type t = { x:int; y:char; z:int64 }
> let t_format = "%2u%c%8d"
>
> put_formated buf t_format t
>
> But how to get that type safe? Maybe a camlp4 module that generates
> the format string and type from a single declaration so they always
> match.
It's possibly off-topic, but you might be interested in Richard
Jones's Bitstring project [1] wich deals with similar issues quite
nicely in my opinion.
[1] http://code.google.com/p/bitstring/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-28 18:19 ` Sylvain Le Gall
@ 2009-10-28 21:05 ` Goswin von Brederlow
0 siblings, 0 replies; 23+ messages in thread
From: Goswin von Brederlow @ 2009-10-28 21:05 UTC (permalink / raw)
To: Sylvain Le Gall; +Cc: caml-list
Sylvain Le Gall <sylvain@le-gall.net> writes:
> On 28-10-2009, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>> Sylvain Le Gall <sylvain@le-gall.net> writes:
>>> On 28-10-2009, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>>>> Sylvain Le Gall <sylvain@le-gall.net> writes:
>>>>> On 28-10-2009, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>>
>>>> PS: Is a.{i} <- x a C call?
>>>
>>> Yes.
>>
>> That obviously sucks. I was hoping since the compiler has a special
>> syntax for it it would be built-in. Bigarray being a seperate module
>> should have clued me in.
>>
>> That obviously speaks against splitting int64 into 8 bytes and calling
>> a.{i} <- x for each.
>>
>> I think I will implement your method and C stubs for every set/get and
>> compare.
>
> This is only the case with int64 array in fact (I really have done test
> and you don't need a C call in most case).
Can I assume you tested on a 32bit cpu?
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-28 15:17 ` Sylvain Le Gall
@ 2009-10-28 17:57 ` Goswin von Brederlow
2009-10-28 18:19 ` Sylvain Le Gall
2009-10-28 22:48 ` blue storm
0 siblings, 2 replies; 23+ messages in thread
From: Goswin von Brederlow @ 2009-10-28 17:57 UTC (permalink / raw)
To: Sylvain Le Gall; +Cc: caml-list
Sylvain Le Gall <sylvain@le-gall.net> writes:
> On 28-10-2009, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>> Sylvain Le Gall <sylvain@le-gall.net> writes:
>>
>>> Hello,
>>>
>>> On 28-10-2009, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>>>> Hi,
>>>>
>>>
>>> Well, we talk about this a little bit, but here is my opinion:
>>> - calling a C function to add a single int will generate a big overhead
>>> - OCaml string are quite fast to modify values
>>>
>>> So to my mind the best option is to have a buffer string (say 16/32
>>> char) where you put data inside and flush it in a single C call to
>>> Bigarray.
>>>
>>> E.g.:
>>> let append_char t c =
>>> if t.idx >= 64 then
>>> (
>>> flush t.bigarray t.buffer;
>>> t.idx <- 0
>>> );
>>> t.buffer.(t.idx) <- c;
>>> t.idx <- t.idx + 1
>>>
>>> let append_little_uint16 t i =
>>> append_char t ((i lsr 8) land 0xFF);
>>> append_char t ((i lsr 0) land 0xFF)
>>>
>>>
>>> I have used this kind of technique and it seems as fast as C, and a lot
>>> less C coding.
>>>
>>> Regards,
>>> Sylvain Le Gall
>>
>> This wont work so nicely:
>>
>> - Writes are not always in sequence. I want to do a stream access
>> too where this could be verry effective. But the plain buffer is
>> more for random / known offset access. At a minimum you would have
>> holes for alignment.
>>
>> - It makes read/write buffers complicated as you need to flush or peek
>> the string in case of uncommited changes. I can't do write-only
>> buffers as I want to be able to write a buffer and then add a
>> checksum to it in my application. The lib should not block that.
>>
>
> I was thinking to pure stream. It still stand with random access but you
> don't get a lot less C function call. You just have to write less C
> code.
set_uint8 buf 5 1 -> read in 64 byte from stream, skip to 5, set byte
set uint8 buf 100 1 -> write 64 byte, read other 64 byte, set byte
That can become real expensive.
>> I also still wonder how bad a C function call really is. Consider the
>> case of writing an int64.
>>
>> Directly: You get one C call that does range check, endian convert and
>> write in one go.
>>
>> Bffered: With your code you have 7 Int64 shifts, 8 Int64 lands, 8
>> conversions to int, at least one index check (more likely 8 to avoid
>> handling unaligned access) and 1/8 C call to blit the 64 byte buffer
>> string into the Bigarray.
>
> Not at all, you begin to break your int64 into 3 int (24bit * 2 + 16bit)
> and then 7 int shift, 8 int land.
>
> You can even manage to only break into 1 or 2 int.
>
> And off course, you bypass index check.
fun with unaligned writes.
>> PS: Is a.{i} <- x a C call?
>
> Yes.
That obviously sucks. I was hoping since the compiler has a special
syntax for it it would be built-in. Bigarray being a seperate module
should have clued me in.
That obviously speaks against splitting int64 into 8 bytes and calling
a.{i} <- x for each.
I think I will implement your method and C stubs for every set/get and
compare.
Maybe ideal would be a format string based interface that calls C with
a format string and a record of values. Because what I really need is
to read/write records in an architecture independend way. Something
like
type t = { x:int; y:char; z:int64 }
let t_format = "%2u%c%8d"
put_formated buf t_format t
But how to get that type safe? Maybe a camlp4 module that generates
the format string and type from a single declaration so they always
match.
> Regards,
> Sylvain Le Gall
MfG
Goswin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Re: How to read different ints from a Bigarray?
2009-10-28 14:16 ` Sylvain Le Gall
@ 2009-10-28 15:00 ` Goswin von Brederlow
2009-10-28 15:17 ` Sylvain Le Gall
2009-10-29 20:40 ` Florian Weimer
0 siblings, 2 replies; 23+ messages in thread
From: Goswin von Brederlow @ 2009-10-28 15:00 UTC (permalink / raw)
To: caml-list
Sylvain Le Gall <sylvain@le-gall.net> writes:
> Hello,
>
> On 28-10-2009, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>> Hi,
>>
>> I'm working on binding s for linux libaio library (asynchron IO) with
>> a sharp eye on efficiency. That means no copying must be done on the
>> data, which in turn means I can not use string as buffer type.
>>
>> The best type for this seems to be a (int, int8_unsigned_elt,
>> c_layout) Bigarray.Array1.t. So far so good.
>>
>> Now I define helper functions:
>>
>> let get_uint8 buf off = buf.{off}
>> let set_uint8 buf off x = buf.{off} <- x
>>
>> But I want more:
>>
>> get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt?
>>
>> And endian correcting access for larger ints:
>>
>> get/set_big_uint16
>> get/set_big_int16
>> get/set_little_uint16
>> get/set_little_int16
>> get/set_big_uint24
>> ...
>> get/set_little_int56
>> get/set_big_int64
>> get/set_little_int64
>>
>> What is the best way there? For uintXX I can get_uint8 each byte and
>> shift and add them together. But that feels inefficient as each access
>> will range check and the shifting generates a lot of code while cpus
>> can usualy endian correct an int more elegantly.
>>
>> Is it worth the overhead of calling a C function to write optimized
>> stubs for this?
>>
>> And last:
>>
>> get/set_string, blit_from/to_string
>>
>> Do I create a string where needed and then loop over every char
>> calling s.(i) <- char_of_int buf.{off+i}? Or better a C function using
>> memcpy?
>>
>> What do you think?
>>
>
> Well, we talk about this a little bit, but here is my opinion:
> - calling a C function to add a single int will generate a big overhead
> - OCaml string are quite fast to modify values
>
> So to my mind the best option is to have a buffer string (say 16/32
> char) where you put data inside and flush it in a single C call to
> Bigarray.
>
> E.g.:
> let append_char t c =
> if t.idx >= 64 then
> (
> flush t.bigarray t.buffer;
> t.idx <- 0
> );
> t.buffer.(t.idx) <- c;
> t.idx <- t.idx + 1
>
> let append_little_uint16 t i =
> append_char t ((i lsr 8) land 0xFF);
> append_char t ((i lsr 0) land 0xFF)
>
>
> I have used this kind of technique and it seems as fast as C, and a lot
> less C coding.
>
> Regards,
> Sylvain Le Gall
This wont work so nicely:
- Writes are not always in sequence. I want to do a stream access
too where this could be verry effective. But the plain buffer is
more for random / known offset access. At a minimum you would have
holes for alignment.
- It makes read/write buffers complicated as you need to flush or peek
the string in case of uncommited changes. I can't do write-only
buffers as I want to be able to write a buffer and then add a
checksum to it in my application. The lib should not block that.
- The data is passed to libaio and needs to be kept alive and unmoved
as long as libaio knows it. I was hoping I could use the pointer to
the data to register/unregister GC roots without having to add a
another custom header and indirections.
I also still wonder how bad a C function call really is. Consider the
case of writing an int64.
Directly: You get one C call that does range check, endian convert and
write in one go.
Bffered: With your code you have 7 Int64 shifts, 8 Int64 lands, 8
conversions to int, at least one index check (more likely 8 to avoid
handling unaligned access) and 1/8 C call to blit the 64 byte buffer
string into the Bigarray.
MfG
Goswin
PS: Is a.{i} <- x a C call?
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2009-11-03 17:12 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-03 17:16 [Caml-list] Re: How to read different ints from a Bigarray? Charles Forsyth
-- strict thread matches above, loose matches on Subject: below --
2009-10-28 13:54 Goswin von Brederlow
2009-10-28 14:16 ` Sylvain Le Gall
2009-10-28 15:00 ` [Caml-list] " Goswin von Brederlow
2009-10-28 15:17 ` Sylvain Le Gall
2009-10-28 17:57 ` [Caml-list] " Goswin von Brederlow
2009-10-28 18:19 ` Sylvain Le Gall
2009-10-28 21:05 ` [Caml-list] " Goswin von Brederlow
2009-10-28 22:48 ` blue storm
2009-10-29 9:50 ` Goswin von Brederlow
2009-10-29 10:34 ` Goswin von Brederlow
2009-10-29 12:20 ` Richard Jones
2009-10-29 17:07 ` Goswin von Brederlow
2009-10-30 20:30 ` Richard Jones
2009-11-01 15:11 ` Goswin von Brederlow
2009-11-01 19:57 ` Richard Jones
2009-11-02 16:11 ` Goswin von Brederlow
2009-11-02 16:33 ` Mauricio Fernandez
2009-11-02 20:27 ` Richard Jones
2009-11-03 13:18 ` Goswin von Brederlow
2009-11-02 20:48 ` Goswin von Brederlow
2009-10-29 20:40 ` Florian Weimer
2009-10-29 21:04 ` Gerd Stolpmann
2009-10-29 23:43 ` Goswin von Brederlow
2009-10-30 0:48 ` Gerd Stolpmann
2009-10-29 23:38 ` Goswin von Brederlow
2009-10-28 17:09 ` [Caml-list] " Xavier Leroy
2009-10-29 17:05 ` Goswin von Brederlow
2009-10-29 18:48 ` Sylvain Le Gall
2009-10-29 23:25 ` [Caml-list] " Goswin von Brederlow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox