* [Caml-list] Primitive sizes
@ 2004-09-29 6:43 Jonathan Bryant
2004-09-29 15:02 ` Brian Hurt
0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Bryant @ 2004-09-29 6:43 UTC (permalink / raw)
To: caml-list
I would like to know the sizes of the "primitive" types in OCaml (I
assume that they vary per platform, but one can hope that they are
standard...) If they do vary , is there any way to define new types
(similar to C typedef macro)? I would like to create 8-, 16-, 32-, and
64-bit integers, 32- and 64-bit floats, and 16-bit characters. I know i
could just create Int32s and Int64s and manipulate the bits ignoring the
ones I don't need, but is there a way to allocate just the necessary
memory without interfacing to C? If not, can anyone point me in a good
direction to learn how to interface with C (by "good" I mean that a
tutorial is better/more preferable than a language specification...)?
--Jonathan Bryant
AIM: JonBoy3182
"The three principal virtues of a programmer are Laziness,
Impatience, and Hubris."
-- Perl man page
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Primitive sizes
2004-09-29 6:43 [Caml-list] Primitive sizes Jonathan Bryant
@ 2004-09-29 15:02 ` Brian Hurt
2004-09-30 6:16 ` Jonathan Bryant
0 siblings, 1 reply; 5+ messages in thread
From: Brian Hurt @ 2004-09-29 15:02 UTC (permalink / raw)
To: Jonathan Bryant; +Cc: caml-list
On Wed, 29 Sep 2004, Jonathan Bryant wrote:
> I would like to know the sizes of the "primitive" types in OCaml (I
> assume that they vary per platform, but one can hope that they are
> standard...)
ints are either 31 or 63 bits, depending upon if you're on 32 or 64 bit
machines (one bit is stolen for the tag bit). Int32 and Int64 have the
obvious bit size, but they are boxed integers (as opposed to ints, which
are unboxed). Chars are 8 bits and unboxed- but can't be used as short
integers directly.
This should be a FAQ, if it isn't already. We just recently had a
discussion on this very mailing list on why ints are one bit short- I'd
search the archives and read the discussion before bringing that
discussion up again.
> If they do vary , is there any way to define new types
> (similar to C typedef macro)? I would like to create 8-, 16-, 32-, and
> 64-bit integers, 32- and 64-bit floats, and 16-bit characters. I know i
> could just create Int32s and Int64s and manipulate the bits ignoring the
> ones I don't need, but is there a way to allocate just the necessary
> memory without interfacing to C? If not, can anyone point me in a good
> direction to learn how to interface with C (by "good" I mean that a
> tutorial is better/more preferable than a language specification...)?
The Ocaml manual has a good section on interfacing to C. But I have to
ask the question: why bother? Especially with the integers?
First off, Ocaml holds all variables in single words- which are defined as
the size of a pointer on the current machine. If you have a char list,
every single char in that list takes up three words- one word for the list
element tag, one word for the next pointer, and the char itself takes up
one word. Likewise, if you have a char array, every element in the array
takes up one whole word (this is why strings are not char arrays). This
allows Ocaml to share code- a function that handles a 'a array can now
handle an array of chars, ints, floats, booleans, or foos. If the type
isn't unboxed (int, char, boolean) the array or list holds a reference to
the type- which is still just a word.
The humorous thing is that C doesn't save as much as most people think it
does in using smaller types- this is because pretty much all C compilers
these days pad the data. Accessing data that is aligned is signifigantly
faster than accessing data that isn't aligned (and on many CPUs, you can't
access misaligned data), so the C compiler inserts padding- unused bytes-
to keep the data aligned. For example, how large is the following
structure on a 32-bit platform (ints are 4 bytes)?
struct foo {
char c;
int i;
};
You might say five bytes- four for the int and one for the char. You'd be
wrong- the compiler will almost certainly add three bytes of padding
between c and i to keep i aligned- meaning the size of the structure is
actualy 8 bytes. The char takes up a full four bytes all by it's
lonesome.
Changing the order doesn't help. Consider the following structure:
struct foo2 {
int i;
char c;
};
Now, the int doesn't follow the char. The char can't be misaligned, so
you don't need padding, do you? Well, yes you still do need padding. The
C standard says the size of a structure will be padded out so that arrays
of the structure are still aligned- effectively, that given a pointer p,
the access:
((struct foo2 *) p)->i
to i is still aligned. So again, the size of the structure is still 8
bytes, and the char is still taking up a full four bytes.
Padding also shows up on local variables and function arguments in C.
Consider the function:
void bar (char c) {
char t;
...
How much memory does the argument c and the local variable t take up?
Again- the compiler needs to keep the stack aligned, so variables and
arguments get padded- both take up a full word.
If you have multiple variables of the same type, the shorter types do save
some memory. For example, this structure also only takes up two words of
memory:
struct foo3 {
int i;
char c;
char d;
};
But this requires you sort your variables, and happens less often than
people think. This is why Ocaml isn't the memory hog a naive analysis
might make you think it is.
In nine years of professional C programming and 15 years of hobbyist
programming, I have come to the conclusion that the main use of the
various C int types- which, by the way, not only includes char, short,
int, and long in both signed and unsigned varieties, but also size_t,
ssize_t, off_t, ptrdiff_t, pid_t, etc.- is to introduce bugs by allowing
you to pick the wrong int type.
So the question becomes- why do you need the other integer types?
--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Primitive sizes
2004-09-29 15:02 ` Brian Hurt
@ 2004-09-30 6:16 ` Jonathan Bryant
2004-09-30 20:54 ` Brian Hurt
0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Bryant @ 2004-09-30 6:16 UTC (permalink / raw)
To: caml-list
Ok. Maybe I need to make clear what I'm doing. I'm trying to write
native Java methods and have them call an OCaml library. The problem is
that I need Java compatible primitives to pass back from the methods.
In Java (ON ALL PLATFORMS), the primitives fall this way (in case you
don't know them off the top of your head):
Type Bits
----- ----
byte - 8
short - 16
int - 32
long - 64
float - 32
double - 64
char - 16
boolean - 8
Is there a way to create these, or should I just stick to C? I
understand how to do them in C/C++, but I was hoping it was possible to
do this in OCaml as well.
On Wed, 2004-09-29 at 11:02, Brian Hurt wrote:
> On Wed, 29 Sep 2004, Jonathan Bryant wrote:
>
> > I would like to know the sizes of the "primitive" types in OCaml (I
> > assume that they vary per platform, but one can hope that they are
> > standard...)
>
> ints are either 31 or 63 bits, depending upon if you're on 32 or 64 bit
> machines (one bit is stolen for the tag bit). Int32 and Int64 have the
> obvious bit size, but they are boxed integers (as opposed to ints, which
> are unboxed). Chars are 8 bits and unboxed- but can't be used as short
> integers directly.
>
> This should be a FAQ, if it isn't already. We just recently had a
> discussion on this very mailing list on why ints are one bit short- I'd
> search the archives and read the discussion before bringing that
> discussion up again.
>
> > If they do vary , is there any way to define new types
> > (similar to C typedef macro)? I would like to create 8-, 16-, 32-, and
> > 64-bit integers, 32- and 64-bit floats, and 16-bit characters. I know i
> > could just create Int32s and Int64s and manipulate the bits ignoring the
> > ones I don't need, but is there a way to allocate just the necessary
> > memory without interfacing to C? If not, can anyone point me in a good
> > direction to learn how to interface with C (by "good" I mean that a
> > tutorial is better/more preferable than a language specification...)?
>
> The Ocaml manual has a good section on interfacing to C. But I have to
> ask the question: why bother? Especially with the integers?
>
> First off, Ocaml holds all variables in single words- which are defined as
> the size of a pointer on the current machine. If you have a char list,
> every single char in that list takes up three words- one word for the list
> element tag, one word for the next pointer, and the char itself takes up
> one word. Likewise, if you have a char array, every element in the array
> takes up one whole word (this is why strings are not char arrays). This
> allows Ocaml to share code- a function that handles a 'a array can now
> handle an array of chars, ints, floats, booleans, or foos. If the type
> isn't unboxed (int, char, boolean) the array or list holds a reference to
> the type- which is still just a word.
>
> The humorous thing is that C doesn't save as much as most people think it
> does in using smaller types- this is because pretty much all C compilers
> these days pad the data. Accessing data that is aligned is signifigantly
> faster than accessing data that isn't aligned (and on many CPUs, you can't
> access misaligned data), so the C compiler inserts padding- unused bytes-
> to keep the data aligned. For example, how large is the following
> structure on a 32-bit platform (ints are 4 bytes)?
>
> struct foo {
> char c;
> int i;
> };
>
> You might say five bytes- four for the int and one for the char. You'd be
> wrong- the compiler will almost certainly add three bytes of padding
> between c and i to keep i aligned- meaning the size of the structure is
> actualy 8 bytes. The char takes up a full four bytes all by it's
> lonesome.
>
> Changing the order doesn't help. Consider the following structure:
>
> struct foo2 {
> int i;
> char c;
> };
>
> Now, the int doesn't follow the char. The char can't be misaligned, so
> you don't need padding, do you? Well, yes you still do need padding. The
> C standard says the size of a structure will be padded out so that arrays
> of the structure are still aligned- effectively, that given a pointer p,
> the access:
> ((struct foo2 *) p)->i
> to i is still aligned. So again, the size of the structure is still 8
> bytes, and the char is still taking up a full four bytes.
>
> Padding also shows up on local variables and function arguments in C.
> Consider the function:
>
> void bar (char c) {
> char t;
> ...
>
> How much memory does the argument c and the local variable t take up?
> Again- the compiler needs to keep the stack aligned, so variables and
> arguments get padded- both take up a full word.
>
> If you have multiple variables of the same type, the shorter types do save
> some memory. For example, this structure also only takes up two words of
> memory:
> struct foo3 {
> int i;
> char c;
> char d;
> };
>
> But this requires you sort your variables, and happens less often than
> people think. This is why Ocaml isn't the memory hog a naive analysis
> might make you think it is.
>
> In nine years of professional C programming and 15 years of hobbyist
> programming, I have come to the conclusion that the main use of the
> various C int types- which, by the way, not only includes char, short,
> int, and long in both signed and unsigned varieties, but also size_t,
> ssize_t, off_t, ptrdiff_t, pid_t, etc.- is to introduce bugs by allowing
> you to pick the wrong int type.
>
> So the question becomes- why do you need the other integer types?
--
--Jonathan Bryant
AIM: JonBoy3182
"The three principal virtues of a programmer are Laziness,
Impatience, and Hubris."
-- Perl man page
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
OAS AAS LLS
ZG214
TeKE For Life
ZN759
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Primitive sizes
2004-09-30 6:16 ` Jonathan Bryant
@ 2004-09-30 20:54 ` Brian Hurt
2004-10-01 9:36 ` Richard Jones
0 siblings, 1 reply; 5+ messages in thread
From: Brian Hurt @ 2004-09-30 20:54 UTC (permalink / raw)
To: Jonathan Bryant; +Cc: caml-list
On Thu, 30 Sep 2004, Jonathan Bryant wrote:
> Ok. Maybe I need to make clear what I'm doing. I'm trying to write
> native Java methods and have them call an OCaml library. The problem is
> that I need Java compatible primitives to pass back from the methods.
> In Java (ON ALL PLATFORMS), the primitives fall this way (in case you
> don't know them off the top of your head):
I don't know what Java's calling convention is. My advice:
> Type Bits
> ----- ----
> byte - 8
> short - 16
> char - 16
> boolean - 8
The above I'd just promote to ints for Ocaml, and truncate as necessary.
> int - 32
I'd use Int32's.
> long - 64
I'd use Int64's.
> float - 32
> double - 64
I'd be inclined to promote both of these to Ocaml floats, which are 64-bit
IEEE 754 floats (the same as Java's). The extra precision won't hurt
anything except stupidly strict binary compatability.
I assume you're using one of the Ocaml->JVM compilers kicking around.
Tying Ocaml native mode (or VM) code to Java would be way too much work.
--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Caml-list] Primitive sizes
2004-09-30 20:54 ` Brian Hurt
@ 2004-10-01 9:36 ` Richard Jones
0 siblings, 0 replies; 5+ messages in thread
From: Richard Jones @ 2004-10-01 9:36 UTC (permalink / raw)
Cc: caml-list
[-- Attachment #1: Type: text/plain, Size: 1036 bytes --]
On Thu, Sep 30, 2004 at 03:54:33PM -0500, Brian Hurt wrote:
> I don't know what Java's calling convention is. My advice:
>
> > Type Bits
> > ----- ----
> > byte - 8
> > short - 16
> > char - 16
> > boolean - 8
>
> The above I'd just promote to ints for Ocaml, and truncate as necessary.
We have a similar problem with the ocamldbi Dbi interface. It maps
SQL int4 and serial types (ie. full 32 bit signed ints) to OCaml int.
The reason is convenience. It's MUCH more convenient to deal with
OCaml int than boxed Int32.
However if anyone has a database with more than about 1 billion rows
in it, then they may find they have a problem ...
Rich.
--
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
"One serious obstacle to the adoption of good programming languages is
the notion that everything has to be sacrificed for speed. In computer
languages as in life, speed kills." -- Mike Vanier
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-10-01 9:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-29 6:43 [Caml-list] Primitive sizes Jonathan Bryant
2004-09-29 15:02 ` Brian Hurt
2004-09-30 6:16 ` Jonathan Bryant
2004-09-30 20:54 ` Brian Hurt
2004-10-01 9:36 ` Richard Jones
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox