From: Brian Hurt <bhurt@spnz.org>
To: Jonathan Bryant <jtbryant@valdosta.edu>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Primitive sizes
Date: Wed, 29 Sep 2004 10:02:54 -0500 (CDT) [thread overview]
Message-ID: <Pine.LNX.4.44.0409290938040.5809-100000@localhost.localdomain> (raw)
In-Reply-To: <1096440210.6626.9.camel@localhost>
On Wed, 29 Sep 2004, Jonathan Bryant wrote:
> I would like to know the sizes of the "primitive" types in OCaml (I
> assume that they vary per platform, but one can hope that they are
> standard...)
ints are either 31 or 63 bits, depending upon if you're on 32 or 64 bit
machines (one bit is stolen for the tag bit). Int32 and Int64 have the
obvious bit size, but they are boxed integers (as opposed to ints, which
are unboxed). Chars are 8 bits and unboxed- but can't be used as short
integers directly.
This should be a FAQ, if it isn't already. We just recently had a
discussion on this very mailing list on why ints are one bit short- I'd
search the archives and read the discussion before bringing that
discussion up again.
> If they do vary , is there any way to define new types
> (similar to C typedef macro)? I would like to create 8-, 16-, 32-, and
> 64-bit integers, 32- and 64-bit floats, and 16-bit characters. I know i
> could just create Int32s and Int64s and manipulate the bits ignoring the
> ones I don't need, but is there a way to allocate just the necessary
> memory without interfacing to C? If not, can anyone point me in a good
> direction to learn how to interface with C (by "good" I mean that a
> tutorial is better/more preferable than a language specification...)?
The Ocaml manual has a good section on interfacing to C. But I have to
ask the question: why bother? Especially with the integers?
First off, Ocaml holds all variables in single words- which are defined as
the size of a pointer on the current machine. If you have a char list,
every single char in that list takes up three words- one word for the list
element tag, one word for the next pointer, and the char itself takes up
one word. Likewise, if you have a char array, every element in the array
takes up one whole word (this is why strings are not char arrays). This
allows Ocaml to share code- a function that handles a 'a array can now
handle an array of chars, ints, floats, booleans, or foos. If the type
isn't unboxed (int, char, boolean) the array or list holds a reference to
the type- which is still just a word.
The humorous thing is that C doesn't save as much as most people think it
does in using smaller types- this is because pretty much all C compilers
these days pad the data. Accessing data that is aligned is signifigantly
faster than accessing data that isn't aligned (and on many CPUs, you can't
access misaligned data), so the C compiler inserts padding- unused bytes-
to keep the data aligned. For example, how large is the following
structure on a 32-bit platform (ints are 4 bytes)?
struct foo {
char c;
int i;
};
You might say five bytes- four for the int and one for the char. You'd be
wrong- the compiler will almost certainly add three bytes of padding
between c and i to keep i aligned- meaning the size of the structure is
actualy 8 bytes. The char takes up a full four bytes all by it's
lonesome.
Changing the order doesn't help. Consider the following structure:
struct foo2 {
int i;
char c;
};
Now, the int doesn't follow the char. The char can't be misaligned, so
you don't need padding, do you? Well, yes you still do need padding. The
C standard says the size of a structure will be padded out so that arrays
of the structure are still aligned- effectively, that given a pointer p,
the access:
((struct foo2 *) p)->i
to i is still aligned. So again, the size of the structure is still 8
bytes, and the char is still taking up a full four bytes.
Padding also shows up on local variables and function arguments in C.
Consider the function:
void bar (char c) {
char t;
...
How much memory does the argument c and the local variable t take up?
Again- the compiler needs to keep the stack aligned, so variables and
arguments get padded- both take up a full word.
If you have multiple variables of the same type, the shorter types do save
some memory. For example, this structure also only takes up two words of
memory:
struct foo3 {
int i;
char c;
char d;
};
But this requires you sort your variables, and happens less often than
people think. This is why Ocaml isn't the memory hog a naive analysis
might make you think it is.
In nine years of professional C programming and 15 years of hobbyist
programming, I have come to the conclusion that the main use of the
various C int types- which, by the way, not only includes char, short,
int, and long in both signed and unsigned varieties, but also size_t,
ssize_t, off_t, ptrdiff_t, pid_t, etc.- is to introduce bugs by allowing
you to pick the wrong int type.
So the question becomes- why do you need the other integer types?
--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
next prev parent reply other threads:[~2004-09-29 14:53 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-29 6:43 Jonathan Bryant
2004-09-29 15:02 ` Brian Hurt [this message]
2004-09-30 6:16 ` Jonathan Bryant
2004-09-30 20:54 ` Brian Hurt
2004-10-01 9:36 ` Richard Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.44.0409290938040.5809-100000@localhost.localdomain \
--to=bhurt@spnz.org \
--cc=caml-list@inria.fr \
--cc=jtbryant@valdosta.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox