From: Goswin von Brederlow <goswin-v-b@web.de>
To: Yotam Barnoy <yotambarnoy@gmail.com>
Cc: Ocaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] Proposal: re-design of ocaml headers
Date: Tue, 8 Oct 2013 12:52:46 +0200 [thread overview]
Message-ID: <20131008105246.GA15550@frosties> (raw)
In-Reply-To: <CAN6ygOnmk_EGViZR_tmHuz+cjmevQiyeS9XeHUpCWDcGhkwFMg@mail.gmail.com>
On Mon, Sep 30, 2013 at 11:31:23AM -0400, Yotam Barnoy wrote:
> On Mon, Sep 30, 2013 at 10:48 AM, Goswin von Brederlow <goswin-v-b@web.de>wrote:
>
> > >
> > > + For 16-bit and 32-bit architectures:
> > > +---------------+----+----+-----+-------+------+
> > > | wosize | ext|cust|noptr| color | tag |
> > > +---------------+----+----+-----+-------+------+
> > > bits 31 21 20 19 18 17 16 15 0
> > >
> > > - noptr: no pointers present
> > > - ext: uses extension word
> > > - cust(om): uses custom word. Custom word is normally used to indicate
> > > floats and pointers.
> > >
> > > 32 bit extension word (present only if ext is 1)
> > > +---------------------------------------------+
> > > | wosize |
> > > +---------------------------------------------+
> > > bits 31 0
> >
> > Why use a full bit for ext? I would define wosize == 0 to mean an
> > extension word with the actual size is present. That way sizes up to
> > <16KB can be encoded without extension word.
> >
> >
> Great point! Of course, that makes perfect sense. I was feeling like I was
> wasting the wosize bits with the extension word but couldn't quite get put
> 2 and 2 together.
> BTW, down the thread is a newer version of the design that reduces the tag
> space to 8000 tags, which I do think is sufficient.
>
>
>
> > > 32 bit custom word (default usage - present only if cust is 1):
> > > +----+----------------------------------------+
> > > |nofp| pfbits |
> > > +----+----------------------------------------+
> > > bits 31 30 0
> > >
> > > - nofp: a structure with no floats. All pfbits are used for pointers,
> > with
> > > a 1 signifying a pointer and a 0 signifying a value.
> > > - pfbits: indicates which double words are floats and pointers. Starting
> > at
> > > the highest bit:
> > > - a 0 indicates neither a pointer nor a float
> > > - a 10 indicates a float (double)
> > > - a 11 indicates a pointer
> > > - If noptr is set, each bit indicates a float. If nofp is set, each
> > bit
> > > indicates a pointer.
> >
> > There are 3 kinds of values:
> >
> > 1) pointers with bit 0 == 0
> > 2) non-pointers with bit 0 == 1
> > 3) floats with all bits used for the type (spanning 2 fields in 32bit)
> >
> > So if pfbits indicates a float then a field (or 2) is a float and all
> > bits are used for the value. Otherwise the bit 0 of the field will
> > tell you wether it is a pointer or not. So why would you want to
> > duplicate that information in the pfbits?
> >
>
> I was thinking of doing it for efficiency. If we're already indicating
> what's what, we might as well represent shortcuts to the pointers, which
> would cut down on the amount of reading, no? In the average case, the GC
> would need to access a lot less memory.
>
>
> > It might be nice to support C values like untagged ints or unaligned
> > pointers. If Custom tag is set then the pfbits become ocaml value
> > bits. The GC will only inspect fields with pfbit set. All other fields
> > are ignored. The custom_operations handle compare, hash, serialize and
> > deserialize so nothing else will access the data.
> >
> > Another thing are int32 and int64. I guess if you want to unbox those
> > then having 2 bits per field in pfbits makes sense again. But then I
> > would allocate them as:
> >
> > - a 00 indicates a tagged value (int or pointer)
> > - a 01 indicates a non-pointer: int, int32, native int, C pointer
> > - a 10 indicates a float (double)
> > - a 11 indicates an int64
> >
> > The higher bit would indicate a 64bit value, meaning spanning 2 fields
> > on 32bit. Not that those 4 values allow mixing ocaml values, C values,
> > int32, int64 and float in a block.
> >
> > I would combine the noptr and nofp bits into a single 2bit field:
> >
> > - a 00 indicates no pointers and no double size, no pfbits
> > - a 01 indicates no double size, pfbits indicate tagged / non-pointer
> > - a 10 indicates no pointers but double size, pfbits indicate size
> > - a 11 indicates both pointers and double size, 2 pfbits per field
> >
> > Note: tagged integers can be stored as 00 or 01. I think this would be
> > required for polymorphic types. An 'a could be int or pointer. In both
> > cases 00 will work.
> >
> >
> I really like this idea -- unboxing more types could be really useful. I'm
> not sure double 'size' would work, however. It should be fine for the
> marshal module, but polymorphic comparison would get messed up because
> floats have to be compared differently. So I think 10 in the bit field
> should indicate no pointers but floats, while 11 could allow both pointers
> and double size, with the 2-bits specifying if it's a float or an int64 (as
> you've outlined). Of course, one cannot have both shortcuts to pointers and
> enhanced unboxing, so let me know what you think about the performance
> increase from shortcutting the tag bit.
>
> Yotam
Lets look at an example:
type 'a r = { a:int; b:float; c:int32; d:int64; e:'a; }
For 16-bit and 32-bit architectures:
+--------------------+----------+-------+------+
| wosize |pfbit type| color | tag |
+--------------------+----------+-------+------+
bits 31 20 19 18 17 16 15 0
wosize = 7
pfbit type = 11 (pointers and double size)
+------------------------------+--+--+--+--+--+
| pfbits |00|11|01|10|01|
+------------------------------+--+--+--+--+--+
e d c b a
The GC only needs to check e since 'a might be a pointer. All other fields
are marked as non pointer.
Comparison does a plain bit comparison on a, c and d, a float
comparison on b and a tagged comparison on e. Similar for marshaling.
There is no confusion between int64 and floats.
MfG
Goswin
next prev parent reply other threads:[~2013-10-08 10:52 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-27 14:05 Yotam Barnoy
2013-09-27 15:08 ` Dmitry Grebeniuk
[not found] ` <CAN6ygOmuCX6HLfSns0tXQCF3LWMANqhpnSN0vGWcNg0one2QzQ@mail.gmail.com>
2013-09-27 15:25 ` [Caml-list] Fwd: " Yotam Barnoy
2013-09-27 16:20 ` Dmitry Grebeniuk
2013-09-27 18:08 ` Yotam Barnoy
2013-09-27 18:12 ` Yotam Barnoy
2013-09-27 18:15 ` Paolo Donadeo
2013-09-27 18:41 ` Yotam Barnoy
2013-09-27 15:31 ` [Caml-list] " Anthony Tavener
2013-09-27 15:37 ` Yotam Barnoy
2013-09-27 16:50 ` Dmitry Grebeniuk
2013-09-30 14:48 ` Goswin von Brederlow
2013-09-30 15:31 ` Yotam Barnoy
2013-10-08 10:52 ` Goswin von Brederlow [this message]
2013-10-11 15:48 ` Yotam Barnoy
2014-01-30 20:53 ` Yotam Barnoy
2014-02-01 15:27 ` Goswin von Brederlow
2013-10-06 10:39 ` Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131008105246.GA15550@frosties \
--to=goswin-v-b@web.de \
--cc=caml-list@inria.fr \
--cc=yotambarnoy@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox