Re: [Caml-list] Proposal: re-design of ocaml headers

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: Goswin von Brederlow <goswin-v-b@web.de>
To: Yotam Barnoy <yotambarnoy@gmail.com>
Cc: Ocaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] Proposal: re-design of ocaml headers
Date: Tue, 8 Oct 2013 12:52:46 +0200	[thread overview]
Message-ID: <20131008105246.GA15550@frosties> (raw)
In-Reply-To: <CAN6ygOnmk_EGViZR_tmHuz+cjmevQiyeS9XeHUpCWDcGhkwFMg@mail.gmail.com>

On Mon, Sep 30, 2013 at 11:31:23AM -0400, Yotam Barnoy wrote:
> On Mon, Sep 30, 2013 at 10:48 AM, Goswin von Brederlow <goswin-v-b@web.de>wrote:
> 
> > >
> > > + For 16-bit and 32-bit architectures:
> > >      +---------------+----+----+-----+-------+------+
> > >      |     wosize    | ext|cust|noptr| color | tag  |
> > >      +---------------+----+----+-----+-------+------+
> > > bits  31           21  20   19   18   17   16 15   0
> > >
> > > - noptr: no pointers present
> > > - ext:  uses extension word
> > > - cust(om): uses custom word. Custom word is normally used to indicate
> > > floats and pointers.
> > >
> > > 32 bit extension word (present only if ext is 1)
> > >      +---------------------------------------------+
> > >      |                   wosize                    |
> > >      +---------------------------------------------+
> > > bits  31                                          0
> >
> > Why use a full bit for ext? I would define wosize == 0 to mean an
> > extension word with the actual size is present. That way sizes up to
> > <16KB can be encoded without extension word.
> >
> >
> Great point! Of course, that makes perfect sense. I was feeling like I was
> wasting the wosize bits with the extension word but couldn't quite get put
> 2 and 2 together.
> BTW, down the thread is a newer version of the design that reduces the tag
> space to 8000 tags, which I do think is sufficient.
> 
> 
> 
> >  > 32 bit custom word (default usage - present only if cust is 1):
> > >      +----+----------------------------------------+
> > >      |nofp|              pfbits                    |
> > >      +----+----------------------------------------+
> > > bits   31  30                                     0
> > >
> > > - nofp: a structure with no floats. All pfbits are used for pointers,
> > with
> > > a 1 signifying a pointer and a 0 signifying a value.
> > > - pfbits: indicates which double words are floats and pointers. Starting
> > at
> > > the highest bit:
> > >     - a 0 indicates neither a pointer nor a float
> > >     - a 10 indicates a float (double)
> > >     - a 11 indicates a pointer
> > >     - If noptr is set, each bit indicates a float. If nofp is set, each
> > bit
> > > indicates a pointer.
> >
> > There are 3 kinds of values:
> >
> > 1) pointers with bit 0 == 0
> > 2) non-pointers with bit 0 == 1
> > 3) floats with all bits used for the type (spanning 2 fields in 32bit)
> >
> > So if pfbits indicates a float then a field (or 2) is a float and all
> > bits are used for the value. Otherwise the bit 0 of the field will
> > tell you wether it is a pointer or not. So why would you want to
> > duplicate that information in the pfbits?
> >
> 
> I was thinking of doing it for efficiency. If we're already indicating
> what's what, we might as well represent shortcuts to the pointers, which
> would cut down on the amount of reading, no? In the average case, the GC
> would need to access a lot less memory.
> 
> 
> > It might be nice to support C values like untagged ints or unaligned
> > pointers. If Custom tag is set then the pfbits become ocaml value
> > bits. The GC will only inspect fields with pfbit set. All other fields
> > are ignored. The custom_operations handle compare, hash, serialize and
> > deserialize so nothing else will access the data.
> >
> > Another thing are int32 and int64. I guess if you want to unbox those
> > then having 2 bits per field in pfbits makes sense again. But then I
> > would allocate them as:
> >
> >     - a 00 indicates a tagged value (int or pointer)
> >     - a 01 indicates a non-pointer: int, int32, native int, C pointer
> >     - a 10 indicates a float (double)
> >     - a 11 indicates an int64
> >
> > The higher bit would indicate a 64bit value, meaning spanning 2 fields
> > on 32bit. Not that those 4 values allow mixing ocaml values, C values,
> > int32, int64 and float in a block.
> >
> > I would combine the noptr and nofp bits into a single 2bit field:
> >
> >     - a 00 indicates no pointers and no double size, no pfbits
> >     - a 01 indicates no double size, pfbits indicate tagged / non-pointer
> >     - a 10 indicates no pointers but double size, pfbits indicate size
> >     - a 11 indicates both pointers and double size, 2 pfbits per field
> >
> > Note: tagged integers can be stored as 00 or 01. I think this would be
> > required for polymorphic types. An 'a could be int or pointer. In both
> > cases 00 will work.
> >
> >
> I really like this idea -- unboxing more types could be really useful. I'm
> not sure double 'size' would work, however. It should be fine for the
> marshal module, but polymorphic comparison would get messed up because
> floats have to be compared differently. So I think 10 in the bit field
> should indicate no pointers but floats, while 11 could allow both pointers
> and double size, with the 2-bits specifying if it's a float or an int64 (as
> you've outlined). Of course, one cannot have both shortcuts to pointers and
> enhanced unboxing, so let me know what you think about the performance
> increase from shortcutting the tag bit.
> 
> Yotam

Lets look at an example:

type 'a r = { a:int; b:float; c:int32; d:int64; e:'a; }

For 16-bit and 32-bit architectures:
     +--------------------+----------+-------+------+
     |     wosize         |pfbit type| color | tag  |
     +--------------------+----------+-------+------+
bits  31               20   19   18   17   16 15   0

wosize = 7
pfbit type = 11 (pointers and double size)

     +------------------------------+--+--+--+--+--+
     |                   pfbits     |00|11|01|10|01|
     +------------------------------+--+--+--+--+--+
                                      e  d  c  b  a

The GC only needs to check e since 'a might be a pointer. All other fields
are marked as non pointer.

Comparison does a plain bit comparison on a, c and d, a float
comparison on b and a tagged comparison on e. Similar for marshaling.
There is no confusion between int64 and floats.

MfG
	Goswin

next prev parent reply	other threads:[~2013-10-08 10:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-27 14:05 Yotam Barnoy
2013-09-27 15:08 ` Dmitry Grebeniuk
     [not found]   ` <CAN6ygOmuCX6HLfSns0tXQCF3LWMANqhpnSN0vGWcNg0one2QzQ@mail.gmail.com>
2013-09-27 15:25     ` [Caml-list] Fwd: " Yotam Barnoy
2013-09-27 16:20       ` Dmitry Grebeniuk
2013-09-27 18:08         ` Yotam Barnoy
2013-09-27 18:12           ` Yotam Barnoy
2013-09-27 18:15           ` Paolo Donadeo
2013-09-27 18:41             ` Yotam Barnoy
2013-09-27 15:31   ` [Caml-list] " Anthony Tavener
2013-09-27 15:37     ` Yotam Barnoy
2013-09-27 16:50     ` Dmitry Grebeniuk
2013-09-30 14:48 ` Goswin von Brederlow
2013-09-30 15:31   ` Yotam Barnoy
2013-10-08 10:52     ` Goswin von Brederlow [this message]
2013-10-11 15:48       ` Yotam Barnoy
2014-01-30 20:53         ` Yotam Barnoy
2014-02-01 15:27         ` Goswin von Brederlow
2013-10-06 10:39 ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131008105246.GA15550@frosties \
    --to=goswin-v-b@web.de \
    --cc=caml-list@inria.fr \
    --cc=yotambarnoy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox