Re: [Caml-list] Proposal: re-design of ocaml headers

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: Goswin von Brederlow <goswin-v-b@web.de>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Proposal: re-design of ocaml headers
Date: Mon, 30 Sep 2013 16:48:42 +0200	[thread overview]
Message-ID: <20130930144842.GE8693@frosties> (raw)
In-Reply-To: <CAN6ygO=cnhc039DEOVf7uZqpTCedVO0SMnG+rvFw4hm4qPc7cg@mail.gmail.com>

On Fri, Sep 27, 2013 at 10:05:56AM -0400, Yotam Barnoy wrote:
> Following up on the previous thread I started in this general topic, I
> present some more thinking I've done on the redesign of ocaml's headers.
> The purpose of this redesign is to lift the tag number restriction as well
> as size limits on 32-bit platforms. At the same time, header bits can be
> used to indicate floats, allowing cheaper usage of floats in data
> structures, and even to indicate the presence of pointers, making the
> traversal of some data structures by the GC unnecessary.
> 
> The basic idea of this redesign is that most allocations need only a small
> amount of space, but a large number of tags is a necessity. If you're
> allocating a large block of memory (>8KB) then you can spare another word
> for the size.
> 
> The pfbits (as shown below) are an efficient way of representing both
> floats and pointers in a data structure, at the cost of disallowing random
> access. From what I can gather, random access to this data is never needed,
> since the GC, Marshal module, and polymorphic comparison all process the
> whole data structure rather than referring to specific parts of it.
> 
> Open issue: making float representation efficient on the stack.
> 
> 
> + For 16-bit and 32-bit architectures:
>      +---------------+----+----+-----+-------+------+
>      |     wosize    | ext|cust|noptr| color | tag  |
>      +---------------+----+----+-----+-------+------+
> bits  31           21  20   19   18   17   16 15   0
> 
> - noptr: no pointers present
> - ext:  uses extension word
> - cust(om): uses custom word. Custom word is normally used to indicate
> floats and pointers.
> 
> 32 bit extension word (present only if ext is 1)
>      +---------------------------------------------+
>      |                   wosize                    |
>      +---------------------------------------------+
> bits  31                                          0

Why use a full bit for ext? I would define wosize == 0 to mean an
extension word with the actual size is present. That way sizes up to
<16KB can be encoded without extension word.

> 32 bit custom word (default usage - present only if cust is 1):
>      +----+----------------------------------------+
>      |nofp|              pfbits                    |
>      +----+----------------------------------------+
> bits   31  30                                     0
> 
> - nofp: a structure with no floats. All pfbits are used for pointers, with
> a 1 signifying a pointer and a 0 signifying a value.
> - pfbits: indicates which double words are floats and pointers. Starting at
> the highest bit:
>     - a 0 indicates neither a pointer nor a float
>     - a 10 indicates a float (double)
>     - a 11 indicates a pointer
>     - If noptr is set, each bit indicates a float. If nofp is set, each bit
> indicates a pointer.

There are 3 kinds of values:

1) pointers with bit 0 == 0
2) non-pointers with bit 0 == 1
3) floats with all bits used for the type (spanning 2 fields in 32bit)

So if pfbits indicates a float then a field (or 2) is a float and all
bits are used for the value. Otherwise the bit 0 of the field will
tell you wether it is a pointer or not. So why would you want to
duplicate that information in the pfbits?

Or do you want to store untagged/unboxed values in blocks?

That also means that the nofp bit is pointless. If there are no floats
then no custom word is required. And if no custom word is given then the
block must not have any floats.


I do love the noptr bit though. Blocks without pointers are quite
common and int/bool/char arrays will hugely benefit from that.

 
> + For 64-bit architectures:
> 
>      +----------------+--------+----+----+-----+-------+------+
>      |     pfbits     | wosize |cust|nofp|noptr| color | tag  |
>      +----------------+--------+----+----+-----+-------+------+
> bits  63            40 39    21  20   19   18   17   16 15   0

Isn't wosize a bit small there? That limits blocks to 4MiB total, right?
 
> - noptr: a structure with no pointers. All pfbits are used for floats, with
> a 1 signifying a float and a 0 signifying a non-float.
> - nofp: a structure with no floats. All pfbits are used for pointers, with
> a 1 signifying a pointer and a 0 signifying a value.
> - If both noptr and nofp are set, wosize is extended to include the pfbits.
> - cust(om): uses custom double word. Custom double word is normally used to
> indicate more floats and pointers, but functionality can change with
> certain tags.
>     - If the custom bit is set, wosize is expanded to include the pfbits in
> the main header.
> - pfbits: indicates which double words are floats and pointers. Starting at
> the highest bit:
>     - a 0 indicates neither a pointer nor a float
>     - a 10 indicates a float (double)
>     - a 11 indicates a pointer
>     - If noptr is set, each bit indicates a float. If nofp is set, each bit
> indicates a pointer.

As above pointer should not be in the pfbits. So if nofp is set then
wosize would be extended to include pfbits. Same if Double tag is set.
That would allow for larger arrays without floats or only floats. I
guess blocks with a mix of float and non-floats will not be large.
Even for generated code I don't expect constructors or recrods with
more than 500k labels.

> 64 bit custom header (default usage indicated - present only if cust is 1):
>      +--------------------------------------------------------+
>      |                         pfbits                         |
>      +--------------------------------------------------------+
> bits  63                                                     0
> 
> - pfbits: indicates which double words are floats and pointers. Starting at
> the highest bit:
>     - a 0 indicates neither a pointer nor a float
>     - a 10 indicates a float (double)
>     - a 11 indicates a pointer
>     - If noptr is set, each bit indicates a float. If nofp is set, each bit
> indicates a pointer.
> 
> + Tags:
> - I think it's a good idea to move custom tags to the low end of the
> spectrum, and add more there if any are needed. This way, if the tag field
> is ever expanded, it's not necessary to move the custom tags again.
> 
> - 0: Closure tag
> - 1: Infix tag (must be 1 mod 4)
> - 2: Lazy tag
> - 3: Object tag
> - 4: Forward tag
> - 5: Abstract tag
> - 6: String tag
> - 7: Double tag
> - 8: Custom tag
> - 9: Proposed tag: custom array. Half of custom header is used to indicate
> array member size, so one could have an array of tuples, saving both memory
> and indirections.
> - 100: Start of user tags
> 
> Yotam Barnoy

It might be nice to support C values like untagged ints or unaligned
pointers. If Custom tag is set then the pfbits become ocaml value
bits. The GC will only inspect fields with pfbit set. All other fields
are ignored. The custom_operations handle compare, hash, serialize and
deserialize so nothing else will access the data.

Another thing are int32 and int64. I guess if you want to unbox those
then having 2 bits per field in pfbits makes sense again. But then I
would allocate them as:

    - a 00 indicates a tagged value (int or pointer)
    - a 01 indicates a non-pointer: int, int32, native int, C pointer
    - a 10 indicates a float (double)
    - a 11 indicates an int64

The higher bit would indicate a 64bit value, meaning spanning 2 fields
on 32bit. Not that those 4 values allow mixing ocaml values, C values,
int32, int64 and float in a block.

I would combine the noptr and nofp bits into a single 2bit field:

    - a 00 indicates no pointers and no double size, no pfbits
    - a 01 indicates no double size, pfbits indicate tagged / non-pointer
    - a 10 indicates no pointers but double size, pfbits indicate size
    - a 11 indicates both pointers and double size, 2 pfbits per field

Note: tagged integers can be stored as 00 or 01. I think this would be
required for polymorphic types. An 'a could be int or pointer. In both
cases 00 will work.

MfG
	Goswin

next prev parent reply	other threads:[~2013-09-30 14:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-27 14:05 Yotam Barnoy
2013-09-27 15:08 ` Dmitry Grebeniuk
     [not found]   ` <CAN6ygOmuCX6HLfSns0tXQCF3LWMANqhpnSN0vGWcNg0one2QzQ@mail.gmail.com>
2013-09-27 15:25     ` [Caml-list] Fwd: " Yotam Barnoy
2013-09-27 16:20       ` Dmitry Grebeniuk
2013-09-27 18:08         ` Yotam Barnoy
2013-09-27 18:12           ` Yotam Barnoy
2013-09-27 18:15           ` Paolo Donadeo
2013-09-27 18:41             ` Yotam Barnoy
2013-09-27 15:31   ` [Caml-list] " Anthony Tavener
2013-09-27 15:37     ` Yotam Barnoy
2013-09-27 16:50     ` Dmitry Grebeniuk
2013-09-30 14:48 ` Goswin von Brederlow [this message]
2013-09-30 15:31   ` Yotam Barnoy
2013-10-08 10:52     ` Goswin von Brederlow
2013-10-11 15:48       ` Yotam Barnoy
2014-01-30 20:53         ` Yotam Barnoy
2014-02-01 15:27         ` Goswin von Brederlow
2013-10-06 10:39 ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130930144842.GE8693@frosties \
    --to=goswin-v-b@web.de \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox