Following up on the previous thread I started in this general topic, I present some more thinking I've done on the redesign of ocaml's headers. The purpose of this redesign is to lift the tag number restriction as well as size limits on 32-bit platforms. At the same time, header bits can be used to indicate floats, allowing cheaper usage of floats in data structures, and even to indicate the presence of pointers, making the traversal of some data structures by the GC unnecessary.

The basic idea of this redesign is that most allocations need only a small amount of space, but a large number of tags is a necessity. If you're allocating a large block of memory (>8KB) then you can spare another word for the size.

The pfbits (as shown below) are an efficient way of representing both floats and pointers in a data structure, at the cost of disallowing random access. From what I can gather, random access to this data is never needed, since the GC, Marshal module, and polymorphic comparison all process the whole data structure rather than referring to specific parts of it.

Open issue: making float representation efficient on the stack.


+ For 16-bit and 32-bit architectures:
     +---------------+----+----+-----+-------+------+
     |     wosize    | ext|cust|noptr| color | tag  |
     +---------------+----+----+-----+-------+------+
bits  31           21  20   19   18   17   16 15   0

- noptr: no pointers present
- ext:  uses extension word
- cust(om): uses custom word. Custom word is normally used to indicate floats and pointers.

32 bit extension word (present only if ext is 1)
     +---------------------------------------------+
     |                   wosize                    |
     +---------------------------------------------+
bits  31                                          0

32 bit custom word (default usage - present only if cust is 1):
     +----+----------------------------------------+
     |nofp|              pfbits                    |
     +----+----------------------------------------+
bits   31  30                                     0

- nofp: a structure with no floats. All pfbits are used for pointers, with a 1 signifying a pointer and a 0 signifying a value.
- pfbits: indicates which double words are floats and pointers. Starting at the highest bit:
    - a 0 indicates neither a pointer nor a float
    - a 10 indicates a float (double)
    - a 11 indicates a pointer
    - If noptr is set, each bit indicates a float. If nofp is set, each bit indicates a pointer.

+ For 64-bit architectures:

     +----------------+--------+----+----+-----+-------+------+
     |     pfbits     | wosize |cust|nofp|noptr| color | tag  |
     +----------------+--------+----+----+-----+-------+------+
bits  63            40 39    21  20   19   18   17   16 15   0

- noptr: a structure with no pointers. All pfbits are used for floats, with a 1 signifying a float and a 0 signifying a non-float.
- nofp: a structure with no floats. All pfbits are used for pointers, with a 1 signifying a pointer and a 0 signifying a value.
- If both noptr and nofp are set, wosize is extended to include the pfbits.
- cust(om): uses custom double word. Custom double word is normally used to indicate more floats and pointers, but functionality can change with certain tags.
    - If the custom bit is set, wosize is expanded to include the pfbits in the main header.
- pfbits: indicates which double words are floats and pointers. Starting at the highest bit:
    - a 0 indicates neither a pointer nor a float
    - a 10 indicates a float (double)
    - a 11 indicates a pointer
    - If noptr is set, each bit indicates a float. If nofp is set, each bit indicates a pointer.

64 bit custom header (default usage indicated - present only if cust is 1):
     +--------------------------------------------------------+
     |                         pfbits                         |
     +--------------------------------------------------------+
bits  63                                                     0

- pfbits: indicates which double words are floats and pointers. Starting at the highest bit:
    - a 0 indicates neither a pointer nor a float
    - a 10 indicates a float (double)
    - a 11 indicates a pointer
    - If noptr is set, each bit indicates a float. If nofp is set, each bit indicates a pointer.

+ Tags:
- I think it's a good idea to move custom tags to the low end of the spectrum, and add more there if any are needed. This way, if the tag field is ever expanded, it's not necessary to move the custom tags again.

- 0: Closure tag
- 1: Infix tag (must be 1 mod 4)
- 2: Lazy tag
- 3: Object tag
- 4: Forward tag
- 5: Abstract tag
- 6: String tag
- 7: Double tag
- 8: Custom tag
- 9: Proposed tag: custom array. Half of custom header is used to indicate array member size, so one could have an array of tuples, saving both memory and indirections.
- 100: Start of user tags

Yotam Barnoy