I'm resurrecting this thread.
Given the recent discussion about optimization, I've done some more thinking and simplification work on my proposal for a better ocaml header. Much of this is just adopting Goswin's proposal, which made a lot of sense. I tried to also reduce the variations while allowing for a lot of useful unboxing of floats, int64s etc. I've also tried to tackle tuples in this version -- basically, in order to be compatible with polymorphic functions, you really want tuples and all other polymorphic types (as in 'a embedded inside records) to be of uniform size. Expanding that size under some circumstances to 64 bits on 32 bit platforms makes sense if, for example, the tuple contains many floats. I don't think handling both what I call narrow and wide polymorphic types (for lack of a better name) will make the code much more complex -- I certainly don't want it to result in heavy register spilling.
Also, the dynamic parts of the header now appear BEFORE the header itself. Their highest bit is 0 to indicate that they're not the main header. This will be seen by GC code, which can afford to do the few extra comparisons. Mutator code will always have the main header available right before the data.
So here it is:
+ For 32-bit architectures:
---------------------------
Wide Types
----------
- For polymorphic types (containing 'a), member sizes need to be uniform for speed. Tuples are fully polymorphic. These polymorphic members can either be narrow (32 bit) as they are now or wide (64 bit) to accomodate float/int64 unboxing.
+-----------+--------------+------+-----+-------+------+
| 1 | wosize| fbits |ebits |noptr| color | tag |
+-----------+--------------+------+-----+-------+------+
bits 31 30 21 20 15 14 13 12 11 10 0
- noptr: no pointers present.
- fbits: wide types cannot be represented unless extbits is used
- 00: tagged (int/pointer)
- 01: int32/native int, C pointer
- 10: float
- 11: int64
- if wosize = 0, the size word is used for wosize
size word
---------
- only present if wosize is 0 in the header. Precedes the main header
+---------------------------------------------+
| 0 | wosize |
+---------------------------------------------+
bits 31 30 0
- bit 31 is used to identify this as a header extension
ext word
--------
- Only present if ebits is 1. Describes the first 15 members of the object (+ 3 members from fbits)
+----------+----------------------------------+
| 0 | wide | extbits |
+----------+----------------------------------+
bits 31 30 29 0
- bit 31 is used to identify this as a header extension
- wide: determines if 'a types in the first 18 words are wide (64 bit) or narrow (32 bits)
- extbits: same as fbits
Tuples
------
- Tuples with any fbits on in the header word are automatically wide tuples (64 bit) for the first 3 words
- Tuples with ebits are automatically wide tuples for the first 18 words. wide is ignored.
Strings
-------
- In strings, the last fbit, ebits and noptr function as the string size modifier (currently present at the end of the string). This improves cache locality on large strings. Wosize expands to include the fbits.
Arrays
------
- Arrays of integers, int64, floats, C pointers etc can all be handled with the regular array type.
- Only 2 lower fbits are needed for the type. The other fbits are joined with wosize.
+ 64-bit architectures:
--------------------------
+-----------+------------------+------+-----+-------+------+
| 1 | wosize| fbits |ebits |noptr| color | tag |
+-----------+------------------+------+-----+-------+------+
bits 63 62 43 42 15 14 13 12 11 10 0
- noptr: a structure with no pointers.
- fbits: 2 bits per object member
- 00: tagged (int/pointer)
- 01: int32
- 10: float
- 11: int64/native int/C pointer
- ebits: use ext word for signaling bits. fbits in header become bottom bits of wosize.
ext_word
--------
- Only present if ebits is 1. Describes the first 31 members of the object.
+---------------------------------------------+
| 0 | - | extbits |
+---------------------------------------------+
bits 63 62 61 0
- bit 31 is used to identify this as a header extension
- extbits: same as fbits
Strings
-------
- In strings, the last fbit, ebits and noptr function as the string size modifier (currently present at the end of the string). This improves cache locality on large strings. Wosize expands
Arrays
------
- Only the 2 lowest fbits are needed to discriminate the type. All other fbits are joined to wosize.
+ Tags:
-------
- 0: Array tag
- 1: Record tag
- 2: Tuple tag
- 3: Infix tag
- 4: Closure tag
- 5: Lazy tag
- 6: Object tag
- 7: Forward tag
- 8: Abstract tag
- 9: String tag (pfbit type used for size completion)
- 10: Primitive value tag (double, int64, int32)
- 11: Custom tag
- 100: First user tag
Comments?
Yotam