That's a good point.
Another relatively easy optimization would be to use a bit from the header on 64-bit platforms (32-bit platforms have no available bits) to indicate another form of extension, whereby an extra word is used as a bitmap to indicate which words are floats. Haskell uses a similar trick to indicate which words are pointers on the stack. This would remove the indirection of floats in the majority of cases, except of course in the stack itself. This shouldn't have an impact on marshaling.
BTW bits in the 64-bit header should probably have been marked as reserved rather than making the wo_size field impossibly large.
Yotam