And byterun/caml/instruct.cpp. This is exactly what I would like to avoid. :-)

Let's wait to hear from people who actually worked on it, then. :-)

 It does not look like this significantly reduces performance compared to OP's if the number of computations is not too large (3-4 operations).

This comment does beg the question, if you are interested in performance, wouldn't you be better off using the native-code compiler?  I am guessing there must be some other factors in play: I am just curious.

--
Best,
Женя