Subj. In functional languages many sequential constructors frequently occur. Each constructor with arguments causes memory allocation. For example: let f x y z = [x+y;y+z;z+x] compiled by O'Caml into the following (real code for x86 placed at the end of the message) code: t := alloc list item ; t.cdr := NIL; t.car := z+x t' := alloc list item ; t'.cdr := t; t'.car := y+z t" := alloc list item ; t".cdr := t'; t".car := x+y return t" For x86 each allocation takes 6 commands: .L101: movl young_ptr, %eax subl $12, %eax movl %eax, young_ptr cmpl young_limit, %eax jb .L102 leal 4(%eax), %edx (and also `caml_call_gc; jmp .L101' at the function end, and frame for GC with approx 8-16B size) - total about 40B. If we allocate memory for all 3 list items by one request, then we can replace each of the two last allocations by the following: mov young_ptr, %eax lea offset(%eax), %reg 8B and nothing more. This optimization is valid only in basic blocks and olny if code between allocations can't call a garbage collection. I made it. This takes about 90 lines of added/changed code in compiler (together with the two changes described below). This optimization reduces code size of ocamlopt.opt+ocamlc.opt by 8.7%. I think this is an excellent result for 90-lines changes. Bootstrapping of ocamlopt.opt was successfull. This means that my changes are correct, I hope. This is an optimization which can be applied to all architectures. For architectures with `young_ptr' in the memory (x86, m68k) yet another improvement exists: in many cases instead of loading `young_ptr' from memory we can use address of the object created by previous constructor which is `young_ptr + offset' and is frequently located in one of the registers because it is the argument of the constructor following it. In this case we eliminate the first of the two remaining commands. This optimization reduces ocamlopt.opt+ocamlc.opt code for x86 by 1.6%. And the last: on x86 and m68k architectures `selection.ml' contains the following method: method select_store addr exp = match exp with Cconst_int n -> (Ispecific(Istore_int(n, addr)), Ctuple []) | Cconst_pointer n -> (Ispecific(Istore_int(n, addr)), Ctuple []) | Cconst_symbol s -> (Ispecific(Istore_symbol(s, addr)), Ctuple []) | _ -> super#select_store addr exp the alternative Cconst_int n -> (Ispecific(Istore_int(n, addr)), Ctuple []) processes storing of the Cconst_int immediate constants, but ignores the Cconst_natint constants. This causes generating the following bad code immediately after each memory allocation: mov $tag, %r1 mov %r1, -4(%r2) instead of a better: mov $tag, -4(%r2). I fixed this by adding the following match pattern: | Cconst_natint n when Nativeint.cmp n min_int >= 0 && Nativeint.cmp n max_int <= 0 -> (Ispecific(Istore_int(Nativeint.to_int n, addr)), Ctuple []) This change improves code size of ocamlc.opt+ocamlopt.opt by yet 0.7%. The same change needed for m68k. A better solution probably will be to add the operator Istore_natint. I estimated the number of memory allocations in ocamlopt.opt+ocamlc.opt. I found about 12,000 memory allocations approximately 7,000 of which is the subject of the described optimizations. Table of code sizes: old size: new size-1: new size-2: new size-3: total: