From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 89066BC6B for ; Mon, 26 Nov 2007 22:01:02 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAGLCSkfUnw6Fg2dsb2JhbACCOY0IAQEBCAQGEREH X-IronPort-AV: E=Sophos;i="4.23,215,1194217200"; d="scan'208";a="6202884" Received: from pih-relay06.plus.net ([212.159.14.133]) by mail3-smtp-sop.national.inria.fr with ESMTP; 26 Nov 2007 22:01:01 +0100 Received: from [80.229.56.224] (helo=beast.local) by pih-relay06.plus.net with esmtp (Exim) id 1Iwl4e-0001JS-HQ for caml-list@yquem.inria.fr; Mon, 26 Nov 2007 21:01:00 +0000 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr Subject: LLVM: A native-code compiler for MiniML in ~100LOC Date: Mon, 26 Nov 2007 20:52:18 +0000 User-Agent: KMail/1.9.5 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711262052.18186.jon@ffconsultancy.com> X-Spam: no; 0.00; native-code:01 compiler:01 low-level:01 ocaml:01 bindings:01 compiler:01 compilation:01 native-code:01 subset:01 ocaml:01 camlp:01 lexing:01 expr:01 binop:01 expr:01 I recently rediscovered the Low-Level Virtual Machine (LLVM) project that has since been adopted by Apple: http://llvm.org This is a library (with OCaml bindings!) that allows you to write a compiler that generates their RISC-like intermediate language (IL) that can then be compiled to native code. LLVM even supports JIT compilation. I went through the usual steps in trying this and was extremely impressed with the results. After only two days I was able to create an optimizing native-code compiler for a subset of CAML large enough to represent the following Fibonacci program: let rec fib n = if n <= 2 then 1 else fib(n-1) + fib(n-2) do fib 40 The compiler is written entirely in OCaml, using camlp4 for lexing and parsing, and the whole thing is only ~100 lines of code! I'll detail exactly how you can use LLVM from OCaml in a future OCaml Journal article: http://www.ffconsultancy.com/products/ocaml_journal/?ol Meanwhile here's my latest source: type expr = | Int of int | Var of string | BinOp of [ `Add | `Sub | `Leq ] * expr * expr | If of expr * expr * expr | Apply of expr * expr type defn = | LetRec of string * string * expr open Camlp4.PreCast let expr = Gram.Entry.mk "expr" let defn = Gram.Entry.mk "defn" let prog = Gram.Entry.mk "prog" EXTEND Gram expr: [ [ "if"; p = expr; "then"; t = expr; "else"; f = expr -> If(p, t, f) ] | [ e1 = expr; "<="; e2 = expr -> BinOp(`Leq, e1, e2) ] | [ e1 = expr; "+"; e2 = expr -> BinOp(`Add, e1, e2) | e1 = expr; "-"; e2 = expr -> BinOp(`Sub, e1, e2) ] | [ f = expr; x = expr -> Apply(f, x) ] | [ v = LIDENT -> Var v | n = INT -> Int(int_of_string n) | "("; e = expr; ")" -> e ] ]; defn: [ [ "let"; "rec"; f = LIDENT; x = LIDENT; "="; body = expr -> LetRec(f, x, body) ] ]; prog: [ [ defns = LIST0 defn; "do"; run = expr -> defns, run ] ]; END open Printf let program, run = try Gram.parse prog Loc.ghost (Stream.of_channel stdin) with | Loc.Exc_located(loc, e) -> printf "%s at line %d\n" (Printexc.to_string e) (Loc.start_line loc); exit 1 open Llvm let ty = i64_type let ( |> ) x f = f x type state = { fn: llvalue; blk: llbasicblock; vars: (string * llvalue) list } let bb state = builder_at_end state.blk let new_block state name = append_block name state.fn let find state v = try List.assoc v state.vars with Not_found -> eprintf "Unknown variable %s\n" v; raise Not_found let cont (v, state) dest_blk = build_br dest_blk (bb state) |> ignore; v, state let rec expr state = function | Int n -> const_int ty n, state | Var x -> find state x, state | BinOp(op, f, g) -> let f, state = expr state f in let g, state = expr state g in let build, name = match op with | `Add -> build_add, "add" | `Sub -> build_sub, "sub" | `Leq -> build_icmp Icmp_sle, "leq" in build f g name (bb state), state | If(p, t, f) -> let t_blk = new_block state "pass" in let f_blk = new_block state "fail" in let k_blk = new_block state "cont" in let cond, state = expr state p in build_cond_br cond t_blk f_blk (bb state) |> ignore; let t, state = cont (expr { state with blk = t_blk } t) k_blk in let f, state = cont (expr { state with blk = f_blk } f) k_blk in build_phi [t, t_blk; f, f_blk] "join" (bb state), state | Apply(f, arg) -> let f, state = expr state f in let arg, state = expr state arg in build_call f [|arg|] "apply" (bb state), state let defn m vars = function | LetRec(f, arg, body) -> let ty = function_type ty [| ty |] in let fn = define_function f ty m in let vars' = (arg, param fn 0) :: (f, fn) :: vars in let body, state = expr { fn = fn; blk = entry_block fn; vars = vars' } body in build_ret body (bb state) |> ignore; (f, fn) :: vars let int n = const_int ty n let main filename = let m = create_module filename in let string = pointer_type i8_type in let print = declare_function "printf" (var_arg_function_type ty [|string|]) m in let main = define_function "main" (function_type ty [| |]) m in let blk = entry_block main in let bb = builder_at_end blk in let str s = define_global "buf" (const_stringz s) m in let int_spec = build_gep (str "%d\n") [| int 0; int 0 |] "int_spec" bb in let vars = List.fold_left (defn m) [] program in let n, _ = expr { fn = main; blk = blk; vars = vars } run in build_call print [| int_spec; n |] "" bb |> ignore; build_ret (int 0) bb |> ignore; if not (Llvm_bitwriter.write_bitcode_file m filename) then exit 1; dispose_module m let () = match Sys.argv with | [|_; filename|] -> main filename | _ as a -> Printf.eprintf "Usage: %s \n" a.(0) To use it, simply download and install the latest SVN version of LLVM (which even builds and installs the OCaml bindings for you!) and then do: $ ocamlc -g -dtypes -pp camlp4oof -I +camlp4 dynlink.cma camlp4lib.cma -cc g++ llvm.cma llvm_bitwriter.cma minml.ml -o minml $ ./minml run.bc