From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p9Q9Rxcr011747
	for <caml-list@sympa-roc.inria.fr>; Wed, 26 Oct 2011 11:27:59 +0200
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ArEBAIXRp05KfVI0kGdsb2JhbABCqEBxCCIBAQEBCQkNBxQEIYFuAQEBBBICExkBGx0BAwwGBQQHOyIBEQEFARw7nwwKi1CCYIVGPYhwAgUKiGAElAqNMz2DcQ
X-IronPort-AV: E=Sophos;i="4.69,408,1315173600"; 
   d="scan'208";a="114692439"
Received: from mail-ww0-f52.google.com ([74.125.82.52])
  by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 26 Oct 2011 11:27:53 +0200
Received: by wwf25 with SMTP id 25so2502370wwf.9
        for <multiple recipients>; Wed, 26 Oct 2011 02:27:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :cc:content-type;
        bh=PK6chzrqfUKxU+yjFY/pi5w2XvoHRNhCGCriHt2zoOo=;
        b=lXZwlh8E045HEgYPgskZA1UPh9aW4zEKcNNP6QIjYSSliESzDqab0vVhJWQm6RsJ2C
         FmgaORj9rkhd1u+ijbqvfWCpdtoxCTfNmMqx4oCjks2blm75StV1Ao4CEndzHiieDx0F
         6+4xK5vu4BKFjLq7V3+rLB41TQSZwfdZvdQOo=
MIME-Version: 1.0
Received: by 10.216.208.169 with SMTP id q41mr670755weo.64.1319621273715; Wed,
 26 Oct 2011 02:27:53 -0700 (PDT)
Received: by 10.216.73.131 with HTTP; Wed, 26 Oct 2011 02:27:53 -0700 (PDT)
In-Reply-To: <20111024210108.GE1838@siouxsie>
References: <CAHqiZ-J15s9PiVnvT+rw8KF--OooFLyP8YRk6x+e31dTEGX_SQ@mail.gmail.com>
	<4EA54BD0.8090009@inria.fr>
	<CAHqiZ-LXF8uMX2eib_=YAXHCQrs5Me94PVOkh5_8NwNx+uVZnQ@mail.gmail.com>
	<1319460013.18639.90.camel@thinkpad>
	<20111024124617.GA2287@siouxsie>
	<1319461110.18639.96.camel@thinkpad>
	<20111024210108.GE1838@siouxsie>
Date: Wed, 26 Oct 2011 11:27:53 +0200
Message-ID: <CAHqiZ-+6oS6KtuNKoZ=Pt-7qrKFUVBSfk-v=RKeG+nPB8HZ+aw@mail.gmail.com>
From: Diego Olivier Fernandez Pons <dofp.ocaml@gmail.com>
To: caml-list@inria.fr
Cc: Gerd Stolpmann <info@gerd-stolpmann.de>,
        Xavier Leroy <Xavier.Leroy@inria.fr>,
        oliver <oliver@first.in-berlin.de>
Content-Type: multipart/alternative; boundary=0016e6dee8dc4205d704b030458f
Subject: Re: [Caml-list] How to write an efficient interpreter


--0016e6dee8dc4205d704b030458f
Content-Type: text/plain; charset=ISO-8859-1

    List,

Thanks for your suggestions, here is a small summary of them with some
comments.

Options for interpreting a DSL
-------------------------------------------

1 - Classical optimised interpreter
2 - Combinator / partial application based interpreter
3 - Emitting bytecode for any VM
4 - Emitting Caml source and compile / run on the fly
5 - Retargeting Caml compiler (CamlP4 syntax extension + custom toplevel)

Comments
----------------

1. Is the most natural and first thing to try (working on optimising my
naive interpreter right now)

5. Is the second most obvious idea : in the same way C is a portable
assembler, core-ML is a portable functional language. If we could easily
write a Lex/Yacc parser and generate Caml AST or some kind of quote and
connect it to Caml, we could interpret many DST with little work. Moreover,
generating Caml source is useful for debugging or profiling and can be
compiled offline.

One comment is that Lex/Yacc is a tool that any engineer can manage while
CamlP4 is much more complex which increases maintainability risks.

4. Is the poor man's variant of 5.

2. Globally the idea is that evaluating Plus (Plus (Var "x", 3), 4) is
slower than calling sum [| get env "x"; 3 ; 4 |] with a pre-compiled 'sum'
function.

There is already an interpreter for that same language that is targets C++
combinators : operator overloading and class hierarchy build an "AST" (2 + 3
* x -> expr<int>) which instantiates a set of predefined operators (sum,
forall, etc). The experience was however not good IMHO : despite a (naive)
gc there are still permanent memory leaks and speed issues, part of the
semantics is lost which requires some guessing, there are weird limitations
like polymorphism that only works with 2 levels of nesting, finally the code
basis grows uncontrollably. Maybe C++ is to blame there. In any case that's
a second level of optimisation after 1.

3. Unless there is a 3rd part tool to emit the code, I won't get into this.
It would allow native interaction with the customer chosen platform (Java,
.NET, etc) but that's way too much trouble and risk as long as it is not a
standard technique that any engineer can handle.

        Diego Olivier

--0016e6dee8dc4205d704b030458f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div><div>=A0 =A0 List,</div><div><br></div><div>Thanks for your suggestion=
s, here is a small summary of them with some comments.</div><div><br></div>=
<div>Options for interpreting a DSL</div><div>-----------------------------=
--------------</div>

<div><br></div><div>1 - Classical optimised interpreter</div><div>2 - Combi=
nator / partial application based interpreter</div><div>3 - Emitting byteco=
de for any VM</div><div>4 - Emitting Caml source and compile / run on the f=
ly</div>

<div>5 - Retargeting Caml compiler (CamlP4 syntax extension + custom toplev=
el)</div><div><br></div><div>Comments</div><div>----------------</div><div>=
<br></div><div>1. Is the most natural and first thing to try (working on op=
timising my naive interpreter right now)</div>

<div><br></div><div>5. Is the second most obvious idea : in the same way C =
is a portable assembler, core-ML is a portable functional language. If we c=
ould easily write a Lex/Yacc parser and generate Caml AST or some kind of q=
uote and connect it to Caml, we could interpret=A0many DST with little work=
. Moreover, generating Caml source is useful for debugging or profiling and=
 can be compiled offline.</div>

<div><br></div><div>One comment is that Lex/Yacc is a tool that any enginee=
r can manage while CamlP4 is much more complex which increases maintainabil=
ity risks.</div><div><br></div><div>4. Is the poor man&#39;s variant of 5.<=
/div>

<div><br></div><div>2.=A0Globally the idea is that evaluating Plus (Plus (V=
ar &quot;x&quot;, 3), 4) is slower than calling sum [| get env &quot;x&quot=
;; 3 ; 4 |] with a pre-compiled &#39;sum&#39; function.</div><div><br></div>
<div>There is already an interpreter for that same language that is targets=
 C++ combinators : operator overloading and class hierarchy build an &quot;=
AST&quot; (2 + 3 * x -&gt; expr&lt;int&gt;) which instantiates a set of pre=
defined operators (sum, forall, etc).=A0The experience was however not good=
 IMHO : despite a (naive) gc there are still permanent memory leaks and spe=
ed issues, part of the semantics is lost which requires some guessing, ther=
e are weird limitations like polymorphism that only works with 2 levels of =
nesting, finally the code basis grows uncontrollably. Maybe C++ is to blame=
 there. In any case that&#39;s a second level of optimisation after 1.</div>

<div><br></div><div>3. Unless there is a 3rd part tool to emit the code, I =
won&#39;t get into this. It would allow native interaction with the custome=
r chosen platform (Java, .NET, etc) but that&#39;s way too much trouble and=
 risk as long as it is not a standard technique that any engineer can handl=
e.</div>
<div><br></div></div><div>=A0 =A0 =A0 =A0 Diego Olivier</div>

--0016e6dee8dc4205d704b030458f--