[Caml-list] How to write an efficient interpreter

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

* [Caml-list] How to write an efficient interpreter
@ 2011-10-24  9:10 Diego Olivier Fernandez Pons
  2011-10-24  9:58 ` Gabriel Scherer
  2011-10-24 11:28 ` Xavier Leroy
  0 siblings, 2 replies; 12+ messages in thread
From: Diego Olivier Fernandez Pons @ 2011-10-24  9:10 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1810 bytes --]

    Caml-list,

I have to write an interpreter for a datatype rich purely applicative
language. I have already written a naive interpreter (like in programming
languages class) and was wondering what where the options for writing
something that would perform better while keeping it maintainable by a
single person < 5% dedicated and preferably only in core-ML (no C code or
fancy ML extensions).

The language could be described as a rich datastructure typed SQL with a
programming language syntax
- first class sets, arrays, dictionaries, lists and their corresponding
comprehensions
- tuples and records merged into a single concept (accessible per position
like in (x, y) = ... or per label like in for t in tupleSet if t.label == 3
then)
- only applicative functions (no lambda operator, no partial application)
- simple types are int, double and string
- only user declared types are tuples-records

It is mainly used for data transformation : take a list of countries,
extract from an database the international airports of those countries,
geolocalize them using city/location table, generate a distance table using
a great-circle distance, assign to each size of plane the legs they can do
based on their maximum fight range, etc.

The language has a JavaScript inline capability

    execute JavaScript {
        //write your javascript code here
    }

that's typically used to define functions, unroll comprehensions to make
them more efficient and to call external libraries (JavaScript has full
visibility on all the language objects and can read/write directly inside,
probably the existing interpreter was written in JavaScript), so I am
considering allowing those features in the core language and only supporting
a very slow JavaScript deprecated compatibility mode.

         Diego Olivier

[-- Attachment #2: Type: text/html, Size: 2082 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] How to write an efficient interpreter
  2011-10-24  9:10 [Caml-list] How to write an efficient interpreter Diego Olivier Fernandez Pons
@ 2011-10-24  9:58 ` Gabriel Scherer
  2011-10-24 10:57   ` Gerd Stolpmann
  2011-10-24 11:28 ` Xavier Leroy
  1 sibling, 1 reply; 12+ messages in thread
From: Gabriel Scherer @ 2011-10-24  9:58 UTC (permalink / raw)
  To: Diego Olivier Fernandez Pons; +Cc: caml-list

Even staying at the "interpreter" stage, you can probably improve your
performance seriously by being careful about your implementation : use
efficient data structure (what is your implementation of variable
lookup?), avoid unnecessary allocation, use tail-recursive functions
where possible, etc. Try to find a time-consuming script example that
exercise the type of computations you'll usually perform, and profile
your interpreter with gprof (compiled natively), try to optimize the
parts that show high in the profile, rinse and repeat.

The natural next step is to use a bytecode instead of interpreting an
AST directly. You can probably find something relatively simple and
yet have a net efficiency gain over direct interpretation, because the
bytecode structure is more linear and can be interpreted more
efficiently; in a way, you compiled away the tree traversal. At this
level, you become quite sensible to micro-optimisation : bytecode
interpreters written in C have access to efficiency tricks (direct
threaded code and the like, or even pinning registers directly) that
are difficult or impossible to emulate in OCaml, even if you can get
something reasonable with mutually tail-recursive functions. As a
single point of measure, I spent a few hours hacking an OCaml bytecode
interpreter for a very small subset of Caml recently, and got within
5x-10x slower than the OCaml bytecode interpreter itself for favorable
cases. It's not very good, or not very bad, depending on what you
expect (but definitely slower than the ocaml native compiler, or even
the various JIT projects).

Of course, if possible, you could also target an existing bytecode or
intermediate representation, or even compile to an existing language
with a rich enough runtime. If you're familiar, for example, with
either JVM or LLVM, those are the popular choices these days. I don't
know how that would mix with inline Javascript, though (I suppose both
environments have a javascript interpreter). You could also compile to
OCaml code or Javascript directly, and reuse their implementation, GC,
etc. With the current Javascript engines, you can get very good
performances if you compile in a way that resemble the ugly Javascript
code they're trying to optimize. For example, js_of_ocaml compiles
OCaml bytecode into Javascript and the result performs better than
using OCaml bytecode interpreter (written in C and well designed for
efficiency) directly. That's already quite a lot more work, however,
than a good interpreter or a simple home-made bytecode.

On Mon, Oct 24, 2011 at 11:10 AM, Diego Olivier Fernandez Pons
<dofp.ocaml@gmail.com> wrote:
>     Caml-list,
> I have to write an interpreter for a datatype rich purely applicative
> language. I have already written a naive interpreter (like in programming
> languages class) and was wondering what where the options for writing
> something that would perform better while keeping it maintainable by a
> single person < 5% dedicated and preferably only in core-ML (no C code or
> fancy ML extensions).
> The language could be described as a rich datastructure typed SQL with a
> programming language syntax
> - first class sets, arrays, dictionaries, lists and their corresponding
> comprehensions
> - tuples and records merged into a single concept (accessible per position
> like in (x, y) = ... or per label like in for t in tupleSet if t.label == 3
> then)
> - only applicative functions (no lambda operator, no partial application)
> - simple types are int, double and string
> - only user declared types are tuples-records
> It is mainly used for data transformation : take a list of countries,
> extract from an database the international airports of those countries,
> geolocalize them using city/location table, generate a distance table using
> a great-circle distance, assign to each size of plane the legs they can do
> based on their maximum fight range, etc.
> The language has a JavaScript inline capability
>     execute JavaScript {
>         //write your javascript code here
>     }
> that's typically used to define functions, unroll comprehensions to make
> them more efficient and to call external libraries (JavaScript has full
> visibility on all the language objects and can read/write directly inside,
> probably the existing interpreter was written in JavaScript), so I am
> considering allowing those features in the core language and only supporting
> a very slow JavaScript deprecated compatibility mode.
>
>          Diego Olivier

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] How to write an efficient interpreter
  2011-10-24  9:58 ` Gabriel Scherer
@ 2011-10-24 10:57   ` Gerd Stolpmann
  0 siblings, 0 replies; 12+ messages in thread
From: Gerd Stolpmann @ 2011-10-24 10:57 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: Diego Olivier Fernandez Pons, caml-list

Am Montag, den 24.10.2011, 11:58 +0200 schrieb Gabriel Scherer:
> Even staying at the "interpreter" stage, you can probably improve your
> performance seriously by being careful about your implementation : use
> efficient data structure (what is your implementation of variable
> lookup?), avoid unnecessary allocation, use tail-recursive functions
> where possible, etc. Try to find a time-consuming script example that
> exercise the type of computations you'll usually perform, and profile
> your interpreter with gprof (compiled natively), try to optimize the
> parts that show high in the profile, rinse and repeat.
> 
> The natural next step is to use a bytecode instead of interpreting an
> AST directly. You can probably find something relatively simple and
> yet have a net efficiency gain over direct interpretation, because the
> bytecode structure is more linear and can be interpreted more
> efficiently; in a way, you compiled away the tree traversal. 

Well, I wouldn't be too optimistic that this also works for an
interpreter written in Ocaml. Bytecode profits a lot from cache
locality, i.e. all bytecode instructions in a single cache line are
loaded at once from RAM. Cache lines are short, though, usually only 64
bytes. If the bytecode is an int array, this translates to 8 or 16 array
elements (64/32 bit platforms). If you stick to the int array, you
probably get the performance benefit, but you have to live with the
integer representation, which will make other parts of the interpreter
difficult (e.g. representation of values). If you use an instruction
array, where "instruction" is a variant type with and without arguments,
the performance benefit is probably completely lost, because you deviate
too much from the linear memory representation.

Also, anything in the direction of good bytecode is really complicated.
What about this alternative: It is much simpler to "JIT"-compile to
ocaml (i.e. dump your AST in a different syntax), compile the ocaml code
by invoking ocamlc or ocamlopt, and to load the compiled code
dynamically (I'm just guessing that you have a dynamic environment).
There are only a few drawbacks of this approach, namely that you need
ocamlc or ocamlopt at runtime (plus "helper files", i.e. the cmi's of
the standard library), and that you cannot delete code from RAM once it
is loaded. The latter is usually not too problematic if there is a
possibility to restart the system now and then.

Gerd

> At this
> level, you become quite sensible to micro-optimisation : bytecode
> interpreters written in C have access to efficiency tricks (direct
> threaded code and the like, or even pinning registers directly) that
> are difficult or impossible to emulate in OCaml, even if you can get
> something reasonable with mutually tail-recursive functions. As a
> single point of measure, I spent a few hours hacking an OCaml bytecode
> interpreter for a very small subset of Caml recently, and got within
> 5x-10x slower than the OCaml bytecode interpreter itself for favorable
> cases. It's not very good, or not very bad, depending on what you
> expect (but definitely slower than the ocaml native compiler, or even
> the various JIT projects).
> 
> Of course, if possible, you could also target an existing bytecode or
> intermediate representation, or even compile to an existing language
> with a rich enough runtime. If you're familiar, for example, with
> either JVM or LLVM, those are the popular choices these days. I don't
> know how that would mix with inline Javascript, though (I suppose both
> environments have a javascript interpreter). You could also compile to
> OCaml code or Javascript directly, and reuse their implementation, GC,
> etc. With the current Javascript engines, you can get very good
> performances if you compile in a way that resemble the ugly Javascript
> code they're trying to optimize. For example, js_of_ocaml compiles
> OCaml bytecode into Javascript and the result performs better than
> using OCaml bytecode interpreter (written in C and well designed for
> efficiency) directly. That's already quite a lot more work, however,
> than a good interpreter or a simple home-made bytecode.
> 
> On Mon, Oct 24, 2011 at 11:10 AM, Diego Olivier Fernandez Pons
> <dofp.ocaml@gmail.com> wrote:
> >     Caml-list,
> > I have to write an interpreter for a datatype rich purely applicative
> > language. I have already written a naive interpreter (like in programming
> > languages class) and was wondering what where the options for writing
> > something that would perform better while keeping it maintainable by a
> > single person < 5% dedicated and preferably only in core-ML (no C code or
> > fancy ML extensions).
> > The language could be described as a rich datastructure typed SQL with a
> > programming language syntax
> > - first class sets, arrays, dictionaries, lists and their corresponding
> > comprehensions
> > - tuples and records merged into a single concept (accessible per position
> > like in (x, y) = ... or per label like in for t in tupleSet if t.label == 3
> > then)
> > - only applicative functions (no lambda operator, no partial application)
> > - simple types are int, double and string
> > - only user declared types are tuples-records
> > It is mainly used for data transformation : take a list of countries,
> > extract from an database the international airports of those countries,
> > geolocalize them using city/location table, generate a distance table using
> > a great-circle distance, assign to each size of plane the legs they can do
> > based on their maximum fight range, etc.
> > The language has a JavaScript inline capability
> >     execute JavaScript {
> >         //write your javascript code here
> >     }
> > that's typically used to define functions, unroll comprehensions to make
> > them more efficient and to call external libraries (JavaScript has full
> > visibility on all the language objects and can read/write directly inside,
> > probably the existing interpreter was written in JavaScript), so I am
> > considering allowing those features in the core language and only supporting
> > a very slow JavaScript deprecated compatibility mode.
> >
> >          Diego Olivier
> 
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] How to write an efficient interpreter
  2011-10-24  9:10 [Caml-list] How to write an efficient interpreter Diego Olivier Fernandez Pons
  2011-10-24  9:58 ` Gabriel Scherer
@ 2011-10-24 11:28 ` Xavier Leroy
  2011-10-24 11:50   ` Diego Olivier Fernandez Pons
  1 sibling, 1 reply; 12+ messages in thread
From: Xavier Leroy @ 2011-10-24 11:28 UTC (permalink / raw)
  To: caml-list

On 10/24/2011 11:10 AM, Diego Olivier Fernandez Pons wrote:

> I have to write an interpreter for a datatype rich purely applicative
> language. I have already written a naive interpreter (like in programming
> languages class) and was wondering what where the options for writing
> something that would perform better while keeping it maintainable by a single
> person < 5% dedicated and preferably only in core-ML (no C code or fancy ML
> extensions).

Gabriel and Gerd gave very good advice already, but here are my two
cents:

- Beware of premature optimizations.  It could very well be that your
  naive interpreter is already fast enough for your applications.
  Test and profile to find hot spots.

- Pay special attention to your data structures, e.g. environments should
  be maps, not A-lists.  A good trick is to hash-cons identifiers
  into unique integers: not only comparisons are much faster,
  but you can use Patricia trees for your environments.

- Compiling to bytecode is probably overkill.

Hope this helps.

- Xavier Leroy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] How to write an efficient interpreter
  2011-10-24 11:28 ` Xavier Leroy
@ 2011-10-24 11:50   ` Diego Olivier Fernandez Pons
  2011-10-24 12:33     ` Jérémie Dimino
  2011-10-24 12:40     ` Gerd Stolpmann
  0 siblings, 2 replies; 12+ messages in thread
From: Diego Olivier Fernandez Pons @ 2011-10-24 11:50 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 752 bytes --]

     Caml-list,

Xavier Leroy wrote
> Compiling to bytecode is probably overkill.

I think that writing my own bytecode interpreter is looking for trouble.
Same for compiling to an existing bytecode.

The language being a kind of SQL, most of the work is to properly execute
the comprehensions (= queries).

For instance

     range numbers = 0 .. 100;
     {int}  sqrtLessThan [k in numbers] = { x | x in numbers : x * x <= k };

There are smarter ways to implement this than a double loop

I was rather thinking of translating on-the-fly into Caml code and letting
Caml do the job. Is that technically possible (rewriting a toplevel ? a
CamlP4 grammar ?). If so guess I would have to license the Caml compiler
from the INRIA.

        Diego Olivier

[-- Attachment #2: Type: text/html, Size: 995 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] How to write an efficient interpreter
  2011-10-24 11:50   ` Diego Olivier Fernandez Pons
@ 2011-10-24 12:33     ` Jérémie Dimino
  2011-10-24 12:40     ` Gerd Stolpmann
  1 sibling, 0 replies; 12+ messages in thread
From: Jérémie Dimino @ 2011-10-24 12:33 UTC (permalink / raw)
  To: Diego Olivier Fernandez Pons; +Cc: caml-list

Hi,

On Mon, Oct 24, 2011 at 01:50:25PM +0200, Diego Olivier Fernandez Pons wrote:
> I was rather thinking of translating on-the-fly into Caml code and letting
> Caml do the job. Is that technically possible (rewriting a toplevel ? a
> CamlP4 grammar ?). If so guess I would have to license the Caml compiler
> from the INRIA.

You can link with the toplevellib and use the Toploop module which
provides functions for executing an OCaml AST. To create the AST you can
for example write a parser for your language with Camlp4 and translate
it using the Camlp4_import module. Note that you still need to have
standard library cmi's at runtime. Also you can only compile your
interpreter in bytecode.

Cheers,

-- 
Jérémie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] How to write an efficient interpreter
  2011-10-24 11:50   ` Diego Olivier Fernandez Pons
  2011-10-24 12:33     ` Jérémie Dimino
@ 2011-10-24 12:40     ` Gerd Stolpmann
  2011-10-24 12:46       ` oliver
  1 sibling, 1 reply; 12+ messages in thread
From: Gerd Stolpmann @ 2011-10-24 12:40 UTC (permalink / raw)
  To: Diego Olivier Fernandez Pons; +Cc: Xavier Leroy, caml-list

Am Montag, den 24.10.2011, 13:50 +0200 schrieb Diego Olivier Fernandez
Pons:
>      Caml-list,
> 
> 
> Xavier Leroy wrote
> > Compiling to bytecode is probably overkill.
> 
> 
> 
> I think that writing my own bytecode interpreter is looking for
> trouble. Same for compiling to an existing bytecode.
> 
> 
> The language being a kind of SQL, most of the work is to properly
> execute the comprehensions (= queries).
> 
> 
> For instance
> 
> 
>      range numbers = 0 .. 100;
>      {int}  sqrtLessThan [k in numbers] = { x | x in numbers : x * x
> <= k };
> 
> 
> There are smarter ways to implement this than a double loop
> 
> 
> I was rather thinking of translating on-the-fly into Caml code and
> letting Caml do the job. Is that technically possible (rewriting a
> toplevel ? a CamlP4 grammar ?). If so guess I would have to license
> the Caml compiler from the INRIA.

I don't think you need that, because you can load compiled OCaml code
dynamically (look into the Dynlink library). This works with both
bytecode and native code (in recent OCaml versions). This way you just
invoke the compiler via the normal ocamlc or ocamlopt executable, and
you don't need to change the compiler (for which you would need the
license).

If you prefer to include the OCaml compiler directly into your
"interpreter", you probably get only a very small speedup when "loading"
code (the overhead of spawning a new process is small). This is IMO only
worth it if you load code very frequently.

Gerd

> 
> 
> 
> 
>         Diego Olivier

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] How to write an efficient interpreter
  2011-10-24 12:40     ` Gerd Stolpmann
@ 2011-10-24 12:46       ` oliver
  2011-10-24 12:58         ` Gerd Stolpmann
  0 siblings, 1 reply; 12+ messages in thread
From: oliver @ 2011-10-24 12:46 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Diego Olivier Fernandez Pons, Xavier Leroy, caml-list

On Mon, Oct 24, 2011 at 02:40:13PM +0200, Gerd Stolpmann wrote:
> Am Montag, den 24.10.2011, 13:50 +0200 schrieb Diego Olivier Fernandez
> Pons:
> >      Caml-list,
> > 
> > 
> > Xavier Leroy wrote
> > > Compiling to bytecode is probably overkill.
> > 
> > 
> > 
> > I think that writing my own bytecode interpreter is looking for
> > trouble. Same for compiling to an existing bytecode.
> > 
> > 
> > The language being a kind of SQL, most of the work is to properly
> > execute the comprehensions (= queries).
> > 
> > 
> > For instance
> > 
> > 
> >      range numbers = 0 .. 100;
> >      {int}  sqrtLessThan [k in numbers] = { x | x in numbers : x * x
> > <= k };
> > 
> > 
> > There are smarter ways to implement this than a double loop
> > 
> > 
> > I was rather thinking of translating on-the-fly into Caml code and
> > letting Caml do the job. Is that technically possible (rewriting a
> > toplevel ? a CamlP4 grammar ?). If so guess I would have to license
> > the Caml compiler from the INRIA.
> 
> I don't think you need that, because you can load compiled OCaml code
> dynamically (look into the Dynlink library).
[...]

Maybe you both talk about different things...

What you seem to talk about is not interpreter but compiler stuff,
and later bind it together?!

I would assume, with "translating on-the-fly into Caml code"
is something meant that could be done via partial application.
Parsing the language that you implement, and create partial
applicated functions of OCaml-code that do the work.

This can be done straight forward and nearly "ad hoc".
It's an easy way to go. Yo can create such partial applicated functions
while prsing or from an AST that you construct in parsing stage.

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] How to write an efficient interpreter
  2011-10-24 12:46       ` oliver
@ 2011-10-24 12:58         ` Gerd Stolpmann
  2011-10-24 21:01           ` oliver
  0 siblings, 1 reply; 12+ messages in thread
From: Gerd Stolpmann @ 2011-10-24 12:58 UTC (permalink / raw)
  To: oliver; +Cc: Diego Olivier Fernandez Pons, Xavier Leroy, caml-list

Am Montag, den 24.10.2011, 14:46 +0200 schrieb oliver:
> On Mon, Oct 24, 2011 at 02:40:13PM +0200, Gerd Stolpmann wrote:
> > Am Montag, den 24.10.2011, 13:50 +0200 schrieb Diego Olivier Fernandez
> > Pons:
> > >      Caml-list,
> > > 
> > > 
> > > Xavier Leroy wrote
> > > > Compiling to bytecode is probably overkill.
> > > 
> > > 
> > > 
> > > I think that writing my own bytecode interpreter is looking for
> > > trouble. Same for compiling to an existing bytecode.
> > > 
> > > 
> > > The language being a kind of SQL, most of the work is to properly
> > > execute the comprehensions (= queries).
> > > 
> > > 
> > > For instance
> > > 
> > > 
> > >      range numbers = 0 .. 100;
> > >      {int}  sqrtLessThan [k in numbers] = { x | x in numbers : x * x
> > > <= k };
> > > 
> > > 
> > > There are smarter ways to implement this than a double loop
> > > 
> > > 
> > > I was rather thinking of translating on-the-fly into Caml code and
> > > letting Caml do the job. Is that technically possible (rewriting a
> > > toplevel ? a CamlP4 grammar ?). If so guess I would have to license
> > > the Caml compiler from the INRIA.
> > 
> > I don't think you need that, because you can load compiled OCaml code
> > dynamically (look into the Dynlink library).
> [...]
> 
> Maybe you both talk about different things...
> 
> What you seem to talk about is not interpreter but compiler stuff,
> and later bind it together?!

Exactly. But what's the big difference? You want to run code of a
domain-specific language.

> I would assume, with "translating on-the-fly into Caml code"
> is something meant that could be done via partial application.

The interpretative overhead does not go away with this technique. E.g.
looking up variables in your interpretation environment. I'd consider
this as a micro-optimization that doesn't bring much improvement.

Gerd

> Parsing the language that you implement, and create partial
> applicated functions of OCaml-code that do the work.
> 
> This can be done straight forward and nearly "ad hoc".
> It's an easy way to go. Yo can create such partial applicated functions
> while prsing or from an AST that you construct in parsing stage.
> 
> Ciao,
>    Oliver
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] How to write an efficient interpreter
  2011-10-24 12:58         ` Gerd Stolpmann
@ 2011-10-24 21:01           ` oliver
  2011-10-26  9:27             ` Diego Olivier Fernandez Pons
  0 siblings, 1 reply; 12+ messages in thread
From: oliver @ 2011-10-24 21:01 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Diego Olivier Fernandez Pons, Xavier Leroy, caml-list

On Mon, Oct 24, 2011 at 02:58:30PM +0200, Gerd Stolpmann wrote:
> Am Montag, den 24.10.2011, 14:46 +0200 schrieb oliver:
> > On Mon, Oct 24, 2011 at 02:40:13PM +0200, Gerd Stolpmann wrote:
> > > Am Montag, den 24.10.2011, 13:50 +0200 schrieb Diego Olivier Fernandez
> > > Pons:
> > > >      Caml-list,
> > > > 
> > > > 
> > > > Xavier Leroy wrote
> > > > > Compiling to bytecode is probably overkill.
> > > > 
> > > > 
> > > > 
> > > > I think that writing my own bytecode interpreter is looking for
> > > > trouble. Same for compiling to an existing bytecode.
> > > > 
> > > > 
> > > > The language being a kind of SQL, most of the work is to properly
> > > > execute the comprehensions (= queries).
> > > > 
> > > > 
> > > > For instance
> > > > 
> > > > 
> > > >      range numbers = 0 .. 100;
> > > >      {int}  sqrtLessThan [k in numbers] = { x | x in numbers : x * x
> > > > <= k };
> > > > 
> > > > 
> > > > There are smarter ways to implement this than a double loop
> > > > 
> > > > 
> > > > I was rather thinking of translating on-the-fly into Caml code and
> > > > letting Caml do the job. Is that technically possible (rewriting a
> > > > toplevel ? a CamlP4 grammar ?). If so guess I would have to license
> > > > the Caml compiler from the INRIA.
> > > 
> > > I don't think you need that, because you can load compiled OCaml code
> > > dynamically (look into the Dynlink library).
> > [...]
> > 
> > Maybe you both talk about different things...
> > 
> > What you seem to talk about is not interpreter but compiler stuff,
> > and later bind it together?!
> 
> Exactly. But what's the big difference? You want to run code of a
> domain-specific language.
[...]

The binding process would not be necessary.
(Or do we talk about different things here?)


> 
> > I would assume, with "translating on-the-fly into Caml code"
> > is something meant that could be done via partial application.
> 
> The interpretative overhead does not go away with this technique. E.g.
> looking up variables in your interpretation environment. I'd consider
> this as a micro-optimization that doesn't bring much improvement.
[...]

But it's an easy-going implementation.
You can just use OCaml directly.
The for-loop of the DSL for example is then directly
translated to a for loop in Ocaml.

For optimisation it maybe is better to work on an AST
and optimize it, before translating it to something that can be called.

I had rather the easy-going in mind, than the performance problem,
referring to the "on-the-fly" conversion.

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] How to write an efficient interpreter
  2011-10-24 21:01           ` oliver
@ 2011-10-26  9:27             ` Diego Olivier Fernandez Pons
  2011-11-07  6:45               ` Jon Harrop
  0 siblings, 1 reply; 12+ messages in thread
From: Diego Olivier Fernandez Pons @ 2011-10-26  9:27 UTC (permalink / raw)
  To: caml-list; +Cc: Gerd Stolpmann, Xavier Leroy, oliver

[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]

    List,

Thanks for your suggestions, here is a small summary of them with some
comments.

Options for interpreting a DSL
-------------------------------------------

1 - Classical optimised interpreter
2 - Combinator / partial application based interpreter
3 - Emitting bytecode for any VM
4 - Emitting Caml source and compile / run on the fly
5 - Retargeting Caml compiler (CamlP4 syntax extension + custom toplevel)

Comments
----------------

1. Is the most natural and first thing to try (working on optimising my
naive interpreter right now)

5. Is the second most obvious idea : in the same way C is a portable
assembler, core-ML is a portable functional language. If we could easily
write a Lex/Yacc parser and generate Caml AST or some kind of quote and
connect it to Caml, we could interpret many DST with little work. Moreover,
generating Caml source is useful for debugging or profiling and can be
compiled offline.

One comment is that Lex/Yacc is a tool that any engineer can manage while
CamlP4 is much more complex which increases maintainability risks.

4. Is the poor man's variant of 5.

2. Globally the idea is that evaluating Plus (Plus (Var "x", 3), 4) is
slower than calling sum [| get env "x"; 3 ; 4 |] with a pre-compiled 'sum'
function.

There is already an interpreter for that same language that is targets C++
combinators : operator overloading and class hierarchy build an "AST" (2 + 3
* x -> expr<int>) which instantiates a set of predefined operators (sum,
forall, etc). The experience was however not good IMHO : despite a (naive)
gc there are still permanent memory leaks and speed issues, part of the
semantics is lost which requires some guessing, there are weird limitations
like polymorphism that only works with 2 levels of nesting, finally the code
basis grows uncontrollably. Maybe C++ is to blame there. In any case that's
a second level of optimisation after 1.

3. Unless there is a 3rd part tool to emit the code, I won't get into this.
It would allow native interaction with the customer chosen platform (Java,
.NET, etc) but that's way too much trouble and risk as long as it is not a
standard technique that any engineer can handle.

        Diego Olivier

[-- Attachment #2: Type: text/html, Size: 2668 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [Caml-list] How to write an efficient interpreter
  2011-10-26  9:27             ` Diego Olivier Fernandez Pons
@ 2011-11-07  6:45               ` Jon Harrop
  0 siblings, 0 replies; 12+ messages in thread
From: Jon Harrop @ 2011-11-07  6:45 UTC (permalink / raw)
  To: 'Diego Olivier Fernandez Pons', caml-list

If the language you are interpreting is quite declarative then piggy-backing
on OCaml's run-time either by writing an interpreter or by compiling to
OCaml code will be a big advantage. Writing a VM with a run-time as
efficient as OCaml's in this context is a *lot* of work compared to writing
an interpreter.

Unless there is a compelling reason not to, I'd stick with an interpreter
and just optimize it carefully. In particular, pay careful attention to your
data representations. When writing a term-rewriter, one of the biggest
performance gains I got was putting the value of each variable in the
variable itself so it could be looked up by dereferencing a pointer rather
than going via a hash table. Symbol tables are useful for similar reasons:
map identifiers onto small integer IDs and replace hash tables that have
identifiers as keys with an array indexed by ID. Beware the write barrier
though.

Also, when profiling I'd recommend logging the "kinds" of inputs your
interpreter sees. For example, try to spot patterns in the shapes of
expressions that your interpreter deals with and add special cases to the
expression type for those shapes and custom code to interpret them.

Cheers,
Jon.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-11-07  6:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-24  9:10 [Caml-list] How to write an efficient interpreter Diego Olivier Fernandez Pons
2011-10-24  9:58 ` Gabriel Scherer
2011-10-24 10:57   ` Gerd Stolpmann
2011-10-24 11:28 ` Xavier Leroy
2011-10-24 11:50   ` Diego Olivier Fernandez Pons
2011-10-24 12:33     ` Jérémie Dimino
2011-10-24 12:40     ` Gerd Stolpmann
2011-10-24 12:46       ` oliver
2011-10-24 12:58         ` Gerd Stolpmann
2011-10-24 21:01           ` oliver
2011-10-26  9:27             ` Diego Olivier Fernandez Pons
2011-11-07  6:45               ` Jon Harrop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox