From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA14622 for caml-red; Tue, 10 Oct 2000 21:20:18 +0200 (MET DST)
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA14099 for <caml-list@pauillac.inria.fr>; Tue, 10 Oct 2000 20:09:54 +0200 (MET DST)
Received: from opus.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8])
	by nez-perce.inria.fr (8.10.0/8.10.0) with ESMTP id e9AI9rH09770
	for <caml-list@inria.fr>; Tue, 10 Oct 2000 20:09:53 +0200 (MET DST)
Received: by opus.cs.cornell.edu with Internet Mail Service (5.5.2650.21)
	id <S5610SP7>; Tue, 10 Oct 2000 14:09:53 -0400
Message-ID: <706871B20764CD449DB0E8E3D81C4D43BFCA25@opus.cs.cornell.edu>
From: Greg Morrisett <jgm@cs.cornell.edu>
To: caml-list@inria.fr
Subject: RE: de Bruijn indices 
Date: Tue, 10 Oct 2000 14:09:44 -0400
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: weis@pauillac.inria.fr

Well, I can say from my own experience that de Bruijn 
isn't always best.  Neither NuPRL nor the TAL type-checker
uses indices, prefering named variables instead.  For
both, preserving original names is really useful when
debugging.  

When implementing TAL, I assumed de Bruijn would be faster,
since what we do is a lot of (a) comparisons between
terms, (b) lots of beta-reductions.  To make comparisons
go fast, it's important to make the common case pointer
equality.  Like Shao, we found hash consing to be very
effective for this.  

At first blush, it seems that using de Bruijn would be better 
in this situation, as any two alpha-equivalent terms will
hash-cons together.  However, it was fairly easy to arrange
most of the alpha-equivalent terms to use the same names
(TAL is the output of a compiler after all), so we could
get good hash-consing to start with -- all we had to do 
was preserve it.  Furthermore, there's the potential to get
more hash-consing with a named scheme.  For instance, for
(\y.\z.y z) and (\z.\y.y z), you can at least share the
"y z".  Not so for de Bruijn.

The way we made substitution work well and fast was, like
Simon, to keep track of the free variables of a term.  
(We computed this information lazily and then memoized it
in the term.)  Shao did something similar in the Flint
IL.  This had two effects:  one, we could minimize re-naming
bound variables, and two, we could cut off substitution
earlier.  I'm not sure how these optimizations translate
into a de Bruijn setting (I'm sure there's some way, but
I haven't thought it through.)

I also tried out a form of explicit substitutions with
the named scheme (Shao did this in the Flint IL), but
found that the overhead was not worth it.  I think in
part this was because we were forced to do deep
sub-type comparisons on terms, and hence, had to push
the substitutions down to the leaves.  I think if you
buy into de Bruijn, then you more or less have to buy
into explicit substitutions too.

On the other hand, we never got around to implementing
a de Bruijn version of the type-checker.  I would like
to do this someday and also try the higher-order abstract 
syntax tricks.

The folks at CMU are working on an interface that allows
you to choose which representation you want.  Like the
Flint folks and us, they hope to do some detailed
comparisons.  I really think the community needs a bunch
of "systems" paper on this topic that do a careful examination 
of the tradeoffs for important applications like type-checkers,
compilers, and theorem provers.  Like realistic techniques
for garbage collection, too much is locked in the brains of 
implementors.

-Greg