From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA14622 for caml-red; Tue, 10 Oct 2000 21:20:18 +0200 (MET DST) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA14099 for ; Tue, 10 Oct 2000 20:09:54 +0200 (MET DST) Received: from opus.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by nez-perce.inria.fr (8.10.0/8.10.0) with ESMTP id e9AI9rH09770 for ; Tue, 10 Oct 2000 20:09:53 +0200 (MET DST) Received: by opus.cs.cornell.edu with Internet Mail Service (5.5.2650.21) id ; Tue, 10 Oct 2000 14:09:53 -0400 Message-ID: <706871B20764CD449DB0E8E3D81C4D43BFCA25@opus.cs.cornell.edu> From: Greg Morrisett To: caml-list@inria.fr Subject: RE: de Bruijn indices Date: Tue, 10 Oct 2000 14:09:44 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Sender: weis@pauillac.inria.fr Well, I can say from my own experience that de Bruijn isn't always best. Neither NuPRL nor the TAL type-checker uses indices, prefering named variables instead. For both, preserving original names is really useful when debugging. When implementing TAL, I assumed de Bruijn would be faster, since what we do is a lot of (a) comparisons between terms, (b) lots of beta-reductions. To make comparisons go fast, it's important to make the common case pointer equality. Like Shao, we found hash consing to be very effective for this. At first blush, it seems that using de Bruijn would be better in this situation, as any two alpha-equivalent terms will hash-cons together. However, it was fairly easy to arrange most of the alpha-equivalent terms to use the same names (TAL is the output of a compiler after all), so we could get good hash-consing to start with -- all we had to do was preserve it. Furthermore, there's the potential to get more hash-consing with a named scheme. For instance, for (\y.\z.y z) and (\z.\y.y z), you can at least share the "y z". Not so for de Bruijn. The way we made substitution work well and fast was, like Simon, to keep track of the free variables of a term. (We computed this information lazily and then memoized it in the term.) Shao did something similar in the Flint IL. This had two effects: one, we could minimize re-naming bound variables, and two, we could cut off substitution earlier. I'm not sure how these optimizations translate into a de Bruijn setting (I'm sure there's some way, but I haven't thought it through.) I also tried out a form of explicit substitutions with the named scheme (Shao did this in the Flint IL), but found that the overhead was not worth it. I think in part this was because we were forced to do deep sub-type comparisons on terms, and hence, had to push the substitutions down to the leaves. I think if you buy into de Bruijn, then you more or less have to buy into explicit substitutions too. On the other hand, we never got around to implementing a de Bruijn version of the type-checker. I would like to do this someday and also try the higher-order abstract syntax tricks. The folks at CMU are working on an interface that allows you to choose which representation you want. Like the Flint folks and us, they hope to do some detailed comparisons. I really think the community needs a bunch of "systems" paper on this topic that do a careful examination of the tradeoffs for important applications like type-checkers, compilers, and theorem provers. Like realistic techniques for garbage collection, too much is locked in the brains of implementors. -Greg