From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id QAA28724 for caml-red; Fri, 7 Jul 2000 16:47:30 +0200 (MET DST) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id EAA08697 for ; Fri, 7 Jul 2000 04:04:50 +0200 (MET DST) Received: from isil.localdomain (adsl-151-201-232-238.bellatlantic.net [151.201.232.238]) by nez-perce.inria.fr (8.10.0/8.10.0) with ESMTP id e6724m504967 for ; Fri, 7 Jul 2000 04:04:48 +0200 (MET DST) Received: by isil.localdomain (Postfix, from userid 1000) id 8B3F937446; Thu, 6 Jul 2000 22:04:53 -0400 (EDT) Newsgroups: comp.lang.functional Cc: caml-list@inria.fr Subject: Camlp4's (lack of) hygiene (was Re: Macros) References: <8js72h$11h$1@nnrp1.deja.com> <8juuep$420$1@news.planetinternet.be> <8jv92l$qpb$1@bird.wu-wien.ac.at> From: John Prevost Date: 06 Jul 2000 22:03:43 -0400 Message-ID: <87d7kqitmo.fsf_-_@isil.localdomain> User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 In-Reply-To: Matthias Blume's message of "05 Jul 2000 21:24:22 +0900" MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Posted-To: comp.lang.functional Sender: weis@pauillac.inria.fr The following message is a courtesy copy of an article that has been posted to comp.lang.functional as well. I'm also forwarding this message to the caml-list for reference. { Summary: No, it's not hygienic, which I wasn't aware of. Have to bug people to fix that. Detailed examples, and explanation of what ways camlp4 is semi-hygienic and very powerful. Summary of non-hygiene: It's hard not to capture variables in subexpressions, if you declare temporaries. This should be fixed, and it's probably pretty doable. Just make any variable reference in a quotation be somehow gensymmed or rename variables in subexpressions. } >>>>> "mb" =3D=3D Matthias Blume writes: >>>>> "mm" =3D=3D Markus Mottl writes: mm> I haven't needed camlp4 so far, but it is a pretty powerful mm> tool: calling it a "preprocessor" is actually an mm> underestimation of its capabilities. mb> It "processes", and it does this before the compiler sees the mb> program, hence "pre". That's a "pre-processor" to me. mb> Still, is its handling of syntax-trees hygienic with respect mb> to the Caml language? mm> You can transform abstract syntax trees of the language with mm> it and pretty print the result again. mb> M4, being Turing-complete (AFAIK), can do the same. (Of mb> course, I am not saying that the result would be beautiful in mb> any sense of the word. :) camlp4's pre-processing for ocaml works by using a parser with support for extensible grammars. Here's an example from the manual: --camlp4-manual--------------------------------------------------------- 3.5 Examples 3.5.1 Arithmetic calculator This is an example of a grammar of arithmetic expressions:=20 let gram =3D Grammar.create (Plexer.make ());; let test =3D Grammar.Entry.create gram "expression";; let expr =3D Grammar.Entry.create gram "expression";; EXTEND test: [ [ e =3D expr; EOI -> e ] ]; expr: [ "plus" LEFTA [ e1 =3D expr; "+"; e2 =3D expr -> e1 + e2 | e1 =3D expr; "-"; e2 =3D expr -> e1 - e2 ] | "mult" LEFTA [ e1 =3D expr; "*"; e2 =3D expr -> e1 * e2 | e1 =3D expr; "/"; e2 =3D expr -> e1 / e2 ] | [ e =3D INT -> int_of_string e | "("; e =3D expr; ")" -> e ] ]; END;; let calc str =3D try Grammar.Entry.parse test (Stream.of_string str) with Stdpp.Exc_located (loc, e) -> Printf.printf "Located at (%d, %d)\n" (fst loc) (snd loc); raise e ;; Now, an extension of the entry ``expr'' to add the modulo, could be: EXTEND expr: AFTER "mult" [ [ e1 =3D expr; "mod"; e2 =3D expr -> e1 mod e2 ] ]; END;; ------------------------------------------------------------------------ the above is somewhat lex/yacc like. It's possible to replace the lexer as well. This works with ocaml by producing results which are ASTs. There are camlp4 extensions used to write ASTs in a natural real-caml-syntax-like way, but the underlying structure is a set of strongly typed datastructures. For ocaml front-end use, a byte representation of the AST (not a pretty-printed source file) is then output for the compiler to digest. Because of the strong typing, it is at the very least not possible to write an extension that outputs grammatically incorrect code. This is at least a level of hygiene that isn't seen in most preprocessors. Again, extensions to the real parser, outputting a type-safe AST. Now, all of this is more than a little heavy. Here's an example adding repeat until , which does what you'd expect. --camlp4-Manual--------------------------------------------------------- 4.3.2 Repeat until =E0 la Pascal The ``repeat...until'' loop of Pascal is closed to the ``while'' loop except that it is executed at least once. We can implement it like this: open Pcaml;; EXTEND expr: LEVEL "let" [[ "repeat"; e1 =3D expr; "until"; e2 =3D expr -> <:expr< do $e1$; return while not $e2$ do $e1$; done >> ]]; END;; ------------------------------------------------------------------------ This says, essentially: "extend the current parser, add an entry at the level named 'let' (same level as let expressions). Match the keyword "repeat", follwed by an expression, the keyword "until", and another expression, and use the resulting AST. The <:expr< ... >> is a "quotation", which is an extension added in camlp4 to allow selectively escaping out into other languages. The $blah$ segments inside refer to variables in the ocaml code, rather than the metalanguage code. So let's compare the "when" hygienic macro example from R5RS to the equivalent camlp4 extension: You might wonder, at this point, about introducing new variables inside the output without capturing--which, I now recall, is the really key thing about hygienic macros. Here's the example which shows (the lack of) that property: --camlp4-manual--------------------------------------------------------- 4.3.1 Infix This is an example to add the infix operator ``o'', composition of two functions. For the meaning of the quotation expr used here, see appendix A. open Pcaml;; EXTEND expr: AFTER "apply" [[ f =3D expr; "o"; g =3D expr -> <:expr< fun x -> $f$ ($g$ x) >= > ]]; END;; ------------------------------------------------------------------------ You see here the $f$ and $g$ antiquoting out to get the values of f and g. We also see "x", which is not antiquoted. Will this capture a reference to x in f or g? Let's see: isil $ ocamlc -pp 'camlp4o ./infix.cmo' testi.ml /home/prevost/src/caml/= test File "testi.ml", line 4, characters 23-28: This expression has type 'a -> 'b but is here used with type 'a isil $ camlp4o ./infix.cmo pr_o.cmo testi.ml /home/prevost/src/caml/= test let a y =3D y + 1;; let x y =3D y + 2;; Printf.printf "%d\n" ((fun x -> a (x x)) 0);; Survey says: yes, it does interfere. Very disappointing--and I'll have to ask about it on the caml list. But, I will show a quick example of the same thing in hygienic macros and in camlp4: --R5RS------------------------------------------------------------------ (let-syntax ((when (syntax-rules () ((when test stmt1 stmt2 ...) (if test (begin stmt1 stmt2 ...)))))) ------------------------------------------------------------------------ Here's the camlp4 version: --camlp4-example-------------------------------------------------------- open Pcaml EXTEND expr: LEVEL "top" [[ "when"; e1 =3D expr; "do"; e2 =3D expr -> <:expr< if $e1$ then $e2$ else () >> ]]; END ------------------------------------------------------------------------ (It turns out the documentation is a little out of date, and there's no level named "let" any more.) this then takes the program: --camlp4-example-program------------------------------------------------ when true do print_string "test\n";; ------------------------------------------------------------------------ quite happily. Of course, in O'Caml you can have ifs with no elses, which makes it a but more pointless. Now--why all this power? Why not something simpler, like Scheme's hygienic macros? The reason is that you can do more powerful manipulations. As an example, I've seen camlp4 quotations for regular expressions: <:re> or things which hoist constant expressions up to the top level so they're only executed once. This kind of power is good. So: hygienic? No. But that can be fixed. Powerful? Yes. In actuality, I think you can actually avoid most collisions by using let and: open Pcaml;; EXTEND expr: AFTER "apply" [[ f =3D expr; "o"; g =3D expr -> <:expr< let f =3D $f$ and g = =3D $g$ in fun x -> f (g x) >> ]]; END;; except, of course, that this isn't the first obvious thing to do, and this is not something that will work in all cases (i.e. the "my-or" hygienic macro). Not only that, but there's no "gensym" function here (at least none I know of.) Oh--actually, this will in fact work in every case, if you make the following transformation: ... <:expr< let f () =3D $f$ and g () =3D $g$ in fun x -> f () (g () x) >> silly example, but it shows how you could use thunks to always avoid capturing. Again, not immediately obvious that you need to. So in any case--I'll bring this up with the developers. I wasn't aware the problem was there. John.