* Camlp4's (lack of) hygiene (was Re: Macros) [not found] ` <u0wvj0datl.fsf@hana.kurims.kyoto-u.ac.jp> @ 2000-07-07 2:03 ` John Prevost 2000-07-07 23:42 ` John Prevost 0 siblings, 1 reply; 7+ messages in thread From: John Prevost @ 2000-07-07 2:03 UTC (permalink / raw) Cc: caml-list The following message is a courtesy copy of an article that has been posted to comp.lang.functional as well. I'm also forwarding this message to the caml-list for reference. { Summary: No, it's not hygienic, which I wasn't aware of. Have to bug people to fix that. Detailed examples, and explanation of what ways camlp4 is semi-hygienic and very powerful. Summary of non-hygiene: It's hard not to capture variables in subexpressions, if you declare temporaries. This should be fixed, and it's probably pretty doable. Just make any variable reference in a quotation be somehow gensymmed or rename variables in subexpressions. } >>>>> "mb" == Matthias Blume <see@my.sig> writes: >>>>> "mm" == Markus Mottl <mottl@miss.wu-wien.ac.at> writes: mm> I haven't needed camlp4 so far, but it is a pretty powerful mm> tool: calling it a "preprocessor" is actually an mm> underestimation of its capabilities. mb> It "processes", and it does this before the compiler sees the mb> program, hence "pre". That's a "pre-processor" to me. mb> Still, is its handling of syntax-trees hygienic with respect mb> to the Caml language? mm> You can transform abstract syntax trees of the language with mm> it and pretty print the result again. mb> M4, being Turing-complete (AFAIK), can do the same. (Of mb> course, I am not saying that the result would be beautiful in mb> any sense of the word. :) camlp4's pre-processing for ocaml works by using a parser with support for extensible grammars. Here's an example from the manual: --camlp4-manual--------------------------------------------------------- 3.5 Examples 3.5.1 Arithmetic calculator This is an example of a grammar of arithmetic expressions: let gram = Grammar.create (Plexer.make ());; let test = Grammar.Entry.create gram "expression";; let expr = Grammar.Entry.create gram "expression";; EXTEND test: [ [ e = expr; EOI -> e ] ]; expr: [ "plus" LEFTA [ e1 = expr; "+"; e2 = expr -> e1 + e2 | e1 = expr; "-"; e2 = expr -> e1 - e2 ] | "mult" LEFTA [ e1 = expr; "*"; e2 = expr -> e1 * e2 | e1 = expr; "/"; e2 = expr -> e1 / e2 ] | [ e = INT -> int_of_string e | "("; e = expr; ")" -> e ] ]; END;; let calc str = try Grammar.Entry.parse test (Stream.of_string str) with Stdpp.Exc_located (loc, e) -> Printf.printf "Located at (%d, %d)\n" (fst loc) (snd loc); raise e ;; Now, an extension of the entry ``expr'' to add the modulo, could be: EXTEND expr: AFTER "mult" [ [ e1 = expr; "mod"; e2 = expr -> e1 mod e2 ] ]; END;; ------------------------------------------------------------------------ the above is somewhat lex/yacc like. It's possible to replace the lexer as well. This works with ocaml by producing results which are ASTs. There are camlp4 extensions used to write ASTs in a natural real-caml-syntax-like way, but the underlying structure is a set of strongly typed datastructures. For ocaml front-end use, a byte representation of the AST (not a pretty-printed source file) is then output for the compiler to digest. Because of the strong typing, it is at the very least not possible to write an extension that outputs grammatically incorrect code. This is at least a level of hygiene that isn't seen in most preprocessors. Again, extensions to the real parser, outputting a type-safe AST. Now, all of this is more than a little heavy. Here's an example adding repeat <expr> until <expr>, which does what you'd expect. --camlp4-Manual--------------------------------------------------------- 4.3.2 Repeat until à la Pascal The ``repeat...until'' loop of Pascal is closed to the ``while'' loop except that it is executed at least once. We can implement it like this: open Pcaml;; EXTEND expr: LEVEL "let" [[ "repeat"; e1 = expr; "until"; e2 = expr -> <:expr< do $e1$; return while not $e2$ do $e1$; done >> ]]; END;; ------------------------------------------------------------------------ This says, essentially: "extend the current parser, add an entry at the level named 'let' (same level as let expressions). Match the keyword "repeat", follwed by an expression, the keyword "until", and another expression, and use the resulting AST. The <:expr< ... >> is a "quotation", which is an extension added in camlp4 to allow selectively escaping out into other languages. The $blah$ segments inside refer to variables in the ocaml code, rather than the metalanguage code. So let's compare the "when" hygienic macro example from R5RS to the equivalent camlp4 extension: You might wonder, at this point, about introducing new variables inside the output without capturing--which, I now recall, is the really key thing about hygienic macros. Here's the example which shows (the lack of) that property: --camlp4-manual--------------------------------------------------------- 4.3.1 Infix This is an example to add the infix operator ``o'', composition of two functions. For the meaning of the quotation expr used here, see appendix A. open Pcaml;; EXTEND expr: AFTER "apply" [[ f = expr; "o"; g = expr -> <:expr< fun x -> $f$ ($g$ x) >> ]]; END;; ------------------------------------------------------------------------ You see here the $f$ and $g$ antiquoting out to get the values of f and g. We also see "x", which is not antiquoted. Will this capture a reference to x in f or g? Let's see: isil $ ocamlc -pp 'camlp4o ./infix.cmo' testi.ml /home/prevost/src/caml/test File "testi.ml", line 4, characters 23-28: This expression has type 'a -> 'b but is here used with type 'a isil $ camlp4o ./infix.cmo pr_o.cmo testi.ml /home/prevost/src/caml/test let a y = y + 1;; let x y = y + 2;; Printf.printf "%d\n" ((fun x -> a (x x)) 0);; Survey says: yes, it does interfere. Very disappointing--and I'll have to ask about it on the caml list. But, I will show a quick example of the same thing in hygienic macros and in camlp4: --R5RS------------------------------------------------------------------ (let-syntax ((when (syntax-rules () ((when test stmt1 stmt2 ...) (if test (begin stmt1 stmt2 ...)))))) ------------------------------------------------------------------------ Here's the camlp4 version: --camlp4-example-------------------------------------------------------- open Pcaml EXTEND expr: LEVEL "top" [[ "when"; e1 = expr; "do"; e2 = expr -> <:expr< if $e1$ then $e2$ else () >> ]]; END ------------------------------------------------------------------------ (It turns out the documentation is a little out of date, and there's no level named "let" any more.) this then takes the program: --camlp4-example-program------------------------------------------------ when true do print_string "test\n";; ------------------------------------------------------------------------ quite happily. Of course, in O'Caml you can have ifs with no elses, which makes it a but more pointless. Now--why all this power? Why not something simpler, like Scheme's hygienic macros? The reason is that you can do more powerful manipulations. As an example, I've seen camlp4 quotations for regular expressions: <:re<a*b*>> or things which hoist constant expressions up to the top level so they're only executed once. This kind of power is good. So: hygienic? No. But that can be fixed. Powerful? Yes. In actuality, I think you can actually avoid most collisions by using let and: open Pcaml;; EXTEND expr: AFTER "apply" [[ f = expr; "o"; g = expr -> <:expr< let f = $f$ and g = $g$ in fun x -> f (g x) >> ]]; END;; except, of course, that this isn't the first obvious thing to do, and this is not something that will work in all cases (i.e. the "my-or" hygienic macro). Not only that, but there's no "gensym" function here (at least none I know of.) Oh--actually, this will in fact work in every case, if you make the following transformation: ... <:expr< let f () = $f$ and g () = $g$ in fun x -> f () (g () x) >> silly example, but it shows how you could use thunks to always avoid capturing. Again, not immediately obvious that you need to. So in any case--I'll bring this up with the developers. I wasn't aware the problem was there. John. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Camlp4's (lack of) hygiene (was Re: Macros) 2000-07-07 2:03 ` Camlp4's (lack of) hygiene (was Re: Macros) John Prevost @ 2000-07-07 23:42 ` John Prevost 2000-07-10 9:37 ` Daniel de Rauglaudre 2000-07-10 11:42 ` Judicael Courant 0 siblings, 2 replies; 7+ messages in thread From: John Prevost @ 2000-07-07 23:42 UTC (permalink / raw) To: caml-list Somebody on clf pointed out the other bigger part of hygiene, which is allowing symbols which *are* bound in the "macro" source to be statically bound to that value when used. Unfortunately, I don't think this is a change that's at all simple for camlp4, since it requires very tight coupling with the compiler. Kind of sad, since for some of the most easy useful things you could do (providing little syntaxes for various datastructures via quotations) depend on referring to values from the ernvironment. As an example, the quotations in q_MLast need Pcaml to be opened, or they won't work. I think making changes to allow good gensymming might still be desirable--but I'm sad that this bigger issue can't really be dealt with without a major merge between camlp4 and ocaml itself. A merge which does not seem likely to happen. John. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Camlp4's (lack of) hygiene (was Re: Macros) 2000-07-07 23:42 ` John Prevost @ 2000-07-10 9:37 ` Daniel de Rauglaudre 2000-07-10 10:17 ` John Prevost 2000-07-10 11:42 ` Judicael Courant 1 sibling, 1 reply; 7+ messages in thread From: Daniel de Rauglaudre @ 2000-07-10 9:37 UTC (permalink / raw) To: John Prevost; +Cc: caml-list Hi, On Fri, Jul 07, 2000 at 07:42:02PM -0400, John Prevost wrote: > Somebody on clf pointed out the other bigger part of hygiene, which is > allowing symbols which *are* bound in the "macro" source to be > statically bound to that value when used. Unfortunately, I don't > think this is a change that's at all simple for camlp4, since it > requires very tight coupling with the compiler. Well, if I find how to do that, I think it would not be a problem to add things in Ocaml compiler to allow that, if it is not too complicated. > Kind of sad, since for some of the most easy useful things you could > do (providing little syntaxes for various datastructures via > quotations) depend on referring to values from the ernvironment. As > an example, the quotations in q_MLast need Pcaml to be opened, or they > won't work. ??? No. These quotations do not depend on Pcaml... only on MLast. -- Daniel de RAUGLAUDRE daniel.de_rauglaudre@inria.fr http://cristal.inria.fr/~ddr/ The trouble with computers is that they do what you tell them, not what you want (D. Cohen). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Camlp4's (lack of) hygiene (was Re: Macros) 2000-07-10 9:37 ` Daniel de Rauglaudre @ 2000-07-10 10:17 ` John Prevost 0 siblings, 0 replies; 7+ messages in thread From: John Prevost @ 2000-07-10 10:17 UTC (permalink / raw) To: Daniel de Rauglaudre; +Cc: caml-list >>>>> "dr" == Daniel de Rauglaudre <daniel.de_rauglaudre@inria.fr> writes: dr> Well, if I find how to do that, I think it would not be a dr> problem to add things in Ocaml compiler to allow that, if it dr> is not too complicated. Well, it is pretty complicated. That level of things essentially requires that the compiler actually knows about the grammar stuff which is in effect and can get at the module the grammar was defined in order to use the right bindings. If I were, say, writing my own system which had a module named MLast, and wanted to write a quotation for my AST, I would need to give my module a different name. This hygiene issue is equivalent to the division between static and dynamic binding. Essentially, a quotation which refers to symbols by name uses the values of those symbols at the time the quotation is called, not the time the quotation is defined. Basically, I think that having quotations (at least) as a hygienic macro facility in O'Caml would be very nice. Having syntax extensions would be cool--but, it's much harder to do. Making quotations hygienic would involve .cmo .cmi and .cmx files carrying information about quotation definitions, which would be scoped like other symbols. Something maybe like: <:q_MLast.expr< ... >> open q_MLast <:expr< ... >> This would be a Major Change to the system. So, as I said before, I doubt it will happen. Maybe in O'Caml 4. :) dr> ??? No. These quotations do not depend on Pcaml... only on dr> MLast. You're right--sorry. :) I should've actually looked at things. Here's a relevant function, turning the quotation's parse tree into MLast expressions. I'll explain the places where hygiene could be violated quickly before the code. Notice that the quotation is actually written using itself (it's from the meta directory. :), but it does use MLast in one place textually right here--in the Node case. Some and None are also referred to directly. (And I'm sure that the expanded version refers to MLast quite a bit more.) The hygiene problem is that if MLast (or None or Some) is bound in the source text of the file using this quotation, then the value used will be that of the source file--not the one in scope in the definition below. A (admittedly stupid) definition like: type 'a bigoption = | None | Some of 'a | Many of 'a list in code that uses q_MLast would show this nicely. Using module paths only helps somewhat, since you don't know that a module is actually in scope with that name. And the pervasives module isn't actually available by name--if you override it, it's overridden for good. (So there's nothing you can write below that can even avoid the little bit I wrote above.) value rec expr_of_ast = fun [ Node n al -> List.fold_left (fun e a -> <:expr< $e$ $expr_of_ast a$ >>) <:expr< MLast.$uid:n$ loc >> al | List al -> List.fold_right (fun a e -> <:expr< [$expr_of_ast a$ :: $e$] >>) al <:expr< [] >> | Tuple al -> <:expr< ($list:List.map expr_of_ast al$) >> | Option None -> <:expr< None >> | Option (Some a) -> <:expr< Some $expr_of_ast a$ >> | Str s -> <:expr< $str:s$ >> | Chr c -> <:expr< $chr:c$ >> | Bool True -> <:expr< True >> | Bool False -> <:expr< False >> | Cons a1 a2 -> <:expr< [$expr_of_ast a1$ :: $expr_of_ast a2$] >> | Record lal -> <:expr< {$list:List.map label_expr_of_ast lal$} >> | Loc -> <:expr< loc >> | Antiquot loc s -> let e = try Grammar.Entry.parse Pcaml.expr_eoi (Stream.of_string s) with [ Stdpp.Exc_located (bp, ep) exc -> raise (Stdpp.Exc_located (fst loc + bp, fst loc + ep) exc) ] in MLast.ExAnt loc e ] and label_expr_of_ast (l, a) = (<:expr< MLast.$lid:l$ >>, expr_of_ast a) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Camlp4's (lack of) hygiene (was Re: Macros) 2000-07-07 23:42 ` John Prevost 2000-07-10 9:37 ` Daniel de Rauglaudre @ 2000-07-10 11:42 ` Judicael Courant 2000-07-10 13:16 ` John Prevost 1 sibling, 1 reply; 7+ messages in thread From: Judicael Courant @ 2000-07-10 11:42 UTC (permalink / raw) To: John Prevost; +Cc: caml-list On 7 jui, John Prevost wrote: > Somebody on clf pointed out the other bigger part of hygiene, which is > allowing symbols which *are* bound in the "macro" source to be > statically bound to that value when used. Unfortunately, I don't > think this is a change that's at all simple for camlp4, since it > requires very tight coupling with the compiler. > Notice that even O'Caml itself (I mean without camlp4) already has this problem: Objective Caml version 3.00+8 (2000-06-30) # let x = [| 1 ; 2 |];; val x : int array = [|1; 2|] # module Array = struct end;; module Array : sig end # x.(1);; (* guess what happens... *) Unbound value Array.get # (* x.(1) just expands to Array.get x 1 *) Judicaël. -- Judicael.Courant@lri.fr, http://www.lri.fr/~jcourant/ (+33) (0)1 69 15 64 85 "Montre moi des morceaux de ton monde, et je te montrerai le mien" Tim, matricule #929, condamné à mort. http://rozenn.picard.free.fr/tim.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Camlp4's (lack of) hygiene (was Re: Macros) 2000-07-10 11:42 ` Judicael Courant @ 2000-07-10 13:16 ` John Prevost 2000-07-17 10:08 ` Markus Mottl 0 siblings, 1 reply; 7+ messages in thread From: John Prevost @ 2000-07-10 13:16 UTC (permalink / raw) To: Judicael Courant; +Cc: caml-list >>>>> "jc" == Judicael Courant <Judicael.Courant@lri.fr> writes: jc> Notice that even O'Caml itself (I mean without camlp4) already jc> has this problem: jc> # let x = [| 1 ; 2 |];; jc> val x : int array = [|1; 2|] jc> # module Array = struct end;; jc> module Array : sig end jc> # x.(1);; (* guess what happens... *) jc> Unbound value Array.get jc> # (* x.(1) just expands to Array.get x 1 *) Ugh... That's horrible! On the one hand, I can see how it could be useful from the point of view of using different implementations of arrays, but on the other hand, it's a mess. Especially considering the description of array syntax in the manual: ------------------------------------------------------------------------ Arrays The expression [| expr1 ; ... ; exprn |] evaluates to a n-element array, whose elements are initialized with the values of expr1 to exprn respectively. The order in which these expressions are evaluated is unspecified. The expression expr1 .( expr2 ) returns the value of element number expr2 in the array denoted by expr1. The first element has number 0; the last element has number n-1, where n is the size of the array. The exception Invalid_argument is raised if the access is out of bounds. The expression expr1 .( expr2 ) <- expr3 modifies in-place the array denoted by expr1, replacing element number expr2 by the value of expr3. The exception Invalid_argument is raised if the access is out of bounds. The value of the whole expression is (). ------------------------------------------------------------------------ These are described in terms of operations on the base array type. The fact that the operations are implemented as sugar shouldn't mean that the behavior is different from what you would expect. Normal function calls have better behavior than this--all the more reason that constructions which are part of the language definition should work in a safe manner. Also of note, of course, is that [| ... |] *does* work, no matter what bindings are in scope. If no change is made to make this safer, the language definition should be changed to note that writing "expr1 .( expr2 )" is completely identical to "Array.get expr1 expr2" for purposes of scoping. Finally, I'd like to note that the same properties occur with strings: # "foo".[1];; - : char = 'o' # module String = struct end;; module String : sig end # "foo".[1];; --------- Unbound value String.get While I've actually never been tempted to create an Array module by that name (I might be tempted a little with the current discussion on clf about fast persistent arrays), I have in fact created a String module. At the time, I was working on some wide character stuff. I suppose that, on one hand, this points out why things are good: module Wide = (struct type ochar = char type char = int type ostring = string type string = char array let o_to_char = Char.code let o_to_string = (* I'm too lazy to write this for you *) (* other stuff *) module String = struct let get = Array.get let set = Array.set end end : sig type ochar = char type char type ostring = string type string val o_to_char : ochar -> char module String : sig val get : string -> int -> char val set : string -> int -> char -> unit end end) By opening this, you're instantly using wide characters instead of 8 bit characters. (Except for the little "constants" problem.) But, like with arrays, I think you might be justifiably confused if you did this, and .[ ] stopped working "normally", and yet quotation marks still worked the same. Especially when you mostly want .[ ] for "byte array" "strings" more than strings. Allowing .() and .[] to somehow be bound would be a different matter. Then normal scoping would apply, and you'd expect it. Gah. In any case, these rough edges are making me start to hate syntax (not just Caml's) as a whole class of experience. (Though I still like Caml's more than SML's. :) John. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Camlp4's (lack of) hygiene (was Re: Macros) 2000-07-10 13:16 ` John Prevost @ 2000-07-17 10:08 ` Markus Mottl 0 siblings, 0 replies; 7+ messages in thread From: Markus Mottl @ 2000-07-17 10:08 UTC (permalink / raw) To: John Prevost; +Cc: Judicael Courant, caml-list On Mon, 10 Jul 2000, John Prevost wrote: > While I've actually never been tempted to create an Array module by > that name (I might be tempted a little with the current discussion on > clf about fast persistent arrays), I have in fact created a String > module. At the time, I was working on some wide character stuff. This "feature" comes handy when you want to replace the implementation of the Array-module while still enjoying syntactic sugar (e.g. you can use my resizable array module without having to change lots of code). Unfortunately, if you create arrays with "[| ... |]", you always end up with ones of the builtin type. So you have to apply a conversion function to them to get the values you want. Best regards, Markus Mottl -- Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2000-07-19 15:42 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <8js72h$11h$1@nnrp1.deja.com> [not found] ` <Pine.LNX.4.21.0007041051140.20796-100000@punaluu.informatik.uni-freiburg.de> [not found] ` <8juuep$420$1@news.planetinternet.be> [not found] ` <u0lmzhyllz.fsf@hana.kurims.kyoto-u.ac.jp> [not found] ` <8jv92l$qpb$1@bird.wu-wien.ac.at> [not found] ` <u0wvj0datl.fsf@hana.kurims.kyoto-u.ac.jp> 2000-07-07 2:03 ` Camlp4's (lack of) hygiene (was Re: Macros) John Prevost 2000-07-07 23:42 ` John Prevost 2000-07-10 9:37 ` Daniel de Rauglaudre 2000-07-10 10:17 ` John Prevost 2000-07-10 11:42 ` Judicael Courant 2000-07-10 13:16 ` John Prevost 2000-07-17 10:08 ` Markus Mottl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox