Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
* Camlp4's (lack of) hygiene (was Re: Macros)
       [not found]         ` <u0wvj0datl.fsf@hana.kurims.kyoto-u.ac.jp>
@ 2000-07-07  2:03           ` John Prevost
  2000-07-07 23:42             ` John Prevost
  0 siblings, 1 reply; 7+ messages in thread
From: John Prevost @ 2000-07-07  2:03 UTC (permalink / raw)
  Cc: caml-list

The following message is a courtesy copy of an article
that has been posted to comp.lang.functional as well.

I'm also forwarding this message to the caml-list for reference.

{ Summary: No, it's not hygienic, which I wasn't aware of.  Have to
  bug people to fix that.  Detailed examples, and explanation of what
  ways camlp4 is semi-hygienic and very powerful.

  Summary of non-hygiene: It's hard not to capture variables in
  subexpressions, if you declare temporaries.  This should be fixed,
  and it's probably pretty doable.  Just make any variable reference
  in a quotation be somehow gensymmed or rename variables in
  subexpressions. }

>>>>> "mb" == Matthias Blume <see@my.sig> writes:
>>>>> "mm" == Markus Mottl <mottl@miss.wu-wien.ac.at> writes:

    mm> I haven't needed camlp4 so far, but it is a pretty powerful
    mm> tool: calling it a "preprocessor" is actually an
    mm> underestimation of its capabilities.

    mb> It "processes", and it does this before the compiler sees the
    mb> program, hence "pre".  That's a "pre-processor" to me.

    mb> Still, is its handling of syntax-trees hygienic with respect
    mb> to the Caml language?

    mm> You can transform abstract syntax trees of the language with
    mm> it and pretty print the result again.

    mb> M4, being Turing-complete (AFAIK), can do the same.  (Of
    mb> course, I am not saying that the result would be beautiful in
    mb> any sense of the word. :)

camlp4's pre-processing for ocaml works by using a parser with support
for extensible grammars.  Here's an example from the manual:

--camlp4-manual---------------------------------------------------------
3.5   Examples

3.5.1   Arithmetic calculator

This is an example of a grammar of arithmetic expressions: 

    let gram = Grammar.create (Plexer.make ());;
    let test = Grammar.Entry.create gram "expression";;
    let expr = Grammar.Entry.create gram "expression";;
    EXTEND
      test: [ [ e = expr; EOI -> e ] ];
      expr: [ "plus" LEFTA
              [ e1 = expr; "+"; e2 = expr -> e1 + e2
              | e1 = expr; "-"; e2 = expr -> e1 - e2 ]
            | "mult" LEFTA
              [ e1 = expr; "*"; e2 = expr -> e1 * e2
              | e1 = expr; "/"; e2 = expr -> e1 / e2 ]
            | [ e = INT -> int_of_string e
              | "("; e = expr; ")" -> e ] ];
    END;;
    let calc str =
      try Grammar.Entry.parse test (Stream.of_string str) with
        Stdpp.Exc_located (loc, e) ->
          Printf.printf "Located at (%d, %d)\n" (fst loc) (snd loc);
          raise e
    ;;

Now, an extension of the entry ``expr'' to add the modulo, could be:

    EXTEND expr:
      AFTER "mult" [ [ e1 = expr; "mod"; e2 = expr -> e1 mod e2 ] ];
    END;;
------------------------------------------------------------------------

the above is somewhat lex/yacc like.  It's possible to replace the
lexer as well.

This works with ocaml by producing results which are ASTs.  There are
camlp4 extensions used to write ASTs in a natural
real-caml-syntax-like way, but the underlying structure is a set of
strongly typed datastructures.  For ocaml front-end use, a byte
representation of the AST (not a pretty-printed source file) is then
output for the compiler to digest.

Because of the strong typing, it is at the very least not possible to
write an extension that outputs grammatically incorrect code.

This is at least a level of hygiene that isn't seen in most
preprocessors.  Again, extensions to the real parser, outputting a
type-safe AST.

Now, all of this is more than a little heavy.  Here's an example
adding repeat <expr> until <expr>, which does what you'd expect.

--camlp4-Manual---------------------------------------------------------
4.3.2   Repeat until à la Pascal

The ``repeat...until'' loop of Pascal is closed to the ``while'' loop
except that it is executed at least once. We can implement it like
this:

       open Pcaml;;
       EXTEND
         expr: LEVEL "let"
           [[ "repeat"; e1 = expr; "until"; e2 = expr ->
                 <:expr< do $e1$; return while not $e2$ do $e1$; done >> ]];
       END;;
------------------------------------------------------------------------

This says, essentially: "extend the current parser, add an entry at
the level named 'let' (same level as let expressions).  Match the
keyword "repeat", follwed by an expression, the keyword "until", and
another expression, and use the resulting AST.

The <:expr< ... >> is a "quotation", which is an extension added in
camlp4 to allow selectively escaping out into other languages.  The
$blah$ segments inside refer to variables in the ocaml code, rather
than the metalanguage code.  So let's compare the "when" hygienic
macro example from R5RS to the equivalent camlp4 extension:

You might wonder, at this point, about introducing new variables
inside the output without capturing--which, I now recall, is the
really key thing about hygienic macros.

Here's the example which shows (the lack of) that property:

--camlp4-manual---------------------------------------------------------
4.3.1   Infix

This is an example to add the infix operator ``o'', composition of two
functions. For the meaning of the quotation expr used here, see
appendix A.

       open Pcaml;;
       EXTEND
         expr: AFTER "apply"
           [[ f = expr; "o"; g = expr -> <:expr< fun x -> $f$ ($g$ x) >> ]];
       END;;
------------------------------------------------------------------------

You see here the $f$ and $g$ antiquoting out to get the values of f
and g.  We also see "x", which is not antiquoted.  Will this capture a
reference to x in f or g?  Let's see:

isil $ ocamlc -pp 'camlp4o ./infix.cmo' testi.ml    /home/prevost/src/caml/test
File "testi.ml", line 4, characters 23-28:
This expression has type 'a -> 'b but is here used with type 'a

isil $ camlp4o ./infix.cmo pr_o.cmo testi.ml        /home/prevost/src/caml/test
let a y = y + 1;;
let x y = y + 2;;

Printf.printf "%d\n" ((fun x -> a (x x)) 0);;

Survey says: yes, it does interfere.  Very disappointing--and I'll
have to ask about it on the caml list.


But, I will show a quick example of the same thing in hygienic macros
and in camlp4:

--R5RS------------------------------------------------------------------
(let-syntax ((when (syntax-rules ()
                           ((when test stmt1 stmt2 ...)
                            (if test
                                (begin stmt1
                                       stmt2 ...))))))
------------------------------------------------------------------------

Here's the camlp4 version:

--camlp4-example--------------------------------------------------------
open Pcaml

EXTEND
  expr: LEVEL "top"
    [[ "when"; e1 = expr; "do"; e2 = expr ->
	    <:expr< if $e1$ then $e2$ else () >> ]];
END
------------------------------------------------------------------------

(It turns out the documentation is a little out of date, and there's
no level named "let" any more.)

this then takes the program:

--camlp4-example-program------------------------------------------------
when true do print_string "test\n";;
------------------------------------------------------------------------

quite happily.  Of course, in O'Caml you can have ifs with no elses,
which makes it a but more pointless.

Now--why all this power?  Why not something simpler, like Scheme's
hygienic macros?  The reason is that you can do more powerful
manipulations.  As an example, I've seen camlp4 quotations for regular
expressions:

<:re<a*b*>>

or things which hoist constant expressions up to the top level so
they're only executed once.  This kind of power is good.


So: hygienic?  No.  But that can be fixed.  Powerful?  Yes.  In
actuality, I think you can actually avoid most collisions by using let
and:

       open Pcaml;;
       EXTEND
         expr: AFTER "apply"
           [[ f = expr; "o"; g = expr -> <:expr< let f = $f$ and g = $g$ in
                                                    fun x -> f (g x) >> ]];
       END;;

except, of course, that this isn't the first obvious thing to do, and
this is not something that will work in all cases (i.e. the "my-or"
hygienic macro).  Not only that, but there's no "gensym" function
here (at least none I know of.)

Oh--actually, this will in fact work in every case, if you make the
following transformation:

... <:expr< let f () = $f$ and g () = $g$ in fun x -> f () (g () x) >>

silly example, but it shows how you could use thunks to always avoid
capturing.  Again, not immediately obvious that you need to.

So in any case--I'll bring this up with the developers.  I wasn't
aware the problem was there.

John.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Camlp4's (lack of) hygiene (was Re: Macros)
  2000-07-07  2:03           ` Camlp4's (lack of) hygiene (was Re: Macros) John Prevost
@ 2000-07-07 23:42             ` John Prevost
  2000-07-10  9:37               ` Daniel de Rauglaudre
  2000-07-10 11:42               ` Judicael Courant
  0 siblings, 2 replies; 7+ messages in thread
From: John Prevost @ 2000-07-07 23:42 UTC (permalink / raw)
  To: caml-list

Somebody on clf pointed out the other bigger part of hygiene, which is
allowing symbols which *are* bound in the "macro" source to be
statically bound to that value when used.  Unfortunately, I don't
think this is a change that's at all simple for camlp4, since it
requires very tight coupling with the compiler.

Kind of sad, since for some of the most easy useful things you could
do (providing little syntaxes for various datastructures via
quotations) depend on referring to values from the ernvironment.  As
an example, the quotations in q_MLast need Pcaml to be opened, or they
won't work.

I think making changes to allow good gensymming might still be
desirable--but I'm sad that this bigger issue can't really be dealt
with without a major merge between camlp4 and ocaml itself.  A merge
which does not seem likely to happen.

John.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Camlp4's (lack of) hygiene (was Re: Macros)
  2000-07-07 23:42             ` John Prevost
@ 2000-07-10  9:37               ` Daniel de Rauglaudre
  2000-07-10 10:17                 ` John Prevost
  2000-07-10 11:42               ` Judicael Courant
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel de Rauglaudre @ 2000-07-10  9:37 UTC (permalink / raw)
  To: John Prevost; +Cc: caml-list

Hi,

On Fri, Jul 07, 2000 at 07:42:02PM -0400, John Prevost wrote:
> Somebody on clf pointed out the other bigger part of hygiene, which is
> allowing symbols which *are* bound in the "macro" source to be
> statically bound to that value when used.  Unfortunately, I don't
> think this is a change that's at all simple for camlp4, since it
> requires very tight coupling with the compiler.

Well, if I find how to do that, I think it would not be a problem to
add things in Ocaml compiler to allow that, if it is not too complicated.

> Kind of sad, since for some of the most easy useful things you could
> do (providing little syntaxes for various datastructures via
> quotations) depend on referring to values from the ernvironment.  As
> an example, the quotations in q_MLast need Pcaml to be opened, or they
> won't work.

??? No. These quotations do not depend on Pcaml... only on MLast.

-- 
Daniel de RAUGLAUDRE
daniel.de_rauglaudre@inria.fr
http://cristal.inria.fr/~ddr/
The trouble with computers is that they do what you tell them, not what
you want (D. Cohen).



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Camlp4's (lack of) hygiene (was Re: Macros)
  2000-07-10  9:37               ` Daniel de Rauglaudre
@ 2000-07-10 10:17                 ` John Prevost
  0 siblings, 0 replies; 7+ messages in thread
From: John Prevost @ 2000-07-10 10:17 UTC (permalink / raw)
  To: Daniel de Rauglaudre; +Cc: caml-list

>>>>> "dr" == Daniel de Rauglaudre <daniel.de_rauglaudre@inria.fr> writes:

    dr> Well, if I find how to do that, I think it would not be a
    dr> problem to add things in Ocaml compiler to allow that, if it
    dr> is not too complicated.

Well, it is pretty complicated.  That level of things essentially
requires that the compiler actually knows about the grammar stuff
which is in effect and can get at the module the grammar was defined
in order to use the right bindings.

If I were, say, writing my own system which had a module named MLast,
and wanted to write a quotation for my AST, I would need to give my
module a different name.  This hygiene issue is equivalent to the
division between static and dynamic binding.  Essentially, a quotation
which refers to symbols by name uses the values of those symbols at
the time the quotation is called, not the time the quotation is
defined.

Basically, I think that having quotations (at least) as a hygienic
macro facility in O'Caml would be very nice.  Having syntax extensions
would be cool--but, it's much harder to do.  Making quotations
hygienic would involve .cmo .cmi and .cmx files carrying information
about quotation definitions, which would be scoped like other symbols.
Something maybe like:

<:q_MLast.expr< ... >>

open q_MLast
<:expr< ... >>

This would be a Major Change to the system.  So, as I said before, I
doubt it will happen.  Maybe in O'Caml 4.  :)

    dr> ??? No. These quotations do not depend on Pcaml... only on
    dr> MLast.

You're right--sorry.  :) I should've actually looked at things.
Here's a relevant function, turning the quotation's parse tree into
MLast expressions.  I'll explain the places where hygiene could be
violated quickly before the code.

Notice that the quotation is actually written using itself (it's from
the meta directory.  :), but it does use MLast in one place textually
right here--in the Node case.  Some and None are also referred to
directly.  (And I'm sure that the expanded version refers to MLast
quite a bit more.)

The hygiene problem is that if MLast (or None or Some) is bound in the
source text of the file using this quotation, then the value used will
be that of the source file--not the one in scope in the definition
below.  A (admittedly stupid) definition like:

type 'a bigoption =
  | None
  | Some of 'a
  | Many of 'a list

in code that uses q_MLast would show this nicely.  Using module paths
only helps somewhat, since you don't know that a module is actually in
scope with that name.  And the pervasives module isn't actually
available by name--if you override it, it's overridden for good.  (So
there's nothing you can write below that can even avoid the little bit
I wrote above.)

value rec expr_of_ast =
  fun
  [ Node n al ->
      List.fold_left (fun e a -> <:expr< $e$ $expr_of_ast a$ >>)
        <:expr< MLast.$uid:n$ loc >> al
  | List al ->
      List.fold_right (fun a e -> <:expr< [$expr_of_ast a$ :: $e$] >>) al
        <:expr< [] >>
  | Tuple al -> <:expr< ($list:List.map expr_of_ast al$) >>
  | Option None -> <:expr< None >>
  | Option (Some a) -> <:expr< Some $expr_of_ast a$ >>
  | Str s -> <:expr< $str:s$ >>
  | Chr c -> <:expr< $chr:c$ >>
  | Bool True -> <:expr< True >>
  | Bool False -> <:expr< False >>
  | Cons a1 a2 -> <:expr< [$expr_of_ast a1$ :: $expr_of_ast a2$] >>
  | Record lal -> <:expr< {$list:List.map label_expr_of_ast lal$} >>
  | Loc -> <:expr< loc >>
  | Antiquot loc s ->
      let e =
        try Grammar.Entry.parse Pcaml.expr_eoi (Stream.of_string s) with
        [ Stdpp.Exc_located (bp, ep) exc ->
            raise (Stdpp.Exc_located (fst loc + bp, fst loc + ep) exc) ]
      in
      MLast.ExAnt loc e ]
and label_expr_of_ast (l, a) =
  (<:expr< MLast.$lid:l$ >>, expr_of_ast a)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Camlp4's (lack of) hygiene (was Re: Macros)
  2000-07-07 23:42             ` John Prevost
  2000-07-10  9:37               ` Daniel de Rauglaudre
@ 2000-07-10 11:42               ` Judicael Courant
  2000-07-10 13:16                 ` John Prevost
  1 sibling, 1 reply; 7+ messages in thread
From: Judicael Courant @ 2000-07-10 11:42 UTC (permalink / raw)
  To: John Prevost; +Cc: caml-list

On  7 jui, John Prevost wrote:
> Somebody on clf pointed out the other bigger part of hygiene, which is
> allowing symbols which *are* bound in the "macro" source to be
> statically bound to that value when used.  Unfortunately, I don't
> think this is a change that's at all simple for camlp4, since it
> requires very tight coupling with the compiler.
>

Notice that even O'Caml itself (I mean without camlp4) already has this
problem:


        Objective Caml version 3.00+8 (2000-06-30)

# let x = [| 1 ; 2 |];;  
val x : int array = [|1; 2|]
# module Array = struct end;;
module Array : sig end
# x.(1);; (* guess what happens... *)
Unbound value Array.get
# (* x.(1) just expands to Array.get x 1 *)


Judicaël.
-- 
Judicael.Courant@lri.fr, http://www.lri.fr/~jcourant/
(+33) (0)1 69 15 64 85
"Montre moi des morceaux de ton monde, et je te montrerai le mien"
Tim, matricule #929, condamné à mort.
http://rozenn.picard.free.fr/tim.html




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Camlp4's (lack of) hygiene (was Re: Macros)
  2000-07-10 11:42               ` Judicael Courant
@ 2000-07-10 13:16                 ` John Prevost
  2000-07-17 10:08                   ` Markus Mottl
  0 siblings, 1 reply; 7+ messages in thread
From: John Prevost @ 2000-07-10 13:16 UTC (permalink / raw)
  To: Judicael Courant; +Cc: caml-list

>>>>> "jc" == Judicael Courant <Judicael.Courant@lri.fr> writes:

    jc> Notice that even O'Caml itself (I mean without camlp4) already
    jc> has this problem:

    jc> # let x = [| 1 ; 2 |];;  
    jc> val x : int array = [|1; 2|]
    jc> # module Array = struct end;;
    jc> module Array : sig end
    jc> # x.(1);; (* guess what happens... *)
    jc> Unbound value Array.get
    jc> # (* x.(1) just expands to Array.get x 1 *)

Ugh...  That's horrible!  On the one hand, I can see how it could be
useful from the point of view of using different implementations of
arrays, but on the other hand, it's a mess.

Especially considering the description of array syntax in the manual:

------------------------------------------------------------------------
Arrays

The expression [| expr1 ; ... ; exprn |] evaluates to a n-element
array, whose elements are initialized with the values of expr1 to
exprn respectively. The order in which these expressions are evaluated
is unspecified.

The expression expr1 .( expr2 ) returns the value of element number
expr2 in the array denoted by expr1. The first element has number 0;
the last element has number n-1, where n is the size of the array. The
exception Invalid_argument is raised if the access is out of bounds.

The expression expr1 .( expr2 ) <- expr3 modifies in-place the array
denoted by expr1, replacing element number expr2 by the value of
expr3. The exception Invalid_argument is raised if the access is out
of bounds. The value of the whole expression is ().
------------------------------------------------------------------------

These are described in terms of operations on the base array type.
The fact that the operations are implemented as sugar shouldn't mean
that the behavior is different from what you would expect.  Normal
function calls have better behavior than this--all the more reason
that constructions which are part of the language definition should
work in a safe manner.

Also of note, of course, is that [| ... |] *does* work, no matter what
bindings are in scope.

If no change is made to make this safer, the language definition
should be changed to note that writing "expr1 .( expr2 )" is
completely identical to "Array.get expr1 expr2" for purposes of
scoping.


Finally, I'd like to note that the same properties occur with strings:

    # "foo".[1];;
    - : char = 'o'
    # module String = struct end;;
    module String : sig end
    # "foo".[1];;
      ---------
    Unbound value String.get

While I've actually never been tempted to create an Array module by
that name (I might be tempted a little with the current discussion on
clf about fast persistent arrays), I have in fact created a String
module.  At the time, I was working on some wide character stuff.

I suppose that, on one hand, this points out why things are good:

module Wide = (struct
  type ochar = char
  type char = int
  type ostring = string
  type string = char array
  let o_to_char = Char.code
  let o_to_string = (* I'm too lazy to write this for you *)
  (* other stuff *)
  module String = struct
    let get = Array.get
    let set = Array.set
  end
end : sig
  type ochar = char
  type char
  type ostring = string
  type string
  val o_to_char : ochar -> char
  module String : sig
    val get : string -> int -> char 
    val set : string -> int -> char -> unit
  end
end)

By opening this, you're instantly using wide characters instead of 8
bit characters.  (Except for the little "constants" problem.)  But,
like with arrays, I think you might be justifiably confused if you did
this, and .[ ] stopped working "normally", and yet quotation marks
still worked the same.  Especially when you mostly want .[ ] for "byte
array" "strings" more than strings.


Allowing .() and .[] to somehow be bound would be a different matter.
Then normal scoping would apply, and you'd expect it.


Gah.  In any case, these rough edges are making me start to hate
syntax (not just Caml's) as a whole class of experience.  (Though I
still like Caml's more than SML's.  :)


John.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Camlp4's (lack of) hygiene (was Re: Macros)
  2000-07-10 13:16                 ` John Prevost
@ 2000-07-17 10:08                   ` Markus Mottl
  0 siblings, 0 replies; 7+ messages in thread
From: Markus Mottl @ 2000-07-17 10:08 UTC (permalink / raw)
  To: John Prevost; +Cc: Judicael Courant, caml-list

On Mon, 10 Jul 2000, John Prevost wrote:
> While I've actually never been tempted to create an Array module by
> that name (I might be tempted a little with the current discussion on
> clf about fast persistent arrays), I have in fact created a String
> module.  At the time, I was working on some wide character stuff.

This "feature" comes handy when you want to replace the implementation of
the Array-module while still enjoying syntactic sugar (e.g. you can use my
resizable array module without having to change lots of code).

Unfortunately, if you create arrays with "[| ... |]", you always end up with
ones of the builtin type. So you have to apply a conversion function to
them to get the values you want.

Best regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2000-07-19 15:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <8js72h$11h$1@nnrp1.deja.com>
     [not found] ` <Pine.LNX.4.21.0007041051140.20796-100000@punaluu.informatik.uni-freiburg.de>
     [not found]   ` <8juuep$420$1@news.planetinternet.be>
     [not found]     ` <u0lmzhyllz.fsf@hana.kurims.kyoto-u.ac.jp>
     [not found]       ` <8jv92l$qpb$1@bird.wu-wien.ac.at>
     [not found]         ` <u0wvj0datl.fsf@hana.kurims.kyoto-u.ac.jp>
2000-07-07  2:03           ` Camlp4's (lack of) hygiene (was Re: Macros) John Prevost
2000-07-07 23:42             ` John Prevost
2000-07-10  9:37               ` Daniel de Rauglaudre
2000-07-10 10:17                 ` John Prevost
2000-07-10 11:42               ` Judicael Courant
2000-07-10 13:16                 ` John Prevost
2000-07-17 10:08                   ` Markus Mottl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox