Re: [Caml-list] Compiler feature - useful or not?

From: "Yaron Minsky" <yminsky@cs.cornell.edu>
To: "Pierre Weis" <pierre.weis@inria.fr>
Cc: "Alain Frisch" <alain@frisch.fr>, caml-list <caml-list@yquem.inria.fr>
Subject: Re: [Caml-list] Compiler feature - useful or not?
Date: Thu, 15 Nov 2007 19:30:18 -0500	[thread overview]
Message-ID: <891bd3390711151630x238b8eddod5b7462d0fa1c735@mail.gmail.com> (raw)
In-Reply-To: <20071115132649.GB15754@yquem.inria.fr>

[-- Attachment #1: Type: text/plain, Size: 17552 bytes --]

Most of what you said about quotient types made sense to me, but I must
admit to being deeply confused about the nature of the change to the
language.  Consider the following trivial module and two possible
interfaces.

module Int = struct
   type t = int
   let of_int x = x
   let to_int x = x
end

module type Abs_int = sig
   type t
  val of_int : int -> t
  val to_int : t -> int
end

module type Priv_int = sig
   type t = private int
   val of_int : int -> t
   val to_int : t -> int
end

So, what is the difference between (Int : Abs_int) and (Int : Priv_int)?
For instance, can I write (Priv_int.of_int 3) + (Priv_int.of_int 5)?  Can
you point to anything concrete I can do with the private version that I
can't do with the abstract version?  Is there any expression that is legal
for one but not for the other?

y

On Nov 15, 2007 8:26 AM, Pierre Weis <pierre.weis@inria.fr> wrote:

> > >- a value of type row is in fact a concrete integer (it is not hidden
> in
> > >any way),
> >
> > But you cannot apply integer operations to it, so it is not a normal
> > concrete integer, right?
>
> Right: a value of type row has type row ... not type int!
>
> > Can you show an example where replacing all "type t = private ..."
> > definitions by "type t" changes a well-typed program into an ill-typed
> > one?   I understand that if private types create real subtypes (w.r.t.
> > :>) then they are different than abstract types, but otherwise, I don't
> > see the difference for the type-checker.
>
> May be, I must recall what are private types in the first place: private
> types has been designed to implement quotient data structure.
>
> (***  Warning.
>
> Wao: after re-reading this message I realize that it is really long.
>
> You can skip it, if you already know something about mathematical quotient
> structures, private types for record and variant, relational types and the
> mocac compiler!
>
> ***)
>
> What is a quotient ?
> --------------------
>
>  Here quotient has to be understood in the mathematical sense of the term:
> given a set S and an equivalence relation R, you consider the set S/R of
> the
> equivalence classes modulo the relation R. S/R is named the quotient
> structure of S by R. Quotient structures are fundamental in mathematics
> and
> there properties have been largely studied, in particular to find the
> relationship between operations defined on S and operations on S/R: which
> operations on S can be extended to operations on S/R ? Which properties of
> operations on S are still valid for there extension on S/R ? and so on.
>
> Simple examples of quotient structures can be found everywhere in maths,
> for
> instance consider the equivalence relation R on pairs of integer values
> defined as
>
>  z_1 equiv z_2 if and only if z_1 and z_2 are both odd or are both even
> (in Caml terms
>  z_1 equiv z_2 if and only if (z_1 mod 2) = (z_2 mod 2))
>
> The set Z/R is named Z/2Z and it inherits properties of operations on Z:
> it
> is an abelian group (and more).
>
> A wonderful example of such inheritance of interesting properties by
> inheritance of operations thanks to a definition by a quotient structure
> is
> the hierarchy of sets of numbers: starting with N (the set of natural
> numbers) we define Z (the set of relative integer numbers) as a quotient
> of
> NxN), then Q (the rational numbers) as a quotient of ZxZ*, R (the set of
> real
> numbers) as a quotient of Q^N (the sets of sequences of rational numbers),
> C
> (the set of complex numbers) as a quotient of R[X] (the set of polynomials
> with one unknown and real coefficients). Note here that at each step of
> the
> hierarchy the quotient structure is richer (has more properties and/or
> more
> elements) than the original structure: first we have a monoide, then a
> group
> and a ring, then a field, then a complete field and so on.
>
> Why quotient structures ?
> -------------------------
>
> So quotient structures are a fundamental tool to express mathematical
> structures and properties. Exactly as mathematical functions, relations
> and
> sets. As you may have noticed, programming languages are extremely related
> to
> maths (due to their purely logical bases) even if, in some cases, the
> zealots
> of the language at hand try to hide the mathematical fundation into a
> strange
> anthropomorphic discurse to describe their favorite language features.
>
> In the end, the computer programing languages try hard to incorporate
> powerful features from mathematics, because these features have been
> polished
> by mathematicians for centuries. As a consequence of this work, we know
> facts, properties and theorems about the mathematical features and this is
> an
> extremely valuable benefit.
>
> Now, what is the next frontier ? What can we steal more to mathematics for
> the
> benefit of our favorite programming language ?
>
> If we review the most powerful tools of mathematicians, we found that
> functions have been borrowed (hello functional programming,
> lambda-calculus
> and friends :)), relations have been borrowed (data bases, hash tables,
> association lists), sets have been more or less borrowed (hello setle,
> pascal, and set definition facilities from various libraries of various
> languages...). More or less, we have all those basic features.
>
> >From the mathematical set construction tools, we have got in Caml:
>
> - the cartesian product of sets (the * binary type constructor, the record
>  type definitions),
> - the disjunct union of sets (the | of polymorphic variants, the sum (or
>  variant) type definitions).
>
> Unfortunately, we have no powerful way to define a quotient data
> structure. That what private type definitions have been designed to do.
>
> What do we need for a quotient data structure ?
> -----------------------------------------------
>
> In the first place, we need the ``true'' thing, the real feature that is
> indeed used in maths. Roughly speaking this means to assimilate the
> quotient
> set S/R to a subset of S.
>
>  In the previous definition of quotient structures, there is a careful
> distinction between the base set S and the quotient set S/R. In fact,
> there
> always exists a canonical injection from S to S/R, and if we choose a
> canonical representant in each equivalence class of S/R, we get another
> canonical injection from S/R to S, so that S/R can be considered as a
> subset
> of S (the story is a bit more complex but that's the idea). This
> injection/projection trickery is intensively used in maths; for instance,
> in
> the hierarchy of number sets, we say and consider that N is a subset of Z
> that itself is a subset of Q, a subset of R, a subset of C. Rigourously,
> we
> should say for instance, there is a subset of Z that is canonically
> isomorphic to N; in fact, we abusively assimilate N to this subset of Z;
> hence, we say that N is ``included'' in Z, when we should say ``the image
> of
> N by the canonical isomorphism from N to Z'' is included in Z; then, we go
> one step further in the assimilation of N to a subset of Z, by denoting
> the
> same, the elements of N and there image in Z; for instance, ``the neutral
> element of the multiplication in Z'' and the successor of 0 in N is
> denoted
> ``1''; and we now can say that 1 belongs to Z. Note here that, in the
> first
> place, ``the neutral element of the multiplication in Z'' is an
> equivalence
> class (as all elements in Z are). So it is a set. More exactly, the
> ``neutral
> element of the multiplication in Z'' is the infinite set of pairs of
> natural
> numbers (n, m) such that n - m = 1 (here ``-'' is an operation in N and
> ``1''
> is the successor of the natural number ``0'', so that n - m = 1 is a
> shorthand for n = succ m). The assimilation between N and its isomorphic
> image allows to use 1 as the denotation of this infinite set of pairs of
> natural numbers.
>
> We understand why the mathematicians always write after having designed a
> quotient structure: ``thanks to this isomorphism, and if no confusion may
> arise, we always assimilate S to its canonical injection in S/R''.
>
> This assimilation is what private type definitions allow.
>
> How do we define a quotient data structure ?
> --------------------------------------------
>
> The mathematical problem:
> - we have a set S and an equivalence relation R on SxS,
> - we construct the quotient S/R.
> - we state afterwards:
>  ``if no confusion may arise, we always assimilate S to its canonical
>    injection in S/R''.
>
> The programming problem:
> - we have a data structure S (defined by a type s) and a relation R on SxS
>  (defined by a function r from s * s to bool).
> - we construct a data structure that represents S/R.
> - we have afterwards:
>  ``No confusion may arise, we always assimilate S to its canonical
> injection
>    in S/R''.
>
> The private data type solution:
> - the data structure S is defined as any Caml data structure and the
>  relation R is implemented by the canonical injection(s) from S to S/R.
> - the canonical projection from S/R to S is automatic.
> - S (defined by s) is assimilated to S/R (which is then also s!).
>
> We defined S/R as a subset of S, the set of canonical representants of
> equivalence classes of S/R.
>
> More exactly, the canonical injection from S to S/R maps each element of S
> to
> its equivalent class in S/R; if we assimilate each equivalence class to
> its
> canonical representant (an element of S), the canonical injection maps
> each
> element of S to the canonical representant of its equivalence class. Hence
> the canonical injection has type S -> S.
>
> An example treated without private types:
> -----------------------------------------
>
> Let's take a simple example:
>
> S is the following data structure that implements a mathematical (free)
> structure
> generated by the constant 0, the unary symbol succ, and the binary symbol
> +.
>
> type peano =
>   | Zero
>   | Succ of peano
>   | Plus of peano * peano
>
> R is the (equivalence) relation that expresses that
> - 0 is the neutral element for + (for all x in S, 0 + x = x and x + 0 =
> x),
> - + is associative (for all x, y, z in S³, (x + (y + z)) = ((x + y) + z)).
>
> The canonical injection is a function from peano -> peano that maps each
> value
> in S (the type peano) to the canonical representant of its class in S/R
> (an
> element of S (the type peano)):
>
> let rec make = function
>  | Zero -> Zero
>  | Plus (Succ n, p) -> Succ (make (Plus (n, p)))
>  | Plus (Zero, p) -> p
>  | Plus (p, Zero) -> p
>  | Plus (Plus (x, y), z) -> make (Plus (x, make (Plus (y, z))))
>  | Succ p -> Succ p
> ;;
>
> (This function may be wrong but you see the idea :))
>
> So, if you want to represent S/R for peano in Caml you must:
> - (1) define the type peano
> - (2) always use the make function to create a value in S/R
> - (3) not to confuse S and S/R in your head (I mean in your programs)
>
> Private data types permits (1), ensures (2), and helps for (3) concerning
> the
> head part and ensures (3) for programs by means of compile-time type
> errors.
>
> The same example with private types:
> ------------------------------------
>
> To define a private data type you must define a module.
> - in the implementation, you define the carrier S and its canonical
> injection.
> - in the interface, you export the carrier and the injection.
>
> The type checker will ensure transparent projection from the quotient to
> the
> carrier and mandatory use of the canonical projection to build a value in
> S/R.
>
> interface peano.mli
> -------------------
> type peano = private
>   | Zero
>   | Succ of peano
>   | Plus of peano * peano
> ;;
>
> val zero : peano;;
> val succ : peano -> peano;;
> val plus : peano * peano -> peano;;
>
> implementation peano.ml
> -----------------------
> type peano =
>   | Zero
>   | Succ of peano
>   | Plus of peano * peano
> ;;
>
> let rec make = function
>  ...
> ;;
>
> let zero = make Zero;;
> let succ p = make (Succ p);;
> let plus (n, m) = make (Plus (n, m));;
>
> (See note (1) for a discussion on the design of this example.)
>
> What is given by private types:
> -------------------------------
>
> - You cannot create a value of S/R if you do not use the canonical
> injection
>  # Zero;;
>  Cannot create values of the private type peano
>
> - As a consequence, values in S (i.e. S/R) are always canonical:
>  # let one = succ zero;;
>  val one : M.peano = Succ Zero
>  # let three = plus (one, succ (plus (one, zero)));;
>  val three : M.peano = Succ (Succ (Succ Zero))
>
> - Projection is automatic
>  # let rec int_of_peano = function
>      | Zero -> 0
>      | Succ n -> 1 + int_of_peano n
>      | Plus (n, p) -> int_of_peano n + int_of_peano p
>    ;;
>  val int_of_peano : M.peano -> int = <fun>
>  # int_of_peano three;;
>  - : int = 3
>
> What about private abbreviations ?
> ----------------------------------
>
> Private abbreviation definitions are a natural extension of private data
> type
> definitions to abbreviation type definitions. The same notions are carried
> out mutatis-mutandis:
>
> - we have a data structure S (defined by a type s) and a relation R on SxS
>  (defined by a function r from s * s to bool).
> - we construct a data structure that represents S/R.
> - we have afterwards:
>  ``No confusion may arise, we always assimilate S to its canonical
> injection
>    in S/R''.
>
> In the case of abbreviations:
>
> - the data structure S/R is defined by a type s (which is an abbreviation)
> and
>  a relation defined by a function,
> - the canonical injection should be defined in the implementation file of
> the
>  private data type module,
> - we always assimilate S to its canonical injection in S/R.
>
> In pratice, as for usual private types (for which the constructive part of
> the data type is not available outside the implementation), the type
> abbreviation
> is not known outside the implementation (it is really private to its
> implementation). This prevents the construction of values of S/R without
> using the canonical injection.
>
> Th noticeable difference is the projection function: it cannot be fully
> implicit (otherwise, as you said Alain, the assimilation will turn to
> complete
> confusion: we would have S identical to S/R, hence row=int and no
> difference
> between a regular abbreviation definition and a private abbreviation
> definition). If not implicit, the injection function should granted to be
> the
> identity function (something that we would get for free, if we allow
> projection via sub-typing coercion).
>
> In short: abstract data types provide values that cannot be inspected nor
> safely manipulated without using the functions provided by the module that
> defines the abstract data type. In contrast, private data types are
> explicitely concrete and you can freely write any algorithm you need. A
> good
> test is printing: you can always write a function to print values of a
> private type, it is up to the implementor of an abstract type to give you
> the
> necessary primitives to access the components of the abstracted values.
>
> Automatic generation of the canonical injection:
> ------------------------------------------------
>
> You may have realized that writing the canonical injection can be a real
> challenge.
>
> The moca compiler (see http://moca.inria.fr/) helps you to write the
> canonical injection by generating one for you, provided you can express
> the
> injection at hand via a set of predefined algebraic relations (and/or
> rewrite
> rules you specify) attached to the constructors of the private type.
> Private
> types with constructors having algebraic relations are named relational
> types. Moca generated canonical injections for relation types.
>
> For instance, you would write the peano example as the following peano.mlm
> file:
>
> type peano = private
>   | Zero
>   | Succ of peano
>   | Plus of peano * peano
>   begin
>     associative
>     neutral (Zero)
>     rule Plus (Succ n, p) -> Succ (Plus (n, p))
>   end;;
>
> The mocac compiler will generate the interface and implementation of the
> peano module for you, including the necessary canonical injection
> function.
>
> Best regards,
>
> --
> Pierre Weis
>
> INRIA Rocquencourt, http://bat8.inria.fr/~weis/<http://bat8.inria.fr/%7Eweis/>
>
> Note (1):
> - we can't just export the canonical injection since you could not build
> any
>  value of the type without previously defined values!
> - we provide specialized versions of the canonical injection function
>  introducing the convention that the lowercase name of a value constructor
> is
>  the name of its associated canonical injection.
> - we do not export the plasin true canonical injection since it would be
> more
>  confusing than useful.
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 19170 bytes --]