From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA26983 for caml-redistribution@pauillac.inria.fr; Fri, 17 Mar 2000 09:54:24 +0100 (MET)
Resent-Message-Id: <200003170854.JAA26983@pauillac.inria.fr>
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA10366 for <weis@pauillac.inria.fr>; Wed, 15 Mar 2000 21:40:39 +0100 (MET)
Received: from mail5.microsoft.com (mail5.microsoft.com [131.107.3.121])
	by nez-perce.inria.fr (8.8.7/8.8.7) with SMTP id VAA24719
	for <Pierre.Weis@inria.fr>; Wed, 15 Mar 2000 21:40:37 +0100 (MET)
Received: from 157.54.9.108 by mail5.microsoft.com (InterScan E-Mail VirusWall NT); Wed, 15 Mar 2000 12:40:28 -0800 (Pacific Standard Time)
Received: by INET-IMC-05 with Internet Mail Service (5.5.2651.58)
	id <GNCG0G4C>; Wed, 15 Mar 2000 12:40:28 -0800
Message-ID: <39ADCF833E74D111A2D700805F1951EF18014503@RED-MSG-06>
From: Don Syme <dsyme@microsoft.com>
To: "'Pierre Weis'" <Pierre.Weis@inria.fr>
Cc: caml-list@inria.fr
Subject: RE: Syntax for label, NEW PROPOSAL
Date: Wed, 15 Mar 2000 12:40:22 -0800
X-Mailer: Internet Mail Service (5.5.2651.58)
Resent-From: weis@pauillac.inria.fr
Resent-Date: Fri, 17 Mar 2000 09:54:24 +0100
Resent-To: caml-redistribution@pauillac.inria.fr


I agree very much with Pierre's comments.  Labels _do_ get in the way, and
probably shouldn't be used in the standard library in 3.00.

It seems pretty clear to me that a typical programmer only keeps an active
vocabulary of roughly 100-200 identifiers. (I mean identifiers from
libraries they are using but have not written themselves - I think you could
do experiments to confirm the figures, but whatever the range we all know
from experience that it is limited). If you accept that, then think about
the impact that labels have on this budget.  A quick grep reveals around 40
different labels.  It's clearly _much_ better to spend the budget on top
level identifiers rather than labels, because otherwise programmers, even
good ones, just become less effective.  The very presence of labels in the
standard library means that new users will look at them, learn them, try to
use them, and the size of their working set of library functions will
decrease as a result.

For example, using the "blit" function from the new version of the array
library module requires knowing not _one_ identifier but _six_.  An existing
user could still use the unlabelled version, but most new users will use
labels, thinking that this is leading to better code and better programming
(incorrectly, I believe, or at best a marginal improvement).  It's like C++
operator overloading - it looks oh so appealing, and a lot of fun for the
language designer to put in, but it nearly always ends up a waste of time to
use.

That's not to say labels are inappropriate everywhere, but it does explain
why they shouldn't be used on functions which users already use accurately,
or where errors in use are already caught by the type checker.  This is the
fundamental test: do labels help programmers use functions with fewer errors
at _runtime_, and is the advantage sufficient to make up for the extra
"weight" of even having the labels around at all?  Remember Caml won out as
the best ML implementation by being Caml-Light, not
Caml-you-must-be-aware-of-the-intricacies-of-my-features-to-learn-to-use-me
;-)

Thus, I'd argue for even stronger rules than Pierre:
  - Labels are only appropriate where a significant number of users 
    routinely make mistakes when using a function, and it is 
    clear that adding labels would solve the problem.
  - This means no labels on functions with 1 or 2 arguments.
  - This means no labels on functions with 3 arguments unless the 
    types are directly ambiguous (and probably not even then).
  - This means no labels on functions where a natural order exists for the
    arguments.
  - This also means no labels on polymorphic functions such as Hash.add (I
think
    it would be very rare that the typechecker wouldn't spot a misuse of
that
    function)
  - No labels inside the arguments of higher order functions.  This 
    will really confuse new users who try not to use labels!
    e.g.. no "acc" in the first argument of
         val fold_right: fun:('b -> acc:'a -> 'a) -> 'b array -> acc:'a ->
'a

And it's not always clear that labels are such a great help - even in the
case
of Array.blit, users may not use the labelled function 
much more accurately, given the time it takes to look up the label names,
correct the errors in misspelling the labels, and given that there is a
natural 
default rule in functional programming that a source operand 
come before a destination.  Even worse, because the programmer has
to remember the damn label names, there may be another 3 or 4 
library functions that they've never learnt to use at all.

Here's a story: In 1990, a new version of the HOL theorem prover (hol90) was
released.  The re-implementation was quite good, but the implementer made a
major mistake - he used labelled versions (actually SML records) of many,
many functions where nothing was gained by doing so.  This was a complete
waste of time, and was a major factor that lead to the splitting of the HOL
effort between "hol-light" and "HOL98", a split that took years to correct.
As Pierre describes, the object system was carefully designed not to put
people off, and if the standard libraries had been objectified then most
existing users would not have moved to OCaml.

Again, that's not to say I don't like labels - they are clearly useful when
functions take many arguments that have no natural order, and will be a god
send for some APIs.  However using them prolifically in the standard library
in this version is simply a bad idea.  Remember, you can always add them to
the standard library later, but you can't take them away!

Cheers,
Don  


-----Original Message-----
From: Pierre Weis [mailto:Pierre.Weis@inria.fr]
Sent: 15 March 2000 14:10
To: caml-redistribution@pauillac.inria.fr
Cc: caml-list@inria.fr
Subject: Re: Syntax for label, NEW PROPOSAL


[Sorry, no french version for this long message]

Abstract:

A long answer to Jacques's proposal. I do not discuss syntax but
semantic issues of the label extension. My conclusion is to be very
careful in adding labels into the standard libraries, and also state
as a extremely desirable design guideline to keep the usage of higher
order functions as simple as possible.

> *** Proposal
> 
> Objective Caml 3.00 is not yet released, and I believe we can still
> have modifications on this point.

Yes, you're perfectly right, we can still modify several points.
However, I think there are many other points that are more important
than the choice of ``%'' instead of ``:'', which is only cosmetic
after all.

Thus, I would prefer to discuss deeper and more semantic problems:

-- Problem1: labels can be reserved keywords. This is questionable
and it has been strongly criticised by some Caml users, especially when
reading in the code the awful sequence fun:begin fun ...

-- Problem2: labels that spread all over the standard libraries, even
when they do not add any good. I would cite:

   * the labels completely redundant with the types
     (E.g. char:char in the type of String.contains or String.index)

   * undesired labels: in many cases I don't want to have labels just
     because I don't want to remember their names. (E.g., I very often
     mispell the label acc since I've always used accu to name an
     accumulator; furthermore, when I do not mispell this label, I feel
     acc:accu extremely verbose). Also because labels are verbose at
     application.

   * labels that prevent you to use comfortably your traditional functions.
     This is particularly evident for the List.map or List.fold_right
     higher-order functionals.

This last point is a real problem. Compare the usual way of using
functionals to define the sum of the elements of a list:

$ ocaml
        Objective Caml version 2.99+10

# let sum l = List.fold_right ( + ) l 0;;
val sum : int list -> int = <fun>

Clearly application is denoted in ML with only one character: a space.

Now, consider using the so-called ``Modern'' versions of these
functionals, obtained with the -modern option of the compiler:

$ ocamlpedantic
        Objective Caml version 2.99+10

# let sum l = List.fold_right ( + ) l 0;;
                              ^^^^^
This expression has type int -> int -> int but is here used with type 'a
list

Clearly, there is something wrong now! We may remark that the error
message is not that clear, but this is a minor point, since error
messages are never clear enough anyway!

The real problem is that fixing the code makes no good at all to its
readability (at least that's what I would say):

# let sum l = List.fold_right fun:begin fun x acc:y -> x + y end acc:0;;
val sum : 'a -> int list -> int = <fun>

It seems that, in the ``modern'' mode, application of higher order
functions is now denoted by a new kind of parens opening by
``fun:begin fun'' and ending by ``end''. This is extremely explicit
but also a bit heavy (in my mind).

For all these reasons, I would suggest to carefully use labels into
the standard libraries: 

-- remove labels from higher-order functional
-- remove redundant labels: when no ambiguity can occur you need not
   to add a label.
-- use labels when typechecking ambiguity is evident (for instance
when there are two or more parameters with the same type).

Labels must enforce readability of code or help documenting the
libraries, it should not be an extra burden to the programmer and a
way of offuscating code.

Evidently, as any other extension, labels must not offuscate the
overall picture, that is they must not clobber the semantics, nor add
extra exceptional cases to the few general rules we have for the
syntax and semantics of Caml.

In this respect, optional labelled arguments might also be discussed,
particularly for the following facts:

-- syntactically identical patterns and expressions now may have
incompatible types:
   # let f ?style:x _ = x;;
   val f : ?style:'a -> 'b -> 'a option = <fun>

   As a pattern on the left-hand side x has type 'a, while as an
   expression on the right hand side it has type 'a option

-- some expressions can be only written as arguments in an application
   context:
   # let f ?style:x g = ?style:x;;
                        ^
   Syntax error
   # let f ?style:x g = g ?style:x;;
   val f : ?style:'a -> (?style:'a -> 'b) -> 'b = <fun>

-- the simple addition of a default value to an optional argument may
   trigger a typechecking error:

   # let f ?(style:x) g = g ?style:x;;
   val f : ?style:'a -> (?style:'a -> 'b) -> 'b = <fun>

   # let f ?(style:x = 1) g = g ?style:x;;
   This expression has type int but is here used with type 'a option

Do not forget the design decision that has always been used before in
the development of Caml: interesting but not universal extensions to
the language must carefully be kept orthogonal to the core language
and its libraries. This has been successfully achieved for the
important addition of modules (that do not prevent the users from
using the old interface-implementation view of modules) as well as for
the objects system addition that has been also maintained orthogonal
to the rest of the language (in particular the standard library has
never been ``objectified''). I don't know of any reason why labels
cannot follow the same safe guidelines.

> Here is an alternative proposal, to use `%' in place of `:'. Labels
> are kept as a lexical entity. This still breaks some programs, since
> `%' was registered as infix, but this is not so bad.

> Con:
> * I still think that `:' looks better, particularly inside types.
> * On my keyboard I can type in `:' without pressing shift :-)
> * We will need some tool to convert existing code.

I think that % should be the infix integer modulo symbol.

> Do you think it would be better?

No.

> Are there people around who would rather keep `:' ?

Yes. However this is syntax and we have to consider semantics in the
first place.

There are also people around that would like to keep Caml a true
functional language, where usage of higer order functions is easy and
natural.  We have to be careful not to lose what is the actual
strength of the language.

-- 
Pierre Weis

INRIA, Projet Cristal, http://pauillac.inria.fr/~weis