From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA26983 for caml-redistribution@pauillac.inria.fr; Fri, 17 Mar 2000 09:54:24 +0100 (MET) Resent-Message-Id: <200003170854.JAA26983@pauillac.inria.fr> Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA10366 for ; Wed, 15 Mar 2000 21:40:39 +0100 (MET) Received: from mail5.microsoft.com (mail5.microsoft.com [131.107.3.121]) by nez-perce.inria.fr (8.8.7/8.8.7) with SMTP id VAA24719 for ; Wed, 15 Mar 2000 21:40:37 +0100 (MET) Received: from 157.54.9.108 by mail5.microsoft.com (InterScan E-Mail VirusWall NT); Wed, 15 Mar 2000 12:40:28 -0800 (Pacific Standard Time) Received: by INET-IMC-05 with Internet Mail Service (5.5.2651.58) id ; Wed, 15 Mar 2000 12:40:28 -0800 Message-ID: <39ADCF833E74D111A2D700805F1951EF18014503@RED-MSG-06> From: Don Syme To: "'Pierre Weis'" Cc: caml-list@inria.fr Subject: RE: Syntax for label, NEW PROPOSAL Date: Wed, 15 Mar 2000 12:40:22 -0800 X-Mailer: Internet Mail Service (5.5.2651.58) Resent-From: weis@pauillac.inria.fr Resent-Date: Fri, 17 Mar 2000 09:54:24 +0100 Resent-To: caml-redistribution@pauillac.inria.fr I agree very much with Pierre's comments. Labels _do_ get in the way, and probably shouldn't be used in the standard library in 3.00. It seems pretty clear to me that a typical programmer only keeps an active vocabulary of roughly 100-200 identifiers. (I mean identifiers from libraries they are using but have not written themselves - I think you could do experiments to confirm the figures, but whatever the range we all know from experience that it is limited). If you accept that, then think about the impact that labels have on this budget. A quick grep reveals around 40 different labels. It's clearly _much_ better to spend the budget on top level identifiers rather than labels, because otherwise programmers, even good ones, just become less effective. The very presence of labels in the standard library means that new users will look at them, learn them, try to use them, and the size of their working set of library functions will decrease as a result. For example, using the "blit" function from the new version of the array library module requires knowing not _one_ identifier but _six_. An existing user could still use the unlabelled version, but most new users will use labels, thinking that this is leading to better code and better programming (incorrectly, I believe, or at best a marginal improvement). It's like C++ operator overloading - it looks oh so appealing, and a lot of fun for the language designer to put in, but it nearly always ends up a waste of time to use. That's not to say labels are inappropriate everywhere, but it does explain why they shouldn't be used on functions which users already use accurately, or where errors in use are already caught by the type checker. This is the fundamental test: do labels help programmers use functions with fewer errors at _runtime_, and is the advantage sufficient to make up for the extra "weight" of even having the labels around at all? Remember Caml won out as the best ML implementation by being Caml-Light, not Caml-you-must-be-aware-of-the-intricacies-of-my-features-to-learn-to-use-me ;-) Thus, I'd argue for even stronger rules than Pierre: - Labels are only appropriate where a significant number of users routinely make mistakes when using a function, and it is clear that adding labels would solve the problem. - This means no labels on functions with 1 or 2 arguments. - This means no labels on functions with 3 arguments unless the types are directly ambiguous (and probably not even then). - This means no labels on functions where a natural order exists for the arguments. - This also means no labels on polymorphic functions such as Hash.add (I think it would be very rare that the typechecker wouldn't spot a misuse of that function) - No labels inside the arguments of higher order functions. This will really confuse new users who try not to use labels! e.g.. no "acc" in the first argument of val fold_right: fun:('b -> acc:'a -> 'a) -> 'b array -> acc:'a -> 'a And it's not always clear that labels are such a great help - even in the case of Array.blit, users may not use the labelled function much more accurately, given the time it takes to look up the label names, correct the errors in misspelling the labels, and given that there is a natural default rule in functional programming that a source operand come before a destination. Even worse, because the programmer has to remember the damn label names, there may be another 3 or 4 library functions that they've never learnt to use at all. Here's a story: In 1990, a new version of the HOL theorem prover (hol90) was released. The re-implementation was quite good, but the implementer made a major mistake - he used labelled versions (actually SML records) of many, many functions where nothing was gained by doing so. This was a complete waste of time, and was a major factor that lead to the splitting of the HOL effort between "hol-light" and "HOL98", a split that took years to correct. As Pierre describes, the object system was carefully designed not to put people off, and if the standard libraries had been objectified then most existing users would not have moved to OCaml. Again, that's not to say I don't like labels - they are clearly useful when functions take many arguments that have no natural order, and will be a god send for some APIs. However using them prolifically in the standard library in this version is simply a bad idea. Remember, you can always add them to the standard library later, but you can't take them away! Cheers, Don -----Original Message----- From: Pierre Weis [mailto:Pierre.Weis@inria.fr] Sent: 15 March 2000 14:10 To: caml-redistribution@pauillac.inria.fr Cc: caml-list@inria.fr Subject: Re: Syntax for label, NEW PROPOSAL [Sorry, no french version for this long message] Abstract: A long answer to Jacques's proposal. I do not discuss syntax but semantic issues of the label extension. My conclusion is to be very careful in adding labels into the standard libraries, and also state as a extremely desirable design guideline to keep the usage of higher order functions as simple as possible. > *** Proposal > > Objective Caml 3.00 is not yet released, and I believe we can still > have modifications on this point. Yes, you're perfectly right, we can still modify several points. However, I think there are many other points that are more important than the choice of ``%'' instead of ``:'', which is only cosmetic after all. Thus, I would prefer to discuss deeper and more semantic problems: -- Problem1: labels can be reserved keywords. This is questionable and it has been strongly criticised by some Caml users, especially when reading in the code the awful sequence fun:begin fun ... -- Problem2: labels that spread all over the standard libraries, even when they do not add any good. I would cite: * the labels completely redundant with the types (E.g. char:char in the type of String.contains or String.index) * undesired labels: in many cases I don't want to have labels just because I don't want to remember their names. (E.g., I very often mispell the label acc since I've always used accu to name an accumulator; furthermore, when I do not mispell this label, I feel acc:accu extremely verbose). Also because labels are verbose at application. * labels that prevent you to use comfortably your traditional functions. This is particularly evident for the List.map or List.fold_right higher-order functionals. This last point is a real problem. Compare the usual way of using functionals to define the sum of the elements of a list: $ ocaml Objective Caml version 2.99+10 # let sum l = List.fold_right ( + ) l 0;; val sum : int list -> int = Clearly application is denoted in ML with only one character: a space. Now, consider using the so-called ``Modern'' versions of these functionals, obtained with the -modern option of the compiler: $ ocamlpedantic Objective Caml version 2.99+10 # let sum l = List.fold_right ( + ) l 0;; ^^^^^ This expression has type int -> int -> int but is here used with type 'a list Clearly, there is something wrong now! We may remark that the error message is not that clear, but this is a minor point, since error messages are never clear enough anyway! The real problem is that fixing the code makes no good at all to its readability (at least that's what I would say): # let sum l = List.fold_right fun:begin fun x acc:y -> x + y end acc:0;; val sum : 'a -> int list -> int = It seems that, in the ``modern'' mode, application of higher order functions is now denoted by a new kind of parens opening by ``fun:begin fun'' and ending by ``end''. This is extremely explicit but also a bit heavy (in my mind). For all these reasons, I would suggest to carefully use labels into the standard libraries: -- remove labels from higher-order functional -- remove redundant labels: when no ambiguity can occur you need not to add a label. -- use labels when typechecking ambiguity is evident (for instance when there are two or more parameters with the same type). Labels must enforce readability of code or help documenting the libraries, it should not be an extra burden to the programmer and a way of offuscating code. Evidently, as any other extension, labels must not offuscate the overall picture, that is they must not clobber the semantics, nor add extra exceptional cases to the few general rules we have for the syntax and semantics of Caml. In this respect, optional labelled arguments might also be discussed, particularly for the following facts: -- syntactically identical patterns and expressions now may have incompatible types: # let f ?style:x _ = x;; val f : ?style:'a -> 'b -> 'a option = As a pattern on the left-hand side x has type 'a, while as an expression on the right hand side it has type 'a option -- some expressions can be only written as arguments in an application context: # let f ?style:x g = ?style:x;; ^ Syntax error # let f ?style:x g = g ?style:x;; val f : ?style:'a -> (?style:'a -> 'b) -> 'b = -- the simple addition of a default value to an optional argument may trigger a typechecking error: # let f ?(style:x) g = g ?style:x;; val f : ?style:'a -> (?style:'a -> 'b) -> 'b = # let f ?(style:x = 1) g = g ?style:x;; This expression has type int but is here used with type 'a option Do not forget the design decision that has always been used before in the development of Caml: interesting but not universal extensions to the language must carefully be kept orthogonal to the core language and its libraries. This has been successfully achieved for the important addition of modules (that do not prevent the users from using the old interface-implementation view of modules) as well as for the objects system addition that has been also maintained orthogonal to the rest of the language (in particular the standard library has never been ``objectified''). I don't know of any reason why labels cannot follow the same safe guidelines. > Here is an alternative proposal, to use `%' in place of `:'. Labels > are kept as a lexical entity. This still breaks some programs, since > `%' was registered as infix, but this is not so bad. > Con: > * I still think that `:' looks better, particularly inside types. > * On my keyboard I can type in `:' without pressing shift :-) > * We will need some tool to convert existing code. I think that % should be the infix integer modulo symbol. > Do you think it would be better? No. > Are there people around who would rather keep `:' ? Yes. However this is syntax and we have to consider semantics in the first place. There are also people around that would like to keep Caml a true functional language, where usage of higer order functions is easy and natural. We have to be careful not to lose what is the actual strength of the language. -- Pierre Weis INRIA, Projet Cristal, http://pauillac.inria.fr/~weis