From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id XAA30365 for caml-redistribution; Sat, 9 Oct 1999 23:16:48 +0200 (MET DST) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA17188 for ; Fri, 8 Oct 1999 20:12:06 +0200 (MET DST) Received: from ruby (pm1-25.triode.net.au [203.63.235.41]) by nez-perce.inria.fr (8.8.7/8.8.7) with ESMTP id UAA15728; Fri, 8 Oct 1999 20:11:56 +0200 (MET DST) Received: from maxtal.com.au (IDENT:root@localhost [127.0.0.1]) by ruby (8.9.3/8.9.3) with ESMTP id EAA07287; Sat, 9 Oct 1999 04:05:50 +1000 Sender: weis Message-ID: <37FE327A.ED47D428@maxtal.com.au> Date: Sat, 09 Oct 1999 04:05:46 +1000 From: skaller Organization: Maxtal P/L X-Mailer: Mozilla 4.51 [en] (X11; I; Linux 2.2.12 i586) X-Accept-Language: en MIME-Version: 1.0 To: Pierre Weis CC: caml-list@inria.fr Subject: Re: Data structures in ocaml References: <199910071210.OAA29673@pauillac.inria.fr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Pierre Weis wrote: > > > Some comments on data structures. > > Please let me know if I miss something, which is possible. > > > > 1. It should be possible to create an uninitialised array. > > [It can be done for string] > > No, it should not. It is _necessary_ for efficiency, to represent procedural code cleanly, and it is possible that it can be done in a _well typed_ manner. See my proposal for an explicit categorical initial. > This is an easy consequence of the main theorem of Caml type-checking: > no well-typed Caml program can ever go wrong. I do not understand this (obviously simplified) assertion. As stated, it is clearly not the case. Ocaml programs can easily go wrong in many ways: infinite loops, core dumps due to stack overflows, exceptions due to invalid function arguments .... Perhaps you mean that the type system is sound, but that is a relative thing: the soundness is useful only within the context of it's expressive power, and like most conventional type systems that is definitely limited in practice. There are a whole host of possible run time errors in ocaml, which represent a lack of expressiveness of the type system. > This is also due to: > > -- a design decision: no uninitialized entities exist in Caml. This is > consistent to the fact that in Caml you DEFINE identifiers instead of > DECLARING them. I understand that this decision was made, and why, but it does not sit so well with the procedural aspect of ocaml. > -- an implementation necessity, since the garbage collector cannot be > faced with some unknown ill-formed data. This only follows from an overly 'strict' interpretation of 'uninitialised'. A pragmatic implementation might well initialise boxed values for the reason you state, but that need not mean the client programmer must. The system can surely do this more efficiently, if it must be done for internal implementation reasons. > A way to mimick uninitialised array inside the Caml language framework > could be to systematically use option encapsulation, initialising the > array with None values, and storing Some x instead of x > afterward. Yes. So the idea is to make the compiler do that, for all values. > This is not acceptable in practice since array accesses > would systematically need destructuring the result. Precisely. > There are other possibilities that cannot be written into the > language, since they locally violates the correct memory management, > but that an implementation may provide. For instance, the run-time > system may define a special ``null'' (allocated) value that is used by > an external primitive to fill uninitialised arrays; then, the array > access primitive would test at each access if the value read is > identical (==) to ``null'' or not; if so, then the array access fails > and an error is raised, otherwise the value fetched is returned. This > means a loss of efficiency due to a spurious test for each array > access, and this is not desirable. I do not care that efficiency is lost while debugging, and the spurious test can be turned off with the -unsafe option just like bounds checking. > > 2. Arrays are mutable, but not variable length. > > This would require an extra indirection for each vector access, which > is undesirable. you are missing the point: I am NOT suggesting changing the existing Array module, but considering the problem of implementing a new module for a variable length one. In this case the 'undesirability' of the extra indirection is completely irrelevant: it is necessary IF you in fact want variable length arrays, and I MUST have them or I cannot implement Python lists efficiently (Python uses variable length arrays for lists, using realloc to reallocate them). >Furthermore, variable length vectors can be defined > by an easy abstraction based on the basic static length vectors. No they can't. I know, I'm trying to do it. It is not at all 'easy', but grossly obscures code clarity. To see this, just consider that your code below is, in fact, wrong: > type 'a evect = ('a array) ref;; > > let make n x = ref (Array.make n x);; > > let give_room v n = > let s = !v in > let l = Array.length s in > if n >= l then > let new_s = Array.make n s.(0) in What happens if Array.length s = 0?? You see, it is NOT possible to implement the 'give_room' function. Instead, you must provide special code in EVERY function of the module to do the reallocation, and you must put dummy values at the end of the array, which will not be collected. This code must be special purpose for each function, to find a suitable dummy value: if there is no such value, the array can be set to zero length: it CAN be done, but it is a mess, it is inefficent, and it can waste a lot of storage, particularly if, as in my implementation, I chose to leave 'old' values lying around in the unused part instead of resetting them all to a single (shared) dummy value. > Anyway, if you want to append two chunks of arrays with no > intermediate allocation, you just have to write a special purpose > function, something like: > > let append_two_chunks v1 start1 end1 v2 start2 end2 = > let result = Array.make ... > Array.blit v1 start1 result 0 ... > Array.blit v2 start2 result start1 ... > result;; Yes, and this requires getting a dummy value to initialise the whole array with, followed by immediately overwriting that dummy value with the actual data, and thus is twice as slow as C. > (SMOP: you can also generalize the previous function to an arbitrary > list of chunks...) I have. And it is even messier to find the dummy value then; I need a recursive function scanning the list of arrays to be concatenated just to find the dummy value, and if all the arguments are zero length, you have to handle it as a special case. ---------------------- I think my point is this: when building a library module, it is useful to have extra code to handle the magic initial value, to find bugs. When the library is thought to be correct, it can be recompiled with the -unsafe option. The result is then fast, although it may crash the program if there is still a bug -- but if there is a bug, the program is bugged _anyhow_, and all that is lost is some extra diagnostic information, which can sometimes be recovered by rerunning the program on the same data, but with the program re-built with the debugging version of the library. The point is that not much safety is lost, since only that module is compiled with -unsafe, and only when it is well enough tested and analysed to have reasonable confidence in its correctness. The alternative is to write code three times longer than would otherwise be required, which is MORE likely to be bugged, even though the bug will not be from an uninitialised value. The tradeoff is: catching a specific kind of error automatically compared with missing a dynamic error which is more likely to occur. In the variable length array case there is no issue: IF I fail to set an array element to the correct value, THEN accessing it is an error. It does not matter whether the value is the wrong value, or an uninitialised one, except perhaps that a core dump from an uninitialised one would be easier to trace! [Especially if the code were simpler] In other words, the decision to force initialisation of values even when it is inappropriate is some times a _disservice_ to the programmer. I recommend giving the programmer a choice. I think this is viable, since the choice can be made on a per compilation unit basis. -- John Skaller, mailto:skaller@maxtal.com.au 1/10 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller downloads: http://www.triode.net.au/~skaller