* [Caml-list] Efficiency of 'a list @ 2003-05-02 19:27 Eray Ozkural 2003-05-03 5:43 ` Mattias Waldau 2003-05-03 21:13 ` Lauri Alanko 0 siblings, 2 replies; 42+ messages in thread From: Eray Ozkural @ 2003-05-02 19:27 UTC (permalink / raw) To: Ocaml Mailing List Hi there, In my maniacal pursuit of efficiency I figured that I don't truly understand the performance of ocaml lists. Could somebody please point to an explanation of ocaml linked list implementation or summarize its performance characteristics? This might seem like a trivial question but having used many functional languages I know that it's easy to commit performance genocide using linked lists. For instance, a naive implementation of an optimal comparison sorting algorithm in LISP almost invariably results in an O(n^2logn) routine :) Therefore, it would be a good start to explain whether ocaml lists are in fact LISP lists and if not in what aspects they differ. The motivation for this question comes from trying to understand the use of linked lists in an efficient algorithm, such as graph algorithms (say we are implementing topological sort) Assume I'm using the following structure that is far from handsome: type x = (int list) array Let a's type be x. Consider codes as the following a.(i) <- a.(i) @ [x;y;z] a.(i) <- [x] :: a.(i) What travesty results in execution of such codes with i coming from an arbitrary sequence? Do using such constructs result in unholy incarnations of space leaks or gross inefficiencies? Another question, does ocaml make any effort to place members of a list close to each other? Or, more naturally, the list element is allocated using a global model and then simply linked inside the structure? These questions may sound weird but I'm hoping it will make sense to somebody! Regards, -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [Caml-list] Efficiency of 'a list 2003-05-02 19:27 [Caml-list] Efficiency of 'a list Eray Ozkural @ 2003-05-03 5:43 ` Mattias Waldau 2003-05-03 8:16 ` Ville-Pertti Keinonen ` (2 more replies) 2003-05-03 21:13 ` Lauri Alanko 1 sibling, 3 replies; 42+ messages in thread From: Mattias Waldau @ 2003-05-03 5:43 UTC (permalink / raw) To: erayo, 'Ocaml Mailing List' Ocaml lists are as efficient or inefficient as Lisp lists. Lists are only good for prototyping. If you need to use repeated appends (or @), then lists are inefficient and there are better solutions. If you do stuff like List.mem, your program will behave very badly if the lists get longer than 100 elements. The only legitimate use of lists is for collecting data using a simple cons and then looping over it. (If you want efficiency.) Normally, my new programs starts out with lists, however after a while they are all replaced by more efficient data structures like arrays or sets. Many modern programming languages (JavaScript, Perl, PHP) have arrays that can take arbitrary keys as an index. This makes many people use them all the time, and it makes the programs resonable efficient, since people do not loop all the time. I think more conventional languages like Java and Ocaml could learn from this and introduce more advanced data structures as primitives, for example replace lists by sets, and let arrays take arbitrary data types as index. This would automatically improve the O-behavior of the programs, ie. make them more scalable. /mattias > -----Original Message----- > From: owner-caml-list@pauillac.inria.fr > [mailto:owner-caml-list@pauillac.inria.fr] On Behalf Of Eray Ozkural > Sent: den 2 maj 2003 21:27 > To: Ocaml Mailing List > Subject: [Caml-list] Efficiency of 'a list > > > Hi there, > > In my maniacal pursuit of efficiency I figured that I don't > truly understand > the performance of ocaml lists. > > Could somebody please point to an explanation of ocaml linked list > implementation or summarize its performance characteristics? > This might seem > like a trivial question but having used many functional > languages I know that > it's easy to commit performance genocide using linked lists. > > For instance, a naive implementation of an optimal comparison sorting > algorithm in LISP almost invariably results in an O(n^2logn) > routine :) > > Therefore, it would be a good start to explain whether ocaml > lists are in fact > LISP lists and if not in what aspects they differ. > > The motivation for this question comes from trying to > understand the use of > linked lists in an efficient algorithm, such as graph > algorithms (say we are > implementing topological sort) > > Assume I'm using the following structure that is far from > handsome: type x = (int list) array > > Let a's type be x. Consider codes as the following > a.(i) <- a.(i) @ [x;y;z] > a.(i) <- [x] :: a.(i) > > What travesty results in execution of such codes with i > coming from an > arbitrary sequence? Do using such constructs result in unholy > incarnations of > space leaks or gross inefficiencies? > > Another question, does ocaml make any effort to place members > of a list close > to each other? Or, more naturally, the list element is > allocated using a > global model and then simply linked inside the structure? > > These questions may sound weird but I'm hoping it will make > sense to somebody! > > Regards, > > -- > Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> > Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-03 5:43 ` Mattias Waldau @ 2003-05-03 8:16 ` Ville-Pertti Keinonen 2003-05-03 14:12 ` Vitaly Lugovsky 2003-05-03 20:03 ` Eray Ozkural 2 siblings, 0 replies; 42+ messages in thread From: Ville-Pertti Keinonen @ 2003-05-03 8:16 UTC (permalink / raw) To: Mattias Waldau; +Cc: erayo, 'Ocaml Mailing List' > Many modern programming languages (JavaScript, Perl, PHP) have arrays > that can take arbitrary keys as an index. This makes many people use > them all the time, and it makes the programs resonable efficient, since > people do not loop all the time. They aren't arrays, you most likely mean maps aka hashes aka associative arrays, which are an entirely different type. > I think more conventional languages like Java and Ocaml could learn > from > this and introduce more advanced data structures as primitives, for > example replace lists by sets, and let arrays take arbitrary data types One big difference here is that the languages providing high-level primitive datatypes are much higher level languages, and they are dynamically typed. They need higher level primitive types because you can't define your own types efficiently enough. Java and OCaml are statically typed, which has significant advantages in terms of performance and compile-time verification (although Java throws away much of that advantage by designing part of the language as if it were dynamically typed, forcing you to widen and narrow types for storage - the worst of both worlds, in many respects). Giving up lists or arrays is a very bad idea. An array has constant-time lookup, unlike a map. A list can be constructed in O(n) time (by consing up), unlike a set. Neither is an appropriate data type to use for a collection from which you want to frequently search based on a key, but they are useful and efficient when used correctly. Primitive 'a set and ('a, 'b) map types in OCaml would certainly be possible, but as far as I can tell, the only advantages a primitive type would have over a library type would be the ability to construct an instance literally (construction is already easy using List/Array.fold_left over a list/array of items or tuples) and to deconstruct using pattern matching (whatever that would even mean - in any case, it would be inconsistent with what pattern matching currently does). In Java, primitive sets and maps would perhaps have a bit more of an advantage, since currently library types are significantly weaker than primitive types (primitive arrays are the only parametrized type in Java). But I think introducing more primitive types is the wrong solution (it would only make the Java type system even more inconsistent than it already is). The right solution is to fix the language to make library types more powerful. Introducing generics is one way to do this, and apparently what is being done. > as index. This would automatically improve the O-behavior of the > programs, ie. make them more scalable. No, it would only improve the behavior of poorly written programs, and it would make make some programs perform worse. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [Caml-list] Efficiency of 'a list 2003-05-03 5:43 ` Mattias Waldau 2003-05-03 8:16 ` Ville-Pertti Keinonen @ 2003-05-03 14:12 ` Vitaly Lugovsky 2003-05-03 18:43 ` Mattias Waldau 2003-05-03 20:03 ` Eray Ozkural 2 siblings, 1 reply; 42+ messages in thread From: Vitaly Lugovsky @ 2003-05-03 14:12 UTC (permalink / raw) To: Mattias Waldau; +Cc: erayo, 'Ocaml Mailing List' On Sat, 3 May 2003, Mattias Waldau wrote: > I think more conventional languages like Java and Ocaml could > learn from > this and introduce more advanced data structures as > primitives, for > example replace lists by sets, and let arrays take arbitrary > data types > as index. This would automatically improve the O-behavior of > the > programs, ie. make them more scalable. OCaml already have Hashtbl implementation. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [Caml-list] Efficiency of 'a list 2003-05-03 14:12 ` Vitaly Lugovsky @ 2003-05-03 18:43 ` Mattias Waldau 2003-05-03 20:01 ` Eray Ozkural ` (2 more replies) 0 siblings, 3 replies; 42+ messages in thread From: Mattias Waldau @ 2003-05-03 18:43 UTC (permalink / raw) To: 'Vitaly Lugovsky'; +Cc: erayo, 'Ocaml Mailing List' *** Do not use lists, there is always a better datastructure *** > From: Vitaly Lugovsky [mailto:vsl@ontil.ihep.su] > OCaml already have Hashtbl implementation. I know Ocaml has hash tables. Ocaml has ALL the datastructures that are needed to create efficient programs. However, lists are the data structure that it easiest to use, since there is special syntax for it, and therefor many novices use it. The result of this is that you get all these questions in this forum complaining about performance. Most of the questions would never have been asked if the author would have used the correct datastructure, mostly Hash/Map or Set. The reason novices use them is that they think that since there is special syntax for lists, this must be the preferred way. We all know that it isn't the preferred solution. I have been doing pure programming since MACLISP on the TOPS-20, and the a large percentage of performance problems can be traced back to IMAPPROPRIATE USE OF LISTS. Therefor my previous post where I essentially say: *** Do not use lists, there is always a better datastructure *** ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-03 18:43 ` Mattias Waldau @ 2003-05-03 20:01 ` Eray Ozkural 2003-05-03 23:17 ` Eray Ozkural 2003-05-04 2:08 ` cashin 2 siblings, 0 replies; 42+ messages in thread From: Eray Ozkural @ 2003-05-03 20:01 UTC (permalink / raw) To: Mattias Waldau, 'Vitaly Lugovsky'; +Cc: 'Ocaml Mailing List' On Saturday 03 May 2003 21:43, Mattias Waldau wrote: > The result of this is that you get all these questions in this forum > complaining about performance. Most of the questions would never have > been asked if the author would have used the correct datastructure, > mostly Hash/Map or Set. Mind you, I am not a novice of any sort and nobody really answered my questions in sufficient detail. I can design and implement a large number of advanced data structures and the implementations you refer to will not satisfy my needs. In particular I don't think I would ever use a straightforward hash table unless the problem presented opportunity for an admissable hash function. -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-03 18:43 ` Mattias Waldau 2003-05-03 20:01 ` Eray Ozkural @ 2003-05-03 23:17 ` Eray Ozkural 2003-05-04 2:08 ` cashin 2 siblings, 0 replies; 42+ messages in thread From: Eray Ozkural @ 2003-05-03 23:17 UTC (permalink / raw) To: Mattias Waldau, 'Vitaly Lugovsky'; +Cc: 'Ocaml Mailing List' Hi Mattias, On Saturday 03 May 2003 21:43, Mattias Waldau wrote: > I have been doing pure programming since MACLISP on the TOPS-20, and the > a large percentage of performance problems can be traced back to > IMAPPROPRIATE USE OF LISTS. Therefor my previous post where I > essentially say: > > *** Do not use lists, there is always a better datastructure *** I fully understand your concern. If you recall I said a similar thing: >For instance, a naive implementation of an optimal comparison sorting >algorithm in LISP almost invariably results in an O(n^2logn) routine :) That I had understood when we had been given an assignment to implement quicksort in our LISP course some years ago :) I had a quick look at how the other people did it, and realized that almost all of them had achieved this inappropriate complexity by using elt function.. However, you can of course implement quick sort (or some other comparison sorting like merge sort) using a linked list, the only problem would be that you can't do it in place (you can even implement randomized quicksort but you'd need two scans of the list instead of a single scan) That's why the programmer must understand how "integral" data structures interact in a PL, especially in one that encompasses both imperative and functional semantics. Of course he must in addition to that have a good understanding of architectural/implementation issues, otherwise he can never predict the performance of his program. Cheers, -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-03 18:43 ` Mattias Waldau 2003-05-03 20:01 ` Eray Ozkural 2003-05-03 23:17 ` Eray Ozkural @ 2003-05-04 2:08 ` cashin 2003-05-04 4:08 ` alc 2003-05-04 8:07 ` Ville-Pertti Keinonen 2 siblings, 2 replies; 42+ messages in thread From: cashin @ 2003-05-04 2:08 UTC (permalink / raw) To: Mattias Waldau; +Cc: 'Ocaml Mailing List' "Mattias Waldau" <mattias.waldau@abc.se> writes: > *** Do not use lists, there is always a better datastructure *** That's kind of silly. Sometimes lists are the natural datastructure to use. If you know what you're doing, use lists when appropriate! I suppose the reason nobody else has responded to this over-generalization is that it's really not specific to ocaml and so of questionable relevance here. However, I don't like thinking that people who aren't yet comfortable with data structures may be reading this suggestion and taking it literally. Whenever you've got a situation where access is always via straightforward iteration and modification is at the beginning or end of a sequence, it doesn't make sense to use a hash table or even a tree. When you don't know the size ahead of time, it doesn't make sense to use an array either. A list is just the right thing in that case. Witness the linux kernel, which uses lists when lists are the most natrural, efficient data structure for the task at hand. ... > The result of this is that you get all these questions in this forum > complaining about performance. Most of the questions would never > have been asked if the author would have used the correct > datastructure, mostly Hash/Map or Set. The solution is not to compound the confusion by saying, "don't use lists". It would be more helpful to point to resources that describe what data structures to use under different circumstances. -- --Ed L Cashin PGP public key: http://noserose.net/e/pgp/ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-04 2:08 ` cashin @ 2003-05-04 4:08 ` alc 2003-05-04 5:32 ` Ed L Cashin ` (3 more replies) 2003-05-04 8:07 ` Ville-Pertti Keinonen 1 sibling, 4 replies; 42+ messages in thread From: alc @ 2003-05-04 4:08 UTC (permalink / raw) Cc: caml Mailing List' cashin@cs.uga.edu wrote: > > Whenever you've got a situation where access is always via > straightforward iteration and modification is at the beginning or end > of a sequence, it doesn't make sense to use a hash table or even a > tree. When you don't know the size ahead of time, it doesn't make > sense to use an array either. A list is just the right thing in that > case. > I've done a little timing of things, and according to my results: If you care about efficiency and use OCaml, you should use lists fairly often, ie if you are always looping and accessing the elements in order. OCaml can iterate through a list (recursively) about twice as fast as it can iterate through an array. It can iterate through a list about as fast as or maybe even a little faster than C or C++ can iterate through an array. Al ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-04 4:08 ` alc @ 2003-05-04 5:32 ` Ed L Cashin 2003-05-04 6:46 ` [Caml-list] Two types of efficiency (Was Efficiency of 'a list) Mattias Waldau ` (2 subsequent siblings) 3 siblings, 0 replies; 42+ messages in thread From: Ed L Cashin @ 2003-05-04 5:32 UTC (permalink / raw) To: alc; +Cc: caml Mailing List' alc@PublicPropertySoftware.com writes: > cashin@cs.uga.edu wrote: >> >> Whenever you've got a situation where access is always via >> straightforward iteration and modification is at the beginning or end >> of a sequence, it doesn't make sense to use a hash table or even a >> tree. When you don't know the size ahead of time, it doesn't make >> sense to use an array either. A list is just the right thing in that >> case. > > I've done a little timing of things, and according to my results: > If you care about efficiency and use OCaml, you should use lists > fairly often, ie if you are always looping and accessing the elements > in order. OCaml can iterate through a list (recursively) about twice as > fast as it can iterate through an array. It can iterate through a > list about as fast as or maybe even a little faster than C or C++ can > iterate through an array. Huh. I wonder why. I had a couple of guesses, but I don't know my way around the ocaml sources, so I couldn't follow up on them (without spending a bit more time than I have ;). -- --Ed L Cashin PGP public key: http://noserose.net/e/pgp/ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-04 4:08 ` alc 2003-05-04 5:32 ` Ed L Cashin @ 2003-05-04 6:46 ` Mattias Waldau 2003-05-04 7:35 ` John Max Skaller ` (3 more replies) 2003-05-04 7:55 ` [Caml-list] Efficiency of 'a list Ville-Pertti Keinonen 2003-05-04 12:38 ` Eray Ozkural 3 siblings, 4 replies; 42+ messages in thread From: Mattias Waldau @ 2003-05-04 6:46 UTC (permalink / raw) To: alc; +Cc: 'caml Mailing List'' > alc@PublicPropertySoftware.com > I've done a little timing of things, and according to my > results: If you care about efficiency and use OCaml, you > should use lists > fairly often, ie if you are always looping and accessing the > elements in order. OCaml can iterate through a list We are talking about two kinds of efficiency: 1. Absolute speed for mostly small benchmarks 2. Scalable programs, i.e. they work fine for input of length 100, but goes on forever for input of length 1000. Scalability is the one that I am talking about, and is the one that mostly gets bitten by inappropriate use of lists. For me scalability is the most important. If you use Map/Hash/Set small benchmarks will be slower, however your programs are more likely to be scalable. Languages like PHP have more advanced built-in datastructures with language syntax to support them. In Ocaml, we can create the same program, however, there is no syntax support, and we have to add declarations like module StringSet = Set.Make(struct type t = string let compare = compare end) ;; It would be nice if the typechecker could deduce the type of the set and add the above declaration automatically to the program. That would make it easier for beginners to use the advanced datastructures of Ocaml. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-04 6:46 ` [Caml-list] Two types of efficiency (Was Efficiency of 'a list) Mattias Waldau @ 2003-05-04 7:35 ` John Max Skaller 2003-05-04 11:52 ` Olivier Andrieu 2003-05-04 16:48 ` brogoff 2003-05-04 7:43 ` Ville-Pertti Keinonen ` (2 subsequent siblings) 3 siblings, 2 replies; 42+ messages in thread From: John Max Skaller @ 2003-05-04 7:35 UTC (permalink / raw) To: Mattias Waldau; +Cc: alc, 'caml Mailing List'' Mattias Waldau wrote: > > Languages like PHP have more advanced built-in datastructures with > language syntax to support them. In Ocaml, we can create the same > program, however, there is no syntax support, and we have to add > declarations like > > module StringSet = Set.Make(struct > type t = string > let compare = compare > end) ;; > > It would be nice if the typechecker could deduce the type of the set and > add the above declaration automatically to the program. That would make > it easier for beginners to use the advanced datastructures of Ocaml. The problem is worse than that. There are places you need such an animal but CANNOT have it. This occurs when there is type recursion involved and the lack of orthogonality in Ocaml syntax prevents you actually declaring the functor instance at the point you need it. This occurs in a class where for example you have a set of references to that class passed to one of the methods of the class: the functor instantiation requires the class type to be complete, but it cannot be completed until the functor is instantiated. [You need to declare the functor inside the class but before the object] Result: dont use functors, they're not properly supported. I found this really annoying. It applies in other parts of the language too, for example polymorphic variants. Here again, the very feature functional languages are supposed to understand best -- recursion -- is actually handled better in C++. I need variants whose arguments include the variant type, and I can't have them together with subtyping. Yet for someone representing a programming language recursion is natural. type x = [`A | `B of y] and y = [x | `C of x] There is no problem doing this with textual substitution: type x = [`A | `B of y] and y = [`A | `B of y | `C of x] Similarly, subtyping of polymorphic variants is invariant in the arguments, when covariance is sound and more useful. This is a real pain for me, since it destroys the main use I hoped to obtain for polymorphic variants: type x = [`A | `B of x] type y = [`A | `B of y | `C] x *is* a subtype of y, but the type system disallows it, failing to recognise that every `B x of type x is also a `B y of y. Typically I have say a term for an expression and I want to eliminate some cases by say a term reduction, but I can't restrict the resultant type as desired because there's no way to 'narrow' the type recursion. So here are two cases where (a) syntax and (b) loss of semantic expressivity reduce the utility of the language. Luckily, the ocaml team is working on such things: local modules, for example, help considerably. Its somewhat unfortunate that the syntactic constraints often make a clean solution difficult, for example: type x = ... and class .. woops: you cant inter-recurse classes and types. You have to use a trick: make the class parametric, then inter-recurse the type x and the instantiation passing x as the argument: ugg. The restriction appears to be entirely a matter of syntax: to distinguish classes and types you'd need the 'type' and 'class' keyword on each conjunct which breaks: type x = .. and y = .. It would appear that the diverse nature of things to be declared indicates this syntax, along with let x = .. and y = .. is in fact wrong: better to have required: let type x = .. and val y = .. and class z = .. As we see in C++, historically sound syntax soon breaks with language extensions :( -- John Max Skaller, mailto:skaller@ozemail.com.au snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-04 7:35 ` John Max Skaller @ 2003-05-04 11:52 ` Olivier Andrieu 2003-05-05 11:04 ` John Max Skaller 2003-05-04 16:48 ` brogoff 1 sibling, 1 reply; 42+ messages in thread From: Olivier Andrieu @ 2003-05-04 11:52 UTC (permalink / raw) To: John Max Skaller; +Cc: caml-list John Max Skaller [Sunday 4 May 2003] : > Similarly, subtyping of polymorphic variants is invariant > in the arguments, when covariance is sound and more useful. > This is a real pain for me, since it destroys the main use > I hoped to obtain for polymorphic variants: > > type x = [`A | `B of x] > type y = [`A | `B of y | `C] > > x *is* a subtype of y, but the type system > disallows it, failing to recognise that every `B x > of type x is also a `B y of y. Typically I have > say a term for an expression and I want to eliminate > some cases by say a term reduction, but I can't restrict > the resultant type as desired because there's no way > to 'narrow' the type recursion. # type x = [`A | `B of x] ;; type x = [ `A | `B of x] # type y = [`A | `B of y | `C] ;; type y = [ `A | `B of y | `C] # fun a -> (a :x :> y) ;; - : x -> y = <fun> Doesn't that mean that x is a subtype of y ? -- Olivier ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-04 11:52 ` Olivier Andrieu @ 2003-05-05 11:04 ` John Max Skaller 0 siblings, 0 replies; 42+ messages in thread From: John Max Skaller @ 2003-05-05 11:04 UTC (permalink / raw) To: Olivier Andrieu; +Cc: caml-list Olivier Andrieu wrote: > John Max Skaller [Sunday 4 May 2003] : > > Similarly, subtyping of polymorphic variants is invariant > > in the arguments, when covariance is sound and more useful. > > This is a real pain for me, since it destroys the main use > > I hoped to obtain for polymorphic variants: > > > > type x = [`A | `B of x] > > type y = [`A | `B of y | `C] > > > > x *is* a subtype of y, but the type system > > disallows it, failing to recognise that every `B x > > of type x is also a `B y of y. Typically I have > > say a term for an expression and I want to eliminate > > some cases by say a term reduction, but I can't restrict > > the resultant type as desired because there's no way > > to 'narrow' the type recursion. > > # type x = [`A | `B of x] ;; > type x = [ `A | `B of x] > # type y = [`A | `B of y | `C] ;; > type y = [ `A | `B of y | `C] > # fun a -> (a :x :> y) ;; > - : x -> y = <fun> > > Doesn't that mean that x is a subtype of y ? Indeed. Hmm .. has this changed since 3.04? [I just installed the CVS version which is 3.06+31 (2003/5/2)] Let me try a more comprehensible example. type expr1 = [`Prim | `Add of expr1 * expr1 | `Plus of expr1 * expr1] type expr2 = [`Prim | `Add of expr2 * expr2] let lift e = (e : expr2 :> expr1) ;; let rec r (e:expr1): expr2 = match e with | `Prim -> `Prim | `Add (a,b) -> `Add (r a,r b) | `Plus (a,b) -> `Add (r a, r b) ;; And it works. It didn't before! So thx for pointing this out, and thx to the ocaml team for doing this work. Now I'll go off and try to use it. It will simplify my code a lot: at present I have a lot of cases I know won't occur, because they've been "reduced out" by a prior term rewriting routine, but the typing didn't allow this to be expressed. A lot of nasty | _ -> assert false terms will be eliminated: getting rid of wildcard matches should improve compile time error diagnosis considerably. Kind of a pity I can't write: type expr1 = [expr2 | `Plus expr1 * expr1] but it would lift the wrong recursion out. -- John Max Skaller, mailto:skaller@ozemail.com.au snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-04 7:35 ` John Max Skaller 2003-05-04 11:52 ` Olivier Andrieu @ 2003-05-04 16:48 ` brogoff 1 sibling, 0 replies; 42+ messages in thread From: brogoff @ 2003-05-04 16:48 UTC (permalink / raw) To: John Max Skaller; +Cc: Mattias Waldau, alc, 'caml Mailing List'' On Sun, 4 May 2003, John Max Skaller wrote: > Mattias Waldau wrote: > > Languages like PHP have more advanced built-in datastructures with > > language syntax to support them. In Ocaml, we can create the same > > program, however, there is no syntax support, and we have to add > > declarations like > > > > module StringSet = Set.Make(struct > > type t = string > > let compare = compare > > end) ;; > > > > It would be nice if the typechecker could deduce the type of the set and > > add the above declaration automatically to the program. That would make > > it easier for beginners to use the advanced datastructures of Ocaml. As someone else points out, you have the source, you can make polymorphic versions of sets/maps/... by coding polymorphic versions of the source and instantiating with modules based on polymorphic comparisons. > The problem is worse than that. There are places you need such > an animal but CANNOT have it. This occurs when there is type > recursion involved and the lack of orthogonality in Ocaml syntax > prevents you actually declaring the functor instance > at the point you need it. It isn't a syntax problem. I agree that the lack of recursion in the module system is a big problem that needs to be fixed (pun intended ;-) but I think you (and, more importantly, the FAQ maintainer) should mention that there ARE workarounds to these problems. There was a recent discussion on this topic here (and in comp.lang.ml, a 2 year long thread!) where you can see them. Briefly, the polymorphic sets I just mentioned can be used to build simple recursive structures using the parameterization trick at the level of types (i.e., a type variable is used to "defer" the instantiation and allow you to tie the knot. To build more complex ones, you need to recode Set (or whatever functor) to be a functor on an "open" ordered type; the comparison function must be parameterized so that you can close the recursion later. So, this is the parameterization trick allowed the level of types and functions simultaneously. Jean Claude Filliatre showed that one on his message on this topic. I haven't tried it with classes, so YMMV. The only minor annoyance with that trick is that it requires some textual duplication on account of the scope rules for "with type" constraints. This is annoying, and I think, syntactic. > Result: dont use functors, they're not properly > supported. Nonsense. There are places where there can be improvements, but this is as whacky as the "don't use lists" assertion. Why are you using classes? Are they properly supported? > I found this really annoying. It applies in > other parts of the language too, for example > polymorphic variants. Here again, the very > feature functional languages are supposed to > understand best -- recursion -- is actually > handled better in C++. Get outta here! > I need variants whose arguments include the variant > > type, and I can't have them together with subtyping. > Yet for someone representing > a programming language recursion is natural. > > type x = [`A | `B of y] > and y = [x | `C of x] > > There is no problem doing this with textual substitution: > > type x = [`A | `B of y] > and y = [`A | `B of y | `C of x] Yup, that's annoying. I was nailed by the same problem recently, and I used polymorphism to avoid having to write the tags more than once. Something like this type 'a x' = [`A | `B of 'a] type x = y x' and y = [y x' | `C of x] or something like that, I don't recall. Maybe Jacques will shed some light on this restriction. Someone else addresses your subtyping issue. I have to admit that I have found polymorphic variants puzzling at times, but the additional power is amazing and allows you to get at the extensibility of OO and beyond. Look up "The Expression Problem" on Google and see if you can find a discussion on the defunct Java Genericity mailing list. Or just take a look at Jacques Garrigues little paper on code reuse with variants. I'm using the same approach to model a different domain than lambda calculus evaluators, and while the (algorithmic) complexity of that domain makes the evaluator and types a bit more complex, the basic ideas translate easily. What a cool language! -- Brian ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-04 6:46 ` [Caml-list] Two types of efficiency (Was Efficiency of 'a list) Mattias Waldau 2003-05-04 7:35 ` John Max Skaller @ 2003-05-04 7:43 ` Ville-Pertti Keinonen 2003-05-04 12:50 ` Eray Ozkural 2003-05-04 12:48 ` Eray Ozkural 2003-05-05 7:31 ` Diego Olivier Fernandez Pons 3 siblings, 1 reply; 42+ messages in thread From: Ville-Pertti Keinonen @ 2003-05-04 7:43 UTC (permalink / raw) To: Mattias Waldau; +Cc: alc, 'caml Mailing List'' On Sunday, May 4, 2003, at 09:46 Europe/Helsinki, Mattias Waldau wrote: > We are talking about two kinds of efficiency: > 1. Absolute speed for mostly small benchmarks > 2. Scalable programs, i.e. they work fine for input of length 100, > but goes on forever for input of length 1000. No. What you suggested (replace lists with sets, replace arrays with maps) would in many places be trading O(1) behavior for O(log n) behavior, which certainly doesn't make programs more scalable. > It would be nice if the typechecker could deduce the type of the set > and > add the above declaration automatically to the program. That would make > it easier for beginners to use the advanced datastructures of Ocaml. Implementing 'a set and ('a, 'b) map types in OCaml is trivial; in the OCaml library, for some reason ('a, 'b) Hashtbl.t is available but Set and Map are only provided through functors. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-04 7:43 ` Ville-Pertti Keinonen @ 2003-05-04 12:50 ` Eray Ozkural 0 siblings, 0 replies; 42+ messages in thread From: Eray Ozkural @ 2003-05-04 12:50 UTC (permalink / raw) To: Ville-Pertti Keinonen, Mattias Waldau Cc: alc, 'caml Mailing List'' On Sunday 04 May 2003 10:43, Ville-Pertti Keinonen wrote: > No. What you suggested (replace lists with sets, replace arrays with > maps) would in many places be trading O(1) behavior for O(log n) > behavior, which certainly doesn't make programs more scalable. No, it doesn't make sense. It would effectively turn ocaml into an inefficient toy language like those "high level languages" that he was talking about. NO DATA STRUCTURE IS SUITABLE FOR ALL TASKS Take it from the structure monster, Cheers, -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-04 6:46 ` [Caml-list] Two types of efficiency (Was Efficiency of 'a list) Mattias Waldau 2003-05-04 7:35 ` John Max Skaller 2003-05-04 7:43 ` Ville-Pertti Keinonen @ 2003-05-04 12:48 ` Eray Ozkural 2003-05-05 7:31 ` Diego Olivier Fernandez Pons 3 siblings, 0 replies; 42+ messages in thread From: Eray Ozkural @ 2003-05-04 12:48 UTC (permalink / raw) To: Mattias Waldau, alc; +Cc: 'caml Mailing List'' On Sunday 04 May 2003 09:46, Mattias Waldau wrote: > It would be nice if the typechecker could deduce the type of the set and > add the above declaration automatically to the program. That would make > it easier for beginners to use the advanced datastructures of Ocaml. Nice in theory but doesn't really alleviate the problem. Why? Because a novice can always shoot himself in the foot. Overuse of any datastructure is overkill. If I use the cumbersome R-B trees everywhere, it will not guarantee my program to be efficient. Somebody who doesn't understand asymptotic complexity will use things like a map from strings to strings. You almost never use strings as keys in a tree. You use tries or trie hashing, etc. In KDE source code, for instance, I've seen a lot of instances of such lousy data structures... Just to point out that there is no such thing as the ultimate index structure that works with any kind of key and any kind of records. Therefore attaching syntactic significance to one particular data structure might be highly confusing for programmers. I don't recommend it. What I would recommend would be a more abstract way of doing the same thing. let A = { 3, 5, 7 } It does look like such a statement has its merits! But yet, one must explicitly state what kind of a data structure he really wants somehow. Maybe let A = { 3, 5, 7 } : MyCuteSet BTW, a really abstract set data type does *not* assume a total ordering among elements. That's because I might want to implement multi-sets as unordered lists! Those kinds of things depend on the application. I really don't know how one would solve this in a static syntaxed language! Cheers, -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-04 6:46 ` [Caml-list] Two types of efficiency (Was Efficiency of 'a list) Mattias Waldau ` (2 preceding siblings ...) 2003-05-04 12:48 ` Eray Ozkural @ 2003-05-05 7:31 ` Diego Olivier Fernandez Pons 2003-05-05 11:11 ` Mattias Waldau ` (2 more replies) 3 siblings, 3 replies; 42+ messages in thread From: Diego Olivier Fernandez Pons @ 2003-05-05 7:31 UTC (permalink / raw) To: Mattias Waldau; +Cc: caml-list Bonjour, > Languages like PHP have more advanced built-in datastructures with > language syntax to support them. In Ocaml, we can create the same > program, however, there is no syntax support, and we have to add > declarations like > > module StringSet = Set.Make(struct > type t = string > let compare = compare > end) ;; > You should not use a set of strings but a trie. There are of course a lot of ways to implement tries : - trees of lists - trees of arrays (or an optimized version with strings) - ternary trees (balanced or not) You may even minimize your acyclic automaton using any of the following algorithms : - 2 phases algorithm (cf Dominique Revuz PhD thesis) - Incremental minimization (Incremental construction of a minimal acyclic finite state automata, Daciuk, Milhov, B. Watson, R. Watson, Computer linguistics volume 26 number 1 mars 2000) - minimization by Brzozowki's algorithm - minimization by Hopcroft's algorithm For which one of these versions should Caml provide build-in support ? Diego Olivier ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-05 7:31 ` Diego Olivier Fernandez Pons @ 2003-05-05 11:11 ` Mattias Waldau 2003-05-05 13:17 ` John Max Skaller 2003-05-05 11:49 ` Eray Ozkural 2003-05-05 11:57 ` Yaron M. Minsky 2 siblings, 1 reply; 42+ messages in thread From: Mattias Waldau @ 2003-05-05 11:11 UTC (permalink / raw) To: 'Diego Olivier Fernandez Pons'; +Cc: caml-list > You should not use a set of strings but a trie. There are of > course a lot of ways to implement tries : > > - trees of lists > - trees of arrays (or an optimized version with strings) > - ternary trees (balanced or not) > > You may even minimize your acyclic automaton using any of the > following algorithms : > - 2 phases algorithm (cf Dominique Revuz PhD thesis) > - Incremental minimization (Incremental construction of a > minimal acyclic finite state automata, Daciuk, Milhov, B. > Watson, R. Watson, Computer linguistics volume 26 number 1 mars 2000) > - minimization by Brzozowki's algorithm > - minimization by Hopcroft's algorithm > > For which one of these versions should Caml provide build-in support ? > > Diego Olivier > Hi Diego, yes, there are many datastructures to choose from, and all have their advantages. Of course, a built-in support solution will not be optimal, but it will definitely be better than a unordered list of strings :-) As a programming language designer, you tell the users the preferred way to do things, and for example almost all pure languages that has been born on a university (LISP, Prolog, SML, Ocaml, ....) use lists as a primary datastructure. I have never understood this love of list, since the only efficient use of a list is as a stack which can be iterated over. Prolog have a little advantage where append is an O(1)-operation instead of an O(n), however searching is still O(n). It would be interesting to see how a typed scripting language (maybe built on top of ocaml) with advanced built-in datastructures (with syntax support) would look like. Cheers, Mattias ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-05 11:11 ` Mattias Waldau @ 2003-05-05 13:17 ` John Max Skaller 0 siblings, 0 replies; 42+ messages in thread From: John Max Skaller @ 2003-05-05 13:17 UTC (permalink / raw) To: Mattias Waldau; +Cc: 'Diego Olivier Fernandez Pons', caml-list Mattias Waldau wrote: > It would be interesting to see how a typed scripting language (maybe > built on top of ocaml) with advanced built-in datastructures (with > syntax support) would look like. There are two things you can look at: (1) FISh 2 when it finally comes out, will support general polynomial data structures with functorial polymorphism: at the moment only FISh 1 is available but gives the flavour: http://www-staff.it.uts.edu.au/~cbj/FISh/ (2) Felix: http://felix.sf.net isn't nearly as ambitious, indeed, its a stock Algol like language with a couple of twists which includes heavy support for purely functional programming with eager evaluation. At the moment, the only 'advanced' data structure built-in to the language is the ability to build O(n) regular expression matching used like: regmatch string_expr with | regexp1 => ... | regexp2 => .. [and there is some hackery here ..] Contrary to providing 'built-in' advanced data structures, Felix is exactly the opposite: it has no built-in data types at all. Instead, it provides powerful data constructors such as anonymous sums and products, as well as functions of course. These are strong enough to construct bool in the standard library. In addition, it provides a way to lift data types from C/C++. For example: type int = "int"; fun add: int * int -> int = "$1 + $2"; Each of these "user defined primitives" is defined in C/C++ and provided as an abstract data type, accessed by user specified primitive functions (like "add" above). Primitives can now also be generic: type vector[t] = "std::vector<?1>"; I'm explaining all this because in my opinion, providing some fixed "efficient/advanced" data structure "built-in" to the language is in fact suboptimal, and unnecessary. The trick, instead, is to provide CORE functionality which can be used to conveniently construct and access such data structures. For example you can write: s[1 to 20] to subscript a string. It's just syntactic sugar for: subscript(s,1,20) and subscript is just a function: for a given string like type, the user has to define it. Some of the other features that make Felix friendly include an advanced overloading scheme which fixes many of the problems in the C++ mechanism (and contrary to GCaml, is even more "open" than the C++ mechanism) The most advanced feature "builtin" is, in fact, not a data structure but a control structure: Felix performs blocking reads on a central message queue by yielding control (without using hardware threading). [I call the suspended states continuations, but technically I think they're resumptions] It is likely I will provided synactic sugar for some other concepts such as iterators. What's important in scripting languages, apart from automatic storage management, is that the syntax be convenient. There is nothing in Python dictionaries or Perl hashes that can't be done in C++ or Ocaml: the primary advantage is NOT that these data structures are built in. What's important is the easy use which specific syntax provides. Felix will try to provide the convenience without actually building anything in: instead, it provides a way to bind the syntax to user defined types. Operator overloading and the like are the simplest and easiest way to do this, although it's painful compared with what FISh 2 seeks to do: allow functorially polymorphic algorithms to be written. I have to recommend consideration of the protocols definitions in Python, and the container and iterator definitions in the C++ Standard. Both of these systems handle, in a different way, the notion of Sequence and Associative Container fairly well. Similarly, SQL embodies well the notions of relational algebra (and in some sense is a generalisation of sequences and associative containers). Its clear, however, that no one has much idea what to do with trees and graphs (yacc like parser tools are the closest thing to tree management syntaxes, and they're usual extrinsic rather than embedded in the language). In the end, convenient use of advanced data structures -- and i don't mean simply sequences and associative containers -- depends on theoreticians developing the right general calculi to deal with them in a compact way. Only then can language designers provide generalised syntax that goes beyond overloading and stuff like address ["fred"] = "20 Smith St Paris" -- John Max Skaller, mailto:skaller@ozemail.com.au snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-05 7:31 ` Diego Olivier Fernandez Pons 2003-05-05 11:11 ` Mattias Waldau @ 2003-05-05 11:49 ` Eray Ozkural 2003-05-05 11:57 ` Yaron M. Minsky 2 siblings, 0 replies; 42+ messages in thread From: Eray Ozkural @ 2003-05-05 11:49 UTC (permalink / raw) To: Diego Olivier Fernandez Pons, Mattias Waldau; +Cc: caml-list On Monday 05 May 2003 10:31, Diego Olivier Fernandez Pons wrote: > - Incremental minimization (Incremental construction of a minimal > acyclic finite state automata, Daciuk, Milhov, B. Watson, R. Watson, > Computer linguistics volume 26 number 1 mars 2000) That's a really cool algorithm, I was going to implement it one day :) -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-05 7:31 ` Diego Olivier Fernandez Pons 2003-05-05 11:11 ` Mattias Waldau 2003-05-05 11:49 ` Eray Ozkural @ 2003-05-05 11:57 ` Yaron M. Minsky 2003-05-05 13:32 ` John Max Skaller 2003-05-05 16:38 ` Diego Olivier Fernandez Pons 2 siblings, 2 replies; 42+ messages in thread From: Yaron M. Minsky @ 2003-05-05 11:57 UTC (permalink / raw) To: Caml List You're kidding, right? You're making a classic "best is enemy of the good" mistake here. Yes, there are lots of implementations, and it's not clear which of them is absolutely optimal. That doesn't mean ocaml shouldn't provide built-in support for one "good-enough" solution. Such support doesn't preclude using whatever the optimal algorithm is for the situation. But most of the time, it works fine, and having built in support improves the usability of the language greatly. I for one have never quite understood why the Set and Map modules only provide modular implementations, and why the API is relatively weak. My solution, and I'm sure that of many others, has been to write their own polymorphic set and map data structures. Luckly, ocaml makes that easy. But it seems clear to me that such data structures belong in the standard library. And some syntax support would be nice to have as part of this standard as well. y On Mon, 2003-05-05 at 03:31, Diego Olivier Fernandez Pons wrote: > You should not use a set of strings but a trie. There are of course a > lot of ways to implement tries : > > - trees of lists > - trees of arrays (or an optimized version with strings) > - ternary trees (balanced or not) > > You may even minimize your acyclic automaton using any of the > following algorithms : > - 2 phases algorithm (cf Dominique Revuz PhD thesis) > - Incremental minimization (Incremental construction of a minimal > acyclic finite state automata, Daciuk, Milhov, B. Watson, R. Watson, > Computer linguistics volume 26 number 1 mars 2000) > - minimization by Brzozowki's algorithm > - minimization by Hopcroft's algorithm > > For which one of these versions should Caml provide build-in support ? > > Diego Olivier > > ------------------- > To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr > Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners -- |--------/ Yaron M. Minsky \--------| |--------\ http://www.cs.cornell.edu/home/yminsky/ /--------| Open PGP --- KeyID B1FFD916 (new key as of Dec 4th) Fingerprint: 5BF6 83E1 0CE3 1043 95D8 F8D5 9F12 B3A9 B1FF D916 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-05 11:57 ` Yaron M. Minsky @ 2003-05-05 13:32 ` John Max Skaller 2003-05-06 2:49 ` Nicolas Cannasse 2003-05-05 16:38 ` Diego Olivier Fernandez Pons 1 sibling, 1 reply; 42+ messages in thread From: John Max Skaller @ 2003-05-05 13:32 UTC (permalink / raw) To: Yaron M. Minsky; +Cc: Caml List Yaron M. Minsky wrote: > You're kidding, right? You're making a classic "best is enemy of the > good" mistake here. With due respect, there is always the other side of the coin. What is thought to be "good enough" often enough turns out later to be ill-considered and a major disaster. Indeed one could make a hypothesis that this is *necessarily* the case, since "good enough" really means "we don't really understand the requirements". I'm not trying to flame, so much as suggesting that "when in doubt leave it out" isn't such a bad principle. As an example: functorial set interface vs. one using type variables. Well, most of us seem to think that the type variable interface is more convenient most of the time. But before the Ocaml team rushes ahead and provides it *in addition* to the existing functorial interface, it might be a good idea to enquire about how the two are related on a theoretical level. It might be an idea to devise some principle for deciding which kinds of interfaces to provide in a library, since the issue is likely to arise again. It may even be an idea to figure out if the theoretical relationship between the two representations can somehow be connected with language syntax so the transformation from one kind to the other can be done easily by a dumb user (like me), obviating the need for providing an exponential set of interfaces. The same applies to classes, since some data structures are mutable enough one wonder why classes weren't used, since dynamic binding to abstractions seems to provide some advantages over both type variables and functors in some circumstances. As an example: streams. Well, there WAS built in support for streams, but it was removed -- you can get the code to work now using camlp4, which is now part of the standard distribution. So just maybe strengthening camlp4 is better way forward then providing built-in syntactic support for more data structures: rather, it may be better to *eliminate* existing support, for example, for arrays (by delegating the job to camlp4). -- John Max Skaller, mailto:skaller@ozemail.com.au snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-05 13:32 ` John Max Skaller @ 2003-05-06 2:49 ` Nicolas Cannasse 2003-05-06 12:30 ` Diego Olivier Fernandez Pons 0 siblings, 1 reply; 42+ messages in thread From: Nicolas Cannasse @ 2003-05-06 2:49 UTC (permalink / raw) To: John Max Skaller, Yaron M. Minsky; +Cc: Caml List > But before the Ocaml team rushes ahead and provides > it *in addition* to the existing functorial interface, > it might be a good idea to enquire about how the two > are related on a theoretical level. It might be an idea > to devise some principle for deciding which kinds of > interfaces to provide in a library, since the issue is > likely to arise again. > > It may even be an idea to figure out if the theoretical > relationship between the two representations can somehow > be connected with language syntax so the transformation > from one kind to the other can be done easily by > a dumb user (like me), obviating the need for > providing an exponential set of interfaces. Few people here are currently running the "ExtLib" - ocaml extended library - project, and are trying to answer theses questions. For an example of a structure that can be used to convert from and between several different data structures, you could have a look at the Enum module from the ExtLib CVS here : http://sourceforge.net/projects/ocaml-lib This is a purely functionnal way of dealing with conversions between Arrays, Lists, etc. with lazy-functionnal support that enable you to map-and-convert a list to an array without having to build an intermediate data structure for storing mapped elements. Nicolas Cannasse ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-06 2:49 ` Nicolas Cannasse @ 2003-05-06 12:30 ` Diego Olivier Fernandez Pons 2003-05-07 2:05 ` Nicolas Cannasse 0 siblings, 1 reply; 42+ messages in thread From: Diego Olivier Fernandez Pons @ 2003-05-06 12:30 UTC (permalink / raw) To: Nicolas Cannasse; +Cc: Caml List Bonjour, On Tue, 6 May 2003, Nicolas Cannasse wrote: > Few people here are currently running the "ExtLib" - ocaml extended > library - project, and are trying to answer theses questions. For an > example of a structure that can be used to convert from and between > several different data structures, you could have a look at the Enum > module from the ExtLib CVS here : I read the code but I am afraid I didn't catch the point : what is the difference between the 'enum' data structure and a ('a * int) stream ? Why is it more convenient than other data structures as an itermediate data representation ? Diego Olivier ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-06 12:30 ` Diego Olivier Fernandez Pons @ 2003-05-07 2:05 ` Nicolas Cannasse 0 siblings, 0 replies; 42+ messages in thread From: Nicolas Cannasse @ 2003-05-07 2:05 UTC (permalink / raw) To: Diego Olivier Fernandez Pons; +Cc: Caml List > > Few people here are currently running the "ExtLib" - ocaml extended > > library - project, and are trying to answer theses questions. For an > > example of a structure that can be used to convert from and between > > several different data structures, you could have a look at the Enum > > module from the ExtLib CVS here : > > I read the code but I am afraid I didn't catch the point : what is the > difference between the 'enum' data structure and a ('a * int) stream ? I agree that an Enum is near from a Stream, but it differs in several points : - as it is purely functional, you can create quite exotic count and next functions, while Stream.from is only working with indexes, and is having a little cost due to the usage of 'a option instead of an exception. - Enum provide easy support for functional operations such as map, while doing it with stream require a little bit of thinking (and from usage) - with a stream, you have access to a "next" functions : you cannot do something like this with Enum . In most cases, you always want to process all your elements in the same way. Enum does not provide user access to next, you can only do iter. In a short way : Stream is designed more as a "tokenizer" where you're querying items one by one, while Enum is more an uniform collection of items, abstracted in a purely fonctional way. > Why is it more convenient than other data structures as an itermediate > data representation ? It's not a data structure since it does not physically store any elements, it can be seen as a common interface from different data structures wanting to provide support for it. See ExtList.of_enum , ExtList.enum, ExtArray.of_enum and ExtArray.enum , one can do the following : let e = ExtList.enum l in let e' = Enum.map f e in ExtArray.of_enum e' To map-and-convert a list to an array. Quite convenient in fact, but I agree Enum is not the Graal of functionnal programming, just a small convenient interface. Nicolas Cannasse ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-05 11:57 ` Yaron M. Minsky 2003-05-05 13:32 ` John Max Skaller @ 2003-05-05 16:38 ` Diego Olivier Fernandez Pons 2003-05-05 18:05 ` Eray Ozkural 2003-05-13 11:35 ` [Caml-list] Data Structure Libraries (was: Two types of efficiency) Oleg Trott 1 sibling, 2 replies; 42+ messages in thread From: Diego Olivier Fernandez Pons @ 2003-05-05 16:38 UTC (permalink / raw) To: Yaron M. Minsky; +Cc: Caml List Bonjour, > You're kidding, right? You're making a classic "best is enemy of > the good" mistake here. Yes, there are lots of implementations, and > it's not clear which of them is absolutely optimal. That doesn't > mean ocaml shouldn't provide built-in support for one "good-enough" > solution. Such support doesn't preclude using whatever the optimal > algorithm is for the situation. But most of the time, it works > fine, and having built in support improves the usability of the > language greatly. Having various algorithms/data structures improves the usability of the language greatly, that is absolutely true. But what you are describing is more the need of a 'large, standard and uniform data structure library' than 'build-in' support. - large, to handle all kind of situations - standard, to insure compatibility - uniform, to be able to switch easily between different implementations > I for one have never quite understood why the Set and Map modules > only provide modular implementations, and why the API is relatively > weak. The Map and the Set modules of the standard library in Caml are a good starting point : they may not be as complete as you would have liked, of course but they will improve with time. Keep in mind that designing a data structure library is a hard work : Chris Okasaki, Ralf Hinze and a lot of others have failed ; Baire has not even been released after 1 year of work, the geometric algorithms in JDSL (Java data structures library) never arrived and after 2 years the new version 2.1 does not provide any real improvment over 2.0.6, etc. The Caml standard library is in the 'not so bad' category Diego Olivier ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-05 16:38 ` Diego Olivier Fernandez Pons @ 2003-05-05 18:05 ` Eray Ozkural 2003-05-06 13:28 ` Diego Olivier Fernandez Pons 2003-05-13 11:35 ` [Caml-list] Data Structure Libraries (was: Two types of efficiency) Oleg Trott 1 sibling, 1 reply; 42+ messages in thread From: Eray Ozkural @ 2003-05-05 18:05 UTC (permalink / raw) To: Diego Olivier Fernandez Pons, Yaron M. Minsky; +Cc: Caml List On Monday 05 May 2003 19:38, Diego Olivier Fernandez Pons wrote: > Keep in mind that designing a data structure library is a hard work : > Chris Okasaki, Ralf Hinze and a lot of others have failed ; Baire has > not even been released after 1 year of work, the geometric algorithms > in JDSL (Java data structures library) never arrived and after 2 years > the new version 2.1 does not provide any real improvment over 2.0.6, > etc. > > The Caml standard library is in the 'not so bad' category Structure monster will help! (^_-) But it's almost certain that I will not write any functional structure library, ahem :P That's more like optimizing for an ensemble of data structures instead of 1 and I think it requires an expertise of its own. Moreover, I don't want to be labeled "failed" :) But in the imperative area, I will consume algorithms one by one ;) I liked Okasaki's stuff btw, I even used those set thingies for a real quick hypergraph implementation (which was admittedly too slow for real life) What I have in mind is tight imperative code like: AVL Trees B+ Trees Disjoint Sets Some dynamic hash code PQ: Binary Heaps, etc. (I already did binary heap) k-d trees (I'm not sure if I wanna go into comp. geometry though ;) ) So my clients will be speed demons who are not concerned with functional beauty of their basic data expressions. OTOH, I will demonstrate that imperative code can be elegant!!! I don't know, what else is left in the world? It's so easy to program in ocaml that I think I will do all of these and more this summer. (Complete with those pretty running time bounds) Assume I'm writing something like that, people don't want it to be functors but simply polymorphic types, is that so? While coding I had the impression that most of the data structures would fit into a functor-ful style. (same goes for haskell classes) Is these concerns because people are tired of the rather redundant means with which one is obliged to instantiate a data structure using functors? For instance if you're doing a priority queue, you want to know at least three things: 1) What is the type of key? 2) What is the type of record? 3) How do I compare them? Not too different from what Set module wants. I even had the wild idea of higher and higher level of abstractions but I don't know where those abstractions would start to bog down the code yet. As I said defining Set like the exactly mathematical way, and then defining implementation classes of sets, and then implementations will be the beginning that path of "i want more" kind of abstractions ;) And a good, in fact, very good thing about abstractions is that I can make a framework that can incorporate other people's code so that I won't have to write the whole world from scratch! I think that's a very nice idea since even I have a finite appetite for coding! I'm also right now developing a bad feeling that the "framework" design might be harder than it sounds... Cheers, -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Two types of efficiency (Was Efficiency of 'a list) 2003-05-05 18:05 ` Eray Ozkural @ 2003-05-06 13:28 ` Diego Olivier Fernandez Pons 0 siblings, 0 replies; 42+ messages in thread From: Diego Olivier Fernandez Pons @ 2003-05-06 13:28 UTC (permalink / raw) To: erayo; +Cc: Caml List Bonjour, On Mon, 5 May 2003, Eray Ozkural wrote: > But it's almost certain that I will not write any functional > structure library, ahem :P That's more like optimizing for an > ensemble of data structures instead of 1 and I think it requires an > expertise of its own. Moreover, I don't want to be labeled "failed" > :) I did not meant to be rude and Chris Okasaki has made a very good job promoting purely functional data structures and some important theoretical work too. In 1997 Roberto Tamassia and Luca Vismara wrote an article comparing the design of GeomLib (the geometric package of JDSL) with the STL, LEDA and CGAL, considering the following points : - ease of use - efficiency - flexibility - reliability - extensibility - reusability - modularity - functionality - correctness checking The only problem is that 6 years after that seminal paper, Geomlib is still unreleased. Okasaki made the same kind of mistake, writting the paper before the library was ended ('Edison is still mostly a framework. That is, I provide signatures, but not yet very many implementations. I indend to populate this framework over the time, adding a new module every few weeks'). > What I have in mind is tight imperative code like: > AVL Trees > B+ Trees > Disjoint Sets > Some dynamic hash code > PQ: Binary Heaps, etc. (I already did binary heap) > k-d trees (I'm not sure if I wanna go into comp. geometry though ;) ) Structure monster should notice that : (a) there are already many data structures (purely functional or imperative) available in Caml. Structure monster should check the Caml Hump (data structures section) or some pages I wrote on the subject (http://www.edite-de-paris.com.fr/~fernandz/Caml/index_en.html). (b) there are already several projects of 'standard libraries', including the official standard library for Caml. Structure monster should not begin one more of this projects but join (or use the same interface, design, naming conventions, ...) one of these. Diego Olivier ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* [Caml-list] Data Structure Libraries (was: Two types of efficiency) 2003-05-05 16:38 ` Diego Olivier Fernandez Pons 2003-05-05 18:05 ` Eray Ozkural @ 2003-05-13 11:35 ` Oleg Trott 1 sibling, 0 replies; 42+ messages in thread From: Oleg Trott @ 2003-05-13 11:35 UTC (permalink / raw) To: Diego Olivier Fernandez Pons; +Cc: Caml List On Monday 05 May 2003 12:38 pm, Diego Olivier Fernandez Pons wrote: > The Map and the Set modules of the standard library in Caml are a good > starting point : they may not be as complete as you would have liked, > of course but they will improve with time. > > Keep in mind that designing a data structure library is a hard work : > Chris Okasaki, Ralf Hinze and a lot of others have failed ; In what sense? Perhaps you have a very inclusive definition of failure :) > Baire has > not even been released after 1 year of work, I understand that you are not designing new algorithms, but coding up ones that are well-known. If so, what is the limiting factor in your work: trying to anticipate what features users might and might not need? Regards Oleg ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-04 4:08 ` alc 2003-05-04 5:32 ` Ed L Cashin 2003-05-04 6:46 ` [Caml-list] Two types of efficiency (Was Efficiency of 'a list) Mattias Waldau @ 2003-05-04 7:55 ` Ville-Pertti Keinonen 2003-05-04 10:56 ` Neel Krishnaswami 2003-05-04 12:38 ` Eray Ozkural 3 siblings, 1 reply; 42+ messages in thread From: Ville-Pertti Keinonen @ 2003-05-04 7:55 UTC (permalink / raw) To: alc; +Cc: caml Mailing List' > I've done a little timing of things, and according to my results: > If you care about efficiency and use OCaml, you should use lists > fairly often, ie if you are always looping and accessing the elements > in order. OCaml can iterate through a list (recursively) about twice as > fast as it can iterate through an array. It can iterate through a > list about as fast as or maybe even a little faster than C or C++ can > iterate through an array. Don't trust microbenchmarks too far over what your knowledge of how things should work tell you. Iterating over arrays is certainly going to be much more cache- and TLB-friendly. BTW: Did you try turning off bounds checking? The OCaml optimizer isn't smart enough to get rid of them. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-04 7:55 ` [Caml-list] Efficiency of 'a list Ville-Pertti Keinonen @ 2003-05-04 10:56 ` Neel Krishnaswami 2003-05-04 12:56 ` Eray Ozkural 0 siblings, 1 reply; 42+ messages in thread From: Neel Krishnaswami @ 2003-05-04 10:56 UTC (permalink / raw) To: caml Mailing List' Ville-Pertti Keinonen writes: > > I've done a little timing of things, and according to my results: > > If you care about efficiency and use OCaml, you should use lists > > fairly often, ie if you are always looping and accessing the elements > > in order. OCaml can iterate through a list (recursively) about twice as > > fast as it can iterate through an array. It can iterate through a > > list about as fast as or maybe even a little faster than C or C++ can > > iterate through an array. > > Don't trust microbenchmarks too far over what your knowledge of how > things should work tell you. Iterating over arrays is certainly > going to be much more cache-and TLB-friendly. This is not be as true you think. Ocaml's garbage collector is a compacting, copying GC, so it's very likely that lists will end up in in continuous blocks of memory. This will end up being nearly as cache-friendly as an array is. The big exception is with arrays of floats -- Ocaml unboxes arrays of floats, but doesn't unbox lists of them. -- Neel Krishnaswami neelk@alum.mit.edu ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-04 10:56 ` Neel Krishnaswami @ 2003-05-04 12:56 ` Eray Ozkural 2003-05-04 13:35 ` Falk Hueffner 0 siblings, 1 reply; 42+ messages in thread From: Eray Ozkural @ 2003-05-04 12:56 UTC (permalink / raw) To: Neel Krishnaswami, caml Mailing List' On Sunday 04 May 2003 13:56, Neel Krishnaswami wrote: > > Don't trust microbenchmarks too far over what your knowledge of how > > things should work tell you. Iterating over arrays is certainly > > going to be much more cache-and TLB-friendly. > > This is not be as true you think. Ocaml's garbage collector is a > compacting, copying GC, so it's very likely that lists will end up in > in continuous blocks of memory. This will end up being nearly as > cache-friendly as an array is. > > The big exception is with arrays of floats -- Ocaml unboxes arrays of > floats, but doesn't unbox lists of them. Now, we're getting some performance talk :) To be precise, comparing an int list and int array we'll see that list occupies twice the same memory and therefore will generate more cache misses. But as the size of 'a in 'a list grows the difference will be negligible. However, if one modifies the elements of an array, in FORTRAN style, won't really be storing huge records inside elements of the array. He will likely split things up in parallel arrays where necessary, and if there are records they will be stored in a private heap and pointers will be stored in the array, etc. Cheers, -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-04 12:56 ` Eray Ozkural @ 2003-05-04 13:35 ` Falk Hueffner 0 siblings, 0 replies; 42+ messages in thread From: Falk Hueffner @ 2003-05-04 13:35 UTC (permalink / raw) To: caml Mailing List' Eray Ozkural <exa@kablonet.com.tr> writes: > To be precise, comparing an int list and int array we'll see that > list occupies twice the same memory Thrice, actually. AFAIK, lists behave exactly like type 'a list = Nil | Cons of 'a * 'a list and therefore require an extra header word that will keep the enumeration number, allocation size and temporary GC information. -- Falk ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-04 4:08 ` alc ` (2 preceding siblings ...) 2003-05-04 7:55 ` [Caml-list] Efficiency of 'a list Ville-Pertti Keinonen @ 2003-05-04 12:38 ` Eray Ozkural 3 siblings, 0 replies; 42+ messages in thread From: Eray Ozkural @ 2003-05-04 12:38 UTC (permalink / raw) To: alc; +Cc: caml Mailing List' On Sunday 04 May 2003 07:08, alc@PublicPropertySoftware.com wrote: > I've done a little timing of things, and according to my results: > If you care about efficiency and use OCaml, you should use lists > fairly often, ie if you are always looping and accessing the elements > in order. OCaml can iterate through a list (recursively) about twice as > fast as it can iterate through an array. It can iterate through a > list about as fast as or maybe even a little faster than C or C++ can > iterate through an array. Considering n appends and m iterations. I must point out that this is non-sense. An open table like my Dynarray module will be a lot faster than a list because of two things: 1) cache coherence (more important) 2) reduced page faults (TLB misses, important after context switches) -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-04 2:08 ` cashin 2003-05-04 4:08 ` alc @ 2003-05-04 8:07 ` Ville-Pertti Keinonen 2003-05-04 15:54 ` Ed L Cashin 2003-05-05 23:52 ` Garry Hodgson 1 sibling, 2 replies; 42+ messages in thread From: Ville-Pertti Keinonen @ 2003-05-04 8:07 UTC (permalink / raw) To: cashin; +Cc: Mattias Waldau, 'Ocaml Mailing List' > Witness the linux kernel, which uses lists when lists are the most > natrural, efficient data structure for the task at hand. Or not. Witness the long-lived O(n) scheduler... And I hope you don't include pre-1.0 versions, which were algorithmically...shocking beyond belief. And this despite the fact that based on his rants, Linus apparently hates all cache-trashing pointer chasing (and Lisp, and garbage collection etc.). ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-04 8:07 ` Ville-Pertti Keinonen @ 2003-05-04 15:54 ` Ed L Cashin 2003-05-05 23:52 ` Garry Hodgson 1 sibling, 0 replies; 42+ messages in thread From: Ed L Cashin @ 2003-05-04 15:54 UTC (permalink / raw) To: Ville-Pertti Keinonen; +Cc: 'Ocaml Mailing List' Ville-Pertti Keinonen <will@exomi.com> writes: >> Witness the linux kernel, which uses lists when lists are the most >> natrural, efficient data structure for the task at hand. > > Or not. Witness the long-lived O(n) scheduler... And I hope you > don't include pre-1.0 versions, which were algorithmically...shocking > beyond belief. > > And this despite the fact that based on his rants, Linus apparently > hates all cache-trashing pointer chasing (and Lisp, and garbage > collection etc.). I'm not going to be dragged into a linux advocacy argument. It suffices to say: Lists are useful data structures, but there's no getting around the need to know what you're doing. Enough said. At least this is my last contribution to this thread. -- --Ed L Cashin PGP public key: http://noserose.net/e/pgp/ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Re: [Caml-list] Efficiency of 'a list 2003-05-04 8:07 ` Ville-Pertti Keinonen 2003-05-04 15:54 ` Ed L Cashin @ 2003-05-05 23:52 ` Garry Hodgson 1 sibling, 0 replies; 42+ messages in thread From: Garry Hodgson @ 2003-05-05 23:52 UTC (permalink / raw) To: 'Ocaml Mailing List' Ville-Pertti Keinonen <will@exomi.com> wrote: > > Witness the linux kernel, which uses lists when lists are the most > > natrural, efficient data structure for the task at hand. > > Or not. Witness the long-lived O(n) scheduler... And I hope you don't > include pre-1.0 versions, which were algorithmically...shocking beyond > belief. FWIW, i believe this is fixed in the forthcoming 2.6 kernel. ---- Garry Hodgson, Senior Hacker, AT&T Labs No act is more patriotic than speaking out when your government is doing the wrong thing in your name. This is not your right but your sacred duty. And none are more treasonous than those who would silence these voices. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-03 5:43 ` Mattias Waldau 2003-05-03 8:16 ` Ville-Pertti Keinonen 2003-05-03 14:12 ` Vitaly Lugovsky @ 2003-05-03 20:03 ` Eray Ozkural 2 siblings, 0 replies; 42+ messages in thread From: Eray Ozkural @ 2003-05-03 20:03 UTC (permalink / raw) To: Mattias Waldau, 'Ocaml Mailing List' On Saturday 03 May 2003 08:43, Mattias Waldau wrote: > If you need to use repeated appends (or @), then lists are inefficient > and there are better solutions. I don't see how this is an analysis of time and space behavior of the kinds of operations I inquired. I also don't see any indication of how memory allocation occurs. -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-02 19:27 [Caml-list] Efficiency of 'a list Eray Ozkural 2003-05-03 5:43 ` Mattias Waldau @ 2003-05-03 21:13 ` Lauri Alanko 2003-05-03 22:03 ` Eray Ozkural 1 sibling, 1 reply; 42+ messages in thread From: Lauri Alanko @ 2003-05-03 21:13 UTC (permalink / raw) To: caml-list On Fri, May 02, 2003 at 10:27:20PM +0300, Eray Ozkural wrote: > Could somebody please point to an explanation of ocaml linked list > implementation or summarize its performance characteristics? This > might seem like a trivial question but having used many functional > languages I know that it's easy to commit performance genocide using > linked lists. It's easy to commit performance genocide with any data structure if you use it for things for which it is not efficient. Singly linked immutable lists have some nice characteristics: O(1) for cons, head and tail, all purely functionally so you can safely share structure between multiple lists. For example, lists are perfect for representing several alternate paths in a path search. > For instance, a naive implementation of an optimal comparison sorting > algorithm in LISP almost invariably results in an O(n^2logn) routine :) Err? You mean something like quicksort? Who would use quicksort on a linked list? Mergesort is O(n log n) as it should be. > Therefore, it would be a good start to explain whether ocaml lists are > in fact LISP lists and if not in what aspects they differ. The main aspects are that the tail of non-empty list _must_ be another list, and that neither the head or tail (car or cdr) of a list cell can be altered. > The motivation for this question comes from trying to understand the use of > linked lists in an efficient algorithm, such as graph algorithms (say we are > implementing topological sort) > > Assume I'm using the following structure that is far from handsome: > type x = (int list) array > > Let a's type be x. Consider codes as the following > a.(i) <- a.(i) @ [x;y;z] > a.(i) <- [x] :: a.(i) > > What travesty results in execution of such codes with i coming from an > arbitrary sequence? Do using such constructs result in unholy > incarnations of space leaks or gross inefficiencies? The first line does an append. So it creates a new list which ends with the list [x;y;z] (the same one you gave @ as an operand) and has all the elements of a.(i) prepended to it. The operation allocates as many cells as a.(i) had, and thus also has to spend time proportional to a.(i)'s length. (@)'s implementation seems to be non-tail-recursive (it would require "cheating" to do it otherwise) so with long lists you might blow the stack. For the second line I think you mean either a.(i) <- x :: a.(i) or a.(i) <- [x] @ a.(i) Both of these are O(1). The first allocates a single cell, the second theoretically two (one of which is discarded immediately) unless the compiler is smart. If you really need to add stuff to both ends of a data structure, ocaml's primitive lists aren't the thing. You might consider some sort of a deque. > Another question, does ocaml make any effort to place members of a > list close to each other? Or, more naturally, the list element is > allocated using a global model and then simply linked inside the > structure? The list _element_ is just the thing which is placed in the list. Its allocation has nothing to do with the list's allocation. The list cells are allocated from the heap like everything else, so temporally closely allocated cells are likely to be physically close as well. However, the copying garbage collector is quite likely to enhance locality of lists when it runs. Someone more knowledgeable can probably tell more on this. Lauri Alanko la@iki.fi ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Caml-list] Efficiency of 'a list 2003-05-03 21:13 ` Lauri Alanko @ 2003-05-03 22:03 ` Eray Ozkural 0 siblings, 0 replies; 42+ messages in thread From: Eray Ozkural @ 2003-05-03 22:03 UTC (permalink / raw) To: Lauri Alanko, caml-list On Sunday 04 May 2003 00:13, Lauri Alanko wrote: > It's easy to commit performance genocide with any data structure if you > use it for things for which it is not efficient. Singly linked immutable > lists have some nice characteristics: O(1) for cons, head and tail, all > purely functionally so you can safely share structure between multiple > lists. For example, lists are perfect for representing several alternate > paths in a path search. OK. So as Mattias said it is the same thing as LISP lists. > > For the second line I think you mean either > a.(i) <- x :: a.(i) > or > a.(i) <- [x] @ a.(i) > *blush* yes I meant the former. > Both of these are O(1). The first allocates a single cell, the second > theoretically two (one of which is discarded immediately) unless the > compiler is smart. > > If you really need to add stuff to both ends of a data structure, > ocaml's primitive lists aren't the thing. You might consider some sort > of a deque. > Hmm. I'm just trying to understand if ocaml lists would be adequate for representing adjacency lists. I thought it'd be easier for programmers to deal with it than something else that I might write. [And since such a thing is likely to pervade my library I should decide early :) ] I also wanted to learn if the compiler went to any length in optimization when it can determine that a mutable field is being overwritten like in a.(i) <- a.(i) @ [x] --- Of course here it is obvious that there is no reference count associated with objects, and another structure might be sharing the list, etc. so there is probably no room for optimization. BTW, we don't have imperative lists in the standard library...I wonder if something like a really simple doubly linked list or non-shared singly linked list would be worthwhile (wasn't that an exercise in ocaml book?) > However, the copying garbage collector is quite likely to enhance > locality of lists when it runs. Someone more knowledgeable can probably > tell more on this. I understand that garbage collectors are pretty smart nowadays. Maybe it might be attempting to compact memory in a way. Happy hacking, -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2003-05-13 11:35 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-05-02 19:27 [Caml-list] Efficiency of 'a list Eray Ozkural 2003-05-03 5:43 ` Mattias Waldau 2003-05-03 8:16 ` Ville-Pertti Keinonen 2003-05-03 14:12 ` Vitaly Lugovsky 2003-05-03 18:43 ` Mattias Waldau 2003-05-03 20:01 ` Eray Ozkural 2003-05-03 23:17 ` Eray Ozkural 2003-05-04 2:08 ` cashin 2003-05-04 4:08 ` alc 2003-05-04 5:32 ` Ed L Cashin 2003-05-04 6:46 ` [Caml-list] Two types of efficiency (Was Efficiency of 'a list) Mattias Waldau 2003-05-04 7:35 ` John Max Skaller 2003-05-04 11:52 ` Olivier Andrieu 2003-05-05 11:04 ` John Max Skaller 2003-05-04 16:48 ` brogoff 2003-05-04 7:43 ` Ville-Pertti Keinonen 2003-05-04 12:50 ` Eray Ozkural 2003-05-04 12:48 ` Eray Ozkural 2003-05-05 7:31 ` Diego Olivier Fernandez Pons 2003-05-05 11:11 ` Mattias Waldau 2003-05-05 13:17 ` John Max Skaller 2003-05-05 11:49 ` Eray Ozkural 2003-05-05 11:57 ` Yaron M. Minsky 2003-05-05 13:32 ` John Max Skaller 2003-05-06 2:49 ` Nicolas Cannasse 2003-05-06 12:30 ` Diego Olivier Fernandez Pons 2003-05-07 2:05 ` Nicolas Cannasse 2003-05-05 16:38 ` Diego Olivier Fernandez Pons 2003-05-05 18:05 ` Eray Ozkural 2003-05-06 13:28 ` Diego Olivier Fernandez Pons 2003-05-13 11:35 ` [Caml-list] Data Structure Libraries (was: Two types of efficiency) Oleg Trott 2003-05-04 7:55 ` [Caml-list] Efficiency of 'a list Ville-Pertti Keinonen 2003-05-04 10:56 ` Neel Krishnaswami 2003-05-04 12:56 ` Eray Ozkural 2003-05-04 13:35 ` Falk Hueffner 2003-05-04 12:38 ` Eray Ozkural 2003-05-04 8:07 ` Ville-Pertti Keinonen 2003-05-04 15:54 ` Ed L Cashin 2003-05-05 23:52 ` Garry Hodgson 2003-05-03 20:03 ` Eray Ozkural 2003-05-03 21:13 ` Lauri Alanko 2003-05-03 22:03 ` Eray Ozkural
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox