* [Caml-list] Suggestion about balanced trees in stdlib @ 2002-05-10 16:59 Gerd Stolpmann 2002-05-10 16:56 ` Alexander V.Voinov ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Gerd Stolpmann @ 2002-05-10 16:59 UTC (permalink / raw) To: caml-list Hello list, I have always wondered why there are two implementations of balanced trees in the standard library: Set and Map. The set of operations is different, but the representation is (almost) identical. Furthermore, neither of the two modules claims to implement balanced trees in the interface; for example Set.iter does not specify the order of iteration although the implementation iterates linearly from the smallest to the largest element. The idea is to introduce a third module Balanced_tree, defining all of the operations for the two mentioned modules, but announcing them as balanced trees. This would have the advantage that the operations could be specified in a more complete way (Balanced_tree.iter is specified as iterating in an ascending way), and it would be reasonable to add further operations that know about the representation. Of course, Set and Map would recur to that more special module. Balanced_tree would have the representation of Map, i.e. the elements are pairs of keys and attached values. To emulate Set, the value () is used. Balanced_tree would have the following operations: - all of Set - all of Map - operations on intervals: "iter_interval", "fold_interval", and "interval" (returning the subset) The interval operations have an important application: One can program indexes that can be accessed by coordinates (e.g.: return all objects that are currently visible in the viewbox). What do you think about this? Gerd -- ---------------------------------------------------------------------------- Gerd Stolpmann Telefon: +49 6151 997705 (privat) Viktoriastr. 45 64293 Darmstadt EMail: gerd@gerd-stolpmann.de Germany ---------------------------------------------------------------------------- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Suggestion about balanced trees in stdlib 2002-05-10 16:59 [Caml-list] Suggestion about balanced trees in stdlib Gerd Stolpmann @ 2002-05-10 16:56 ` Alexander V.Voinov 2002-05-10 22:31 ` Stefano Zacchiroli 2002-05-13 7:23 ` Mattias Waldau 2 siblings, 0 replies; 10+ messages in thread From: Alexander V.Voinov @ 2002-05-10 16:56 UTC (permalink / raw) To: info; +Cc: caml-list Hi From: Gerd Stolpmann <info@gerd-stolpmann.de> Subject: [Caml-list] Suggestion about balanced trees in stdlib Date: Fri, 10 May 2002 18:59:54 +0200 > Balanced_tree would have the following operations: > > - all of Set > - all of Map > - operations on intervals: "iter_interval", "fold_interval", and "interval" > (returning the subset) Yes, this would be great, together with the rest of the proposal. Alexander ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Suggestion about balanced trees in stdlib 2002-05-10 16:59 [Caml-list] Suggestion about balanced trees in stdlib Gerd Stolpmann 2002-05-10 16:56 ` Alexander V.Voinov @ 2002-05-10 22:31 ` Stefano Zacchiroli 2002-05-10 23:05 ` Alexander V.Voinov 2002-05-11 15:47 ` Gerd Stolpmann 2002-05-13 7:23 ` Mattias Waldau 2 siblings, 2 replies; 10+ messages in thread From: Stefano Zacchiroli @ 2002-05-10 22:31 UTC (permalink / raw) To: caml-list On Fri, May 10, 2002 at 06:59:54PM +0200, Gerd Stolpmann wrote: > Of course, Set and Map would recur to that more special module. > > Balanced_tree would have the representation of Map, i.e. the elements are > pairs of keys and attached values. To emulate Set, the value () is used. I don't know how the compiler treat unit values wrt optimization or such, but using an unit in each node to emulate Set seems to me inefficient for a module of the standard library. I think, but I haven't thought a lot about it, that the problem can be solved using an implementation polymorphic in the leaf type hidden to the use that access Set and Map through functors as usual. Anyway the idea seems really good. I also suggest to add different kind of visit (i.e. not only iter) like {pre,post,in}visit via different functions or specifing the desired behaviour when using the functor or both. Cheers. -- Stefano Zacchiroli - undergraduate student of CS @ Univ. Bologna, Italy zack@cs.unibo.it | ICQ# 33538863 | http://www.cs.unibo.it/~zacchiro "I know you believe you understood what you think I said, but I am not sure you realize that what you heard is not what I meant!" -- G.Romney ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Suggestion about balanced trees in stdlib 2002-05-10 22:31 ` Stefano Zacchiroli @ 2002-05-10 23:05 ` Alexander V.Voinov 2002-05-11 7:20 ` Stefano Zacchiroli 2002-05-11 15:47 ` Gerd Stolpmann 1 sibling, 1 reply; 10+ messages in thread From: Alexander V.Voinov @ 2002-05-10 23:05 UTC (permalink / raw) To: zack; +Cc: caml-list Hi From: Stefano Zacchiroli <zack@cs.unibo.it> Subject: Re: [Caml-list] Suggestion about balanced trees in stdlib Date: Sat, 11 May 2002 00:31:10 +0200 > On Fri, May 10, 2002 at 06:59:54PM +0200, Gerd Stolpmann wrote: > > Of course, Set and Map would recur to that more special module. > > > > Balanced_tree would have the representation of Map, i.e. the elements are > > pairs of keys and attached values. To emulate Set, the value () is used. > > I don't know how the compiler treat unit values wrt optimization or > such, but using an unit in each node to emulate Set seems to me > inefficient for a module of the standard library. Is 1 (integer one) better? :-) Alexander ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Suggestion about balanced trees in stdlib 2002-05-10 23:05 ` Alexander V.Voinov @ 2002-05-11 7:20 ` Stefano Zacchiroli 0 siblings, 0 replies; 10+ messages in thread From: Stefano Zacchiroli @ 2002-05-11 7:20 UTC (permalink / raw) To: caml-list On Fri, May 10, 2002 at 04:05:58PM -0700, Alexander V.Voinov wrote: > > I don't know how the compiler treat unit values wrt optimization or > > such, but using an unit in each node to emulate Set seems to me > > inefficient for a module of the standard library. > > Is 1 (integer one) better? :-) No, is the same, they are both unboxed integers. Cheers. -- Stefano Zacchiroli - undergraduate student of CS @ Univ. Bologna, Italy zack@cs.unibo.it | ICQ# 33538863 | http://www.cs.unibo.it/~zacchiro "I know you believe you understood what you think I said, but I am not sure you realize that what you heard is not what I meant!" -- G.Romney ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Suggestion about balanced trees in stdlib 2002-05-10 22:31 ` Stefano Zacchiroli 2002-05-10 23:05 ` Alexander V.Voinov @ 2002-05-11 15:47 ` Gerd Stolpmann 2002-05-11 17:18 ` Stefano Zacchiroli 1 sibling, 1 reply; 10+ messages in thread From: Gerd Stolpmann @ 2002-05-11 15:47 UTC (permalink / raw) To: Stefano Zacchiroli; +Cc: caml-list On 2002.05.11 00:31 Stefano Zacchiroli wrote: > On Fri, May 10, 2002 at 06:59:54PM +0200, Gerd Stolpmann wrote: > > Of course, Set and Map would recur to that more special module. > > > > Balanced_tree would have the representation of Map, i.e. the elements are > > pairs of keys and attached values. To emulate Set, the value () is used. > > I don't know how the compiler treat unit values wrt optimization or > such, but using an unit in each node to emulate Set seems to me > inefficient for a module of the standard library. > > I think, but I haven't thought a lot about it, that the problem can be > solved using an implementation polymorphic in the leaf type hidden to > the use that access Set and Map through functors as usual. type 'a map = Empty | Node of 'a map * key * 'a * 'a map * int type set = Empty | Node of set * elt * set * int The difference is that every map node has 6 words, and every set node consists of 5 words. My suggestion is to prefer the map representation, and define set = unit map, wasting one word per node, and making sets 20% larger. The other possibility would be to prefer the set representation, e.g. type 'a baltree = Empty | Node of 'a baltree * 'a elt * 'a baltree * int and to reduce map and set with some functor tricks to this representation. Now sets are not larger, but maps require 8 words of memory, because you have to use an additional pair 'a elt = ('a, map_elt), i.e. maps are 33% larger. I think this is the worse choice. BTW, if memory consumption matters, the current representation can be still optimized by removing the balance factor, and encoding it in the variant tag: type set = Empty | Node_0 of set * elt * set | Node_l of <same> | Node_r of <same> Of course, the readability of the code would suffer, but one can argue that the overall performance is more important than code beauty for the standard library. However, the point is that it is a matter of discussion what is acceptable in the stdlib, because there are a lot of views to this central library. I would prefer to pay an increased memory consumption of 20% for sets, and to get the possibility to use balanced trees directly for that costs. Maybe other people have different views on this. > Anyway the idea seems really good. > > I also suggest to add different kind of visit (i.e. not only iter) like > {pre,post,in}visit via different functions or specifing the desired > behaviour when using the functor or both. Why not export functions that access the subtrees directly: left_subtree, right_subtree, top_element. Gerd -- ---------------------------------------------------------------------------- Gerd Stolpmann Telefon: +49 6151 997705 (privat) Viktoriastr. 45 64293 Darmstadt EMail: gerd@gerd-stolpmann.de Germany ---------------------------------------------------------------------------- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Suggestion about balanced trees in stdlib 2002-05-11 15:47 ` Gerd Stolpmann @ 2002-05-11 17:18 ` Stefano Zacchiroli 2002-05-11 17:36 ` Dave Mason 2002-05-12 0:03 ` polux moon 0 siblings, 2 replies; 10+ messages in thread From: Stefano Zacchiroli @ 2002-05-11 17:18 UTC (permalink / raw) To: caml-list On Sat, May 11, 2002 at 05:47:04PM +0200, Gerd Stolpmann wrote: > type 'a map = Empty | Node of 'a map * key * 'a * 'a map * int > > type set = Empty | Node of set * elt * set * int > > The difference is that every map node has 6 words, and every set node > consists of 5 words. My suggestion is to prefer the map representation, > and define set = unit map, wasting one word per node, and making sets > 20% larger. Ok, I agree that this is the "clean" way to the goal. Anyway I think that memory consumption is crucial for the standard library, this is why I'm proposing a dirty trick for the set/map matter. Usually I dislike tricks and prefer clean programming, but IMHO a standard library is somewhat a particular case. <WARNING_dirty_trick> Why not use a single source file that implement both set and map using some campl4 tricks like pa_ifdef module? You may have two different definitions of tree and a pool of accessor functions working on a tree and returning single components (hoping that the compiler inline this kind of functions). In the set case you doesn't need to use the value component of a tree so you can avoid to compile the relative accessor function (poor gain) and you can also compile a tree tuple without a value component. </WARNING_dirty_trick> [ Question: gurus, do you thinks this is a feasible solution or I'm missing something? ] I know, this is a really dirty trick. Also I don't know if is applicable because I have never looked at the implementations set.ml and map.ml but just as an idea it can work. Pros: - reduce memory consumption for Set - no code duplication Cons: - objects (.cm{a,o,x...}) duplication - really dirty! - last time I looked at it camlp4 pa_ifdef module doesn't permit multiple declaration using only one "if" construct so you incurs in situations like: ifdef FOO then let bar = ... ifdef FOO then let quux = ... ifdef FOO then let baz = ... Just a [dirty] idea. Cheers. -- Stefano Zacchiroli - undergraduate student of CS @ Univ. Bologna, Italy zack@cs.unibo.it | ICQ# 33538863 | http://www.cs.unibo.it/~zacchiro "I know you believe you understood what you think I said, but I am not sure you realize that what you heard is not what I meant!" -- G.Romney ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Suggestion about balanced trees in stdlib 2002-05-11 17:18 ` Stefano Zacchiroli @ 2002-05-11 17:36 ` Dave Mason 2002-05-12 0:03 ` polux moon 1 sibling, 0 replies; 10+ messages in thread From: Dave Mason @ 2002-05-11 17:36 UTC (permalink / raw) To: caml-list A less-dirty idea: Sorry, I scanned Gerd's comment and deleted it. I agree unifying Set and Map would be good, but don't like the idea of *increasing* the size of an existing module's data structure. And I'm not crazy about Stefano's dirty trick. :-) Gerd suggested that the int weighting factor could be turned into a constructor. Doing that would save the word that making a Set be a unit Map would cost, so Set would be the same size as now, Map would be smaller, and they'd be integrated. (Yes, you could make Set yet smaller, but this way that complexity (of constructors) is only in one place. You could also save that extra word, and make it a little faster with the ugly trick of using recursive records, where a record that points to itself is the sentinel for Empty... but then the = operation becomes non-terminating, so this isn't something that I think is worth it in a stdlib kind of module. Sorry if I missed something, and this is senseless mumbling... ../Dave ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Caml-list] Suggestion about balanced trees in stdlib 2002-05-11 17:18 ` Stefano Zacchiroli 2002-05-11 17:36 ` Dave Mason @ 2002-05-12 0:03 ` polux moon 1 sibling, 0 replies; 10+ messages in thread From: polux moon @ 2002-05-12 0:03 UTC (permalink / raw) To: caml-list In a tree, there are many leaf > type set = Empty | Node of set * elt * set * int Why not add Leaf of elt Instead of using Node(Empty,some value,Empty,0) u can use Leaf(some value). It is clear and just add some code. U win 3 word per leaf. (in a tree the leaf are about the half of the nodes) ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [Caml-list] Suggestion about balanced trees in stdlib 2002-05-10 16:59 [Caml-list] Suggestion about balanced trees in stdlib Gerd Stolpmann 2002-05-10 16:56 ` Alexander V.Voinov 2002-05-10 22:31 ` Stefano Zacchiroli @ 2002-05-13 7:23 ` Mattias Waldau 2 siblings, 0 replies; 10+ messages in thread From: Mattias Waldau @ 2002-05-13 7:23 UTC (permalink / raw) To: 'Gerd Stolpmann', caml-list > - operations on intervals: "iter_interval", "fold_interval", > and "interval" In order to solve the mentioned operations, I usually use Splay-trees, which is part of the CDK. They have the following operations. val floor: 'a t -> key -> key * 'a (* [floor s c] returns the greatest element with [key <= c] *) val ceil: 'a t -> key -> key * 'a (* [ceil s c] returns the smallest element with [key >= c] *) val prev: 'a t -> key -> key * 'a (* [prev s c] returns the greatest element with [key < c] *) val next: 'a t -> key -> key * 'a (* [next s c] returns the smallest element with [key > c] *) ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2002-05-13 7:23 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-05-10 16:59 [Caml-list] Suggestion about balanced trees in stdlib Gerd Stolpmann 2002-05-10 16:56 ` Alexander V.Voinov 2002-05-10 22:31 ` Stefano Zacchiroli 2002-05-10 23:05 ` Alexander V.Voinov 2002-05-11 7:20 ` Stefano Zacchiroli 2002-05-11 15:47 ` Gerd Stolpmann 2002-05-11 17:18 ` Stefano Zacchiroli 2002-05-11 17:36 ` Dave Mason 2002-05-12 0:03 ` polux moon 2002-05-13 7:23 ` Mattias Waldau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox