* Re: [Caml-list] Why AVL-tree? @ 2014-06-02 13:21 Damien Guichard 2014-06-02 13:34 ` Andrew Herron 0 siblings, 1 reply; 17+ messages in thread From: Damien Guichard @ 2014-06-02 13:21 UTC (permalink / raw) To: Caml List [-- Attachment #1: Type: text/plain, Size: 545 bytes --] Red-black tree would spare a machine word per node, because a red-black tree doesn't need depth information. Hence the reason is either historical or a space/speed trade-off (comparing two depths may be faster than pattern matching). Regards, damien guichard Hi, list, Just from the curiosity, why balanced binary trees used in Set and Map are AVL-trees, not their alternative, say, red-black trees? Is there a deep reason for it, or just a historical one? Best, -- Yoriyuki Yamagata http://yoriyuki.info/ [-- Attachment #2: Type: text/html, Size: 1539 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-02 13:21 [Caml-list] Why AVL-tree? Damien Guichard @ 2014-06-02 13:34 ` Andrew Herron 2014-06-02 15:06 ` Gabriel Scherer 2014-06-02 16:57 ` Xavier Leroy 0 siblings, 2 replies; 17+ messages in thread From: Andrew Herron @ 2014-06-02 13:34 UTC (permalink / raw) To: Damien Guichard; +Cc: Caml List [-- Attachment #1: Type: text/plain, Size: 1244 bytes --] Wikipedia has some notes on the difference: http://en.wikipedia.org/wiki/AVL_tree AVL has faster lookup, so maybe they decided to optimise for that. It's different to some other languages I've seen, but then so is their decision to not use a tail recursive List.map. Each to their own, it's not hard to implement the alternative :) On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr> wrote: > > Red-black tree would spare a machine word per node, because a red-black tree > doesn't need depth information. > Hence the reason is either historical or a space/speed trade-off (comparing > two depths may be faster than pattern matching). > > Regards, > > damien guichard > Hi, list, > Just from the curiosity, why balanced binary trees used in Set and Map are > AVL-trees, not their alternative, say, red-black trees? Is there a deep > reason for it, or just a historical one? > Best, > -- > Yoriyuki Yamagata > http://yoriyuki.info/ > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs [-- Attachment #2: Type: text/html, Size: 2110 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-02 13:34 ` Andrew Herron @ 2014-06-02 15:06 ` Gabriel Scherer 2014-06-03 12:48 ` Yaron Minsky 2014-06-02 16:57 ` Xavier Leroy 1 sibling, 1 reply; 17+ messages in thread From: Gabriel Scherer @ 2014-06-02 15:06 UTC (permalink / raw) To: Andrew Herron; +Cc: Damien Guichard, Caml List Note that OCaml's balanced trees are not exactly what is usually called AVL, as the imbalance between different branches can be at most 2 (+1 on one side and -1 on the other) instead of just 1 as the traditional definition assumes. On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com> wrote: > Wikipedia has some notes on the difference: > > http://en.wikipedia.org/wiki/AVL_tree > > AVL has faster lookup, so maybe they decided to optimise for that. > > It's different to some other languages I've seen, but then so is their > decision to not use a tail recursive List.map. Each to their own, it's not > hard to implement the alternative :) > > > On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr> > wrote: >> >> >> Red-black tree would spare a machine word per node, because a red-black >> tree doesn't need depth information. >> Hence the reason is either historical or a space/speed trade-off >> (comparing two depths may be faster than pattern matching). >> >> Regards, >> >> damien guichard >> >> Hi, list, >> >> Just from the curiosity, why balanced binary trees used in Set and Map are >> AVL-trees, not their alternative, say, red-black trees? Is there a deep >> reason for it, or just a historical one? >> >> Best, >> -- >> Yoriyuki Yamagata >> http://yoriyuki.info/ >> >> > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-02 15:06 ` Gabriel Scherer @ 2014-06-03 12:48 ` Yaron Minsky 2014-06-03 13:12 ` Gabriel Scherer 2014-06-03 13:41 ` Yoriyuki Yamagata 0 siblings, 2 replies; 17+ messages in thread From: Yaron Minsky @ 2014-06-03 12:48 UTC (permalink / raw) To: Gabriel Scherer; +Cc: Andrew Herron, Damien Guichard, Caml List, David Powers The following summary of what we do with respect to Maps and Sets in Core was written by David Powers (who isn't yet subscribe to the list, so he asked me to forward it on.) In Core we use a slight modification of the AVL tree found in the standard library. I think the biggest change (other than the interface) is that we add a specialized constructor (Leaf of 'key * 'value) as a specialization of Node (left * key * value * right) to limit allocation. It's a nice speed bump and doesn't do too much damage to the readability of the code. We also spent a bunch of time last summer working through the research papers of the last 10 years to see if we could find an implementation we liked better. I'd have to pull up the full history of the project to give real details, but we tried at least all of the following: - red-black trees - left-leaning red-black trees - treaps (including a variant that stored entropy in the spare bits in the variant tag) - splay trees - weight balanced trees - AVL trees with GADT enforcement of the invariants - 1-2 brother trees I'll lead with the caveat that benchmarking is hard, and these structures shine in different ways depending on the type of workload you throw at them. Each implementation below was also mostly a first-pass to understand the structure and do simple tests, so there may be more speed gold in the hills. Your mileage may vary. That said, our conclusions at the end: - red black trees are hard to code and understand (mostly due to remove), and don't show a real performance win. - treaps are a wonderful structure in terms of code simplicity, but getting enough randomness quickly enough is too costly to make them a win over AVL trees (you need to allocate just as much and you need to generate randomness) - splay trees are in our tree, but are too special purpose to be a general win. - Weight balanced trees are a nice structure, and are used in other languages/libraries. They were neither better or worse than AVL trees. - AVL trees with GADT enforcement work, but were actually slower than straightforward AVL trees at the time we tested them. There is some extra matching due to the variant having more cases, so perhaps this isn't surprising. It's also likely that we didn't carry the 2-imbalance trick into the GADT version, which might have skewed the result. - 1-2 brother trees were the best of the lot, and we actually produced a version of the code that we felt was an overall win (or tie) for all workloads. Unfortunately, the optimizations we needed to get us there made the code much longer and harder to understand than the AVL tree code. We just couldn't convince ourselves that it was worth it. Probably the most important point is that nothing we did above gave a general win of more than 10-20% in the tight loop case. Given that, we kept our tweaked AVL tree implementation. If you want to be very very fast, you probably can't get away with a map, and if you just want to be "fast enough" the AVL tree we have is a nice set of tradeoffs for code complexity. On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer <gabriel.scherer@gmail.com> wrote: > Note that OCaml's balanced trees are not exactly what is usually > called AVL, as the imbalance between different branches can be at most > 2 (+1 on one side and -1 on the other) instead of just 1 as the > traditional definition assumes. > > On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com> wrote: >> Wikipedia has some notes on the difference: >> >> http://en.wikipedia.org/wiki/AVL_tree >> >> AVL has faster lookup, so maybe they decided to optimise for that. >> >> It's different to some other languages I've seen, but then so is their >> decision to not use a tail recursive List.map. Each to their own, it's not >> hard to implement the alternative :) >> >> >> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr> >> wrote: >>> >>> >>> Red-black tree would spare a machine word per node, because a red-black >>> tree doesn't need depth information. >>> Hence the reason is either historical or a space/speed trade-off >>> (comparing two depths may be faster than pattern matching). >>> >>> Regards, >>> >>> damien guichard >>> >>> Hi, list, >>> >>> Just from the curiosity, why balanced binary trees used in Set and Map are >>> AVL-trees, not their alternative, say, red-black trees? Is there a deep >>> reason for it, or just a historical one? >>> >>> Best, >>> -- >>> Yoriyuki Yamagata >>> http://yoriyuki.info/ >>> >>> >> > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-03 12:48 ` Yaron Minsky @ 2014-06-03 13:12 ` Gabriel Scherer 2014-06-03 13:37 ` Yaron Minsky 2014-06-03 13:41 ` Yoriyuki Yamagata 1 sibling, 1 reply; 17+ messages in thread From: Gabriel Scherer @ 2014-06-03 13:12 UTC (permalink / raw) To: Yaron Minsky; +Cc: Andrew Herron, Damien Guichard, Caml List, David Powers Thanks Yaron, that is very interesting feedback. Would you happen to have the same kind of information about your experiments with balanced tree buckets for Hashtable? I'm quite interested in their good worst-case behavior, and considered experimenting with such a structure for Batteries, but didn't have time to look at it so far. On Tue, Jun 3, 2014 at 2:48 PM, Yaron Minsky <yminsky@janestreet.com> wrote: > The following summary of what we do with respect to Maps and Sets in > Core was written by David Powers (who isn't yet subscribe to the list, > so he asked me to forward it on.) > > In Core we use a slight modification of the AVL tree found in the > standard library. I think the biggest change (other than the > interface) is that we add a specialized constructor (Leaf of 'key * > 'value) as a specialization of Node (left * key * value * right) to > limit allocation. It's a nice speed bump and doesn't do too much > damage to the readability of the code. > > We also spent a bunch of time last summer working through the research > papers of the last 10 years to see if we could find an implementation > we liked better. I'd have to pull up the full history of the project > to give real details, but we tried at least all of the following: > > - red-black trees > - left-leaning red-black trees > - treaps (including a variant that stored entropy in the spare bits in > the variant tag) > - splay trees > - weight balanced trees > - AVL trees with GADT enforcement of the invariants > - 1-2 brother trees > > I'll lead with the caveat that benchmarking is hard, and these > structures shine in different ways depending on the type of workload > you throw at them. Each implementation below was also mostly a > first-pass to understand the structure and do simple tests, so there > may be more speed gold in the hills. Your mileage may vary. > > That said, our conclusions at the end: > > - red black trees are hard to code and understand (mostly due to > remove), and don't show a real performance win. > > - treaps are a wonderful structure in terms of code simplicity, but > getting enough randomness quickly enough is too costly to make them a > win over AVL trees (you need to allocate just as much and you need to > generate randomness) > > - splay trees are in our tree, but are too special purpose to be a general win. > > - Weight balanced trees are a nice structure, and are used in other > languages/libraries. They were neither better or worse than AVL > trees. > > - AVL trees with GADT enforcement work, but were actually slower than > straightforward AVL trees at the time we tested them. There is some > extra matching due to the variant having more cases, so perhaps this > isn't surprising. It's also likely that we didn't carry the > 2-imbalance trick into the GADT version, which might have skewed the > result. > > - 1-2 brother trees were the best of the lot, and we actually produced > a version of the code that we felt was an overall win (or tie) for all > workloads. Unfortunately, the optimizations we needed to get us there > made the code much longer and harder to understand than the AVL tree > code. We just couldn't convince ourselves that it was worth it. > > Probably the most important point is that nothing we did above gave a > general win of more than 10-20% in the tight loop case. Given that, > we kept our tweaked AVL tree implementation. If you want to be very > very fast, you probably can't get away with a map, and if you just > want to be "fast enough" the AVL tree we have is a nice set of > tradeoffs for code complexity. > > On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer > <gabriel.scherer@gmail.com> wrote: >> Note that OCaml's balanced trees are not exactly what is usually >> called AVL, as the imbalance between different branches can be at most >> 2 (+1 on one side and -1 on the other) instead of just 1 as the >> traditional definition assumes. >> >> On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com> wrote: >>> Wikipedia has some notes on the difference: >>> >>> http://en.wikipedia.org/wiki/AVL_tree >>> >>> AVL has faster lookup, so maybe they decided to optimise for that. >>> >>> It's different to some other languages I've seen, but then so is their >>> decision to not use a tail recursive List.map. Each to their own, it's not >>> hard to implement the alternative :) >>> >>> >>> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr> >>> wrote: >>>> >>>> >>>> Red-black tree would spare a machine word per node, because a red-black >>>> tree doesn't need depth information. >>>> Hence the reason is either historical or a space/speed trade-off >>>> (comparing two depths may be faster than pattern matching). >>>> >>>> Regards, >>>> >>>> damien guichard >>>> >>>> Hi, list, >>>> >>>> Just from the curiosity, why balanced binary trees used in Set and Map are >>>> AVL-trees, not their alternative, say, red-black trees? Is there a deep >>>> reason for it, or just a historical one? >>>> >>>> Best, >>>> -- >>>> Yoriyuki Yamagata >>>> http://yoriyuki.info/ >>>> >>>> >>> >> >> -- >> Caml-list mailing list. Subscription management and archives: >> https://sympa.inria.fr/sympa/arc/caml-list >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-03 13:12 ` Gabriel Scherer @ 2014-06-03 13:37 ` Yaron Minsky 0 siblings, 0 replies; 17+ messages in thread From: Yaron Minsky @ 2014-06-03 13:37 UTC (permalink / raw) To: Gabriel Scherer Cc: caml-list, Andrew Herron, David Powers, Damien Guichard, Eric Stokes [-- Attachment #1: Type: text/plain, Size: 5753 bytes --] Looping in Eric Stokes, who did the benchmarking on that. On Jun 3, 2014 9:13 AM, "Gabriel Scherer" <gabriel.scherer@gmail.com> wrote: > Thanks Yaron, that is very interesting feedback. > > Would you happen to have the same kind of information about your > experiments with balanced tree buckets for Hashtable? I'm quite > interested in their good worst-case behavior, and considered > experimenting with such a structure for Batteries, but didn't have > time to look at it so far. > > On Tue, Jun 3, 2014 at 2:48 PM, Yaron Minsky <yminsky@janestreet.com> > wrote: > > The following summary of what we do with respect to Maps and Sets in > > Core was written by David Powers (who isn't yet subscribe to the list, > > so he asked me to forward it on.) > > > > In Core we use a slight modification of the AVL tree found in the > > standard library. I think the biggest change (other than the > > interface) is that we add a specialized constructor (Leaf of 'key * > > 'value) as a specialization of Node (left * key * value * right) to > > limit allocation. It's a nice speed bump and doesn't do too much > > damage to the readability of the code. > > > > We also spent a bunch of time last summer working through the research > > papers of the last 10 years to see if we could find an implementation > > we liked better. I'd have to pull up the full history of the project > > to give real details, but we tried at least all of the following: > > > > - red-black trees > > - left-leaning red-black trees > > - treaps (including a variant that stored entropy in the spare bits in > > the variant tag) > > - splay trees > > - weight balanced trees > > - AVL trees with GADT enforcement of the invariants > > - 1-2 brother trees > > > > I'll lead with the caveat that benchmarking is hard, and these > > structures shine in different ways depending on the type of workload > > you throw at them. Each implementation below was also mostly a > > first-pass to understand the structure and do simple tests, so there > > may be more speed gold in the hills. Your mileage may vary. > > > > That said, our conclusions at the end: > > > > - red black trees are hard to code and understand (mostly due to > > remove), and don't show a real performance win. > > > > - treaps are a wonderful structure in terms of code simplicity, but > > getting enough randomness quickly enough is too costly to make them a > > win over AVL trees (you need to allocate just as much and you need to > > generate randomness) > > > > - splay trees are in our tree, but are too special purpose to be a > general win. > > > > - Weight balanced trees are a nice structure, and are used in other > > languages/libraries. They were neither better or worse than AVL > > trees. > > > > - AVL trees with GADT enforcement work, but were actually slower than > > straightforward AVL trees at the time we tested them. There is some > > extra matching due to the variant having more cases, so perhaps this > > isn't surprising. It's also likely that we didn't carry the > > 2-imbalance trick into the GADT version, which might have skewed the > > result. > > > > - 1-2 brother trees were the best of the lot, and we actually produced > > a version of the code that we felt was an overall win (or tie) for all > > workloads. Unfortunately, the optimizations we needed to get us there > > made the code much longer and harder to understand than the AVL tree > > code. We just couldn't convince ourselves that it was worth it. > > > > Probably the most important point is that nothing we did above gave a > > general win of more than 10-20% in the tight loop case. Given that, > > we kept our tweaked AVL tree implementation. If you want to be very > > very fast, you probably can't get away with a map, and if you just > > want to be "fast enough" the AVL tree we have is a nice set of > > tradeoffs for code complexity. > > > > On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer > > <gabriel.scherer@gmail.com> wrote: > >> Note that OCaml's balanced trees are not exactly what is usually > >> called AVL, as the imbalance between different branches can be at most > >> 2 (+1 on one side and -1 on the other) instead of just 1 as the > >> traditional definition assumes. > >> > >> On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com> > wrote: > >>> Wikipedia has some notes on the difference: > >>> > >>> http://en.wikipedia.org/wiki/AVL_tree > >>> > >>> AVL has faster lookup, so maybe they decided to optimise for that. > >>> > >>> It's different to some other languages I've seen, but then so is their > >>> decision to not use a tail recursive List.map. Each to their own, it's > not > >>> hard to implement the alternative :) > >>> > >>> > >>> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr > > > >>> wrote: > >>>> > >>>> > >>>> Red-black tree would spare a machine word per node, because a > red-black > >>>> tree doesn't need depth information. > >>>> Hence the reason is either historical or a space/speed trade-off > >>>> (comparing two depths may be faster than pattern matching). > >>>> > >>>> Regards, > >>>> > >>>> damien guichard > >>>> > >>>> Hi, list, > >>>> > >>>> Just from the curiosity, why balanced binary trees used in Set and > Map are > >>>> AVL-trees, not their alternative, say, red-black trees? Is there a > deep > >>>> reason for it, or just a historical one? > >>>> > >>>> Best, > >>>> -- > >>>> Yoriyuki Yamagata > >>>> http://yoriyuki.info/ > >>>> > >>>> > >>> > >> > >> -- > >> Caml-list mailing list. Subscription management and archives: > >> https://sympa.inria.fr/sympa/arc/caml-list > >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > >> Bug reports: http://caml.inria.fr/bin/caml-bugs > [-- Attachment #2: Type: text/html, Size: 7694 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-03 12:48 ` Yaron Minsky 2014-06-03 13:12 ` Gabriel Scherer @ 2014-06-03 13:41 ` Yoriyuki Yamagata 1 sibling, 0 replies; 17+ messages in thread From: Yoriyuki Yamagata @ 2014-06-03 13:41 UTC (permalink / raw) To: Caml List; +Cc: David Powers [-- Attachment #1: Type: text/plain, Size: 5697 bytes --] Thank you everyone for answering my questions. By the way, in Batteries Included, AVL-trees used in BatISet and BatIMap allow height difference by 1 (?). I think the code is originated from my Camomile. When I wrote them, I'm afraid of "deviating" from the text book definition of AVL tree. Maybe the change is order? Best, 2014-06-03 21:48 GMT+09:00 Yaron Minsky <yminsky@janestreet.com>: > The following summary of what we do with respect to Maps and Sets in > Core was written by David Powers (who isn't yet subscribe to the list, > so he asked me to forward it on.) > > In Core we use a slight modification of the AVL tree found in the > standard library. I think the biggest change (other than the > interface) is that we add a specialized constructor (Leaf of 'key * > 'value) as a specialization of Node (left * key * value * right) to > limit allocation. It's a nice speed bump and doesn't do too much > damage to the readability of the code. > > We also spent a bunch of time last summer working through the research > papers of the last 10 years to see if we could find an implementation > we liked better. I'd have to pull up the full history of the project > to give real details, but we tried at least all of the following: > > - red-black trees > - left-leaning red-black trees > - treaps (including a variant that stored entropy in the spare bits in > the variant tag) > - splay trees > - weight balanced trees > - AVL trees with GADT enforcement of the invariants > - 1-2 brother trees > > I'll lead with the caveat that benchmarking is hard, and these > structures shine in different ways depending on the type of workload > you throw at them. Each implementation below was also mostly a > first-pass to understand the structure and do simple tests, so there > may be more speed gold in the hills. Your mileage may vary. > > That said, our conclusions at the end: > > - red black trees are hard to code and understand (mostly due to > remove), and don't show a real performance win. > > - treaps are a wonderful structure in terms of code simplicity, but > getting enough randomness quickly enough is too costly to make them a > win over AVL trees (you need to allocate just as much and you need to > generate randomness) > > - splay trees are in our tree, but are too special purpose to be a general > win. > > - Weight balanced trees are a nice structure, and are used in other > languages/libraries. They were neither better or worse than AVL > trees. > > - AVL trees with GADT enforcement work, but were actually slower than > straightforward AVL trees at the time we tested them. There is some > extra matching due to the variant having more cases, so perhaps this > isn't surprising. It's also likely that we didn't carry the > 2-imbalance trick into the GADT version, which might have skewed the > result. > > - 1-2 brother trees were the best of the lot, and we actually produced > a version of the code that we felt was an overall win (or tie) for all > workloads. Unfortunately, the optimizations we needed to get us there > made the code much longer and harder to understand than the AVL tree > code. We just couldn't convince ourselves that it was worth it. > > Probably the most important point is that nothing we did above gave a > general win of more than 10-20% in the tight loop case. Given that, > we kept our tweaked AVL tree implementation. If you want to be very > very fast, you probably can't get away with a map, and if you just > want to be "fast enough" the AVL tree we have is a nice set of > tradeoffs for code complexity. > > On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer > <gabriel.scherer@gmail.com> wrote: > > Note that OCaml's balanced trees are not exactly what is usually > > called AVL, as the imbalance between different branches can be at most > > 2 (+1 on one side and -1 on the other) instead of just 1 as the > > traditional definition assumes. > > > > On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com> > wrote: > >> Wikipedia has some notes on the difference: > >> > >> http://en.wikipedia.org/wiki/AVL_tree > >> > >> AVL has faster lookup, so maybe they decided to optimise for that. > >> > >> It's different to some other languages I've seen, but then so is their > >> decision to not use a tail recursive List.map. Each to their own, it's > not > >> hard to implement the alternative :) > >> > >> > >> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr> > >> wrote: > >>> > >>> > >>> Red-black tree would spare a machine word per node, because a red-black > >>> tree doesn't need depth information. > >>> Hence the reason is either historical or a space/speed trade-off > >>> (comparing two depths may be faster than pattern matching). > >>> > >>> Regards, > >>> > >>> damien guichard > >>> > >>> Hi, list, > >>> > >>> Just from the curiosity, why balanced binary trees used in Set and Map > are > >>> AVL-trees, not their alternative, say, red-black trees? Is there a > deep > >>> reason for it, or just a historical one? > >>> > >>> Best, > >>> -- > >>> Yoriyuki Yamagata > >>> http://yoriyuki.info/ > >>> > >>> > >> > > > > -- > > Caml-list mailing list. Subscription management and archives: > > https://sympa.inria.fr/sympa/arc/caml-list > > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- Yoriyuki Yamagata *http://yoriyuki.info/ <http://yoriyuki.info/>* [-- Attachment #2: Type: text/html, Size: 7759 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-02 13:34 ` Andrew Herron 2014-06-02 15:06 ` Gabriel Scherer @ 2014-06-02 16:57 ` Xavier Leroy 2014-06-02 21:16 ` Andrew Herron ` (2 more replies) 1 sibling, 3 replies; 17+ messages in thread From: Xavier Leroy @ 2014-06-02 16:57 UTC (permalink / raw) To: caml-list On 02/06/14 15:34, Andrew Herron wrote: > It's different to some other languages I've seen, but then so is their > decision to not use a tail recursive List.map. Each to their own, it's not > hard to implement the alternative :) Yes, we're stupid, of course. Yoriyuki Yamagata originally asked: > Just from the curiosity, why balanced binary trees used in Set and > Map are AVL-trees, not their alternative, say, red-black trees? Is > there a deep reason for it, or just a historical one? At the time Set was written, no efficient algorithms for whole-set operations (union, intersection, difference, etc) were known for red-black trees. I'm not sure they are known today. As for performance of insert/lookup operations, Jean-Christophe Filliâtre has measurements showing that OCaml's 2-imbalance AVL trees perform better than red-black trees. It all depends on your ratio of insertions to lookups, of course. But the 2-imbalance trick makes a big difference with textbook AVL trees. - Xavier Leroy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-02 16:57 ` Xavier Leroy @ 2014-06-02 21:16 ` Andrew Herron 2014-06-10 18:19 ` jonikelee 2014-08-03 21:25 ` Diego Olivier Fernandez Pons 2 siblings, 0 replies; 17+ messages in thread From: Andrew Herron @ 2014-06-02 21:16 UTC (permalink / raw) To: Xavier Leroy; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 566 bytes --] On Tue, Jun 3, 2014 at 02:58 AM, Xavier Leroy<xavier.leroy@inria.fr>, wrote: On 02/06/14 15:34, Andrew Herron wrote: > It's different to some other languages I've seen, but then so is their > decision to not use a tail recursive List.map. Each to their own, it's not > hard to implement the alternative :) Yes, we're stupid, of course. I didn't mean to imply that. My apologies if it came across that way. I'm supportive of doing things differently; if everyone did the same thing the world would never evolve. Cheers, Andy [-- Attachment #2: Type: text/html, Size: 1164 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-02 16:57 ` Xavier Leroy 2014-06-02 21:16 ` Andrew Herron @ 2014-06-10 18:19 ` jonikelee 2014-06-10 18:51 ` Florian Hars 2014-06-15 4:51 ` Lukasz Stafiniak 2014-08-03 21:25 ` Diego Olivier Fernandez Pons 2 siblings, 2 replies; 17+ messages in thread From: jonikelee @ 2014-06-10 18:19 UTC (permalink / raw) To: Xavier.Leroy, caml-list I am coming here from the Coq mailing list, because of a link posted there about this e-mail thread. I have been working on Coq-verified and fully proof-erased versions of AVL and red-black trees, as well as a variant I didn't know existed before, until I read about "2-imbalance AVL trees" in this thread. The github address for my development is: https://github.com/jonleivent/mindless-coding and the "gaptree" development there may be the same as your "2-imbalance" AVL trees, though I can't be sure without more of an explanation (or maybe just a pointer to the source code) for the 2-imbalance AVL trees. Any further information or source pointers you can give me would be greatly appreciated. Thanks --Jonathan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-10 18:19 ` jonikelee @ 2014-06-10 18:51 ` Florian Hars 2014-06-10 19:52 ` Jonathan 2014-06-15 4:51 ` Lukasz Stafiniak 1 sibling, 1 reply; 17+ messages in thread From: Florian Hars @ 2014-06-10 18:51 UTC (permalink / raw) To: caml-list Am 10.06.2014 20:19, schrieb jonikelee@gmail.com: > Any further information or source pointers you can give me would be greatly > appreciated. This may prove to be of interest to you: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.6890 (Or the version on one of the authors' page: https://www.lri.fr/~filliatr/ftp/publis/fpp.ps.gz ) - Florian. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-10 18:51 ` Florian Hars @ 2014-06-10 19:52 ` Jonathan 0 siblings, 0 replies; 17+ messages in thread From: Jonathan @ 2014-06-10 19:52 UTC (permalink / raw) To: caml-list On 06/10/2014 02:51 PM, Florian Hars wrote: > Am 10.06.2014 20:19, schrieb jonikelee@gmail.com: >> Any further information or source pointers you can give me would be >> greatly >> appreciated. > > This may prove to be of interest to you: > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.6890 > > (Or the version on one of the authors' page: > https://www.lri.fr/~filliatr/ftp/publis/fpp.ps.gz ) > > - Florian. > Thanks for the links. I see that those are the origin of the FSets in Coq. The code I generated for AVL trees and "gap" trees doesn't carry around the height for each node at runtime (although the proofs do - but it gets fully erased in the extracted OCaml code, leaving just a small enumeration type), nor perform any height computations at runtime. There is a claim in the above paper that they carry around the height on each node for "greater efficiency". I wonder what that claim refers to. Possibly the fact that they relax the AVL "delta" (the difference between sibling subtree heights) to be greater than 1. Is that what was meant by "2-imbalance AVL trees" in this thread? -- Jonathan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-10 18:19 ` jonikelee 2014-06-10 18:51 ` Florian Hars @ 2014-06-15 4:51 ` Lukasz Stafiniak 2014-06-15 14:01 ` Jonathan 1 sibling, 1 reply; 17+ messages in thread From: Lukasz Stafiniak @ 2014-06-15 4:51 UTC (permalink / raw) To: Caml, jonikelee [-- Attachment #1: Type: text/plain, Size: 1314 bytes --] Here is a 2-imbalance AVL trees example from InvarGenT: https://github.com/lukstafi/invargent/blob/master/examples/avl_tree.gadt and the (inferred) types: https://github.com/lukstafi/invargent/blob/master/examples/avl_tree.gadti.target Regards, Lukasz On Tue, Jun 10, 2014 at 11:19 AM, <jonikelee@gmail.com> wrote: > I am coming here from the Coq mailing list, because of a link posted there > about this e-mail thread. I have been working on Coq-verified and fully > proof-erased versions of AVL and red-black trees, as well as a variant I > didn't know existed before, until I read about "2-imbalance AVL trees" in > this > thread. > > The github address for my development is: > > https://github.com/jonleivent/mindless-coding > > and the "gaptree" development there may be the same as your "2-imbalance" > AVL > trees, though I can't be sure without more of an explanation (or maybe > just a > pointer to the source code) for the 2-imbalance AVL trees. > > Any further information or source pointers you can give me would be greatly > appreciated. > > Thanks > --Jonathan > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > [-- Attachment #2: Type: text/html, Size: 2418 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-15 4:51 ` Lukasz Stafiniak @ 2014-06-15 14:01 ` Jonathan 0 siblings, 0 replies; 17+ messages in thread From: Jonathan @ 2014-06-15 14:01 UTC (permalink / raw) To: Lukasz Stafiniak, Caml On 06/15/2014 12:51 AM, Lukasz Stafiniak wrote: > Here is a 2-imbalance AVL trees example from InvarGenT: > https://github.com/lukstafi/invargent/blob/master/examples/avl_tree.gadt > and the (inferred) types: > https://github.com/lukstafi/invargent/blob/master/examples/avl_tree.gadti.target > > Regards, > Lukasz Thank you for those links, Lukasz. I see from them that there is a difference between 2-imbalance AVL trees and gap trees. In 2-imbalance AVL trees, as with standard AVL trees, the parent height is always 1 + the maximum child height. In gap trees, a parent's height can in some cases be 2 above the height of both children, even though the childrens' heights never differ by more than 1. I am wondering how important this difference is. For example, with gap trees, there is at most one rebalancing needed per insertion or deletion operation. My understanding of standard AVL trees is that they may require O(logN) rebalancings per insertion or deletion. Do you know how the 2-imbalance property addition changes this rebalancing requirement for AVL trees? Thanks again, Jonathan > > On Tue, Jun 10, 2014 at 11:19 AM, <jonikelee@gmail.com> wrote: > >> I am coming here from the Coq mailing list, because of a link posted there >> about this e-mail thread. I have been working on Coq-verified and fully >> proof-erased versions of AVL and red-black trees, as well as a variant I >> didn't know existed before, until I read about "2-imbalance AVL trees" in >> this >> thread. >> >> The github address for my development is: >> >> https://github.com/jonleivent/mindless-coding >> >> and the "gaptree" development there may be the same as your "2-imbalance" >> AVL >> trees, though I can't be sure without more of an explanation (or maybe >> just a >> pointer to the source code) for the 2-imbalance AVL trees. >> >> Any further information or source pointers you can give me would be greatly >> appreciated. >> >> Thanks >> --Jonathan >> >> -- >> Caml-list mailing list. Subscription management and archives: >> https://sympa.inria.fr/sympa/arc/caml-list >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs >> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? 2014-06-02 16:57 ` Xavier Leroy 2014-06-02 21:16 ` Andrew Herron 2014-06-10 18:19 ` jonikelee @ 2014-08-03 21:25 ` Diego Olivier Fernandez Pons 2 siblings, 0 replies; 17+ messages in thread From: Diego Olivier Fernandez Pons @ 2014-08-03 21:25 UTC (permalink / raw) To: Xavier Leroy; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 5202 bytes --] Hi > At the time Set was written, no efficient algorithms for whole-set > operations (union, intersection, difference, etc) were known for > red-black trees. I'm not sure they are known today. > The answer is no and I will try to explain. When doing a set union, at some point you will have to merge two disjoint sets represented each by a tree [1 .. 10] union [15 .. 22] At that point you need to know the "size" of each tree because if both trees are the same size you just need to append them return Node ([1..10], [15..22]) while if one tree is bigger than the other, you will need to cut that tree into smaller pieces and reorganize them (a "rotation"). divide [1..10] into [1..5], [6..10] return Node([1..5], Node ([6..10], [15..22])) The AVL implementation of OCaml gives you the "size" of each tree, its the extra int in the constructor 'a tree = Empty | Node 'a tree * 'a * 'a tree * int <--- here it is equal to the longest path in the tree f = function E -> 0 | Node (l, _, r, h) -> 1 + max (f l, fr) If you implement Weight-balanced trees, that value is the cardinal of the set represented by the tree. If you use red-black or "traditional AVL" representation, that extra bit just tells you if the tree is leaning towards the left or towards the right, but doesn't tell you any information about the global "size" of the tree. Which means you cannot implement set union directly. You have to traverse full tree in O(n) to recompute that information. There is at least one library in Haskell that implements AVL trees that way, it uses -1 | 0 | 1 information for most operations and recomputes the longest path for all trees when doing set operations, then discards that info. There is also a way to implement red black trees to do set operations just like (non-traditional) AVLs. It is described in Olivié H.J A new class of balanced search trees : half balanced search trees, RAIRO Informatique Théorique 16, 51 71 - also in his PhD Thesis. Basically a tree is red-black when shortest-branch * 2 >= longest-branch (cf. Constructing Red-Black trees, Ralf Hinze 1999 - easier to find than Olivié's works) So you can implement a tree structure that memorizes both the shortest and the longest branch in each node 'a tree = E | N of 'a tree * 'a * 'a tree * int * int <---- shortest and longest From there you just follow the same scheme than the AVL implementation in the OCaml lib, you just need to cope with a couple of triple rotation cases. > As for performance of insert/lookup operations, Jean-Christophe > Filliâtre has measurements showing that OCaml's 2-imbalance AVL trees > perform better than red-black trees. It all depends on your ratio of > insertions to lookups, of course. But the 2-imbalance trick makes a > big difference with textbook AVL trees. > This is a bit more difficult but in short "red-black trees do not implement truly red-black trees" and "all implementations of red-black trees implement a different approximation of red-black trees". Any implementation of red-black trees that uses the 1 bit trick and where insertions are in log (n) is not surjective. You can understand that as meaning (1) - some red-black trees will never be built by the balancing algorithm (2) - sometimes you will give the balancing algorithm a red-black tree (= has a provable red-black coloring) and it will still rebalance it instead of noticing it's already balanced and returning it untouched. The reason is insertion in red-black trees using the 1 bit trick is linear. Meaning if I want a balancing algorithm that never falls into (1) or (2) I cannot avoid linear time insertion. The result is all algorithms found in the literature are log(n) approximations of the full algorithm. The only algorithm that is truly logarithmic is again Olivié's thanks to it's implicit representation of the red-black coloring. From there, comparing performance of AVLs with "red-black trees" in general makes no sense as the person that implemented them may have implemented any arbitrary approximation of the full algorithm. This translates in the code as a random number of "cases" to check for balancing : 4 for Okasaki, 27 in the CLR, etc. With double, triple, quadruple rotations, according to how "deep" the author was willing to go. Because what these guys are doing is "unfolding" the linear behavior of the algorithm into a constant and falling back into rebuilding any tree they are given for all other cases. In any case, some benchmarks I did very, very, very longtime ago showed AVL and Weight balanced are good all-purpose implementations. The advantage of WB being it gives you the cardinality of the set in O(1). In some sense, AVL is the log of WB (in the same way the height of a tree is the log of its size). Diego Olivier > > - Xavier Leroy > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > [-- Attachment #2: Type: text/html, Size: 7020 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Caml-list] Why AVL-tree? @ 2014-06-02 18:23 Damien Guichard 0 siblings, 0 replies; 17+ messages in thread From: Damien Guichard @ 2014-06-02 18:23 UTC (permalink / raw) To: Caml List [-- Attachment #1: Type: text/plain, Size: 184 bytes --] Worth to be mentionned : OCaml Reins has red-black trees featuring whole-set operations. http://caml.inria.fr/cgi-bin/hump.en.cgi?contrib=599 Regards, damien guichard [-- Attachment #2: Type: text/html, Size: 884 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [Caml-list] Why AVL-tree? @ 2014-06-02 11:48 Yoriyuki Yamagata 0 siblings, 0 replies; 17+ messages in thread From: Yoriyuki Yamagata @ 2014-06-02 11:48 UTC (permalink / raw) To: Caml List [-- Attachment #1: Type: text/plain, Size: 277 bytes --] Hi, list, Just from the curiosity, why balanced binary trees used in Set and Map are AVL-trees, not their alternative, say, red-black trees? Is there a deep reason for it, or just a historical one? Best, -- Yoriyuki Yamagata *http://yoriyuki.info/ <http://yoriyuki.info/>* [-- Attachment #2: Type: text/html, Size: 401 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2014-08-03 21:25 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-06-02 13:21 [Caml-list] Why AVL-tree? Damien Guichard 2014-06-02 13:34 ` Andrew Herron 2014-06-02 15:06 ` Gabriel Scherer 2014-06-03 12:48 ` Yaron Minsky 2014-06-03 13:12 ` Gabriel Scherer 2014-06-03 13:37 ` Yaron Minsky 2014-06-03 13:41 ` Yoriyuki Yamagata 2014-06-02 16:57 ` Xavier Leroy 2014-06-02 21:16 ` Andrew Herron 2014-06-10 18:19 ` jonikelee 2014-06-10 18:51 ` Florian Hars 2014-06-10 19:52 ` Jonathan 2014-06-15 4:51 ` Lukasz Stafiniak 2014-06-15 14:01 ` Jonathan 2014-08-03 21:25 ` Diego Olivier Fernandez Pons -- strict thread matches above, loose matches on Subject: below -- 2014-06-02 18:23 Damien Guichard 2014-06-02 11:48 Yoriyuki Yamagata
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox