From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id RAA01938 for caml-redistribution; Wed, 10 Mar 1999 17:59:33 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA08395 for ; Wed, 10 Mar 1999 14:59:21 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.8.7/8.8.7) with ESMTP id OAA15268; Wed, 10 Mar 1999 14:58:02 +0100 (MET) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id OAA03832; Wed, 10 Mar 1999 14:58:02 +0100 (MET) Message-ID: <19990310145802.51200@pauillac.inria.fr> Date: Wed, 10 Mar 1999 14:58:02 +0100 From: Xavier Leroy To: doligez@pa.dec.com, OCAML Subject: Re: Sort.array easily degenerates References: <199903092303.AA14481@six.pa.dec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.89.1 In-Reply-To: <199903092303.AA14481@six.pa.dec.com>; from doligez@pa.dec.com on Tue, Mar 09, 1999 at 03:03:05PM -0800 Sender: weis I have revised my Quicksort implementation based on that found in glibc 2, which contains some clever optimizations. Interested parties can see the code at http://camlcvs.inria.fr/cgi-bin/cvsweb.out/ocaml/stdlib/sort.ml The behavior on extreme situations (input already sorted) is now much better. > There's no way to implement Sedgewick's quicksort with the interface > given in sort.mli. You'd need two comparison functions, one for ">=" > and one for "<=". That explains the degenerate case when all the > elements are equal. It's true that ">=" vs. ">" can make a big difference, but >= is easily definable from <= : a >= b is b <= a, a > b is not(a <= b), a < b is not(b <= a). > And it degenerates on already-sorted data because you swap the pivot > with the right-most element. As a consequence, one of the subarrays > has its two highest elements in first and last positions, which makes > the median-of-three degenerate. I think this bug is also in > Sedgewick's pseudo-code. Sedgewick doesn't even give pseudo-code for the "median of three" heuristic, but the glibc implementation avoids this swap altogether. > Also, you should recurse on the smallest subarray first, not the > largest. Good point. > I vote for Shellsort. Well, I tried it, and it's noticeably slower than quicksort by almost a factor of two on random input. Perhaps because polymorphic array access is quite expensive in OCaml. - Xavier Leroy