From: Niols <niols@niols.fr>
To: caml-list@inria.fr
Subject: Re: [Caml-list] O(n ln k) sorting for ocaml on github and a challenge
Date: Mon, 11 Oct 2021 13:46:45 +0200 [thread overview]
Message-ID: <875a3212-c4a1-41fb-65f7-2a68b012af9f@niols.fr> (raw)
In-Reply-To: <20211009195910.mqivg4tsafiyb3tr@oulala>
Hi Mário, Christophe,
I'm glad some other people are also doing that kind of things!
On 10/9/21 9:59 PM, Christophe Raffalli wrote:
> On 21-10-09 09:58:12, Mario Pereira wrote:
>> Your implementation reminds me very much of TimSort (for an OCaml
>> implementation, see for instance https://github.com/LesBoloss-es/sorting/blob/
>> master/src/list/timsort.ml). This is also a stable algorithm.
>
> Yes the idea is similar, but
>
> - I use a reference to the list for the sorted/reverse-sorted block I build in
> first phase (like Timsort on arrays, which actually is stable, in place and as
> good complexity, but you have to get the balancing right, see below)
>
> - in the second phase, I use a merge, returning sorted list at even depth and
> rev-sorted at odd depth (like OCaml's List.sort). This avoid the Asc/Desc
> constructor and the List.rev.
>
> - the balancement is ensured by a simple arithmetic computation, not by a stack
> and an invariant on sizes (probably building a balanced binary tree with the
> initial block instead of a list could be a bit better, I am not sure).
>
> - Last but not least: the code you point seems broken somewhere, it does not
> ends on large lists (probably quadratic because the balancing with size is
> probably wrong. cf the paragraph "bug" on wikipedia about Timsort. Anyway, the
> code you point is probably optimised for arrays.
The repository in question is very experimental. Our implementation of
TimSort on list is indeed broken. I can't remember if it contains the
famous TimSort bug, but we are aware that it exists and know of two ways
to fix it. We quickly switched to working on arrays, though, and were
planning on coming back to list eventually.
Our implementation of TimSort on arrays can be found here:
https://github.com/LesBoloss-es/sorting/blob/master/src/array/timsort.ml
It does include a fix of the bug. We have tested it successfully on a
rather wide range of arrays.
We have also implemented PeekSort and PowerSort from “Nearly-Optimal
Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to
Existing Runs” by J. Ian Munro and Sebastian Wild. cf:
https://www.wild-inter.net/publications/munro-wild-2018
They use a number of comparisons similar to TimSort. Our implementations
are copied directly from the implementation of the authors. There is a
lot of room for improvement to make the runtime much slower but we
haven't gotten to that yet.
PowerSort, in particular, is designed to be efficient in cache. We
wondered whether it would adapt nicely to lists but it is also only
future work.
> My work aims at being a replacement for OCaml sort (for list currently). Here is
> the timing I get:
>
> Correctness test passed
> Stability test passed
> Random lists:
> random: tf = 1.44, tg = 1.39, factor = 1.04x, gain = 3.70%
> random small: tf = 1.10, tg = 1.19, factor = 0.92x, gain = -8.27%
> Worst cases:
> worst1: tf = 0.83, tg = 0.71, factor = 1.17x, gain = 14.21%
> worst2: tf = 0.65, tg = 0.70, factor = 0.93x, gain = -7.17%
> Sorted (partially) lists:
> sorted: tf = 0.65, tg = 0.01, factor = 97.86x, gain = 98.98%
> reversed: tf = 0.65, tg = 0.09, factor = 7.62x, gain = 86.88%
> sorted@rev: tf = 0.65, tg = 0.21, factor = 3.14x, gain = 68.19%
> rev@sorted: tf = 0.65, tg = 0.21, factor = 3.13x, gain = 68.03%
> Shuffled lists (permute k times 2 elements in a sorted list):
> shuffle 10: tf = 0.66, tg = 0.41, factor = 1.61x, gain = 38.01%
> shuffle 100: tf = 0.64, tg = 0.51, factor = 1.25x, gain = 20.28%
> shuffle 1000: tf = 0.63, tg = 0.56, factor = 1.14x, gain = 11.94%
> shuffle 10000: tf = 0.64, tg = 0.61, factor = 1.04x, gain = 4.30%
>
> factor > 1 means Block_sort is faster that List.sort. Remark that in the case of
> sorted list, my algorithm returns the original list using constant memory.
I guess we had similar goals. I do think it can be nice to have a
sorting algorithm in the standard library that behaves well on lists
that have some order in them, as such lists occur quite often in real life.
We also obtained similar results when comparing to the standard library
(for arrays): On purely random lists, our TimSort is ~20% slower and
uses some more comparisons. As soon as the lists contain some order,
TimSort beats the standard library by far both in term of runtime and
comparisons.
We were planning on starting to work again on this project soon-ish
(hopefully within the coming month), and in particular we wanted to
optimise a bit more PeekSort and PowerSort and then package the whole
thing in a library. We'll follow closely what you are doing so as to not
clash with your work!
Cheers,
Niols
next prev parent reply other threads:[~2021-10-11 11:46 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-09 4:05 Christophe Raffalli
2021-10-09 8:58 ` Mario Pereira
2021-10-09 19:59 ` Christophe Raffalli
2021-10-11 11:46 ` Niols [this message]
2021-10-11 21:27 ` Mario Pereira
2021-10-22 4:42 ` Kazuhiko Sakaguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875a3212-c4a1-41fb-65f7-2a68b012af9f@niols.fr \
--to=niols@niols.fr \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox