From: Alain Frisch <alain@frisch.fr>
To: Berke Durak <berke.durak@exalead.com>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Canonical Set/Map datastructure?
Date: Wed, 05 Mar 2008 18:27:38 +0100 [thread overview]
Message-ID: <47CED80A.1010504@frisch.fr> (raw)
In-Reply-To: <47CECF23.1020508@exalead.com>
Berke Durak wrote:
> The Map and Set modules use AVL trees which are efficient but not
> canonical - a given
> set of elements can have more than one representation. This means that
> you cannot use
> ad hoc comparison on sets and maps, and this is why they are presented
> as functors.
>
> Does anyone know if, in the many years that have passed since the
> implementation of
> those fine modules, someone has invented a (functional) datastructure
> that is as
> efficient while being canonic?
Well, Patricia trees have been around for many years and they satisfy
this property. They also allow set operations (union, intersection, ...)
in linear time (and I explain below how this can be optimized to
something which is really efficient for some applications).
Jean-Christophe Filliâtre has an implementation on its web page.
Patricia trees work fine when the set elements can easily be represented
as strings of bits. So if you can map your elements to integers, that's
ok. Otherwise, you can hash-cons your elements to get unique integers
for them.
Something that Jean-Christophe's implementation doesn't do but which is
quite easy to add is to use hash-consing on patricia trees themselves,
that is, to memoize their constructors in order to get unique physical
representation and maximal sharing. That way, you get:
structural equality = physical equality = set equality
With this property, set operations on patricia trees can be optimized
with reflexivity properties (e.g. the inner loop of the union function
can start by checking equality of its arguments).
Also, you get a nice unique integer for each tree. This allow you to
memoize efficiently set operations (like union, intersection, for which
you can use memoization in the inner loop, not only at toplevel), and to
build sets of sets (and so on).
-- Alain
next prev parent reply other threads:[~2008-03-05 17:27 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-05 16:49 Berke Durak
2008-03-05 17:16 ` [Caml-list] " Brian Hurt
2008-03-05 17:27 ` Alain Frisch [this message]
2008-03-05 19:53 ` Jean-Christophe Filliâtre
2008-03-05 20:03 ` Jon Harrop
2008-03-05 21:56 ` Alain Frisch
2008-03-06 7:45 ` Jean-Christophe Filliâtre
2008-03-05 17:34 ` Harrison, John R
2008-03-06 9:53 ` Berke Durak
2008-03-06 17:36 ` Harrison, John R
2008-03-07 10:09 ` Berke Durak
2008-03-07 17:13 ` Harrison, John R
2008-03-07 10:19 ` Alain Frisch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47CED80A.1010504@frisch.fr \
--to=alain@frisch.fr \
--cc=berke.durak@exalead.com \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox