From: Berke Durak <berke.durak@exalead.com>
To: "Harrison, John R" <john.r.harrison@intel.com>
Cc: Caml-list List <caml-list@inria.fr>
Subject: Re: [Caml-list] Canonical Set/Map datastructure?
Date: Fri, 07 Mar 2008 11:09:06 +0100 [thread overview]
Message-ID: <47D11442.6090409@exalead.com> (raw)
In-Reply-To: <DCC19446A892D84FBB89AE7C94F0C04E01D98FB2BE@azsmsx501.amr.corp.intel.com>
Harrison, John R a écrit :
> Hi Berke,
>
> | However, the idea of combining hash-consing and Patricia trees,
> | however elegant, does not suit my problem. Basically, you are
> | cheating by using an auxiliary data structure, the hashtable (which is
> | also O(n^2) worst-case).
>
> The code I pointed you to uses hashes, but not hash tables. As for the
> worst-case efficiency, what you say is true, but it's unlikely to
> matter in practice. And I suspect it may be unavoidable in principle,
> which is the interesting thing I learned last time this was discussed.
> See for example the following paper and its references:
Hi John,
Thanks for your code. I'm sorry I thought you maintained a separate hash table.
It's very interesting code and I'll give it a try.
But it seems to have two properties which make it unsuitable for
the particular application I had in mind:
- The theoretical worst-case performance of hash-based data structures
can be attained if the hash function has bad dispersal. As the code
relies on Hashtbl.hash, which does not hash deeply, this could
potentially be a proble, in particular if your keys have long "common
prefixes" that are not distinguished by the hash function. It might
work well in practice but I feel a little uncomfortable.
- Also, it does not preserve the natural order for keys, and that
is particularly bad, because I often use, for instance, float-indexed
maps or sets as a substitute for heaps.
The paper with the inverse cubic lower bound is *very* interesting; we don't
see plausible lower bounds often in complexity theory, especially with such
large assumptions (just bounded out-degree for the graph nodes).
So it seems there is little hope to have a drop-in replacement for Set
or Map that is compatible with the natural order of things, a.k.a.,
Pervasives.compare.
--
Berke DURAK
next prev parent reply other threads:[~2008-03-07 10:09 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-05 16:49 Berke Durak
2008-03-05 17:16 ` [Caml-list] " Brian Hurt
2008-03-05 17:27 ` Alain Frisch
2008-03-05 19:53 ` Jean-Christophe Filliâtre
2008-03-05 20:03 ` Jon Harrop
2008-03-05 21:56 ` Alain Frisch
2008-03-06 7:45 ` Jean-Christophe Filliâtre
2008-03-05 17:34 ` Harrison, John R
2008-03-06 9:53 ` Berke Durak
2008-03-06 17:36 ` Harrison, John R
2008-03-07 10:09 ` Berke Durak [this message]
2008-03-07 17:13 ` Harrison, John R
2008-03-07 10:19 ` Alain Frisch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47D11442.6090409@exalead.com \
--to=berke.durak@exalead.com \
--cc=caml-list@inria.fr \
--cc=john.r.harrison@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox