Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: Jacques Garrigue <garrigue@math.nagoya-u.ac.jp>
To: ober.14@osu.edu
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Re: Hash clash in polymorphic variants
Date: Tue, 15 Jan 2008 12:36:21 +0900 (JST)	[thread overview]
Message-ID: <20080115.123621.184910428.garrigue@math.nagoya-u.ac.jp> (raw)
In-Reply-To: <200801140956.25449.ober.14@osu.edu>

From: Kuba Ober <ober.14@osu.edu>
> On Monday 14 January 2008, Stefan Monnier wrote:
> > > What I meant was simply that instead of using some fixed hash function,
> > > one could use a perfect hashing function which is optimal for its known
> > > set of inputs, and won't ever generate a collision.
> >
> > The problem is that the set of inputs is not know at compile time, only
> > at link time.
> 
> As I've said in the cited post, the perfect hash generator would have to be 
> invoked at link time, which shouldn't be a big deal.

Unfortunately, this would make marshalling between different programs
much more complicated...

Another advantage of knowing the hash function at compile time is
that you can generate efficient code for pattern matching. Since you
already know the ordering of tags, it is easy to generate a decision
tree. I didn't check very recently about efficiency for polymorphic
variants, but the depth of the decision tree is logarithmic in the
number of tags involved in the pattern matching, and if you can keep
it below 3 or 4 (about 10 tags) you can be actually faster than a
jump table.
Another comparison is with the old implementation for method calls.
Originally ocaml used your idea for methods: method hashes were
generated at initialization time. The scheme for dispatch was a two
level array, compressed by reusing buckets so that you don't use too
much memory. This meant actually 3 array accesses for a method call.
The current scheme reuses variant hashes, and implements a simple
dichotomic search, together with an index cache for each call site.
This doesn't look very efficient, but on small method tables, the
search is almost as fast as the old approach, and if the cache hits
this is much faster...

Now concerning the risks of name conflicts. The main point of
polymorphic variants is that there is only a conflict if the two tags
appear in the same type. And logically the type should stay small.
If you want to put all GLenum's inside the same type, then you may
well end up with conflicts. But what LablGL shows is that in practice
only a small number of tags are used together. So if you can partition
your set of tags so that each type has at most 64 tags, then you get
a probability conflict less than 1 per million for each type. This
seems safe enough. But if you have one type with 2000 tags, then the
probability is 1 per thousand. Not that much, but it can happen.
(p(n) is n*n / 2**32) 

Jacques Garrigue


  parent reply	other threads:[~2008-01-15  3:36 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-10 17:09 Jon Harrop
2008-01-10 20:35 ` [Caml-list] " Eric Cooper
2008-01-10 21:24   ` Jon Harrop
2008-01-10 21:40     ` David Allsopp
2008-01-11 13:30       ` Kuba Ober
2008-01-11 13:48         ` Jon Harrop
2008-01-11 16:14           ` Kuba Ober
2008-01-11 18:40             ` David Allsopp
2008-01-14 12:20               ` Kuba Ober
2008-01-14 14:44                 ` Stefan Monnier
2008-01-14 14:56                   ` [Caml-list] " Kuba Ober
2008-01-14 15:37                     ` David Allsopp
2008-01-14 15:44                       ` Kuba Ober
2008-01-14 16:03                         ` David Allsopp
2008-01-14 15:45                     ` Stefan Monnier
2008-01-15  3:36                     ` Jacques Garrigue [this message]
2008-01-15  4:59                       ` [Caml-list] " Jon Harrop
2008-01-15  9:01                         ` Jacques Garrigue
2008-01-15 18:17                           ` Jon Harrop
2008-01-15 19:20                             ` Gerd Stolpmann
2008-01-15 22:04                               ` Jon Harrop
2008-01-16 13:48                                 ` Kuba Ober
2008-01-16 15:02                                   ` Dario Teixeira
2008-01-16 19:00                                     ` Jon Harrop
2008-01-17 13:09                                     ` Kuba Ober
2008-01-18  5:33                                 ` Kuba Ober
2008-01-18  5:19                               ` Kuba Ober
2008-01-18  5:39                                 ` Kuba Ober
2008-01-16  3:26                             ` Jacques GARRIGUE
2008-01-16  3:34                               ` Yaron Minsky
2008-01-16  3:42                                 ` Jon Harrop
2008-01-16  4:40                               ` Jon Harrop
2008-01-16 16:03                                 ` Eric Cooper
2008-01-16 10:50                             ` Richard Jones
2008-01-14 17:14                   ` Jon Harrop
2008-01-14 17:36                     ` Alain Frisch
2008-01-11  0:15 ` [Caml-list] " Jacques Garrigue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080115.123621.184910428.garrigue@math.nagoya-u.ac.jp \
    --to=garrigue@math.nagoya-u.ac.jp \
    --cc=caml-list@yquem.inria.fr \
    --cc=ober.14@osu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox