From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ober.14@osu.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE 
	autolearn=disabled version=3.1.3
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by yquem.inria.fr (Postfix) with ESMTP id 33659BC69
	for <caml-list@yquem.inria.fr>; Mon, 14 Jan 2008 16:44:08 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Aq4HANMRi0dDWxLC/2dsb2JhbACBVqcB
X-IronPort-AV: E=Sophos;i="4.24,282,1196636400"; 
   d="scan'208";a="7785425"
Received: from ip67-91-18-194.z18-91-67.customer.algx.net (HELO server1.bertec.net) ([67.91.18.194])
  by mail3-smtp-sop.national.inria.fr with ESMTP; 14 Jan 2008 16:44:07 +0100
Received: from kuba.bertec.net (kuba.bertec.net [192.168.2.16])
	by server1.bertec.net (Postfix) with ESMTP id 9F8F3105830
	for <caml-list@yquem.inria.fr>; Mon, 14 Jan 2008 10:44:06 -0500 (EST)
From: Kuba Ober <ober.14@osu.edu>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Re: Hash clash in polymorphic variants
Date: Mon, 14 Jan 2008 10:44:05 -0500
User-Agent: KMail/1.9.6 (enterprise 0.20071123.740460)
References: <200801101709.14110.jon@ffconsultancy.com> <200801140956.25449.ober.14@osu.edu> <001401c856c3$6c692260$6c7ba8c0@countertenor>
In-Reply-To: <001401c856c3$6c692260$6c7ba8c0@countertenor>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200801141044.05670.ober.14@osu.edu>
X-Spam: no; 0.00; hash:01 variants:01 hash:01 hashing:01 inputs:01 inputs:01 runtime:01 hashing:01 trivial:01 dlopen:01 hashes:01 cheers:01 polymorphic:01 polymorphic:01 wrote:01 

On Monday 14 January 2008, David Allsopp wrote:
> Kuba Ober wrote:
> > On Monday 14 January 2008, Stefan Monnier wrote:
> > > > What I meant was simply that instead of using some fixed hash
> > > > function, one could use a perfect hashing function which is optimal
> > > > for its known set of inputs, and won't ever generate a collision.
> > >
> > > The problem is that the set of inputs is not know at compile time, only
> > > at link time.
> >
> > As I've said in the cited post, the perfect hash generator would have to
> > be invoked at link time, which shouldn't be a big deal.
>
> Assuming you're talking hypothetically and designing a new runtime then,
> yes, it's not a big deal.
>
> However, this scheme could not just be dropped into the present system - it
> would not work with dynamic linking because once you've hashed a
> polymorphic variant tag-name you drop the name so you can't re-hash when
> you update your perfect hashing function...

A trivial solution to that is to keep both, as obviously each time an 
equivalent of dlopen() is made, everything has to be rehashed. gperf 
is "slightly" memory-hungry, so surely it'd need to be something using a 
different algorithm. I'm talking hypothetically, but I also think it's a 
weird design decision to use those possibly-colliding hashes. String 
sorting/comparison isn't exactly a CPU killer, so couldn't the original names 
have been used instead? I admit not to knowing too many details of the 
current implementation of course ;(

Cheers, Kuba