From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE autolearn=disabled version=3.1.3 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 33659BC69 for ; Mon, 14 Jan 2008 16:44:08 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq4HANMRi0dDWxLC/2dsb2JhbACBVqcB X-IronPort-AV: E=Sophos;i="4.24,282,1196636400"; d="scan'208";a="7785425" Received: from ip67-91-18-194.z18-91-67.customer.algx.net (HELO server1.bertec.net) ([67.91.18.194]) by mail3-smtp-sop.national.inria.fr with ESMTP; 14 Jan 2008 16:44:07 +0100 Received: from kuba.bertec.net (kuba.bertec.net [192.168.2.16]) by server1.bertec.net (Postfix) with ESMTP id 9F8F3105830 for ; Mon, 14 Jan 2008 10:44:06 -0500 (EST) From: Kuba Ober To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Re: Hash clash in polymorphic variants Date: Mon, 14 Jan 2008 10:44:05 -0500 User-Agent: KMail/1.9.6 (enterprise 0.20071123.740460) References: <200801101709.14110.jon@ffconsultancy.com> <200801140956.25449.ober.14@osu.edu> <001401c856c3$6c692260$6c7ba8c0@countertenor> In-Reply-To: <001401c856c3$6c692260$6c7ba8c0@countertenor> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801141044.05670.ober.14@osu.edu> X-Spam: no; 0.00; hash:01 variants:01 hash:01 hashing:01 inputs:01 inputs:01 runtime:01 hashing:01 trivial:01 dlopen:01 hashes:01 cheers:01 polymorphic:01 polymorphic:01 wrote:01 On Monday 14 January 2008, David Allsopp wrote: > Kuba Ober wrote: > > On Monday 14 January 2008, Stefan Monnier wrote: > > > > What I meant was simply that instead of using some fixed hash > > > > function, one could use a perfect hashing function which is optimal > > > > for its known set of inputs, and won't ever generate a collision. > > > > > > The problem is that the set of inputs is not know at compile time, only > > > at link time. > > > > As I've said in the cited post, the perfect hash generator would have to > > be invoked at link time, which shouldn't be a big deal. > > Assuming you're talking hypothetically and designing a new runtime then, > yes, it's not a big deal. > > However, this scheme could not just be dropped into the present system - it > would not work with dynamic linking because once you've hashed a > polymorphic variant tag-name you drop the name so you can't re-hash when > you update your perfect hashing function... A trivial solution to that is to keep both, as obviously each time an equivalent of dlopen() is made, everything has to be rehashed. gperf is "slightly" memory-hungry, so surely it'd need to be something using a different algorithm. I'm talking hypothetically, but I also think it's a weird design decision to use those possibly-colliding hashes. String sorting/comparison isn't exactly a CPU killer, so couldn't the original names have been used instead? I admit not to knowing too many details of the current implementation of course ;( Cheers, Kuba