From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id B15CDBC6B for ; Thu, 10 Jan 2008 21:35:36 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAMoPhkeAArkpjGdsb2JhbACBVo5JAQEBCAQGCQYamRE X-IronPort-AV: E=Sophos;i="4.24,267,1196636400"; d="scan'208";a="7656018" Received: from chokecherry.srv.cs.cmu.edu ([128.2.185.41]) by mail3-smtp-sop.national.inria.fr with ESMTP; 10 Jan 2008 21:35:34 +0100 Received: from stratocaster.home (c-71-206-252-35.hsd1.pa.comcast.net [71.206.252.35]) (authenticated bits=0) by chokecherry.srv.cs.cmu.edu (8.13.6/8.13.6) with ESMTP id m0AKZYwC006939 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Thu, 10 Jan 2008 15:35:34 -0500 (EST) Received: from ecc by stratocaster.home with local (Exim 4.68) (envelope-from ) id 1JD47i-0005Bx-4S for caml-list@yquem.inria.fr; Thu, 10 Jan 2008 15:35:34 -0500 Date: Thu, 10 Jan 2008 15:35:34 -0500 From: Eric Cooper To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Hash clash in polymorphic variants Message-ID: <20080110203534.GB12491@stratocaster.home> Mail-Followup-To: caml-list@yquem.inria.fr References: <200801101709.14110.jon@ffconsultancy.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200801101709.14110.jon@ffconsultancy.com> User-Agent: Mutt/1.5.17 (2007-11-01) X-Spam: no; 0.00; hash:01 variants:01 constructors:01 avoided:01 clashing:01 hash:01 variants:01 bignum:01 ocaml:01 collides:01 10,:98 polymorphic:01 polymorphic:01 wrote:01 wrote:01 On Thu, Jan 10, 2008 at 05:09:13PM +0000, Jon Harrop wrote: > ISTR advice that constructors sharing the first few characters should be > avoided in order to reduce the likelihood of clashing hash values for > polymorphic variants. Is that right? I don't think it's worth worrying about. I wrote a program a while ago to look into this. I never saw any "human-sensible" collisions (between two identifiers that a person might have chosen). And if you're producing gensyms in a program, you can just check ahead of time. To find a collision with a given identifier, consider each bignum N that differs by a multiple of 2^31 from the identifier's hash value. Compute the radix-223 representation of N. If that forms a legal OCaml identifier, then you've found a collision. For example, Eric_Cooper collides with azdwbie, c7diagq, hlChrkt, NSaServ, and SaupDOF, to pick just a few. -- Eric (call me SaupDOF) Cooper e c c @ c m u . e d u