From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 13C4BBC32 for ; Sat, 19 Mar 2005 03:28:40 +0100 (CET) Received: from kurims.kurims.kyoto-u.ac.jp (kurims.kurims.kyoto-u.ac.jp [130.54.16.1]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j2J2Sb7g022116 for ; Sat, 19 Mar 2005 03:28:39 +0100 Received: from localhost (suiren [130.54.16.25]) by kurims.kurims.kyoto-u.ac.jp (8.13.1/8.13.1) with ESMTP id j2J2SWBA020884; Sat, 19 Mar 2005 11:28:32 +0900 (JST) Date: Sat, 19 Mar 2005 11:28:23 +0900 (JST) Message-Id: <20050319.112823.108781186.garrigue@math.nagoya-u.ac.jp> To: jon@ffconsultancy.com Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Maximum non-constant constructors From: Jacques Garrigue In-Reply-To: <200503171755.25927.jon@ffconsultancy.com> References: <16953.31187.263581.694769@gargle.gargle.HOWL> <20050317140034.GA7049@localhost> <200503171755.25927.jon@ffconsultancy.com> X-Mailer: Mew version 4.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce with ID 423B8E55.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 constructors:01 variants:01 justifies:01 hashes:01 variants:01 hash:01 char:01 runtime:01 interfacing:01 usr:01 conflicts:01 conflicts:01 constructors:01 undetected:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: From: Jon Harrop > On Thursday 17 March 2005 14:00, Eric Cooper wrote: > > For example, for all strings XXX, the variants `XXX and `zyctRecXXX > > collide. > > My guess is that none of them are likely to be chosen by > > humans, but might occur in program-generated code. > > Good point! I think this justifies the existence of a library function to > compare the hashes of string names of polymorphic variants. Does such a > function exist? Here it is: let hash_variant s = let accu = ref 0 in for i = 0 to String.length s - 1 do accu := 223 * !accu + Char.code s.[i] done; (* reduce to 31 bits *) accu := !accu land (1 lsl 31 - 1); (* make it signed for 64 bits architectures *) if !accu > 0x3FFFFFFF then !accu - (1 lsl 31) else !accu It is also available in the C runtime (where it is useful for interfacing) but if you really need it in your program just copy the above code. To give you an idea of the probability of conflict, here are the conclicts I found in a 235882 word dictionary (/usr/share/dict/web2 on FreeBSD) Tag `Cretacic conflicts with `cornigerous Tag `gedackt conflicts with `aquicolous Tag `myeloencephalitis conflicts with `adequative Tag `Nemorensian conflicts with `condor Tag `nonrioter conflicts with `anematosis Tag `omphaloma conflicts with `crimelessness Tag `persecute conflicts with `paraenetic Tag `soredioid conflicts with `genitorial Tag `subpopulation conflicts with `oxyquinone Tag `sympathy conflicts with `herbman Tag `trophaeum conflicts with `prepossession Tag `unbreaded conflicts with `neback Tag `undragooned conflicts with `nuisance Tag `unlistened conflicts with `disturb Tag `veratrine conflicts with `absolutely You will get different conflict if you change the capitalization, but not more. So much for the birthday paradox: variants fully use 31 bits, so that the probability only gets more than 0.5 when you have more than 32000 constructors in the same type... Of course this is assuming a perfect distribution, but as the above dictionary shows, reality is close to that. By the way, the check is not at link time, as was stated in another message, but at compile time. It is the typing of variants itself that guarantees that no such conflict will go undetected. So you will be warned early enough. The same hash function is also used for method names (with the same compile-time check), but fortunately it is hard to imagine an object with more than 10000 methods. Jacques Garrigue