From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id pB68qkTs020221 for ; Tue, 6 Dec 2011 09:52:46 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoBAFTX3U7ZSMDzimdsb2JhbABEqlsiAQEBCgkNBxIFIoFyAQEEAScTPwULCyElDwEEKCETiAcCtQqLMgSaHIxr X-IronPort-AV: E=Sophos;i="4.71,304,1320620400"; d="scan'208";a="134109819" Received: from fmmailgate05.web.de ([217.72.192.243]) by mail1-smtp-roc.national.inria.fr with ESMTP; 06 Dec 2011 09:52:35 +0100 Received: from moweb002.kundenserver.de (moweb002.kundenserver.de [172.19.20.108]) by fmmailgate05.web.de (Postfix) with ESMTP id 5407467BEFFE; Tue, 6 Dec 2011 09:52:35 +0100 (CET) Received: from frosties.localnet ([95.208.118.96]) by smtp.web.de (mrweb001) with ESMTPA (Nemesis) id 0Ls9JH-1Qmelw0EiF-013yZT; Tue, 06 Dec 2011 09:52:35 +0100 Received: from mrvn by frosties.localnet with local (Exim 4.77) (envelope-from ) id 1RXqlO-00045g-Dn; Tue, 06 Dec 2011 09:52:34 +0100 From: Goswin von Brederlow To: Damien Pous Cc: caml-list References: Date: Tue, 06 Dec 2011 09:52:34 +0100 In-Reply-To: (Damien Pous's message of "Tue, 6 Dec 2011 08:22:39 +0100") Message-ID: <874nxersj1.fsf@frosties.localnet> User-Agent: Gnus/5.110009 (No Gnus v0.9) XEmacs/21.4.22 (linux, no MULE) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Provags-ID: V02:K0:0n6gRL9rBL6XpbsFqoccQ9TM85bJS+wli3HTTXHq2Lp 7GsOF720LLRyHtefIWSFenA8DF18rDTb0bI8HWw0UiHsKNywcQ 8tF1aLelmIPDavol6GLp4qBthMQIES7j5+AKxc/lXVQpUmF3yh UjQFRrXqEAgljZxxh997VPCjis31aU9ttxlYLAFbgQv+Lwz+wj eLVoJRxEEmfM4Jpnr1PJQ== Subject: Re: [Caml-list] hashing big ints Damien Pous writes: > Dear caml-list, > > Let me re-raise an apparently old and possibly dumb question : how to > hash big ints ? > > Hashtbl.hash doesn't work (ocaml 3.12.1): > > # Hashtbl.hash (Big_int.big_int_of_int 42);; > - : int = 1 > # Hashtbl.hash (Big_int.big_int_of_int 33);; > - : int = 1 > > (Certainly because big ints hide their content in custom blocks.) > > I found several discussions about this kind of problem (comparison of > nums, etc...), which were more focused on how to let > Pervasive.compare/equal and Hashtbl.hash behave correctly in a uniform > way. > > My question is simpler : I just need to write my own hashing function, > so that I can call Hashtbl.Make. I currently use something like: > > let rec hash x = > try int_of_big_int x > with _ -> hash (shift_right_big_int x Sys.word_size) > > Some of you certainly know how to get a better behaved function. (I > don't know anything about hashing theory, except that my function is > certainly not a good one: it doesn't contain any prime number...) > > Thanks for your help, > Damien A usualy good hash function for integers is to take the integer modulo a prime number. This usualy results in an uniform distribution. So: let big_prime = (big_int_of_int prime) let hash x = mod_big_int x big_prime We also want the output range of the hash function to be as large as possible to minimize the number of collision. That means we want the prime to be as big as possible. And there we hit a little snag. For best results you want a different prime number for 32bit and 64bit systems. Lookup the closest prime number smaller than 2^32 and 2^64 and use those. Note: It is too bad there isn't a mod_big_int_with_int or even better a Big_int.hash function directly. Maybe you could write one that takes advantage of knowing the internal structure of Big_int? MfG Goswin