From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 7445E7EE6B for ; Mon, 25 Nov 2013 14:38:32 +0100 (CET) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of goswin-v-b@web.de) identity=pra; client-ip=212.227.15.4; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of goswin-v-b@web.de) identity=mailfrom; client-ip=212.227.15.4; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mout.web.de) identity=helo; client-ip=212.227.15.4; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="postmaster@mout.web.de"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjwCANlRk1LU4w8EnGdsb2JhbABZgz+3LIVOgSsWDgEBAQEBBg0JCRQogiUBAQU6PxALDgoJJQ8FKCEnh1oBFgm1CR+IXI4kYweDIIETA5gTgTGEfRKOfA X-IPAS-Result: AjwCANlRk1LU4w8EnGdsb2JhbABZgz+3LIVOgSsWDgEBAQEBBg0JCRQogiUBAQU6PxALDgoJJQ8FKCEnh1oBFgm1CR+IXI4kYweDIIETA5gTgTGEfRKOfA X-IronPort-AV: E=Sophos;i="4.93,768,1378850400"; d="scan'208";a="45239793" Received: from mout.web.de ([212.227.15.4]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES128-SHA; 25 Nov 2013 14:38:32 +0100 Received: from frosties.localnet ([37.49.32.119]) by smtp.web.de (mrweb002) with ESMTPSA (Nemesis) id 0Lakoa-1VMvEN0JxI-00kNa7 for ; Mon, 25 Nov 2013 14:38:31 +0100 Received: from mrvn by frosties.localnet with local (Exim 4.80) (envelope-from ) id 1VkwMu-0001lR-9e; Mon, 25 Nov 2013 14:38:28 +0100 Date: Mon, 25 Nov 2013 14:38:28 +0100 From: Goswin von Brederlow To: Gerd Stolpmann Cc: "Richard W.M. Jones" , caml-list@inria.fr Message-ID: <20131125133828.GD3610@frosties> References: <20131118204426.GA14731@annexia.org> <1384819720.4083.57.camel@zotac> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1384819720.4083.57.camel@zotac> User-Agent: Mutt/1.5.21 (2010-09-15) X-Provags-ID: V03:K0:zNvQ58Yf+bhy0kcQod3qS93DJ2mRRDbba6aMooNuWRNaKOG6ZAR KeRGplGk3dSH/VSF6eKeDWVauwL+dceMwGgGjjQrK6ijyxiTpS4ZR2G0KxyVsY1ls+mcENj iusQY8sQItCuaD4WHbtgtOcvm6/FEI7FFQz7x3O72CzBSrDZcy2J89NxxTpvUuMh6BNqEpQ NQTXHGcNs3ZpZsmo5P4fA== Subject: Re: [Caml-list] Hardening [Perl's] hash function further On Tue, Nov 19, 2013 at 01:08:40AM +0100, Gerd Stolpmann wrote: > Am Montag, den 18.11.2013, 20:44 +0000 schrieb Richard W.M. Jones: > > The Perl community have found further attacks against their > > (already considered hardened) hash function. This article > > explains the problems quite well: > > > > http://blog.booking.com/hardening-perls-hash-function.html > > Interesting read. I'm wondering which conclusions one should draw from > it. This article is more or less discussing cryptoattacks on an insecure > hash function, and the new versions of the functions are only "better" > but in no way safe (which you only get with crypto hashes or > dictionaries that don't allow collisions). This sounds for me like a > race between attackers and hash implementors. > > Gerd I notice that the problem isn't exploting hash collisions but collisions in the bucket being used in the hashtable. So no matter how good the hash is the table still breaks down. 1) the hashtable isn't randomized by default - the bucket being used is predictable so you can make long chains Why not just always use the randomized hash? Why only use it when you think you have detected an abuse? That is just silly. 2) on long chains the hashtable size is doubled - each bucket is split in two and assuming a good hash each will get half the entries You don't double your size. The old and new size should not have a common divisor, esspecially not 2. Best is to use primes but not fastest. Using 2^n-1 is tempting since (x mod 2^n-1) is fast to compute without needing a long division. If the old and new size have no common divisor then the entries of the overly long chain are distributed over *all* the buckets. So instead of getting 2 buckets with n/2 entries you get buckets with n/size entries. 3) hashtables aren't re-randomized on resize When you resize a hashtable you have to sort every key into a new bucket. So why not change the randomize of the hash function at that time. If the randomizer is applied after the hash is computed this can be done quickly without having to rehash every key. Takes extra memory to store the non randomized hash value of each key though. If you don't want to spend that memory you can rehash every key, which also allows using a randomized hash function instead of randomizing the result of the hash. 4) hashtable sizes are predictable You can also randomize the size. Nobody says you have to (nearly) double the size in a predictable way. You can have a long list of good sizes (e.g. prime numbers) and randomly pick one that is between 1.8 and 2.2 times (or whatever grows factor you want) the size as the old size. The randomness will make the size of the table unpredictable making it that much harder to construct a key collision. Note: if a table is resized due to an overlong chain without being reasonable full I would just resize it a tiny bit instead of about double. That gets rid of the collisions without doubling the memory used. MfG Goswin PS: is anyone working on fixing the stdlib hashtable?