From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <goswin-v-b@web.de>
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by sympa.inria.fr (Postfix) with ESMTPS id 7445E7EE6B
	for <caml-list@sympa.inria.fr>; Mon, 25 Nov 2013 14:38:32 +0100 (CET)
Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender
  authenticity information available from domain of
  goswin-v-b@web.de) identity=pra; client-ip=212.227.15.4;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="goswin-v-b@web.de";
  x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible
Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender
  authenticity information available from domain of
  goswin-v-b@web.de) identity=mailfrom; client-ip=212.227.15.4;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="goswin-v-b@web.de";
  x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible
Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender
  authenticity information available from domain of
  postmaster@mout.web.de) identity=helo;
  client-ip=212.227.15.4;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="goswin-v-b@web.de";
  x-sender="postmaster@mout.web.de";
  x-conformance=sidf_compatible
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AjwCANlRk1LU4w8EnGdsb2JhbABZgz+3LIVOgSsWDgEBAQEBBg0JCRQogiUBAQU6PxALDgoJJQ8FKCEnh1oBFgm1CR+IXI4kYweDIIETA5gTgTGEfRKOfA
X-IPAS-Result: AjwCANlRk1LU4w8EnGdsb2JhbABZgz+3LIVOgSsWDgEBAQEBBg0JCRQogiUBAQU6PxALDgoJJQ8FKCEnh1oBFgm1CR+IXI4kYweDIIETA5gTgTGEfRKOfA
X-IronPort-AV: E=Sophos;i="4.93,768,1378850400"; 
   d="scan'208";a="45239793"
Received: from mout.web.de ([212.227.15.4])
  by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES128-SHA; 25 Nov 2013 14:38:32 +0100
Received: from frosties.localnet ([37.49.32.119]) by smtp.web.de (mrweb002)
 with ESMTPSA (Nemesis) id 0Lakoa-1VMvEN0JxI-00kNa7 for <caml-list@inria.fr>;
 Mon, 25 Nov 2013 14:38:31 +0100
Received: from mrvn by frosties.localnet with local (Exim 4.80)
	(envelope-from <goswin-v-b@web.de>)
	id 1VkwMu-0001lR-9e; Mon, 25 Nov 2013 14:38:28 +0100
Date: Mon, 25 Nov 2013 14:38:28 +0100
From: Goswin von Brederlow <goswin-v-b@web.de>
To: Gerd Stolpmann <info@gerd-stolpmann.de>
Cc: "Richard W.M. Jones" <rich@annexia.org>, caml-list@inria.fr
Message-ID: <20131125133828.GD3610@frosties>
References: <20131118204426.GA14731@annexia.org>
 <1384819720.4083.57.camel@zotac>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1384819720.4083.57.camel@zotac>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Provags-ID:  V03:K0:zNvQ58Yf+bhy0kcQod3qS93DJ2mRRDbba6aMooNuWRNaKOG6ZAR
 KeRGplGk3dSH/VSF6eKeDWVauwL+dceMwGgGjjQrK6ijyxiTpS4ZR2G0KxyVsY1ls+mcENj
 iusQY8sQItCuaD4WHbtgtOcvm6/FEI7FFQz7x3O72CzBSrDZcy2J89NxxTpvUuMh6BNqEpQ
 NQTXHGcNs3ZpZsmo5P4fA==
Subject: Re: [Caml-list] Hardening [Perl's] hash function further

On Tue, Nov 19, 2013 at 01:08:40AM +0100, Gerd Stolpmann wrote:
> Am Montag, den 18.11.2013, 20:44 +0000 schrieb Richard W.M. Jones:
> > The Perl community have found further attacks against their
> > (already considered hardened) hash function.  This article
> > explains the problems quite well:
> > 
> > http://blog.booking.com/hardening-perls-hash-function.html
> 
> Interesting read. I'm wondering which conclusions one should draw from
> it. This article is more or less discussing cryptoattacks on an insecure
> hash function, and the new versions of the functions are only "better"
> but in no way safe (which you only get with crypto hashes or
> dictionaries that don't allow collisions). This sounds for me like a
> race between attackers and hash implementors.
> 
> Gerd

I notice that the problem isn't exploting hash collisions but
collisions in the bucket being used in the hashtable. So no matter how
good the hash is the table still breaks down.

1) the hashtable isn't randomized by default
   - the bucket being used is predictable so you can make long chains

Why not just always use the randomized hash? Why only use it when you
think you have detected an abuse? That is just silly.

2) on long chains the hashtable size is doubled
   - each bucket is split in two and assuming a good hash each will
     get half the entries

You don't double your size. The old and new size should not have a
common divisor, esspecially not 2. Best is to use primes but not
fastest. Using 2^n-1 is tempting since (x mod 2^n-1) is fast to
compute without needing a long division.

If the old and new size have no common divisor then the entries of the
overly long chain are distributed over *all* the buckets. So instead
of getting 2 buckets with n/2 entries you get buckets with n/size
entries.

3) hashtables aren't re-randomized on resize

When you resize a hashtable you have to sort every key into a new
bucket. So why not change the randomize of the hash function at that
time. If the randomizer is applied after the hash is computed this can
be done quickly without having to rehash every key. Takes extra memory
to store the non randomized hash value of each key though. If you
don't want to spend that memory you can rehash every key, which also
allows using a randomized hash function instead of randomizing the
result of the hash.

4) hashtable sizes are predictable

You can also randomize the size. Nobody says you have to (nearly)
double the size in a predictable way. You can have a long list of good
sizes (e.g. prime numbers) and randomly pick one that is between 1.8
and 2.2 times (or whatever grows factor you want) the size as the old
size. The randomness will make the size of the table unpredictable
making it that much harder to construct a key collision.

Note: if a table is resized due to an overlong chain without being
reasonable full I would just resize it a tiny bit instead of about
double. That gets rid of the collisions without doubling the memory
used.

MfG
	Goswin

PS: is anyone working on fixing the stdlib hashtable?