From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by sympa.inria.fr (Postfix) with ESMTPS id 1F5907EEAF for ; Fri, 18 Jan 2013 18:46:51 +0100 (CET) Received-SPF: None (mail4-smtp-sop.national.inria.fr: no sender authenticity information available from domain of jeannin@cs.cornell.edu) identity=pra; client-ip=128.84.103.138; receiver=mail4-smtp-sop.national.inria.fr; envelope-from="jeannin@cs.cornell.edu"; x-sender="jeannin@cs.cornell.edu"; x-conformance=sidf_compatible Received-SPF: None (mail4-smtp-sop.national.inria.fr: no sender authenticity information available from domain of jeannin@cs.cornell.edu) identity=mailfrom; client-ip=128.84.103.138; receiver=mail4-smtp-sop.national.inria.fr; envelope-from="jeannin@cs.cornell.edu"; x-sender="jeannin@cs.cornell.edu"; x-conformance=sidf_compatible Received-SPF: None (mail4-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@exch-hub1.cs.cornell.edu) identity=helo; client-ip=128.84.103.138; receiver=mail4-smtp-sop.national.inria.fr; envelope-from="jeannin@cs.cornell.edu"; x-sender="postmaster@exch-hub1.cs.cornell.edu"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ag4CAPeJ+VCAVGeKiWdsb2JhbABEgkipBpMADgEBARUSFAUiD4IPAQEBBEklCgEQCxgJBBIBBwcJAwIBAgEPJAERBg0BBQIBAQ6HdQMPDK89hEkDCkyHfowJgRWEGwOILDWLVYJyhE+FTIglgUo X-IronPort-AV: E=Sophos;i="4.84,494,1355094000"; d="scan'208,217";a="168955476" Received: from mail-hub-1.cs.cornell.edu (HELO exch-hub1.cs.cornell.edu) ([128.84.103.138]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-MD5; 18 Jan 2013 18:46:49 +0100 Received: from [128.84.98.62] (128.84.98.62) by mail.cs.cornell.edu (128.84.103.140) with Microsoft SMTP Server (TLS) id 8.3.279.5; Fri, 18 Jan 2013 12:46:48 -0500 Message-ID: <50F98A81.6080902@cs.cornell.edu> Date: Fri, 18 Jan 2013 12:46:41 -0500 From: Jean-Baptiste Jeannin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Nicholas Lucaroni CC: Edgar Friendly , "caml-list@inria.fr" References: <50F81A48.4080901@cs.cornell.edu> In-Reply-To: Content-Type: multipart/alternative; boundary="------------050308010900000404020105" Subject: Re: [Caml-list] Hash function: complexity and circular structures --------------050308010900000404020105 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Thank you for your answers and suggestions. I upgraded to ocaml 4.00, and the update to the hash function is great and appreciated. I haven't studied closely hash.c yet, but I am very curious about it and will look at it, thanks for pointing it. However, the hope was to be able to use a hash table with (potentially) circular lists as keys. And, interestingly, this does not quite work, even with the update of the hashing function: let t1 = Hashtbl.create 10;; let rec ones = 1 :: ones;; let rec twos = 2 :: twos;; let ones2 = 1 :: 1 :: ones;; let rec ones3 = 1 :: 1 :: ones3;; (* ones, ones2 and ones3 are observationally equivalent, and their hash is the same: *) Hashtbl.hash ones;; - : int = 312056965 Hashtbl.hash ones2;; - : int = 312056965 Hashtbl.hash ones3;; - : int = 312056965 Hashtbl.add t1 ones 1;; Hashtbl.add t1 twos 2;; Hashtbl.find t1 ones;; - : int = 1 (* as expected *) Hashtbl.add t1 ones2 11;; Hashtbl.find t1 ones;; - : int = 11 (* as expected *) Hashtbl.find t1 ones2;; - : int = 11 (* as expected *) (* However this does not work and runs forever *) Hashtbl.find t1 ones3;; Looking a little bit further into the implementation of Hashtbl.find (available at https://github.com/ocaml/ocaml/blob/trunk/stdlib/hashtbl.ml), it appears that inside each bucket, the keys are compared using Pervasives.compare. Pervasives.compare halts on ones and ones2 (probably because they have sublists that are physically equal, though I haven't checked the implementation of compare), but it does not halt on ones and ones3, or on ones2 and ones3: compare ones ones2;; - : int = 0 compare ones ones3;; (* runs forever *) compare ones2 ones3;; (* runs forever *) I would be curious to know if this is by design (it is supposed not to work), or if it is a problem with the implementation of compare, or of Hashtbl.find. In particular, if it is by design, why have updated the hash function to support circular lists? I am also now stuck on creating an (efficient) hashtable supporting circular data structures as keys. Any idea on this? Thank you, Jean-Baptiste Jeannin On 01/17/2013 12:05 PM, Nicholas Lucaroni wrote: > You should update to OCaml 4.xx I don't think the hash function works > at all on lists in 3.12.1 (in the sense that it, as you've found, it > always returns 0). > > in ocaml 4.00+rc1 > # Hashtbl.hash (6 :: 7 :: 8 :: 9 :: 10 :: 11 :: 12 :: 13:: 14 :: 15 :: > []);; > - : int = 418135511 > > in ocaml 3.12.1 > # Hashtbl.hash (6 :: 7 :: 8 :: 9 :: 10 :: 11 :: 12 :: 13:: 14 :: 15 :: > []);; > - : int = 0 > > > > On Thu, Jan 17, 2013 at 11:41 AM, Edgar Friendly > wrote: > > The source is quite easy to read, and is available here: > > https://github.com/ocaml/ocaml/blob/trunk/byterun/hash.c > > It turns out that even seeded MurmurHash3 does not prevent > algorithmic attacks as expected, as demonstrated and explained at > 29c3: > http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html > and www.youtube.com/watch?v=Vdrab3sB7MU > . > > E. > > > On Thu, Jan 17, 2013 at 10:46 AM, Nicholas Lucaroni > > wrote: > > https://sympa.inria.fr/sympa/arc/caml-list/2011-07/msg00117.html > > That thread talks about the previous and a better alternative > (which is in 4.x). > > Xavier had said, > /The SVN trunk for OCaml contains a complete reimplementation of the/ > /generic hash function that addresses both issues: lists and other > complex keys are traversed breadth-first in a more cautious manner > than before, and the mixing functions are based on MurmurHash 3, which > exhibits very good statistical properties. All this should go in the > next major release 3.13. So, everyone with an interest in efficient > hash tables is welcome to try the trunk sources and let me know of any > problems encountered./ > > On Thu, Jan 17, 2013 at 10:35 AM, Jean-Baptiste Jeannin > > wrote: > > Hello, > > The default OCaml function (Hashtbl.hash) says that it > "always terminates, even on cyclic structures.". I would > be curious to know what its complexity is, both on a > finite list and on a cyclic list (created by let rec l = 1 > :: l for example). Is the algorithm that is being used > published anywhere? > > Also, this hashing function seems to be returning 0 on any > cyclic list (at least the ones I tried). Is this normal? > Does anyone know any better hashing function on cyclic lists? > > # Hashtbl.hash [ 1 ; 2 ];; > - : int = 131199 > # let rec ones = 1 :: ones;; > val ones : int list = > [1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; > 1; 1; 1; 1; 1; 1; > ...] > # Hashtbl.hash ones;; > - : int = 0 > # Hashtbl.hash (5 :: 4 :: ones);; > - : int = 0 > > Thank you, > Jean-Baptiste Jeannin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > > > --------------050308010900000404020105 Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Thank you for your answers and suggestions. I upgraded to ocaml 4.00, and the update to the hash function is great and appreciated. I haven't studied closely hash.c yet, but I am very curious about it and will look at it, thanks for pointing it.

However, the hope was to be able to use a hash table with (potentially) circular lists as keys. And, interestingly, this does not quite work, even with the update of the hashing function:

let t1 = Hashtbl.create 10;;
let rec ones = 1 :: ones;;
let rec twos = 2 :: twos;;
let ones2 = 1 :: 1 :: ones;;
let rec ones3 = 1 :: 1 :: ones3;;
(* ones, ones2 and ones3 are observationally equivalent, and their hash is the same: *)
Hashtbl.hash ones;;
- : int = 312056965
Hashtbl.hash ones2;;
- : int = 312056965
Hashtbl.hash ones3;;
- : int = 312056965

Hashtbl.add t1 ones 1;;
Hashtbl.add t1 twos 2;;
Hashtbl.find t1 ones;;
- : int = 1 (* as expected *)

Hashtbl.add t1 ones2 11;;
Hashtbl.find t1 ones;;
- : int = 11 (* as expected *)
Hashtbl.find t1 ones2;;
- : int = 11 (* as expected *)

(* However this does not work and runs forever *)
Hashtbl.find t1 ones3;;

Looking a little bit further into the implementation of Hashtbl.find (available at https://github.com/ocaml/ocaml/blob/trunk/stdlib/hashtbl.ml), it appears that inside each bucket, the keys are compared using Pervasives.compare. Pervasives.compare halts on ones and ones2 (probably because they have sublists that are physically equal, though I haven't checked the implementation of compare), but it does not halt on ones and ones3, or on ones2 and ones3:

compare ones ones2;;
- : int = 0
compare ones ones3;; (* runs forever *)
compare ones2 ones3;; (* runs forever *)

I would be curious to know if this is by design (it is supposed not to work), or if it is a problem with the implementation of compare, or of Hashtbl.find. In particular, if it is by design, why have updated the hash function to support circular lists?
I am also now stuck on creating an (efficient) hashtable supporting circular data structures as keys. Any idea on this?

Thank you,
Jean-Baptiste Jeannin

On 01/17/2013 12:05 PM, Nicholas Lucaroni wrote:
You should update to OCaml 4.xx I don't think the hash function works at all on lists in 3.12.1 (in the sense that it, as you've found, it always returns 0).

in ocaml 4.00+rc1
# Hashtbl.hash (6 :: 7 :: 8 :: 9 :: 10 :: 11 :: 12 :: 13:: 14 :: 15 :: []);;
- : int = 418135511

in ocaml 3.12.1
# Hashtbl.hash (6 :: 7 :: 8 :: 9 :: 10 :: 11 :: 12 :: 13:: 14 :: 15 :: []);;
- : int = 0



On Thu, Jan 17, 2013 at 11:41 AM, Edgar Friendly <thelema314@gmail.com> wrote:
The source is quite easy to read, and is available here:

https://github.com/ocaml/ocaml/blob/trunk/byterun/hash.c

It turns out that even seeded MurmurHash3 does not prevent algorithmic attacks as expected, as demonstrated and explained at 29c3: http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html and www.youtube.com/watch?v=Vdrab3sB7MU.

E.


On Thu, Jan 17, 2013 at 10:46 AM, Nicholas Lucaroni <nicholas.r.lucaroni@gmail.com> wrote:
https://sympa.inria.fr/sympa/arc/caml-list/2011-07/msg00117.html

That thread talks about the previous and a better alternative (which is in 4.x). 

Xavier had said,
The SVN trunk for OCaml contains a complete reimplementation of the
generic hash function that addresses both issues: lists and other
complex keys are traversed breadth-first in a more cautious manner
than before, and the mixing functions are based on MurmurHash 3, which
exhibits very good statistical properties.  All this should go in the
next major release 3.13.  So, everyone with an interest in efficient
hash tables is welcome to try the trunk sources and let me know of any
problems encountered.

On Thu, Jan 17, 2013 at 10:35 AM, Jean-Baptiste Jeannin <jeannin@cs.cornell.edu> wrote:
Hello,

The default OCaml function (Hashtbl.hash) says that it "always terminates, even on cyclic structures.". I would be curious to know what its complexity is, both on a finite list and on a cyclic list (created by let rec l = 1 :: l for example). Is the algorithm that is being used published anywhere?

Also, this hashing function seems to be returning 0 on any cyclic list (at least the ones I tried). Is this normal? Does anyone know any better hashing function on cyclic lists?

# Hashtbl.hash [ 1 ; 2 ];;
- : int = 131199
# let rec ones = 1 :: ones;;
val ones : int list =
  [1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1;
   ...]
# Hashtbl.hash ones;;
- : int = 0
# Hashtbl.hash (5 :: 4 :: ones);;
- : int = 0

Thank you,
Jean-Baptiste Jeannin

--
Caml-list mailing list.  Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs




--------------050308010900000404020105--