From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA13057 for caml-red; Thu, 25 May 2000 09:01:55 +0200 (MET DST) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id AAA20368 for ; Thu, 25 May 2000 00:04:20 +0200 (MET DST) Received: from mail3.microsoft.com (mail3.microsoft.com [131.107.3.123]) by concorde.inria.fr (8.10.0/8.10.0) with SMTP id e4OM4Jv05714 for ; Thu, 25 May 2000 00:04:19 +0200 (MET DST) Received: from 157.54.9.100 by mail3.microsoft.com (InterScan E-Mail VirusWall NT); Wed, 24 May 2000 15:04:17 -0700 (Pacific Daylight Time) Received: by INET-IMC-03 with Internet Mail Service (5.5.2651.58) id ; Wed, 24 May 2000 15:04:17 -0700 Message-ID: <783D93998201D311B0CF00805FEAA07B7E9233@RED-MSG-42> From: Manuel Fahndrich To: caml-list@pauillac.inria.fr Subject: Alternative generic hash function Date: Wed, 24 May 2000 15:04:11 -0700 X-Mailer: Internet Mail Service (5.5.2651.58) Sender: weis The generic hash function Hashtbl.hash traverses into mutable data structures. So I get different hash values depending on updates of the data structure. # let x = ref None;; val x : '_a option ref = {contents=None} # Hashtbl.hash x;; - : int = 0 # x := Some 5;; - : unit = () # Hashtbl.hash x;; - : int = 5 As was pointed out times before on this list, Hashtbl.hash is thus unsuitable for hashing on data structures that contain mutable data. I'm wondering what the design rational is for this choice. I'd rather have a generic hash function (Hashtbl.hash_pure) that never looks into mutable structures (skips refs and mutable fields of records and classes). The reason being that if I want to hash on data that contains references, I'm more likely to want to find the structure independent of the contents of any mutable subparts. So yes, a hashtable containing int refs would pointless, because it maps them all to the same bucket. But I frequently use hashing with equality being ==. Whenever I do this, I need to actually add some unique identifier to the data structure just for the hashing operation. Having an alternate hash function (Hashtbl.hash_pure) would make such id's (and their managment) unnecessary. I see how the current working of Hashtbl.hash is useful for data structures that contain mutable parts in the case where such structures don't change between the time they are stored and later looked up in the hash table. But I'm wondering how often this actually arises. Furthermore, using Hashtbl.hash_pure in place of Hashtbl.hash would not break any old code (except for performance), since all it does is make the equivalence classes bigger. On the other hand, using Hashtbl.hash with the misconception that it acts as Hashtbl.hash_pure breaks lots of code. I rest my case. Manuel Fahndrich