From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by sympa.inria.fr (Postfix) with ESMTPS id 6A3677ED26 for ; Wed, 13 Jun 2012 10:26:06 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av0EAP5N2E/V+6tl/2dsb2JhbABFtTqBB4IZAQU6TwtGFCghiCK6FosnFYUcYAOVII99gmKBVQ X-IronPort-AV: E=Sophos;i="4.75,762,1330902000"; d="scan'208";a="162643749" Received: from eneide.happyleptic.org ([213.251.171.101]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/AES256-SHA; 13 Jun 2012 10:25:42 +0200 Received: from extranet.securactive.org ([82.234.213.170] helo=ccellier.rd.securactive.lan) by eneide.happyleptic.org with esmtp (Exim 4.72) (envelope-from ) id 1SejOD-0000Dg-Bg for caml-list@inria.fr; Wed, 13 Jun 2012 10:57:21 +0200 Received: from rixed by ccellier.rd.securactive.lan with local (Exim 4.77) (envelope-from ) id 1SeitT-0004LJ-Ri for caml-list@inria.fr; Wed, 13 Jun 2012 10:25:35 +0200 Date: Wed, 13 Jun 2012 10:25:35 +0200 From: rixed@happyleptic.org To: caml users Message-ID: <20120613082535.GA16617@securactive.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [Caml-list] What is the default hash function of string in 3.12? -[ Tue, Jun 12, 2012 at 04:23:46PM -0400, Jianzhou Zhao ]---- > So, I was wondering what hash > function of String OCaml 3.12 uses, and if I can define my own. Here is the relevant part of the code: --[ byterun/hash.c ]-- switch (tag) { case String_tag: hash_univ_count--; i = caml_string_length(obj); for (p = &Byte_u(obj, 0); i > 0; i--, p++) Combine_small(*p); break; ---- If I can read C (which sometime I can't) then it seams hashing a string requires an amount of CPU proportional to the length of the string, and that a string count as one value only for the depth limit. We could imagine for instance that each char decrease the hash_univ_count, but then you need to choose the chars of the string you consider very wisely. For instance if you choose to hash only the first of last N chars then you risk hashing many strings that ends/starts with the same substrings to the same value. If you choose one char every N you'r still at risk (for instance strings that differs only by a few chars may all end with the same hash). So all in all its not easy to _not_ process all chars. As for changing the hash function, you can use the Hashtbl.Make functor to build your own hash out of the hash function you want. But there is no way to change the default hash function for strings AFAICT.