Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: rixed@happyleptic.org
To: caml users <caml-list@inria.fr>
Subject: Re: [Caml-list] What is the default hash function of string in 3.12?
Date: Wed, 13 Jun 2012 10:25:35 +0200	[thread overview]
Message-ID: <20120613082535.GA16617@securactive.lan> (raw)
In-Reply-To: <CALhoTJNkkGOM-sR0=O3Z7wHqKM8B9JHsjLr+jWZ8Hjejbg2iow@mail.gmail.com>

-[ Tue, Jun 12, 2012 at 04:23:46PM -0400, Jianzhou Zhao ]----
> So, I was wondering what hash
> function of String OCaml 3.12 uses, and if I can define my own.

Here is the relevant part of the code:

--[ byterun/hash.c ]--

    switch (tag) {
    case String_tag:
      hash_univ_count--;
      i = caml_string_length(obj);
      for (p = &Byte_u(obj, 0); i > 0; i--, p++)
        Combine_small(*p);
      break;

----

If I can read C (which sometime I can't) then it seams hashing a string
requires an amount of CPU proportional to the length of the string, and that a
string count as one value only for the depth limit.  We could imagine for
instance that each char decrease the hash_univ_count, but then you need to
choose the chars of the string you consider very wisely.  For instance if you
choose to hash only the first of last N chars then you risk hashing many
strings that ends/starts with the same substrings to the same value. If you
choose one char every N you'r still at risk (for instance strings that differs
only by a few chars may all end with the same hash).  So all in all its not
easy to _not_ process all chars.

As for changing the hash function, you can use the Hashtbl.Make functor to
build your own hash out of the hash function you want.  But there is no way to
change the default hash function for strings AFAICT.


      reply	other threads:[~2012-06-13  8:26 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-12 20:23 Jianzhou Zhao
2012-06-13  8:25 ` rixed [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120613082535.GA16617@securactive.lan \
    --to=rixed@happyleptic.org \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox