From: Jacques Garrigue <garrigue@math.nagoya-u.ac.jp>
To: jon@ffconsultancy.com
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Maximum non-constant constructors
Date: Sat, 19 Mar 2005 11:28:23 +0900 (JST) [thread overview]
Message-ID: <20050319.112823.108781186.garrigue@math.nagoya-u.ac.jp> (raw)
In-Reply-To: <200503171755.25927.jon@ffconsultancy.com>
From: Jon Harrop <jon@ffconsultancy.com>
> On Thursday 17 March 2005 14:00, Eric Cooper wrote:
> > For example, for all strings XXX, the variants `XXX and `zyctRecXXX
> > collide.
> > My guess is that none of them are likely to be chosen by
> > humans, but might occur in program-generated code.
>
> Good point! I think this justifies the existence of a library function to
> compare the hashes of string names of polymorphic variants. Does such a
> function exist?
Here it is:
let hash_variant s =
let accu = ref 0 in
for i = 0 to String.length s - 1 do
accu := 223 * !accu + Char.code s.[i]
done;
(* reduce to 31 bits *)
accu := !accu land (1 lsl 31 - 1);
(* make it signed for 64 bits architectures *)
if !accu > 0x3FFFFFFF then !accu - (1 lsl 31) else !accu
It is also available in the C runtime (where it is useful for
interfacing) but if you really need it in your program just copy the
above code.
To give you an idea of the probability of conflict, here are the
conclicts I found in a 235882 word dictionary (/usr/share/dict/web2 on
FreeBSD)
Tag `Cretacic conflicts with `cornigerous
Tag `gedackt conflicts with `aquicolous
Tag `myeloencephalitis conflicts with `adequative
Tag `Nemorensian conflicts with `condor
Tag `nonrioter conflicts with `anematosis
Tag `omphaloma conflicts with `crimelessness
Tag `persecute conflicts with `paraenetic
Tag `soredioid conflicts with `genitorial
Tag `subpopulation conflicts with `oxyquinone
Tag `sympathy conflicts with `herbman
Tag `trophaeum conflicts with `prepossession
Tag `unbreaded conflicts with `neback
Tag `undragooned conflicts with `nuisance
Tag `unlistened conflicts with `disturb
Tag `veratrine conflicts with `absolutely
You will get different conflict if you change the capitalization, but
not more.
So much for the birthday paradox: variants fully use 31 bits, so that
the probability only gets more than 0.5 when you have more than 32000
constructors in the same type... Of course this is assuming a perfect
distribution, but as the above dictionary shows, reality is close to
that.
By the way, the check is not at link time, as was stated in another
message, but at compile time. It is the typing of variants itself that
guarantees that no such conflict will go undetected. So you will be
warned early enough.
The same hash function is also used for method names (with the same
compile-time check), but fortunately it is hard to imagine an object
with more than 10000 methods.
Jacques Garrigue
prev parent reply other threads:[~2005-03-19 2:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-16 19:51 Tom Hawkins
2005-03-16 20:42 ` [Caml-list] " Alain Frisch
2005-03-16 23:22 ` Tom Hawkins
2005-03-17 15:23 ` Stefan Monnier
2005-03-16 23:32 ` [Caml-list] " Richard Jones
2005-03-17 12:36 ` Jean-Christophe Filliatre
2005-03-17 14:00 ` Eric Cooper
2005-03-17 17:32 ` Marcin 'Qrczak' Kowalczyk
2005-03-17 18:27 ` Richard Jones
2005-03-17 17:55 ` Jon Harrop
2005-03-19 2:28 ` Jacques Garrigue [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050319.112823.108781186.garrigue@math.nagoya-u.ac.jp \
--to=garrigue@math.nagoya-u.ac.jp \
--cc=caml-list@yquem.inria.fr \
--cc=jon@ffconsultancy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox