* [Caml-list] Hashing failure
[not found] <CAHR=VkzBhR5FMsWO-_BVzYV4yhSARP8rg7G3b=jezU6OOOg0jQ@mail.gmail.com>
@ 2012-12-20 20:18 ` Thomas Braibant
2012-12-20 20:32 ` [Caml-list] " Thomas Braibant
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Thomas Braibant @ 2012-12-20 20:18 UTC (permalink / raw)
To: caml-list
Hi list,
I have a piece of code that looks like this.
let hkey = H.hash d in (* a custom hash function *)
let index = hkey mod (Array.length t.table) in
let bucket = t.table.(index) in (* 1 *)
the line marked with 1 was generating an index out of bounds
exception. It appeared that it was because index was negative, a
problem I knew of when using mod ... So I added an abs around index
let hkey = H.hash d in
let index = abs (hkey mod (Array.length t.table)) in
let bucket = t.table.(index) in (* 1 *)
But it yields the same exception...
This seemed completely unbelievable, till I realized that
- x mod y may return a negative value if x is less than 0
- abs x may return a negative value if x is min_int
So I tried to reproduce this behavior with the following code:
let t = Array.create 17 0;;
let hkey = min_int in
let index = (abs hkey) mod (Array.length t) in
t.(index);;
which yields an out of bound. Bingo. It turns out that some value of
the length of t will result in an error, and some will not (e.g., 16) ... My
question is : is there a safe way to bullet-proof my code against
these pathological cases, without too much hinderance (this is some
code whose efficiency is critical...)
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Caml-list] Re: Hashing failure
2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant
@ 2012-12-20 20:32 ` Thomas Braibant
2012-12-20 21:01 ` Martin Jambon
2012-12-21 7:58 ` [Caml-list] " Jacques Garrigue
2012-12-21 10:02 ` Xavier Leroy
2 siblings, 1 reply; 6+ messages in thread
From: Thomas Braibant @ 2012-12-20 20:32 UTC (permalink / raw)
To: caml-list
The second piece of code should read
> let hkey = H.hash d in
> let index = (abs hkey) mod (Array.length t.table) in
> let bucket = t.table.(index) in (* 1 *)
Sorry for the noise.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Re: Hashing failure
2012-12-20 20:32 ` [Caml-list] " Thomas Braibant
@ 2012-12-20 21:01 ` Martin Jambon
0 siblings, 0 replies; 6+ messages in thread
From: Martin Jambon @ 2012-12-20 21:01 UTC (permalink / raw)
To: Thomas Braibant; +Cc: caml-list
On Thu 20 Dec 2012 12:32:07 PM PST, Thomas Braibant wrote:
> The second piece of code should read
>> let hkey = H.hash d in
>> let index = (abs hkey) mod (Array.length t.table) in
>> let bucket = t.table.(index) in (* 1 *)
>
> Sorry for the noise.
You can use (if hkey = min_int then 0 else abs hkey). It is guaranteed
to work and won't cost much.
Martin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Hashing failure
2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant
2012-12-20 20:32 ` [Caml-list] " Thomas Braibant
@ 2012-12-21 7:58 ` Jacques Garrigue
2012-12-21 10:02 ` Xavier Leroy
2 siblings, 0 replies; 6+ messages in thread
From: Jacques Garrigue @ 2012-12-21 7:58 UTC (permalink / raw)
To: Thomas Braibant; +Cc: caml-list
On 2012/12/21, at 5:18, Thomas Braibant <thomas.braibant@gmail.com> wrote:
> Hi list,
>
> I have a piece of code that looks like this.
>
> let hkey = H.hash d in (* a custom hash function *)
> let index = hkey mod (Array.length t.table) in
> let bucket = t.table.(index) in (* 1 *)
>
> the line marked with 1 was generating an index out of bounds
> exception. It appeared that it was because index was negative, a
> problem I knew of when using mod ... So I added an abs around index
>
> let hkey = H.hash d in
> let index = abs (hkey mod (Array.length t.table)) in
> let bucket = t.table.(index) in (* 1 *)
>
> But it yields the same exception...
>
> This seemed completely unbelievable, till I realized that
> - x mod y may return a negative value if x is less than 0
> - abs x may return a negative value if x is min_int
What about just ensuring that your hash function returns a positive
value to start with.
An easy way to do that is
let hkey = hkey land max_int
This avoids using abs altogether.
(I now understand why the standard hash function is 30-bit…)
Jacques Garrigue
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Hashing failure
2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant
2012-12-20 20:32 ` [Caml-list] " Thomas Braibant
2012-12-21 7:58 ` [Caml-list] " Jacques Garrigue
@ 2012-12-21 10:02 ` Xavier Leroy
2012-12-21 19:03 ` Martin Jambon
2 siblings, 1 reply; 6+ messages in thread
From: Xavier Leroy @ 2012-12-21 10:02 UTC (permalink / raw)
To: caml-list
On 12/20/2012 09:18 PM, Thomas Braibant wrote:
> [Reducing a hash value to a [0, N) interval ]
> My question is : is there a safe way to bullet-proof my code against
> these pathological cases, without too much hinderance (this is some
> code whose efficiency is critical...)
Some possibilities to reduce an integer k to the interval [0, N):
abs (k mod N)
(if N > 0, k mod N is guaranteed to be in (-N,N) )
(k land max_int) mod N
("land max_int" masks the sign bit off and preserves all other
bits, reducing k to [0, max_int])
The latter is a bit more efficient, as it avoids the conditional in "abs".
If N is a power of 2, you can just do
k land (N - 1)
For hash tables, it is easy to ensure that the size of the array is a
power of 2. However, the formula above assumes that the low bits of k
have good distribution, which is not always the case if k is computed
with a simple hash function. You can also "mix up" some of the high
bits before taking the low bits, e.g.
(k lxor (k lsr 16)) land (N - 1)
Hope this helps,
- Xavier
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Hashing failure
2012-12-21 10:02 ` Xavier Leroy
@ 2012-12-21 19:03 ` Martin Jambon
0 siblings, 0 replies; 6+ messages in thread
From: Martin Jambon @ 2012-12-21 19:03 UTC (permalink / raw)
To: Xavier Leroy; +Cc: caml-list
On Fri 21 Dec 2012 02:02:38 AM PST, Xavier Leroy wrote:
> On 12/20/2012 09:18 PM, Thomas Braibant wrote:
>
>> [Reducing a hash value to a [0, N) interval ]
>> My question is : is there a safe way to bullet-proof my code against
>> these pathological cases, without too much hinderance (this is some
>> code whose efficiency is critical...)
>
> Some possibilities to reduce an integer k to the interval [0, N):
>
> abs (k mod N)
> (if N > 0, k mod N is guaranteed to be in (-N,N) )
>
> (k land max_int) mod N
> ("land max_int" masks the sign bit off and preserves all other
> bits, reducing k to [0, max_int])
Is the two's complement representation of integers guaranteed by all
OCaml implementations? It would be helpful if the manual was explicit
about it.
Martin
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-12-21 19:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CAHR=VkzBhR5FMsWO-_BVzYV4yhSARP8rg7G3b=jezU6OOOg0jQ@mail.gmail.com>
2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant
2012-12-20 20:32 ` [Caml-list] " Thomas Braibant
2012-12-20 21:01 ` Martin Jambon
2012-12-21 7:58 ` [Caml-list] " Jacques Garrigue
2012-12-21 10:02 ` Xavier Leroy
2012-12-21 19:03 ` Martin Jambon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox