* [Caml-list] Hashing failure [not found] <CAHR=VkzBhR5FMsWO-_BVzYV4yhSARP8rg7G3b=jezU6OOOg0jQ@mail.gmail.com> @ 2012-12-20 20:18 ` Thomas Braibant 2012-12-20 20:32 ` [Caml-list] " Thomas Braibant ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Thomas Braibant @ 2012-12-20 20:18 UTC (permalink / raw) To: caml-list Hi list, I have a piece of code that looks like this. let hkey = H.hash d in (* a custom hash function *) let index = hkey mod (Array.length t.table) in let bucket = t.table.(index) in (* 1 *) the line marked with 1 was generating an index out of bounds exception. It appeared that it was because index was negative, a problem I knew of when using mod ... So I added an abs around index let hkey = H.hash d in let index = abs (hkey mod (Array.length t.table)) in let bucket = t.table.(index) in (* 1 *) But it yields the same exception... This seemed completely unbelievable, till I realized that - x mod y may return a negative value if x is less than 0 - abs x may return a negative value if x is min_int So I tried to reproduce this behavior with the following code: let t = Array.create 17 0;; let hkey = min_int in let index = (abs hkey) mod (Array.length t) in t.(index);; which yields an out of bound. Bingo. It turns out that some value of the length of t will result in an error, and some will not (e.g., 16) ... My question is : is there a safe way to bullet-proof my code against these pathological cases, without too much hinderance (this is some code whose efficiency is critical...) Thomas ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Caml-list] Re: Hashing failure 2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant @ 2012-12-20 20:32 ` Thomas Braibant 2012-12-20 21:01 ` Martin Jambon 2012-12-21 7:58 ` [Caml-list] " Jacques Garrigue 2012-12-21 10:02 ` Xavier Leroy 2 siblings, 1 reply; 6+ messages in thread From: Thomas Braibant @ 2012-12-20 20:32 UTC (permalink / raw) To: caml-list The second piece of code should read > let hkey = H.hash d in > let index = (abs hkey) mod (Array.length t.table) in > let bucket = t.table.(index) in (* 1 *) Sorry for the noise. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Re: Hashing failure 2012-12-20 20:32 ` [Caml-list] " Thomas Braibant @ 2012-12-20 21:01 ` Martin Jambon 0 siblings, 0 replies; 6+ messages in thread From: Martin Jambon @ 2012-12-20 21:01 UTC (permalink / raw) To: Thomas Braibant; +Cc: caml-list On Thu 20 Dec 2012 12:32:07 PM PST, Thomas Braibant wrote: > The second piece of code should read >> let hkey = H.hash d in >> let index = (abs hkey) mod (Array.length t.table) in >> let bucket = t.table.(index) in (* 1 *) > > Sorry for the noise. You can use (if hkey = min_int then 0 else abs hkey). It is guaranteed to work and won't cost much. Martin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Hashing failure 2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant 2012-12-20 20:32 ` [Caml-list] " Thomas Braibant @ 2012-12-21 7:58 ` Jacques Garrigue 2012-12-21 10:02 ` Xavier Leroy 2 siblings, 0 replies; 6+ messages in thread From: Jacques Garrigue @ 2012-12-21 7:58 UTC (permalink / raw) To: Thomas Braibant; +Cc: caml-list On 2012/12/21, at 5:18, Thomas Braibant <thomas.braibant@gmail.com> wrote: > Hi list, > > I have a piece of code that looks like this. > > let hkey = H.hash d in (* a custom hash function *) > let index = hkey mod (Array.length t.table) in > let bucket = t.table.(index) in (* 1 *) > > the line marked with 1 was generating an index out of bounds > exception. It appeared that it was because index was negative, a > problem I knew of when using mod ... So I added an abs around index > > let hkey = H.hash d in > let index = abs (hkey mod (Array.length t.table)) in > let bucket = t.table.(index) in (* 1 *) > > But it yields the same exception... > > This seemed completely unbelievable, till I realized that > - x mod y may return a negative value if x is less than 0 > - abs x may return a negative value if x is min_int What about just ensuring that your hash function returns a positive value to start with. An easy way to do that is let hkey = hkey land max_int This avoids using abs altogether. (I now understand why the standard hash function is 30-bit…) Jacques Garrigue ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Hashing failure 2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant 2012-12-20 20:32 ` [Caml-list] " Thomas Braibant 2012-12-21 7:58 ` [Caml-list] " Jacques Garrigue @ 2012-12-21 10:02 ` Xavier Leroy 2012-12-21 19:03 ` Martin Jambon 2 siblings, 1 reply; 6+ messages in thread From: Xavier Leroy @ 2012-12-21 10:02 UTC (permalink / raw) To: caml-list On 12/20/2012 09:18 PM, Thomas Braibant wrote: > [Reducing a hash value to a [0, N) interval ] > My question is : is there a safe way to bullet-proof my code against > these pathological cases, without too much hinderance (this is some > code whose efficiency is critical...) Some possibilities to reduce an integer k to the interval [0, N): abs (k mod N) (if N > 0, k mod N is guaranteed to be in (-N,N) ) (k land max_int) mod N ("land max_int" masks the sign bit off and preserves all other bits, reducing k to [0, max_int]) The latter is a bit more efficient, as it avoids the conditional in "abs". If N is a power of 2, you can just do k land (N - 1) For hash tables, it is easy to ensure that the size of the array is a power of 2. However, the formula above assumes that the low bits of k have good distribution, which is not always the case if k is computed with a simple hash function. You can also "mix up" some of the high bits before taking the low bits, e.g. (k lxor (k lsr 16)) land (N - 1) Hope this helps, - Xavier ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] Hashing failure 2012-12-21 10:02 ` Xavier Leroy @ 2012-12-21 19:03 ` Martin Jambon 0 siblings, 0 replies; 6+ messages in thread From: Martin Jambon @ 2012-12-21 19:03 UTC (permalink / raw) To: Xavier Leroy; +Cc: caml-list On Fri 21 Dec 2012 02:02:38 AM PST, Xavier Leroy wrote: > On 12/20/2012 09:18 PM, Thomas Braibant wrote: > >> [Reducing a hash value to a [0, N) interval ] >> My question is : is there a safe way to bullet-proof my code against >> these pathological cases, without too much hinderance (this is some >> code whose efficiency is critical...) > > Some possibilities to reduce an integer k to the interval [0, N): > > abs (k mod N) > (if N > 0, k mod N is guaranteed to be in (-N,N) ) > > (k land max_int) mod N > ("land max_int" masks the sign bit off and preserves all other > bits, reducing k to [0, max_int]) Is the two's complement representation of integers guaranteed by all OCaml implementations? It would be helpful if the manual was explicit about it. Martin ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-12-21 19:03 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAHR=VkzBhR5FMsWO-_BVzYV4yhSARP8rg7G3b=jezU6OOOg0jQ@mail.gmail.com> 2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant 2012-12-20 20:32 ` [Caml-list] " Thomas Braibant 2012-12-20 21:01 ` Martin Jambon 2012-12-21 7:58 ` [Caml-list] " Jacques Garrigue 2012-12-21 10:02 ` Xavier Leroy 2012-12-21 19:03 ` Martin Jambon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox