* [Caml-list] Hashtbl.hash and Hashtbl.hash_param @ 2002-08-23 15:24 sebastien FURIC 2002-08-23 15:45 ` sebastien FURIC 2002-08-27 8:24 ` Xavier Leroy 0 siblings, 2 replies; 7+ messages in thread From: sebastien FURIC @ 2002-08-23 15:24 UTC (permalink / raw) To: OCaml Mailing list What kind of algorithm is used to compute the hash code of objects in O'Caml ? Hashtbl.hash (List.map (fun x -> Random.int 100) [1;2;3;4;5;6;7;8;9;10]);; always returns 0 (Hashtbl.hash_param has the same properties) which is a poor result ! What can I do if I don't know the size of the list in advance ? Cheers, Sebastien. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param 2002-08-23 15:24 [Caml-list] Hashtbl.hash and Hashtbl.hash_param sebastien FURIC @ 2002-08-23 15:45 ` sebastien FURIC 2002-08-23 16:27 ` Florian Douetteau 2002-08-27 8:24 ` Xavier Leroy 1 sibling, 1 reply; 7+ messages in thread From: sebastien FURIC @ 2002-08-23 15:45 UTC (permalink / raw) To: OCaml Mailing list sebastien FURIC a écrit : > > What kind of algorithm is used to compute the hash code of objects in > O'Caml ? > > Hashtbl.hash (List.map (fun x -> Random.int 100) > [1;2;3;4;5;6;7;8;9;10]);; > always returns 0 (Hashtbl.hash_param has the same properties) which is > a poor result ! > > What can I do if I don't know the size of the list in advance ? Of course, I know I could use functors to specify my own hash function. While prototyping I just would prefer an acceptable Hashtbl.hash function to avoid writing "boring" code. Sebastien. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param 2002-08-23 15:45 ` sebastien FURIC @ 2002-08-23 16:27 ` Florian Douetteau 0 siblings, 0 replies; 7+ messages in thread From: Florian Douetteau @ 2002-08-23 16:27 UTC (permalink / raw) To: sebastien FURIC; +Cc: OCaml Mailing list On Fri, 23 Aug 2002, sebastien FURIC wrote: After a quick inspection in the source byterun/hash.c, it appears that hash_param C implementation decrements a C static variable 'counter' whenever an object is considered and returns 0 whenever this counter reach 0 (that's a way to handle cycles). As a consequence (..details omitted..), in the case of the list type, if the length of your list is greater that the initial value of the counter, hash will always return 0. A quick fix is to use: hash_param <number_bigger_than_the_length_of_any_list> 100 instead of hash (== hash_param 10 100) By the way, beware that hash_param is not thread-safe, since it uses a C global variable. -- Florian Douetteau > > > sebastien FURIC a écrit : > > > > What kind of algorithm is used to compute the hash code of objects in > > O'Caml ? > > > > Hashtbl.hash (List.map (fun x -> Random.int 100) > > [1;2;3;4;5;6;7;8;9;10]);; > > always returns 0 (Hashtbl.hash_param has the same properties) which is > > a poor result ! > > > > What can I do if I don't know the size of the list in advance ? > > Of course, I know I could use functors to specify my own hash function. > While prototyping I just would prefer an acceptable Hashtbl.hash > function to avoid writing "boring" code. > > Sebastien. > ------------------- > To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr > Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param 2002-08-23 15:24 [Caml-list] Hashtbl.hash and Hashtbl.hash_param sebastien FURIC 2002-08-23 15:45 ` sebastien FURIC @ 2002-08-27 8:24 ` Xavier Leroy 2002-08-27 9:59 ` jeanmarc.eber 2002-08-27 16:12 ` Blair Zajac 1 sibling, 2 replies; 7+ messages in thread From: Xavier Leroy @ 2002-08-27 8:24 UTC (permalink / raw) To: sebastien FURIC; +Cc: caml-list > What kind of algorithm is used to compute the hash code of objects in > O'Caml ? > > Hashtbl.hash (List.map (fun x -> Random.int 100) > [1;2;3;4;5;6;7;8;9;10]);; > always returns 0 (Hashtbl.hash_param has the same properties) which is > a poor result ! Yes, this is disappointing. To understand what's going on, here is how "Hashtbl.hash_param v count limit" works: - v is traversed, depth-first. - "Interesting" information found at each node is hashed, e.g. string node -> hash code of string integer -> the integer itself constructor block -> integer tag of constructor (Some nodes have no interesting information, e.g. certain custom blocks.) - The hash values for each node are combined with a simple linear congruence. Moreover, to prevent infinite descent in cyclic values, and ensure that hashing doesn't take too long, the traversal is stopped when either - "count" interesting nodes were found, or - "limit" nodes (interesting or not) were traversed. Now, for your example [1;2;3;4;5;6;7;8;9;10], the interesting nodes and their associated hash values are - the integers 1 to 10, with same hash values; - and the 10 occurrences of the "::" constructor, which correspond to 0-tagged blocks, with hash values 0. The fly in the ointment is that the traversal is done right-to-left, hence the hash values of interest are encountered in the following order: 0 ......... 0 10 9 8 7 6 5 4 3 2 1 ------------- -------------------- the :: cells the list contents Hence, with count = 10, the traversal stops at the cons cells, and doesn't even look at the list contents! Result: a 0 hash value. There are several ways to remedy this behavior, such as ignoring zero-tagged blocks, or doing breadth-first traversal. However, we need to think twice before changing the hashing function, because this would cause trouble to users that store hashtables in files using output_value/input_value: if the hash function changes before writing and reading, the hashtable read becomes unusable. Hence, a request for OCaml users: if you use hashtables whose keys are structured data (not just strings or integers), *and* your program stores hashtables to files, *and* it's important for you that these persistent hashtables can be read back with future versions of OCaml, then please drop me a line. - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param 2002-08-27 8:24 ` Xavier Leroy @ 2002-08-27 9:59 ` jeanmarc.eber 2002-08-27 10:58 ` Alain Frisch 2002-08-27 16:12 ` Blair Zajac 1 sibling, 1 reply; 7+ messages in thread From: jeanmarc.eber @ 2002-08-27 9:59 UTC (permalink / raw) To: Xavier Leroy; +Cc: sebastien FURIC, caml-list Quoting Xavier Leroy <xavier.leroy@inria.fr>: > > However, we need to think twice before changing the hashing function, > because this would cause trouble to users that store hashtables in > files using output_value/input_value: if the hash function changes > before writing and reading, the hashtable read becomes unusable. > > Hence, a request for OCaml users: if you use hashtables whose keys are > structured data (not just strings or integers), *and* your program > stores hashtables to files, *and* it's important for you that these > persistent hashtables can be read back with future versions of OCaml, > then please drop me a line. > That is an important point that should, I think, at least be clearly said. Personally, I always thought that values written with output_value (more generally marshaled ocaml values) were only guaranteed to be compatible for a given version of ocaml. I never considered output_value as a "long term" storing solution, but only a "short term" one (good example: *.cmo ans *.cmi files generated by the ocaml compiler), not to speak about calculated hash values... Personally, I *want* the ocaml team to be able to change internal representation, optimize hash functions etc in the hope that this produces an even better system! (BTW, I may be wrong, but didnt some tags change between 3.04 and 3.05, but maybe this didnt change marshaled values ?) More generally, the concept and importance of 100% backward compatibility should be discussed. I can not hope for 100% backward compatibility and hope for big progresses on the ocaml compiler... no ? If people really want 100% compatibilty, they should stay with an ocaml version. Conclusion: personally, I dont want progress of the compiler made difficult by a 100% backward compatibility "religion". What do other users of ocaml think about it? (I agree that this is of course a question that is as old as the existence of computer languages: its more a question about what stage of development we think ocaml has reached now) Jean-Marc Eber LexiFi ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param 2002-08-27 9:59 ` jeanmarc.eber @ 2002-08-27 10:58 ` Alain Frisch 0 siblings, 0 replies; 7+ messages in thread From: Alain Frisch @ 2002-08-27 10:58 UTC (permalink / raw) To: jeanmarc.eber; +Cc: Caml list On Tue, 27 Aug 2002 jeanmarc.eber@lexifi.com wrote: > Conclusion: personally, I dont want progress of the compiler made difficult by > a 100% backward compatibility "religion". What do other users of ocaml think > about it? I strongly agree with you, especially concerning the situation at issue (improving Hashtbl.hash). You're right that it is explicitely stated in the manual, for Marshal: << The format for the byte sequences is compatible across all machines for a given version of Objective Caml. >> -- Alain ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param 2002-08-27 8:24 ` Xavier Leroy 2002-08-27 9:59 ` jeanmarc.eber @ 2002-08-27 16:12 ` Blair Zajac 1 sibling, 0 replies; 7+ messages in thread From: Blair Zajac @ 2002-08-27 16:12 UTC (permalink / raw) To: Xavier Leroy; +Cc: sebastien FURIC, caml-list Xavier Leroy wrote: > > However, we need to think twice before changing the hashing function, > because this would cause trouble to users that store hashtables in > files using output_value/input_value: if the hash function changes > before writing and reading, the hashtable read becomes unusable. > > Hence, a request for OCaml users: if you use hashtables whose keys are > structured data (not just strings or integers), *and* your program > stores hashtables to files, *and* it's important for you that these > persistent hashtables can be read back with future versions of OCaml, > then please drop me a line. I agree with the sentiment that OCaml should not be held up by backwards compatibility. But I think adding a version header to the files that indicates the particular hashing function used would allow newer OCaml's to read older files. The newer OCaml's would only write the new version of the files. This would require that OCaml binary keep the old hash functions around, but this could be made a configure option to enable this feature. I'm reminded of Perl's Storable module which serves much the same function and had a method to read old Storable files but only write new versioned files. Best, Blair -- Blair Zajac <blair@orcaware.com> Web and OS performance plots - http://www.orcaware.com/orca/ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2002-08-27 16:12 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-08-23 15:24 [Caml-list] Hashtbl.hash and Hashtbl.hash_param sebastien FURIC 2002-08-23 15:45 ` sebastien FURIC 2002-08-23 16:27 ` Florian Douetteau 2002-08-27 8:24 ` Xavier Leroy 2002-08-27 9:59 ` jeanmarc.eber 2002-08-27 10:58 ` Alain Frisch 2002-08-27 16:12 ` Blair Zajac
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox