* [Caml-list] Specialized dictionaries @ 2001-11-05 10:06 Marcin 'Qrczak' Kowalczyk 2001-11-05 10:19 ` Xavier Leroy ` (3 more replies) 0 siblings, 4 replies; 12+ messages in thread From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 10:06 UTC (permalink / raw) To: caml-list I need dictionaries indexed by ints which must be very fast. I'm afraid that there is an overhead in using Hashtbl.t such that the generic hash function must recognize that the value is immediate instead of using it as a hash directly. Is it worth to do something with it? What to do? I could copy the first half of hashtbl.ml and replace all occurrences of the function hash by land'ing with 0x3FFFFFFF (so the value is nonnegative and mod gives nonnegative results). Any better idea? -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ QRCZAK ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries 2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk @ 2001-11-05 10:19 ` Xavier Leroy 2001-11-05 10:32 ` Jean-Christophe Filliatre ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Xavier Leroy @ 2001-11-05 10:19 UTC (permalink / raw) To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list > I need dictionaries indexed by ints which must be very fast. I'm > afraid that there is an overhead in using Hashtbl.t such that the > generic hash function must recognize that the value is immediate > instead of using it as a hash directly. > > Is it worth to do something with it? What to do? I could copy the first > half of hashtbl.ml and replace all occurrences of the function hash by > land'ing with 0x3FFFFFFF (so the value is nonnegative and mod gives > nonnegative results). Any better idea? No, no need to copy anything, just unleash the power of functors! module IntHashtbl = Hashtbl.make(struct type t = int let equal = (==) let hash x = x land 0x3FFFFFFF end) I'm not sure the performance gain is significant, but it's worth a try. - Xavier Leroy ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries 2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk 2001-11-05 10:19 ` Xavier Leroy @ 2001-11-05 10:32 ` Jean-Christophe Filliatre 2001-11-05 17:36 ` Florian Hars [not found] ` <9s6j7c$i6r$1@qrnik.zagroda> [not found] ` <9s5pe7$5k6$1@qrnik.zagroda> 2001-11-05 23:40 ` Julian Assange 3 siblings, 2 replies; 12+ messages in thread From: Jean-Christophe Filliatre @ 2001-11-05 10:32 UTC (permalink / raw) To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list Marcin 'Qrczak' Kowalczyk writes: > I need dictionaries indexed by ints which must be very fast. I'm > afraid that there is an overhead in using Hashtbl.t such that the > generic hash function must recognize that the value is immediate > instead of using it as a hash directly. > > Is it worth to do something with it? What to do? I could copy the first > half of hashtbl.ml and replace all occurrences of the function hash by > land'ing with 0x3FFFFFFF (so the value is nonnegative and mod gives > nonnegative results). Any better idea? As suggested by Xavier regarding your other question, you can instantiate Hashtbl.Make accordingly: ====================================================================== module IntHashtbl = Hashtbl.Make(struct type t = int let equal = (==) let hash n = n land 0x3FFFFFFF end) ====================================================================== To be even more efficient, I'm afraid you have to follow your idea, that is to inline this hash function in your own copy of hashtbl.ml. -- Jean-Christophe Filliatre (http://www.lri.fr/~filliatr) ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries 2001-11-05 10:32 ` Jean-Christophe Filliatre @ 2001-11-05 17:36 ` Florian Hars 2001-11-05 17:54 ` Sven [not found] ` <9s6j7c$i6r$1@qrnik.zagroda> 1 sibling, 1 reply; 12+ messages in thread From: Florian Hars @ 2001-11-05 17:36 UTC (permalink / raw) To: Jean-Christophe Filliatre; +Cc: Marcin 'Qrczak' Kowalczyk, caml-list On Mon, Nov 05, 2001 at 11:32:51AM +0100, Jean-Christophe Filliatre wrote: > Marcin 'Qrczak' Kowalczyk writes: > > I need dictionaries indexed by ints which must be very fast. > > To be even more efficient, I'm afraid you have to follow your idea, > that is to inline this hash function in your own copy of hashtbl.ml. Wouldn't the Patricia Trees (from the "the-name-of-the-author-currently-escapes-me"-department :-)) mentioned on http://www.lri.fr/~filliatr/software.en.html be useful in this case (unless the problem needs the in-place update available with Hashtbl)? The documentation claims that "The performances are always better than the standard library's module [Set], except for linear insertion (building a set by insertion of consecutive integers)." Or is Hashtbl faster still? Yours, Florian Hars. ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries 2001-11-05 17:36 ` Florian Hars @ 2001-11-05 17:54 ` Sven 0 siblings, 0 replies; 12+ messages in thread From: Sven @ 2001-11-05 17:54 UTC (permalink / raw) To: Florian Hars Cc: Jean-Christophe Filliatre, Marcin 'Qrczak' Kowalczyk, caml-list On Mon, Nov 05, 2001 at 06:36:54PM +0100, Florian Hars wrote: > On Mon, Nov 05, 2001 at 11:32:51AM +0100, Jean-Christophe Filliatre wrote: > > Marcin 'Qrczak' Kowalczyk writes: > > > I need dictionaries indexed by ints which must be very fast. > > > > To be even more efficient, I'm afraid you have to follow your idea, > > that is to inline this hash function in your own copy of hashtbl.ml. > > Wouldn't the Patricia Trees (from the > "the-name-of-the-author-currently-escapes-me"-department :-)) mentioned on > http://www.lri.fr/~filliatr/software.en.html be useful in this case (unless > the problem needs the in-place update available with Hashtbl)? > The documentation claims that "The > performances are always better than the standard library's module > [Set], except for linear insertion (building a set by insertion of > consecutive integers)." The standard library [Set] is a functional B tree, if i am not wrong, it is quite fast, but depending on the apps, it will not be faster than the hashtable, that's why we have the hashables. Sven Luther ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <9s6j7c$i6r$1@qrnik.zagroda>]
* Re: [Caml-list] Specialized dictionaries [not found] ` <9s6j7c$i6r$1@qrnik.zagroda> @ 2001-11-05 18:18 ` Marcin 'Qrczak' Kowalczyk 2001-11-05 18:24 ` Nicolas George [not found] ` <9s6m53$k16$1@qrnik.zagroda> 0 siblings, 2 replies; 12+ messages in thread From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 18:18 UTC (permalink / raw) To: caml-list Mon, 5 Nov 2001 18:36:54 +0100, Florian Hars <florian@hars.de> pisze: > Wouldn't the Patricia Trees (from the > "the-name-of-the-author-currently-escapes-me"-department :-)) mentioned on > http://www.lri.fr/~filliatr/software.en.html be useful in this case (unless > the problem needs the in-place update available with Hashtbl)? Hey, it's faster! One program runs in 4.4s instead of 5.3s. Thanks! I'm using these dictionaries for dispatching on types in a dynamically typed language compiled to OCaml. So updates are rare, dictionaries are small and they contain small integers, but lookups are very frequent. There are also rarely used dictionaries indexed by pairs of integers and Hashtbl should be OK for them. -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ QRCZAK ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries 2001-11-05 18:18 ` Marcin 'Qrczak' Kowalczyk @ 2001-11-05 18:24 ` Nicolas George [not found] ` <9s6m53$k16$1@qrnik.zagroda> 1 sibling, 0 replies; 12+ messages in thread From: Nicolas George @ 2001-11-05 18:24 UTC (permalink / raw) To: caml-list Le quintidi 15 brumaire, an CCX, Marcin 'Qrczak' Kowalczyk a écrit : > So updates are rare, dictionaries are > small and they contain small integers, but lookups are very frequent. What about using a simple array for that? ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <9s6m53$k16$1@qrnik.zagroda>]
* Re: [Caml-list] Specialized dictionaries [not found] ` <9s6m53$k16$1@qrnik.zagroda> @ 2001-11-05 20:56 ` Marcin 'Qrczak' Kowalczyk 2001-11-06 6:53 ` Sven 2001-11-06 0:35 ` Marcin 'Qrczak' Kowalczyk 1 sibling, 1 reply; 12+ messages in thread From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 20:56 UTC (permalink / raw) To: caml-list Mon, 5 Nov 2001 19:24:06 +0100, Nicolas George <nicolas.george@ens.fr> pisze: >> So updates are rare, dictionaries are >> small and they contain small integers, but lookups are very frequent. > > What about using a simple array for that? Then usually contain small integers, but theoretically these integers can go large. If many types are created in a program, then it would be wasteful to allocate large arrays for each dispatched function which uses a single type with a large number. Perhaps some heuristic could use an array for the initial segment of numbers (which correspond to types created earlier) and another dictionary for the rest, but it would complicate what is being done purely for fun and for being simple. More importantly, small differences such that loading modules in a different order could have large effects; I don't like treating old types and young types in a very different way. I've heard about packing multiple dispatch tables in a large array. Well, it's complicated, and it's hard to perform dynamic updates if slots are used by different functions. Updates are rare but they do occur - for example if a dispatched function is used at a type for the first time and the implementation was found at its supertype. I don't know... -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ QRCZAK ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries 2001-11-05 20:56 ` Marcin 'Qrczak' Kowalczyk @ 2001-11-06 6:53 ` Sven 0 siblings, 0 replies; 12+ messages in thread From: Sven @ 2001-11-06 6:53 UTC (permalink / raw) To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list On Mon, Nov 05, 2001 at 08:56:40PM +0000, Marcin 'Qrczak' Kowalczyk wrote: > Mon, 5 Nov 2001 19:24:06 +0100, Nicolas George <nicolas.george@ens.fr> pisze: > > >> So updates are rare, dictionaries are > >> small and they contain small integers, but lookups are very frequent. > > > > What about using a simple array for that? > > Then usually contain small integers, but theoretically these integers > can go large. If many types are created in a program, then it would > be wasteful to allocate large arrays for each dispatched function > which uses a single type with a large number. > > Perhaps some heuristic could use an array for the initial segment > of numbers (which correspond to types created earlier) and another > dictionary for the rest, but it would complicate what is being > done purely for fun and for being simple. More importantly, small > differences such that loading modules in a different order could have > large effects; I don't like treating old types and young types in a > very different way. > > I've heard about packing multiple dispatch tables in a large array. > Well, it's complicated, and it's hard to perform dynamic updates if > slots are used by different functions. Updates are rare but they do > occur - for example if a dispatched function is used at a type for > the first time and the implementation was found at its supertype. What about using a datatype with several arrays, using a maximum number of entries per array or something like that, and then having a serie of such arrayys, or an array of arrays. You would just need to make a division and a modulo operation to get the right array and get the value, if you take the rigth max number, you could even get away with only bit shifts, which is not so expensive and two indirections instead of one. If you do it right, you could even have the datatype grow incrementally based on your needs. That will work only if you numbers are contigous though. That said, i had the impression that, as ocaml is optimized for functional datatypes, it will be more freindly to the GC that you use a functional datatype, and thus faster maybe. Friendly, Sven Luther ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries [not found] ` <9s6m53$k16$1@qrnik.zagroda> 2001-11-05 20:56 ` Marcin 'Qrczak' Kowalczyk @ 2001-11-06 0:35 ` Marcin 'Qrczak' Kowalczyk 1 sibling, 0 replies; 12+ messages in thread From: Marcin 'Qrczak' Kowalczyk @ 2001-11-06 0:35 UTC (permalink / raw) To: caml-list Mon, 5 Nov 2001 19:24:06 +0100, Nicolas George <nicolas.george@ens.fr> pisze: >> So updates are rare, dictionaries are >> small and they contain small integers, but lookups are very frequent. > > What about using a simple array for that? I tested how fast is 'a option array, allocated big enough for the test and with bounds checking disabled. Surprisingly it's only 5% faster than the Patricia tree, measuring the whole program which does many lookups but also other things like computing Fibonacci numbers. The difference would be obviously larger if actual dictionaries had more entries (the ones I used happened to have 10, 2 and 3), but now I feel that this part is optimized enough. Here is the mutable version of Ptmap I'm using: module Typetbl = struct type 'a t = 'a Ptmap.t ref let create _ = ref Ptmap.empty let add dict k v = dict := Ptmap.add k v (!dict) let find dict k = Ptmap.find k (!dict) let mem dict k = Ptmap.mem k (!dict) let replace dict k v = dict := Ptmap.add k v (!dict) let clear dict = dict := Ptmap.empty end And here is the quick & dirty array wrapper: module Typetbl = struct type 'a t = 'a option array let create _ = Array.make 100 None let add dict k v = dict.(k) <- Some v let find dict k = match dict.(k) with | Some v -> v | None -> raise Not_found let mem dict k = match dict.(k) with | Some _ -> true | None -> false let replace dict k v = dict.(k) <- Some v let clear dict = for i = 0 to 99 do dict.(i) <- None done end -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ QRCZAK ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <9s5pe7$5k6$1@qrnik.zagroda>]
* Re: [Caml-list] Specialized dictionaries [not found] ` <9s5pe7$5k6$1@qrnik.zagroda> @ 2001-11-05 11:49 ` Marcin 'Qrczak' Kowalczyk 0 siblings, 0 replies; 12+ messages in thread From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 11:49 UTC (permalink / raw) To: caml-list Mon, 5 Nov 2001 11:19:56 +0100, Xavier Leroy <xavier.leroy@inria.fr> pisze: > No, no need to copy anything, just unleash the power of functors! > > module IntHashtbl = Hashtbl.make(struct type t = int > let equal = (==) > let hash x = x land 0x3FFFFFFF > end) Ok, I tried: Implementation | Test1 | Test2 ---------------------------------+-------+------- Hashtbl.t | 7.40s | 6.45s Hashtbl.Make(...) | 3.62s | 5.35s hashtbl.ml specialized for ints | 2.37s | 5.00s Test1 is a small program which does nothing but dictionary lookups. Test2 is a real program where I use dictionaries. It happenens that let equal = (==) let equal (x : int) (y : int) = x = y are fast, where let equal x y = x = y (* module constrained by Hashtbl.HashedType with type t = int *) let equal : int -> int -> bool = (=) are slow. The compiler doesn't insert the specialized equality version if it's not immediately applied, or if its type is constrained only by module signature. I'm going to use the functorial version: 7% loss of performance wrt. the specialized version is acceptable. -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ QRCZAK ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries 2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk ` (2 preceding siblings ...) [not found] ` <9s5pe7$5k6$1@qrnik.zagroda> @ 2001-11-05 23:40 ` Julian Assange 3 siblings, 0 replies; 12+ messages in thread From: Julian Assange @ 2001-11-05 23:40 UTC (permalink / raw) To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list Do these dictionaries change? If not you could consider searching for a perfect hash algorithm. -- Julian Assange |If you want to build a ship, don't drum up people |together to collect wood or assign them tasks and proff@iq.org |work, but rather teach them to long for the endless proff@gnu.ai.mit.edu |immensity of the sea. -- Antoine de Saint Exupery ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2001-11-06 11:46 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk 2001-11-05 10:19 ` Xavier Leroy 2001-11-05 10:32 ` Jean-Christophe Filliatre 2001-11-05 17:36 ` Florian Hars 2001-11-05 17:54 ` Sven [not found] ` <9s6j7c$i6r$1@qrnik.zagroda> 2001-11-05 18:18 ` Marcin 'Qrczak' Kowalczyk 2001-11-05 18:24 ` Nicolas George [not found] ` <9s6m53$k16$1@qrnik.zagroda> 2001-11-05 20:56 ` Marcin 'Qrczak' Kowalczyk 2001-11-06 6:53 ` Sven 2001-11-06 0:35 ` Marcin 'Qrczak' Kowalczyk [not found] ` <9s5pe7$5k6$1@qrnik.zagroda> 2001-11-05 11:49 ` Marcin 'Qrczak' Kowalczyk 2001-11-05 23:40 ` Julian Assange
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox