* [Caml-list] Specialized dictionaries
@ 2001-11-05 10:06 Marcin 'Qrczak' Kowalczyk
2001-11-05 10:19 ` Xavier Leroy
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 10:06 UTC (permalink / raw)
To: caml-list
I need dictionaries indexed by ints which must be very fast. I'm
afraid that there is an overhead in using Hashtbl.t such that the
generic hash function must recognize that the value is immediate
instead of using it as a hash directly.
Is it worth to do something with it? What to do? I could copy the first
half of hashtbl.ml and replace all occurrences of the function hash by
land'ing with 0x3FFFFFFF (so the value is nonnegative and mod gives
nonnegative results). Any better idea?
--
__("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
\__/
^^
QRCZAK
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk
@ 2001-11-05 10:19 ` Xavier Leroy
2001-11-05 10:32 ` Jean-Christophe Filliatre
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Xavier Leroy @ 2001-11-05 10:19 UTC (permalink / raw)
To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list
> I need dictionaries indexed by ints which must be very fast. I'm
> afraid that there is an overhead in using Hashtbl.t such that the
> generic hash function must recognize that the value is immediate
> instead of using it as a hash directly.
>
> Is it worth to do something with it? What to do? I could copy the first
> half of hashtbl.ml and replace all occurrences of the function hash by
> land'ing with 0x3FFFFFFF (so the value is nonnegative and mod gives
> nonnegative results). Any better idea?
No, no need to copy anything, just unleash the power of functors!
module IntHashtbl = Hashtbl.make(struct type t = int
let equal = (==)
let hash x = x land 0x3FFFFFFF
end)
I'm not sure the performance gain is significant, but it's worth a try.
- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk
2001-11-05 10:19 ` Xavier Leroy
@ 2001-11-05 10:32 ` Jean-Christophe Filliatre
2001-11-05 17:36 ` Florian Hars
[not found] ` <9s6j7c$i6r$1@qrnik.zagroda>
[not found] ` <9s5pe7$5k6$1@qrnik.zagroda>
2001-11-05 23:40 ` Julian Assange
3 siblings, 2 replies; 12+ messages in thread
From: Jean-Christophe Filliatre @ 2001-11-05 10:32 UTC (permalink / raw)
To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list
Marcin 'Qrczak' Kowalczyk writes:
> I need dictionaries indexed by ints which must be very fast. I'm
> afraid that there is an overhead in using Hashtbl.t such that the
> generic hash function must recognize that the value is immediate
> instead of using it as a hash directly.
>
> Is it worth to do something with it? What to do? I could copy the first
> half of hashtbl.ml and replace all occurrences of the function hash by
> land'ing with 0x3FFFFFFF (so the value is nonnegative and mod gives
> nonnegative results). Any better idea?
As suggested by Xavier regarding your other question, you can
instantiate Hashtbl.Make accordingly:
======================================================================
module IntHashtbl = Hashtbl.Make(struct
type t = int
let equal = (==)
let hash n = n land 0x3FFFFFFF
end)
======================================================================
To be even more efficient, I'm afraid you have to follow your idea,
that is to inline this hash function in your own copy of hashtbl.ml.
--
Jean-Christophe Filliatre (http://www.lri.fr/~filliatr)
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
[not found] ` <9s5pe7$5k6$1@qrnik.zagroda>
@ 2001-11-05 11:49 ` Marcin 'Qrczak' Kowalczyk
0 siblings, 0 replies; 12+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 11:49 UTC (permalink / raw)
To: caml-list
Mon, 5 Nov 2001 11:19:56 +0100, Xavier Leroy <xavier.leroy@inria.fr> pisze:
> No, no need to copy anything, just unleash the power of functors!
>
> module IntHashtbl = Hashtbl.make(struct type t = int
> let equal = (==)
> let hash x = x land 0x3FFFFFFF
> end)
Ok, I tried:
Implementation | Test1 | Test2
---------------------------------+-------+-------
Hashtbl.t | 7.40s | 6.45s
Hashtbl.Make(...) | 3.62s | 5.35s
hashtbl.ml specialized for ints | 2.37s | 5.00s
Test1 is a small program which does nothing but dictionary lookups.
Test2 is a real program where I use dictionaries.
It happenens that
let equal = (==)
let equal (x : int) (y : int) = x = y
are fast, where
let equal x y = x = y
(* module constrained by Hashtbl.HashedType with type t = int *)
let equal : int -> int -> bool = (=)
are slow. The compiler doesn't insert the specialized equality version
if it's not immediately applied, or if its type is constrained only
by module signature.
I'm going to use the functorial version: 7% loss of performance
wrt. the specialized version is acceptable.
--
__("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
\__/
^^
QRCZAK
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
2001-11-05 10:32 ` Jean-Christophe Filliatre
@ 2001-11-05 17:36 ` Florian Hars
2001-11-05 17:54 ` Sven
[not found] ` <9s6j7c$i6r$1@qrnik.zagroda>
1 sibling, 1 reply; 12+ messages in thread
From: Florian Hars @ 2001-11-05 17:36 UTC (permalink / raw)
To: Jean-Christophe Filliatre; +Cc: Marcin 'Qrczak' Kowalczyk, caml-list
On Mon, Nov 05, 2001 at 11:32:51AM +0100, Jean-Christophe Filliatre wrote:
> Marcin 'Qrczak' Kowalczyk writes:
> > I need dictionaries indexed by ints which must be very fast.
>
> To be even more efficient, I'm afraid you have to follow your idea,
> that is to inline this hash function in your own copy of hashtbl.ml.
Wouldn't the Patricia Trees (from the
"the-name-of-the-author-currently-escapes-me"-department :-)) mentioned on
http://www.lri.fr/~filliatr/software.en.html be useful in this case (unless
the problem needs the in-place update available with Hashtbl)?
The documentation claims that "The
performances are always better than the standard library's module
[Set], except for linear insertion (building a set by insertion of
consecutive integers)."
Or is Hashtbl faster still?
Yours, Florian Hars.
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
2001-11-05 17:36 ` Florian Hars
@ 2001-11-05 17:54 ` Sven
0 siblings, 0 replies; 12+ messages in thread
From: Sven @ 2001-11-05 17:54 UTC (permalink / raw)
To: Florian Hars
Cc: Jean-Christophe Filliatre, Marcin 'Qrczak' Kowalczyk, caml-list
On Mon, Nov 05, 2001 at 06:36:54PM +0100, Florian Hars wrote:
> On Mon, Nov 05, 2001 at 11:32:51AM +0100, Jean-Christophe Filliatre wrote:
> > Marcin 'Qrczak' Kowalczyk writes:
> > > I need dictionaries indexed by ints which must be very fast.
> >
> > To be even more efficient, I'm afraid you have to follow your idea,
> > that is to inline this hash function in your own copy of hashtbl.ml.
>
> Wouldn't the Patricia Trees (from the
> "the-name-of-the-author-currently-escapes-me"-department :-)) mentioned on
> http://www.lri.fr/~filliatr/software.en.html be useful in this case (unless
> the problem needs the in-place update available with Hashtbl)?
> The documentation claims that "The
> performances are always better than the standard library's module
> [Set], except for linear insertion (building a set by insertion of
> consecutive integers)."
The standard library [Set] is a functional B tree, if i am not wrong, it is
quite fast, but depending on the apps, it will not be faster than the
hashtable, that's why we have the hashables.
Sven Luther
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
[not found] ` <9s6j7c$i6r$1@qrnik.zagroda>
@ 2001-11-05 18:18 ` Marcin 'Qrczak' Kowalczyk
2001-11-05 18:24 ` Nicolas George
[not found] ` <9s6m53$k16$1@qrnik.zagroda>
0 siblings, 2 replies; 12+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 18:18 UTC (permalink / raw)
To: caml-list
Mon, 5 Nov 2001 18:36:54 +0100, Florian Hars <florian@hars.de> pisze:
> Wouldn't the Patricia Trees (from the
> "the-name-of-the-author-currently-escapes-me"-department :-)) mentioned on
> http://www.lri.fr/~filliatr/software.en.html be useful in this case (unless
> the problem needs the in-place update available with Hashtbl)?
Hey, it's faster! One program runs in 4.4s instead of 5.3s. Thanks!
I'm using these dictionaries for dispatching on types in a dynamically
typed language compiled to OCaml. So updates are rare, dictionaries are
small and they contain small integers, but lookups are very frequent.
There are also rarely used dictionaries indexed by pairs of integers
and Hashtbl should be OK for them.
--
__("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
\__/
^^
QRCZAK
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
2001-11-05 18:18 ` Marcin 'Qrczak' Kowalczyk
@ 2001-11-05 18:24 ` Nicolas George
[not found] ` <9s6m53$k16$1@qrnik.zagroda>
1 sibling, 0 replies; 12+ messages in thread
From: Nicolas George @ 2001-11-05 18:24 UTC (permalink / raw)
To: caml-list
Le quintidi 15 brumaire, an CCX, Marcin 'Qrczak' Kowalczyk a écrit :
> So updates are rare, dictionaries are
> small and they contain small integers, but lookups are very frequent.
What about using a simple array for that?
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
[not found] ` <9s6m53$k16$1@qrnik.zagroda>
@ 2001-11-05 20:56 ` Marcin 'Qrczak' Kowalczyk
2001-11-06 6:53 ` Sven
2001-11-06 0:35 ` Marcin 'Qrczak' Kowalczyk
1 sibling, 1 reply; 12+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 20:56 UTC (permalink / raw)
To: caml-list
Mon, 5 Nov 2001 19:24:06 +0100, Nicolas George <nicolas.george@ens.fr> pisze:
>> So updates are rare, dictionaries are
>> small and they contain small integers, but lookups are very frequent.
>
> What about using a simple array for that?
Then usually contain small integers, but theoretically these integers
can go large. If many types are created in a program, then it would
be wasteful to allocate large arrays for each dispatched function
which uses a single type with a large number.
Perhaps some heuristic could use an array for the initial segment
of numbers (which correspond to types created earlier) and another
dictionary for the rest, but it would complicate what is being
done purely for fun and for being simple. More importantly, small
differences such that loading modules in a different order could have
large effects; I don't like treating old types and young types in a
very different way.
I've heard about packing multiple dispatch tables in a large array.
Well, it's complicated, and it's hard to perform dynamic updates if
slots are used by different functions. Updates are rare but they do
occur - for example if a dispatched function is used at a type for
the first time and the implementation was found at its supertype.
I don't know...
--
__("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
\__/
^^
QRCZAK
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk
` (2 preceding siblings ...)
[not found] ` <9s5pe7$5k6$1@qrnik.zagroda>
@ 2001-11-05 23:40 ` Julian Assange
3 siblings, 0 replies; 12+ messages in thread
From: Julian Assange @ 2001-11-05 23:40 UTC (permalink / raw)
To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list
Do these dictionaries change? If not you could consider searching for
a perfect hash algorithm.
--
Julian Assange |If you want to build a ship, don't drum up people
|together to collect wood or assign them tasks and
proff@iq.org |work, but rather teach them to long for the endless
proff@gnu.ai.mit.edu |immensity of the sea. -- Antoine de Saint Exupery
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
[not found] ` <9s6m53$k16$1@qrnik.zagroda>
2001-11-05 20:56 ` Marcin 'Qrczak' Kowalczyk
@ 2001-11-06 0:35 ` Marcin 'Qrczak' Kowalczyk
1 sibling, 0 replies; 12+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2001-11-06 0:35 UTC (permalink / raw)
To: caml-list
Mon, 5 Nov 2001 19:24:06 +0100, Nicolas George <nicolas.george@ens.fr> pisze:
>> So updates are rare, dictionaries are
>> small and they contain small integers, but lookups are very frequent.
>
> What about using a simple array for that?
I tested how fast is 'a option array, allocated big enough for the
test and with bounds checking disabled. Surprisingly it's only 5%
faster than the Patricia tree, measuring the whole program which does
many lookups but also other things like computing Fibonacci numbers.
The difference would be obviously larger if actual dictionaries had
more entries (the ones I used happened to have 10, 2 and 3), but now
I feel that this part is optimized enough.
Here is the mutable version of Ptmap I'm using:
module Typetbl =
struct
type 'a t = 'a Ptmap.t ref
let create _ = ref Ptmap.empty
let add dict k v = dict := Ptmap.add k v (!dict)
let find dict k = Ptmap.find k (!dict)
let mem dict k = Ptmap.mem k (!dict)
let replace dict k v = dict := Ptmap.add k v (!dict)
let clear dict = dict := Ptmap.empty
end
And here is the quick & dirty array wrapper:
module Typetbl =
struct
type 'a t = 'a option array
let create _ = Array.make 100 None
let add dict k v = dict.(k) <- Some v
let find dict k = match dict.(k) with
| Some v -> v
| None -> raise Not_found
let mem dict k = match dict.(k) with
| Some _ -> true
| None -> false
let replace dict k v = dict.(k) <- Some v
let clear dict = for i = 0 to 99 do dict.(i) <- None done
end
--
__("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
\__/
^^
QRCZAK
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Caml-list] Specialized dictionaries
2001-11-05 20:56 ` Marcin 'Qrczak' Kowalczyk
@ 2001-11-06 6:53 ` Sven
0 siblings, 0 replies; 12+ messages in thread
From: Sven @ 2001-11-06 6:53 UTC (permalink / raw)
To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list
On Mon, Nov 05, 2001 at 08:56:40PM +0000, Marcin 'Qrczak' Kowalczyk wrote:
> Mon, 5 Nov 2001 19:24:06 +0100, Nicolas George <nicolas.george@ens.fr> pisze:
>
> >> So updates are rare, dictionaries are
> >> small and they contain small integers, but lookups are very frequent.
> >
> > What about using a simple array for that?
>
> Then usually contain small integers, but theoretically these integers
> can go large. If many types are created in a program, then it would
> be wasteful to allocate large arrays for each dispatched function
> which uses a single type with a large number.
>
> Perhaps some heuristic could use an array for the initial segment
> of numbers (which correspond to types created earlier) and another
> dictionary for the rest, but it would complicate what is being
> done purely for fun and for being simple. More importantly, small
> differences such that loading modules in a different order could have
> large effects; I don't like treating old types and young types in a
> very different way.
>
> I've heard about packing multiple dispatch tables in a large array.
> Well, it's complicated, and it's hard to perform dynamic updates if
> slots are used by different functions. Updates are rare but they do
> occur - for example if a dispatched function is used at a type for
> the first time and the implementation was found at its supertype.
What about using a datatype with several arrays, using a maximum number of
entries per array or something like that, and then having a serie of such
arrayys, or an array of arrays. You would just need to make a division and a
modulo operation to get the right array and get the value, if you take the
rigth max number, you could even get away with only bit shifts, which is not
so expensive and two indirections instead of one.
If you do it right, you could even have the datatype grow incrementally based
on your needs. That will work only if you numbers are contigous though.
That said, i had the impression that, as ocaml is optimized for functional
datatypes, it will be more freindly to the GC that you use a functional
datatype, and thus faster maybe.
Friendly,
Sven Luther
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2001-11-06 11:46 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk
2001-11-05 10:19 ` Xavier Leroy
2001-11-05 10:32 ` Jean-Christophe Filliatre
2001-11-05 17:36 ` Florian Hars
2001-11-05 17:54 ` Sven
[not found] ` <9s6j7c$i6r$1@qrnik.zagroda>
2001-11-05 18:18 ` Marcin 'Qrczak' Kowalczyk
2001-11-05 18:24 ` Nicolas George
[not found] ` <9s6m53$k16$1@qrnik.zagroda>
2001-11-05 20:56 ` Marcin 'Qrczak' Kowalczyk
2001-11-06 6:53 ` Sven
2001-11-06 0:35 ` Marcin 'Qrczak' Kowalczyk
[not found] ` <9s5pe7$5k6$1@qrnik.zagroda>
2001-11-05 11:49 ` Marcin 'Qrczak' Kowalczyk
2001-11-05 23:40 ` Julian Assange
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox