From: "Harrison, John R" <john.r.harrison@intel.com>
To: Berke Durak <berke.durak@exalead.com>
Cc: Caml-list List <caml-list@inria.fr>
Subject: RE: [Caml-list] Canonical Set/Map datastructure?
Date: Fri, 7 Mar 2008 10:13:32 -0700 [thread overview]
Message-ID: <DCC19446A892D84FBB89AE7C94F0C04E01D99539DC@azsmsx501.amr.corp.intel.com> (raw)
In-Reply-To: <47D11442.6090409@exalead.com>
[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]
Hi Berke,
| Thanks for your code. I'm sorry I thought you maintained a separate
| hash table. It's very interesting code and I'll give it a try.
It would in fact have been more efficient to use Jean-Christophe
Filliatre's implementation of Patricia trees instead of writing my
own. However, it was important to me to release the code under a
BSD-like license. For what it's worth, I attach a version of my code
that's extracted from its context so it can be used independently. (I
replaced a few of my pet functions with standard library ones, and I
hope I didn't screw anything up in the process.)
| - The theoretical worst-case performance of hash-based data structures
| can be attained if the hash function has bad dispersal. As the code
| relies on Hashtbl.hash, which does not hash deeply, this could
| potentially be a proble, in particular if your keys have long "common
| prefixes" that are not distinguished by the hash function. It might
| work well in practice but I feel a little uncomfortable.
Yes, sure. My applications are mainly in theorem proving and symbolic
computation where this isn't an issue, and I can imagine it might not
be suitable elsewhere.
| - Also, it does not preserve the natural order for keys, and that
| is particularly bad, because I often use, for instance,
| float-indexed maps or sets as a substitute for heaps.
When I was looking at this area last time (maybe just following the
references from the paper I cited) I came across a kind of mixed
heap/tree structure with distinct "horizontal" and "vertical"
orderings. That might be something to consider. But once again there
is a bad worst-case performance if the two orders happen to be
correlated.
| The paper with the inverse cubic lower bound is *very* interesting; we
| don't see plausible lower bounds often in complexity theory,
| especially with such large assumptions (just bounded out-degree for
| the graph nodes).
|
| So it seems there is little hope to have a drop-in replacement for Set
| or Map that is compatible with the natural order of things, a.k.a.,
| Pervasives.compare.
Yes, I really found it striking that in a fundamental sense, you need
to pay the price of noncanonicality in order to get the guaranteed
O(log n) lookup.
John.
[-- Attachment #2: map.ml --]
[-- Type: application/octet-stream, Size: 10766 bytes --]
(* ------------------------------------------------------------------------- *)
(* Polymorphic finite partial functions via Patricia trees. *)
(* *)
(* The point of this strange representation is that it is canonical (equal *)
(* functions have the same encoding) yet reasonably efficient on average. *)
(* *)
(* (c) John Harrison 2003. *)
(* *)
(* See http://www.cl.cam.ac.uk/~jrh13/atp/OCaml/LICENSE.txt *)
(* *)
(* Idea due to Diego Olivier Fernandez Pons (OCaml list, 2003/11/10). *)
(* ------------------------------------------------------------------------- *)
type ('a,'b)func =
Empty
| Leaf of int * ('a*'b)list
| Branch of int * int * ('a,'b)func * ('a,'b)func;;
(* ------------------------------------------------------------------------- *)
(* Undefined function. *)
(* ------------------------------------------------------------------------- *)
let undefined = Empty;;
(* ------------------------------------------------------------------------- *)
(* In case of equality comparison worries, better use this. *)
(* ------------------------------------------------------------------------- *)
let is_undefined f =
match f with
Empty -> true
| _ -> false;;
(* ------------------------------------------------------------------------- *)
(* Operation analogous to "map" for lists. *)
(* ------------------------------------------------------------------------- *)
let mapf =
let rec map_list f l =
match l with
[] -> []
| (x,y)::t -> (x,f(y))::(map_list f t) in
let rec mapf f t =
match t with
Empty -> Empty
| Leaf(h,l) -> Leaf(h,map_list f l)
| Branch(p,b,l,r) -> Branch(p,b,mapf f l,mapf f r) in
mapf;;
(* ------------------------------------------------------------------------- *)
(* Operations analogous to "fold" for lists. *)
(* ------------------------------------------------------------------------- *)
let foldl =
let rec foldl_list f a l =
match l with
[] -> a
| (x,y)::t -> foldl_list f (f a x y) t in
let rec foldl f a t =
match t with
Empty -> a
| Leaf(h,l) -> foldl_list f a l
| Branch(p,b,l,r) -> foldl f (foldl f a l) r in
foldl;;
let foldr =
let rec foldr_list f l a =
match l with
[] -> a
| (x,y)::t -> f x y (foldr_list f t a) in
let rec foldr f t a =
match t with
Empty -> a
| Leaf(h,l) -> foldr_list f l a
| Branch(p,b,l,r) -> foldr f l (foldr f r a) in
foldr;;
(* ------------------------------------------------------------------------- *)
(* Application. *)
(* ------------------------------------------------------------------------- *)
let applyd =
let rec apply_listd l d x =
match l with
(a,b)::t -> let c = Pervasives.compare x a in
if c = 0 then b else if c > 0 then apply_listd t d x else d x
| [] -> d x in
fun f d x ->
let k = Hashtbl.hash x in
let rec look t =
match t with
Leaf(h,l) when h = k -> apply_listd l d x
| Branch(p,b,l,r) when (k lxor p) land (b - 1) = 0
-> look (if k land b = 0 then l else r)
| _ -> d x in
look f;;
let apply f = applyd f (fun x -> failwith "apply");;
let tryapplyd f a d = applyd f (fun x -> d) a;;
let tryapplyl f x = tryapplyd f x [];;
let defined f x = try apply f x; true with Failure _ -> false;;
(* ------------------------------------------------------------------------- *)
(* Undefinition. *)
(* ------------------------------------------------------------------------- *)
let undefine =
let rec undefine_list x l =
match l with
(a,b as ab)::t ->
let c = Pervasives.compare x a in
if c = 0 then t
else if c < 0 then l else
let t' = undefine_list x t in
if t' == t then l else ab::t'
| [] -> [] in
fun x ->
let k = Hashtbl.hash x in
let rec und t =
match t with
Leaf(h,l) when h = k ->
let l' = undefine_list x l in
if l' == l then t
else if l' = [] then Empty
else Leaf(h,l')
| Branch(p,b,l,r) when k land (b - 1) = p ->
if k land b = 0 then
let l' = und l in
if l' == l then t
else if is_undefined l' then r
else Branch(p,b,l',r)
else
let r' = und r in
if r' == r then t
else if is_undefined r' then l
else Branch(p,b,l,r')
| _ -> t in
und;;
(* ------------------------------------------------------------------------- *)
(* Redefinition and combination. *)
(* ------------------------------------------------------------------------- *)
let (|->),combine =
let ldb x y = let z = x lxor y in z land (-z) in
let newbranch p1 t1 p2 t2 =
let b = ldb p1 p2 in
let p = p1 land (b - 1) in
if p1 land b = 0 then Branch(p,b,t1,t2)
else Branch(p,b,t2,t1) in
let rec define_list (x,y as xy) l =
match l with
(a,b as ab)::t ->
let c = Pervasives.compare x a in
if c = 0 then xy::t
else if c < 0 then xy::l
else ab::(define_list xy t)
| [] -> [xy]
and combine_list op z l1 l2 =
match (l1,l2) with
[],_ -> l2
| _,[] -> l1
| ((x1,y1 as xy1)::t1,(x2,y2 as xy2)::t2) ->
let c = Pervasives.compare x1 x2 in
if c < 0 then xy1::(combine_list op z t1 l2)
else if c > 0 then xy2::(combine_list op z l1 t2) else
let y = op y1 y2 and l = combine_list op z t1 t2 in
if z(y) then l else (x1,y)::l in
let (|->) x y =
let k = Hashtbl.hash x in
let rec upd t =
match t with
Empty -> Leaf (k,[x,y])
| Leaf(h,l) ->
if h = k then Leaf(h,define_list (x,y) l)
else newbranch h t k (Leaf(k,[x,y]))
| Branch(p,b,l,r) ->
if k land (b - 1) <> p then newbranch p t k (Leaf(k,[x,y]))
else if k land b = 0 then Branch(p,b,upd l,r)
else Branch(p,b,l,upd r) in
upd in
let rec combine op z t1 t2 =
match (t1,t2) with
Empty,_ -> t2
| _,Empty -> t1
| Leaf(h1,l1),Leaf(h2,l2) ->
if h1 = h2 then
let l = combine_list op z l1 l2 in
if l = [] then Empty else Leaf(h1,l)
else newbranch h1 t1 h2 t2
| (Leaf(k,lis) as lf),(Branch(p,b,l,r) as br) ->
if k land (b - 1) = p then
if k land b = 0 then
let l' = combine op z lf l in
if is_undefined l' then r else Branch(p,b,l',r)
else
let r' = combine op z lf r in
if is_undefined r' then l else Branch(p,b,l,r')
else
newbranch k lf p br
| (Branch(p,b,l,r) as br),(Leaf(k,lis) as lf) ->
if k land (b - 1) = p then
if k land b = 0 then
let l' = combine op z l lf in
if is_undefined l' then r else Branch(p,b,l',r)
else
let r' = combine op z r lf in
if is_undefined r' then l else Branch(p,b,l,r')
else
newbranch p br k lf
| Branch(p1,b1,l1,r1),Branch(p2,b2,l2,r2) ->
if b1 < b2 then
if p2 land (b1 - 1) <> p1 then newbranch p1 t1 p2 t2
else if p2 land b1 = 0 then
let l = combine op z l1 t2 in
if is_undefined l then r1 else Branch(p1,b1,l,r1)
else
let r = combine op z r1 t2 in
if is_undefined r then l1 else Branch(p1,b1,l1,r)
else if b2 < b1 then
if p1 land (b2 - 1) <> p2 then newbranch p1 t1 p2 t2
else if p1 land b2 = 0 then
let l = combine op z t1 l2 in
if is_undefined l then r2 else Branch(p2,b2,l,r2)
else
let r = combine op z t1 r2 in
if is_undefined r then l2 else Branch(p2,b2,l2,r)
else if p1 = p2 then
let l = combine op z l1 l2 and r = combine op z r1 r2 in
if is_undefined l then r
else if is_undefined r then l else Branch(p1,b1,l,r)
else
newbranch p1 t1 p2 t2 in
(|->),combine;;
(* ------------------------------------------------------------------------- *)
(* Special case of point function. *)
(* ------------------------------------------------------------------------- *)
let (|=>) = fun x y -> (x |-> y) undefined;;
(* ------------------------------------------------------------------------- *)
(* Grab an arbitrary element. *)
(* ------------------------------------------------------------------------- *)
let rec choose t =
match t with
Empty -> failwith "choose: completely undefined function"
| Leaf(h,l) -> List.hd l
| Branch(b,p,t1,t2) -> choose t1;;
(* ------------------------------------------------------------------------- *)
(* Mapping to sorted-list representation of the graph, domain and range. *)
(* ------------------------------------------------------------------------- *)
let rec uniq l =
match l with
x::(y::_ as t) -> let t' = uniq t in
if Pervasives.compare x y = 0 then t' else
if t'==t then l else x::t'
| _ -> l;;
let setify l = uniq(List.sort Pervasives.compare l);;
let graph f = setify (foldl (fun a x y -> (x,y)::a) [] f);;
let dom f = setify(foldl (fun a x y -> x::a) [] f);;
let ran f = setify(foldl (fun a x y -> y::a) [] f);;
(* ------------------------------------------------------------------------- *)
(* Idiom for a mapping zipping domain and range lists. *)
(* ------------------------------------------------------------------------- *)
let fpf xs ys = List.fold_right2 (|->) xs ys undefined;;
(* ------------------------------------------------------------------------- *)
(* Install a (trivial) printer for finite partial functions. *)
(* ------------------------------------------------------------------------- *)
let print_fpf (f:('a,'b)func) = print_string "<func>";;
#install_printer print_fpf;;
next prev parent reply other threads:[~2008-03-07 17:13 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-05 16:49 Berke Durak
2008-03-05 17:16 ` [Caml-list] " Brian Hurt
2008-03-05 17:27 ` Alain Frisch
2008-03-05 19:53 ` Jean-Christophe Filliâtre
2008-03-05 20:03 ` Jon Harrop
2008-03-05 21:56 ` Alain Frisch
2008-03-06 7:45 ` Jean-Christophe Filliâtre
2008-03-05 17:34 ` Harrison, John R
2008-03-06 9:53 ` Berke Durak
2008-03-06 17:36 ` Harrison, John R
2008-03-07 10:09 ` Berke Durak
2008-03-07 17:13 ` Harrison, John R [this message]
2008-03-07 10:19 ` Alain Frisch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DCC19446A892D84FBB89AE7C94F0C04E01D99539DC@azsmsx501.amr.corp.intel.com \
--to=john.r.harrison@intel.com \
--cc=berke.durak@exalead.com \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox