* [Caml-list] Multi-keyed lookup table? @ 2003-08-07 19:41 Matt Gushee 2003-08-07 20:16 ` Brian Hurt 2003-08-07 22:19 ` John Max Skaller 0 siblings, 2 replies; 23+ messages in thread From: Matt Gushee @ 2003-08-07 19:41 UTC (permalink / raw) To: caml-list Hello, all-- I am trying to decide on a data structure that allows efficient lookup of fonts according to various font properties. I am thinking that the data structures describing fonts will be something like this: type font_weight = | Medium | Bold | Light type font_style = | Roman | Italic | Oblique type font_width = | Normal | Expanded | Condensed type encoding = string * int (* e.g. : ("iso8859",15)) type font_ref = string (* reference to the font file *) type font_family = ((font_weight * font_style * font_width * encoding) * font_ref) list (* Example font family *) let arial = [ ((Medium, Roman, Normal, ("iso10646",1)), "arial.ttf"); ((Bold, Roman, Normal, ("iso10646",1)), "arialb.ttf"); ... ];; (* based on the 5 types used in CSS and other Web standards *) type font_class = | Serif | SansSerif | Monospaced | Cursive | Fantasy What I would like to have is a data structure that contains descriptions of all fonts available on a particular system, such that an application can do something like: get_font_by_class ~weight:Bold ~style:Italic ~encoding:("iso8859",1) SansSerif or get_font_by_family ~weight:Bold "Helvetica" And obtain the font file name that matches the specified characteristics. As I mentioned above, efficient lookup is important; efficient creation of the data structure is probably not important; and since there are no large objects involved, and the data refers to font collections that may have hundreds of members, but probably not thousands, I don't think memory usage is really an issue. So, does anyone have an idea what sort of data structure would work best? TIA for your suggestions. -- Matt Gushee When a nation follows the Way, Englewood, Colorado, USA Horses bear manure through mgushee@havenrock.com its fields; http://www.havenrock.com/ When a nation ignores the Way, Horses bear soldiers through its streets. --Lao Tzu (Peter Merel, trans.) ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Multi-keyed lookup table? 2003-08-07 19:41 [Caml-list] Multi-keyed lookup table? Matt Gushee @ 2003-08-07 20:16 ` Brian Hurt 2003-08-07 21:49 ` Yaron Minsky 2003-08-07 22:19 ` John Max Skaller 1 sibling, 1 reply; 23+ messages in thread From: Brian Hurt @ 2003-08-07 20:16 UTC (permalink / raw) To: Matt Gushee; +Cc: caml-list On Thu, 7 Aug 2003, Matt Gushee wrote: > Hello, all-- > > I am trying to decide on a data structure that allows efficient lookup > of fonts according to various font properties. I am thinking that the > data structures describing fonts will be something like this: In the general case, this is hard. In this specific case, you might consider just hard coding your levels. So you'd end up with a data structure like: All font / | \ / | \ medium bold light <-- pick the weight / | \ / | \ / | \ / | \ Roman Italic Oblique <-- pick the style / | \ / | \ / | \ / | \ / | \ Normal Expanded Condensed <-- pick the width | [ ...; 15; ... ] <-- pick the size | | * <-- pick the family So you'd end up with something like: type font_tree = { medium: style_tree; bold: style_tree; light: style_tree }; type style_tree = { roman: width_tree; italic: width_tree; oblique: width_tree }; type witdth_tree = ... type family_tree = (string, size_tree) Hashtbl.t type size_tree = (int, string) Hashtbl.t let font_list tree weight style width family size = let list_of_hashtbl tbl = let f _ s t = s :: t in List.rev (Hashtbl.fold f tbl []) let get_size sztree = match size with | "*" -> list_of_hashtbl sztree | _ -> try [ Hashtbl.find sztree (int_of_string size) ] with Not_found -> [] in ... in match weight with | "*" -> (get_style tbl.medium) @ (get_style tbl.bold) @ (get_style tbl.light) | "medium" -> get_style tbl.medium | "bold" -> get_style tbl.bold ... For bonus points remove the @'s by accumulating. Brian > > > type font_weight = > | Medium > | Bold > | Light > type font_style = > | Roman > | Italic > | Oblique > type font_width = > | Normal > | Expanded > | Condensed > type encoding = string * int (* e.g. : ("iso8859",15)) > type font_ref = string (* reference to the font file *) > > type font_family = > ((font_weight * font_style * font_width * encoding) * font_ref) > list > > (* Example font family *) > let arial = [ > ((Medium, Roman, Normal, ("iso10646",1)), "arial.ttf"); > ((Bold, Roman, Normal, ("iso10646",1)), "arialb.ttf"); > ... > ];; > > (* based on the 5 types used in CSS and other Web standards *) > type font_class = > | Serif > | SansSerif > | Monospaced > | Cursive > | Fantasy > > > What I would like to have is a data structure that contains descriptions > of all fonts available on a particular system, such that an application > can do something like: > > get_font_by_class ~weight:Bold ~style:Italic ~encoding:("iso8859",1) SansSerif > > or > > get_font_by_family ~weight:Bold "Helvetica" > > And obtain the font file name that matches the specified > characteristics. As I mentioned above, efficient lookup is important; > efficient creation of the data structure is probably not important; and > since there are no large objects involved, and the data refers to font > collections that may have hundreds of members, but probably not > thousands, I don't think memory usage is really an issue. > > So, does anyone have an idea what sort of data structure would work > best? > > TIA for your suggestions. > > > ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Multi-keyed lookup table? 2003-08-07 20:16 ` Brian Hurt @ 2003-08-07 21:49 ` Yaron Minsky 2003-08-07 22:26 ` John Max Skaller 0 siblings, 1 reply; 23+ messages in thread From: Yaron Minsky @ 2003-08-07 21:49 UTC (permalink / raw) To: caml-list This might be too simplistic, but how about creating a union type the includes all of the different things you might want to index on, and then use that as the key to a hashtable. The efficiency of that would hinge on the efficiency of the hash function, I would think. I would expect it to be simple to implement and pretty quick. Yaron ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Multi-keyed lookup table? 2003-08-07 21:49 ` Yaron Minsky @ 2003-08-07 22:26 ` John Max Skaller [not found] ` <D4DBD8568F05D511A1C20002A55C008C11AC05E6@uswaumsx03medge.med.ge.com> 0 siblings, 1 reply; 23+ messages in thread From: John Max Skaller @ 2003-08-07 22:26 UTC (permalink / raw) To: Yaron Minsky; +Cc: caml-list Yaron Minsky wrote: > This might be too simplistic, but how about creating a union type the > includes all of the different things you might want to index on, and then > use that as the key to a hashtable. The efficiency of that would hinge on > the efficiency of the hash function, I would think. I would expect it to > be simple to implement and pretty quick. That's actually a fairly neat trick to embody multiple index tables in one data structure. The main problem is probably that hashtables aren't ordered, so there is no way of getting all the values of a particular 'index' quickly. A Map does not have this problem .. but then it doesn't hash .. where's that cake I've just eaten .. -- John Max Skaller, mailto:skaller@ozemail.com.au snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <D4DBD8568F05D511A1C20002A55C008C11AC05E6@uswaumsx03medge.med.ge.com>]
* Re: [Caml-list] Multi-keyed lookup table? [not found] ` <D4DBD8568F05D511A1C20002A55C008C11AC05E6@uswaumsx03medge.med.ge.com> @ 2003-08-08 21:30 ` Matt Gushee 2003-08-08 22:13 ` Brian Hurt ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Matt Gushee @ 2003-08-08 21:30 UTC (permalink / raw) To: caml-list Thanks for the very informative answers. Even if I end up not exactly following any of your suggestions, this thread has given me some very good information about data structures in general. On Thu, Aug 07, 2003 at 03:16:49PM -0500, Brian Hurt wrote: > > In the general case, this is hard. In this specific case, you might > consider just hard coding your levels. So you'd end up with a data > structure like: > > All font > / | \ > / | \ > medium bold light <-- pick the weight > / | \ > [ ..... ] Interesting idea, and not one that I had considered at all. It seems quite efficient, too. On the other hand, it looks like the fixed search path would make it rather hard to implement fallback rules in case an exact match isn't found. It seems to me that that's fairly important for fonts: there are many situations where it is preferable to produce some output with some fonts not quite right rather than producing nothing. Maybe it should be up to the application to handle that. But then if my library only implements an exact-match-or-nothing approach, you could easily end up an application that has to handle a whole lot of Not_found exceptions. I'd rather create a library that can do a closest-match kind of thing. On Thu, Aug 07, 2003 at 05:49:30PM -0400, Yaron Minsky wrote: > This might be too simplistic, but how about creating a union type the > includes all of the different things you might want to index on, and then > use that as the key to a hashtable. The efficiency of that would hinge on > the efficiency of the hash function, I would think. I would expect it to > be simple to implement and pretty quick. You mean something like: type font_spec = (font_family * font_weight * font_style * font_width * encoding) ? I thought of doing something like that, but then how would you handle something like: (_, Bold, Italic, _, _) (* Not intended to represent real code, just illustrating the concept *) ? I was actually thinking of doing a little bitmask voodoo, e.g. let candidates = List.filter (fun key font -> (key land pattern) == pattern) all_fonts (BTW, why isn't there an Array.filter function?) On Fri, Aug 08, 2003 at 08:19:36AM +1000, John Max Skaller wrote: > > Since you have 'arbitrary' keying, the ideal data structure > is clearly a relational database :-) Now why didn't I think of that? I may actually follow that semi-suggestion. Or rather, having thought through my idea a little more, it becomes clear that having a good runtime data structure is only part of the problem. A persistence mechanism is also important, and if we use, say, an RDBMS or DBM, it probably doesn't make much sense to layer an OCaml data structure on top of that. Or does it? Does caching have much value when the data being cached is a bunch of short strings? Anyway, I think what I'm going to do is--since what I have in mind is a pretty simple utility that does one thing--to whip together a library with several backends--Dbm, Postgres, etc.--and see how they work out. Again, thanks for the ideas. -- Matt Gushee When a nation follows the Way, Englewood, Colorado, USA Horses bear manure through mgushee@havenrock.com its fields; http://www.havenrock.com/ When a nation ignores the Way, Horses bear soldiers through its streets. --Lao Tzu (Peter Merel, trans.) ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Multi-keyed lookup table? 2003-08-08 21:30 ` Matt Gushee @ 2003-08-08 22:13 ` Brian Hurt [not found] ` <005d01c35e51$7c927200$6628f9c1@zofo> 2003-08-11 11:51 ` [Caml-list] Multi-keyed lookup table? Remi Vanicat 2 siblings, 0 replies; 23+ messages in thread From: Brian Hurt @ 2003-08-08 22:13 UTC (permalink / raw) To: Matt Gushee; +Cc: caml-list On Fri, 8 Aug 2003, Matt Gushee wrote: > On Thu, Aug 07, 2003 at 03:16:49PM -0500, Brian Hurt wrote: > > > > In the general case, this is hard. In this specific case, you might > > consider just hard coding your levels. So you'd end up with a data > > structure like: > > > > All font > > / | \ > > / | \ > > medium bold light <-- pick the weight > > / | \ > > [ ..... ] > > Interesting idea, and not one that I had considered at all. It seems > quite efficient, too. On the other hand, it looks like the fixed search > path would make it rather hard to implement fallback rules in case an > exact match isn't found. It seems to me that that's fairly important for > fonts: there are many situations where it is preferable to produce some > output with some fonts not quite right rather than producing nothing. You certainly can handle * rules by just concatenating results from the different children. You could also do something like: match weight with "medium" -> let t = get_style tree.medium ... in match t with _ :: _ -> (* we have at least one font *) t | [] -> (* try getting something in bold *) let t = get_style tree.bold in ... Brian ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <005d01c35e51$7c927200$6628f9c1@zofo>]
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) [not found] ` <005d01c35e51$7c927200$6628f9c1@zofo> @ 2003-08-09 16:57 ` Matt Gushee 2003-08-09 18:48 ` ijtrotts 0 siblings, 1 reply; 23+ messages in thread From: Matt Gushee @ 2003-08-09 16:57 UTC (permalink / raw) To: caml-list On Sat, Aug 09, 2003 at 10:36:06AM +0200, Jean-Baptiste Rouquier wrote: > >BTW, why isn't there an Array.filter function? > There is a lot of such general purpose functions that one would like to have > already implemented. But having all of these in the standard library would > make it less readable. Sure. I just thought that lists and arrays are rather similar data structures, and to the extent that they are similar, their respective modules should have similar APIs. -- Matt Gushee When a nation follows the Way, Englewood, Colorado, USA Horses bear manure through mgushee@havenrock.com its fields; http://www.havenrock.com/ When a nation ignores the Way, Horses bear soldiers through its streets. --Lao Tzu (Peter Merel, trans.) ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) 2003-08-09 16:57 ` [Caml-list] Array.filter (was Multi-keyed lookup table?) Matt Gushee @ 2003-08-09 18:48 ` ijtrotts 2003-08-10 19:53 ` Michal Moskal ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: ijtrotts @ 2003-08-09 18:48 UTC (permalink / raw) To: caml-list On Sat, Aug 09, 2003 at 10:57:20AM -0600, Matt Gushee wrote: > On Sat, Aug 09, 2003 at 10:36:06AM +0200, Jean-Baptiste Rouquier wrote: > > >BTW, why isn't there an Array.filter function? > > There is a lot of such general purpose functions that one would like to have > > already implemented. But having all of these in the standard library would > > make it less readable. > > Sure. I just thought that lists and arrays are rather similar data > structures, and to the extent that they are similar, their respective > modules should have similar APIs. It might help to create a file (mods.ml) with your favorite additions and modifications to the standard library. Then you can open Mods in any files where the modifications are needed. # module Array = struct include Array let filter f a = Array.init (Array.length a) (fun i -> f a.(i)) end;; # Array.filter (fun x -> x * x) [| 1;2;3 |];; - : int array = [|1; 4; 9|] Hope this helps, Issac -- Issac Trotts Programmer Center for Neuroscience University of California, Davis ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) 2003-08-09 18:48 ` ijtrotts @ 2003-08-10 19:53 ` Michal Moskal 2003-08-10 2:34 ` ijtrotts ` (2 more replies) 2003-08-10 20:55 ` [Caml-list] Array.filter Jean-Baptiste Rouquier 2003-08-10 22:29 ` [Caml-list] Array.filter (was Multi-keyed lookup table?) Shawn Wagner 2 siblings, 3 replies; 23+ messages in thread From: Michal Moskal @ 2003-08-10 19:53 UTC (permalink / raw) To: caml-list On Sat, Aug 09, 2003 at 11:48:51AM -0700, ijtrotts@ucdavis.edu wrote: > # module Array = struct > include Array > let filter f a = Array.init (Array.length a) (fun i -> f a.(i)) > end;; > # Array.filter (fun x -> x * x) [| 1;2;3 |];; > - : int array = [|1; 4; 9|] This is map, not filter. Writing filter is more difficult, since you need to first make list of results and then put them into array (or use Obj.magic tricks). -- : Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv : When in doubt, use brute force. -- Ken Thompson : {E-,w}-- {b++,e}>+++ h ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) 2003-08-10 19:53 ` Michal Moskal @ 2003-08-10 2:34 ` ijtrotts 2003-08-11 5:48 ` David Brown 2003-08-10 20:23 ` Marcin 'Qrczak' Kowalczyk [not found] ` <200308102222.16369.qrczak@knm.org.pl> 2 siblings, 1 reply; 23+ messages in thread From: ijtrotts @ 2003-08-10 2:34 UTC (permalink / raw) To: caml-list On Sun, Aug 10, 2003 at 09:53:07PM +0200, Michal Moskal wrote: > On Sat, Aug 09, 2003 at 11:48:51AM -0700, ijtrotts@ucdavis.edu wrote: > > # module Array = struct > > include Array > > let filter f a = Array.init (Array.length a) (fun i -> f a.(i)) > > end;; > > # Array.filter (fun x -> x * x) [| 1;2;3 |];; > > - : int array = [|1; 4; 9|] > > This is map, not filter. Writing filter is more difficult, since you > need to first make list of results and then put them into array (or use > Obj.magic tricks). Yikes, my bad. Here you go: # let filter f a = let n = ref 0 in for i = 0 to Array.length a - 1 do if f a.(i) then incr n done; let k = ref 0 in Array.init !n (fun i -> while not (f a.(!k)) do incr k done; incr k; a.(!k-1));; val filter : ('a -> bool) -> 'a array -> 'a array = <fun> # let odd = fun x -> x land 1 = 1;; val odd : int -> bool = <fun> # filter odd [| 1;2;3;4;5;6;7;8;9 |];; - : int array = [|1; 3; 5; 7; 9|] Luckily, you don't have to make a list or use Obj.magic. Issac -- Issac Trotts Programmer Center for Neuroscience University of California, Davis ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) 2003-08-10 2:34 ` ijtrotts @ 2003-08-11 5:48 ` David Brown 2003-08-10 18:53 ` ijtrotts 0 siblings, 1 reply; 23+ messages in thread From: David Brown @ 2003-08-11 5:48 UTC (permalink / raw) To: Caml List; +Cc: ijtrotts On Sat, Aug 09, 2003 at 07:34:35PM -0700, ijtrotts@ucdavis.edu wrote: > # let filter f a = > let n = ref 0 in > for i = 0 to Array.length a - 1 do if f a.(i) then incr n done; > let k = ref 0 in > Array.init !n > (fun i -> while not (f a.(!k)) do incr k done; incr k; a.(!k-1));; Unfortunately, this calls f twice on each element. If that solution is not acceptable, then the results of f a.(i) could be cached in some type of bit array (perhaps in a string), and then that string iterated again. This function also poorly if f a.(i) isn't really functional, and returns different results. For example: List.filter (fun _ -> Random.bool) data will return roughly half of the items, randomly. This would cause array bounds problems about half the time with the above code. Dave Brown ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) 2003-08-11 5:48 ` David Brown @ 2003-08-10 18:53 ` ijtrotts 0 siblings, 0 replies; 23+ messages in thread From: ijtrotts @ 2003-08-10 18:53 UTC (permalink / raw) To: caml-list On Sun, Aug 10, 2003 at 10:48:08PM -0700, David Brown wrote: > On Sat, Aug 09, 2003 at 07:34:35PM -0700, ijtrotts@ucdavis.edu wrote: > > > # let filter f a = > > let n = ref 0 in > > for i = 0 to Array.length a - 1 do if f a.(i) then incr n done; > > let k = ref 0 in > > Array.init !n > > (fun i -> while not (f a.(!k)) do incr k done; incr k; a.(!k-1));; > > Unfortunately, this calls f twice on each element. If that solution is > not acceptable, then the results of f a.(i) could be cached in some type > of bit array (perhaps in a string), and then that string iterated again. > > This function also poorly if f a.(i) isn't really functional, and > returns different results. For example: > > List.filter (fun _ -> Random.bool) data > > will return roughly half of the items, randomly. This would cause array > bounds problems about half the time with the above code. > > Dave Brown > Good idea. This would be more robust: let filter f arr = let n = ref 0 in let tmp = Array.map (fun x -> if f x then (incr n; 1) else 0) arr in let k = ref 0 in Array.init !n (fun i -> while tmp.(!k) <> 1 do incr k done; incr k; arr.(!k-1));; or... let filter2 f arr = let n = ref 0 in let tmp = String.create (Array.length arr) in for i = 0 to Array.length arr - 1 do tmp.[i] <- if f arr.(i) then (incr n; '1') else '0' done; let k = ref 0 in Array.init !n (fun i -> while tmp.[!k] <> '1' do incr k done; incr k; arr.(!k-1));; -- Issac Trotts Programmer Center for Neuroscience University of California, Davis ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) 2003-08-10 19:53 ` Michal Moskal 2003-08-10 2:34 ` ijtrotts @ 2003-08-10 20:23 ` Marcin 'Qrczak' Kowalczyk 2003-08-10 2:37 ` ijtrotts [not found] ` <200308102222.16369.qrczak@knm.org.pl> 2 siblings, 1 reply; 23+ messages in thread From: Marcin 'Qrczak' Kowalczyk @ 2003-08-10 20:23 UTC (permalink / raw) To: caml-list Dnia nie 10. sierpnia 2003 21:53, Michal Moskal napisał: > This is map, not filter. Writing filter is more difficult, since you > need to first make list of results and then put them into array (or use > Obj.magic tricks). let filter pred arr = let sz = Array.length arr in if sz == 0 then arr else let rec loop i j = if i >= sz then Array.make j arr.(0) else let x = arr.(i) in if pred x then begin let result = loop (i + 1) (j + 1) in result.(j) <- x; result end else loop (i + 1) j in loop 0 0 Untested. Doesn't make a list on heap but uses O(result_length) stack. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) 2003-08-10 20:23 ` Marcin 'Qrczak' Kowalczyk @ 2003-08-10 2:37 ` ijtrotts 0 siblings, 0 replies; 23+ messages in thread From: ijtrotts @ 2003-08-10 2:37 UTC (permalink / raw) To: caml-list On Sun, Aug 10, 2003 at 10:23:05PM +0200, Marcin 'Qrczak' Kowalczyk wrote: > Dnia nie 10. sierpnia 2003 21:53, Michal Moskal napisa?: > > > This is map, not filter. Writing filter is more difficult, since you > > need to first make list of results and then put them into array (or use > > Obj.magic tricks). > > let filter pred arr = > let sz = Array.length arr in > if sz == 0 then arr else > let rec loop i j = > if i >= sz then Array.make j arr.(0) else > let x = arr.(i) in > if pred x then begin > let result = loop (i + 1) (j + 1) in > result.(j) <- x; > result > end else loop (i + 1) j in > loop 0 0 > > Untested. Doesn't make a list on heap but uses O(result_length) stack. Mine is tested (on a small example), is shorter, and takes O(1) stack :o). Issac -- Issac Trotts Programmer Center for Neuroscience University of California, Davis ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <200308102222.16369.qrczak@knm.org.pl>]
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) [not found] ` <200308102222.16369.qrczak@knm.org.pl> @ 2003-08-10 20:43 ` Michal Moskal 2003-08-10 21:59 ` Ville-Pertti Keinonen 0 siblings, 1 reply; 23+ messages in thread From: Michal Moskal @ 2003-08-10 20:43 UTC (permalink / raw) To: caml-list On Sun, Aug 10, 2003 at 10:22:16PM +0200, Marcin 'Qrczak' Kowalczyk wrote: > Dnia nie 10. sierpnia 2003 21:53, Michal Moskal napisał: > > > > # Array.filter (fun x -> x * x) [| 1;2;3 |];; > > > - : int array = [|1; 4; 9|] > > > > This is map, not filter. Writing filter is more difficult, since you > > need to first make list of results and then put them into array (or use > > Obj.magic tricks). > > let filter pred arr = > let sz = Array.length arr in > if sz == 0 then arr else > let rec loop i j = > if i >= sz then Array.make j arr.(0) else > let x = arr.(i) in > if pred x then begin > let result = loop (i + 1) (j + 1) in > result.(j) <- x; > result > end else loop (i + 1) j in > loop 0 0 > > Untested. Doesn't make a list on heap but uses O(result_length) stack. It will bomb for large arrays. List.filter will not (BTW I don't understand why List.filter uses List.rev to be tail recursive and List.map does not). Another thought would be to create bool array for predicate results, fill it, counting how much space do you need, and then create result array. There is also third way, to run predicate twice, but it is not guaranteed safe in presence of side effects. -- : Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv : When in doubt, use brute force. -- Ken Thompson : {E-,w}-- {b++,e}>+++ h ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) 2003-08-10 20:43 ` Michal Moskal @ 2003-08-10 21:59 ` Ville-Pertti Keinonen 0 siblings, 0 replies; 23+ messages in thread From: Ville-Pertti Keinonen @ 2003-08-10 21:59 UTC (permalink / raw) To: caml-list On Sun, Aug 10, 2003 at 10:43:19PM +0200, Michal Moskal wrote: > It will bomb for large arrays. List.filter will not (BTW I don't > understand why List.filter uses List.rev to be tail recursive and List.map > does not). > > Another thought would be to create bool array for predicate results, fill > it, counting how much space do you need, and then create result array. > > There is also third way, to run predicate twice, but it is not guaranteed > safe in presence of side effects. How about just creating a result array the same size as the original (initially filled with item 0 of the original, special-cased for empty arrays) and Array.sub:ing it for the result? I'd think that would be the obvious method. Creating new instances of a value shouldn't have any noticable side-effects. Anyway, filter for arrays as well as strings is one of the more useful things that might be desirable as part of the standard library. Functions for mapping and folding strings might also be useful. While the OCaml standard library is pretty good, it seems fairly sparse compared to e.g. the functionality offered by Common Lisp sequence functions, which work on lists, arrays and strings. Implementing the necessary functionality is easy, but an advantage of having more things as part of the standard lirbary would be the reduction in LOC for OCaml solutions to programming tasks used to compare programming languages (e.g. Prechelt's study, although that didn't include OCaml). OCaml isn't otherwise much more verbose than Common Lisp (ignoring macros) or various scripting languages, but for array and string manipulation, it can end up looking much worse than it needs to. Plenty of people who don't know many languages seem to believe that static typing and/or explicit lexical scoping inevitably make languages verbose, and that dynamically typed scripting languages with idiosyncratic scoping rules and huge performance hits are the only way around that. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter 2003-08-09 18:48 ` ijtrotts 2003-08-10 19:53 ` Michal Moskal @ 2003-08-10 20:55 ` Jean-Baptiste Rouquier 2003-08-11 9:46 ` Michal Moskal 2003-08-10 22:29 ` [Caml-list] Array.filter (was Multi-keyed lookup table?) Shawn Wagner 2 siblings, 1 reply; 23+ messages in thread From: Jean-Baptiste Rouquier @ 2003-08-10 20:55 UTC (permalink / raw) To: caml-list I didn't think I would have any answers ! I prefer reading "worl domination thread", even if I don't have time to help for the moment. Perhaps the really missing thing is one guy to coordinate the work. I have a wish along #unload : the symetric of open, even if the [open] scope is the current struct. So we can [open] a module in the main file, and [close] it in the middle of the same file. > It might help to create a file (mods.ml) with your favorite additions >and modifications to the standard library. I called it Complements in my precedent mail. let filter test a = let result = (Array.fold_left (fun accu elt -> if test elt then elt :: accu else accu) [] a) in Array.of_list (List.rev result) # let _ = filter (fun x -> x mod 2 = 0) [|0; 1; 2; 3; 4; 5|];; - : int array = [|0; 2; 4|] Constant stack but O(n) heap. Jean-Baptiste Rouquier ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter 2003-08-10 20:55 ` [Caml-list] Array.filter Jean-Baptiste Rouquier @ 2003-08-11 9:46 ` Michal Moskal 0 siblings, 0 replies; 23+ messages in thread From: Michal Moskal @ 2003-08-11 9:46 UTC (permalink / raw) To: caml-list On Sun, Aug 10, 2003 at 10:55:36PM +0200, Jean-Baptiste Rouquier wrote: > let filter test a = > let result = (Array.fold_left > (fun accu elt -> > if test elt then elt :: accu else accu) > [] a) in > Array.of_list (List.rev result) I did some timing for several versions. 1% elements selected: 0 0.26s real 0.21s user 0.01s system 1 0.46s real 0.37s user 0.01s system 2 1.18s real 0.85s user 0.06s system 3 0.92s real 0.66s user 0.10s system 4 0.57s real 0.43s user 0.03s system 5 0.52s real 0.44s user 0.00s system 1/2 elements selected: 0 0.29s real 0.19s user 0.03s system 1 2.53s real 2.04s user 0.10s system 2 1.49s real 1.05s user 0.11s system 3 1.31s real 0.98s user 0.12s system 4 8.55s real 6.87s user 0.10s system 5 1.48s real 1.15s user 0.05s system 0. just create random array, substract from subseqent results 1. version posted by Qrczak, allocates space on the heap 2. allocate bool array for predicate results 3. allocate array, and then sub it 4. your version 5. allocate first 32 elemnts for results, when it's filled, store array on list, allocate 64 and so on. At the end concat resulting arrays. 5. version doesn't seem to have worst case (it's slower then fastest versions in given situation, but not by much). #v+ let filter1 pred arr = let sz = Array.length arr in if sz == 0 then arr else let rec loop i j = if i >= sz then Array.make j arr.(0) else let x = arr.(i) in if pred x then begin let result = loop (i + 1) (j + 1) in result.(j) <- x; result end else loop (i + 1) j in loop 0 0 let filter2 f ar = let sz = Array.length ar in if sz == 0 then ar else let tmp = Array.create sz false in let rec loop len i = if i >= sz then len else if f ar.(i) then (tmp.(i) <- true; loop (len + 1) (i + 1)) else loop len (i + 1) in let len = loop 0 0 in let res = Array.create len ar.(0) in let rec loop len i = if i >= sz then res else if tmp.(i) then (res.(len) <- ar.(i); loop (len + 1) (i + 1)) else loop len (i + 1) in loop 0 0 let filter3 f ar = let sz = Array.length ar in if sz == 0 then ar else let tmp = Array.create sz ar.(0) in let rec loop len i = if i >= sz then Array.sub tmp 0 len else if f ar.(i) then (tmp.(len) <- ar.(i); loop (len + 1) (i + 1)) else loop len (i + 1) in loop 0 0 let filter4 test a = let result = (Array.fold_left (fun accu elt -> if test elt then elt :: accu else accu) [] a) in Array.of_list (List.rev result) let filter5 f ar = let sz = Array.length ar in if sz == 0 then ar else let rec loop acc cur i pos = if i >= sz then if pos == 0 then Array.concat (List.rev acc) else let cut = Array.sub cur 0 pos in Array.concat (List.rev (cut :: acc)) else if pos >= Array.length cur then loop (cur :: acc) (Array.make (Array.length cur * 2) ar.(0)) i 0 else if f ar.(i) then (cur.(pos) <- ar.(i); loop acc cur (i + 1) (pos + 1)) else loop acc cur (i + 1) pos in loop [] (Array.make 32 ar.(0)) 0 0 let main () = let ar = Array.init 1000000 (fun _ -> Random.int 100) in let test f = for i = 1 to 10 do Printf.printf "%d\n" (Array.length (f (fun x -> x > 50) ar)) done in match Sys.argv.(1) with | "0" -> () | "1" -> test filter1 | "2" -> test filter2 | "3" -> test filter3 | "4" -> test filter4 | "5" -> test filter5 | _ -> assert false ;; main () #v- -- : Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv : When in doubt, use brute force. -- Ken Thompson : {E-,w}-- {b++,e}>+++ h ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Array.filter (was Multi-keyed lookup table?) 2003-08-09 18:48 ` ijtrotts 2003-08-10 19:53 ` Michal Moskal 2003-08-10 20:55 ` [Caml-list] Array.filter Jean-Baptiste Rouquier @ 2003-08-10 22:29 ` Shawn Wagner 2 siblings, 0 replies; 23+ messages in thread From: Shawn Wagner @ 2003-08-10 22:29 UTC (permalink / raw) To: caml-list On Sat, Aug 09, 2003 at 11:48:51AM -0700, ijtrotts@ucdavis.edu wrote: > On Sat, Aug 09, 2003 at 10:57:20AM -0600, Matt Gushee wrote: > > On Sat, Aug 09, 2003 at 10:36:06AM +0200, Jean-Baptiste Rouquier wrote: > > > >BTW, why isn't there an Array.filter function? > > > There is a lot of such general purpose functions that one would like to have > > > already implemented. But having all of these in the standard library would > > > make it less readable. > > > > Sure. I just thought that lists and arrays are rather similar data > > structures, and to the extent that they are similar, their respective > > modules should have similar APIs. > > It might help to create a file (mods.ml) with your favorite additions > and modifications to the standard library. Then you can open Mods > in any files where the modifications are needed. > Shameless plug: I have a library that's little more than this. Simple, basic things missing from the standard libraries that I keep running across a need for. Other people no doubt do the same, as there is a lot missing from the standard library (Even such fundamental things as searching for a substring a la C's strstr()!) http://raevnos.pennmush.org/code/ocaml.html#extlib -- Shawn Wagner shawnw@speakeasy.org ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Multi-keyed lookup table? 2003-08-08 21:30 ` Matt Gushee 2003-08-08 22:13 ` Brian Hurt [not found] ` <005d01c35e51$7c927200$6628f9c1@zofo> @ 2003-08-11 11:51 ` Remi Vanicat 2 siblings, 0 replies; 23+ messages in thread From: Remi Vanicat @ 2003-08-11 11:51 UTC (permalink / raw) To: caml-list Matt Gushee <matt@gushee.net> writes: > On Thu, Aug 07, 2003 at 05:49:30PM -0400, Yaron Minsky wrote: >> This might be too simplistic, but how about creating a union type the >> includes all of the different things you might want to index on, and then >> use that as the key to a hashtable. The efficiency of that would hinge on >> the efficiency of the hash function, I would think. I would expect it to >> be simple to implement and pretty quick. > > You mean something like: > > type font_spec = (font_family * font_weight * font_style * font_width > * encoding) I believe he mean : type font_spec = | Font_family of font_family | Font_weight of font_weight | Font_style of font_style | Font_width of font_width | Encodinf of encoding then each font is put into the hashtable several time (one for each characteristic) -- Rémi Vanicat vanicat@labri.u-bordeaux.fr http://dept-info.labri.u-bordeaux.fr/~vanicat ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Multi-keyed lookup table? 2003-08-07 19:41 [Caml-list] Multi-keyed lookup table? Matt Gushee 2003-08-07 20:16 ` Brian Hurt @ 2003-08-07 22:19 ` John Max Skaller 2003-08-12 6:34 ` Florian Hars 1 sibling, 1 reply; 23+ messages in thread From: John Max Skaller @ 2003-08-07 22:19 UTC (permalink / raw) To: Matt Gushee; +Cc: caml-list Matt Gushee wrote: > Hello, all-- > > I am trying to decide on a data structure that allows efficient lookup > of fonts according to various font properties. I am thinking that the > data structures describing fonts will be something like this: > So, does anyone have an idea what sort of data structure would work > best? > > TIA for your suggestions. Since you have 'arbitrary' keying, the ideal data structure is clearly a relational database :-) I personally suggest the brain dead equivalent, assuming you have a *finite* set of fonts available: an array of all the fonts in random order, and just use a linear search. It's likely for a small number of fonts (<1000) that this is faster than any other method due to caching issues: arrays outperform ALL other data structures for small number of entries. You might optimise this in two ways: cache some results, on the basis the same request is made many times in one program. [possibly making the cache persistent] And you might be able to index the most commonly used keys, such as font-family...much like you'd do in a database. You might gain some advantage in the comparisons for equality using the == operator (physical equality) provided you can ensure equal values have the same physical representation (eg: let helvetica = "Helvetica" [.. helvetica .. ] [.. helvetica .. ] which would mean you'd have to match the incoming request against the available font-families: let bff = match ff with | "Helvetica" -> helvetica .. so you can do the comparisons like if bff == font.(i).ff ...... This is an address comparison, and is faster than a string comparison. There's a good chance the comparisons will dominate, rather than the time taken to read each entry from the database. That means an index would give you an order of magnitude speed improvement .. in theory .. but its hard to know which keys to index. I think I'd be building a model with no indexes, encapsulating it so that you can add indexes transparently later. Then profile/performance test to see where the bottlenecks are... I suspect it is client dependent (some clients will key by font-family, other may choose font class ...) -- John Max Skaller, mailto:skaller@ozemail.com.au snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Multi-keyed lookup table? 2003-08-07 22:19 ` John Max Skaller @ 2003-08-12 6:34 ` Florian Hars 2003-08-12 9:58 ` Michael Wohlwend 0 siblings, 1 reply; 23+ messages in thread From: Florian Hars @ 2003-08-12 6:34 UTC (permalink / raw) To: caml-list John Max Skaller wrote: > It's likely for a small number of fonts (<1000) that this > is faster than any other method due to caching issues: xfonsel says something like "10064 names match" on my workstation. Yours, Florian. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Caml-list] Multi-keyed lookup table? 2003-08-12 6:34 ` Florian Hars @ 2003-08-12 9:58 ` Michael Wohlwend 0 siblings, 0 replies; 23+ messages in thread From: Michael Wohlwend @ 2003-08-12 9:58 UTC (permalink / raw) To: caml-list how about using buddy-trees or tv-trees for that? It would be surely the most oversized solution (if it works) - but with nice data structures :-) maybe more interresting than the rest of the program... Michael tv-trees: http://citeseer.nj.nec.com/rd/43499374%2C17891%2C1%2C0.25%2CDownload/http://citeseer.nj.nec.com/cache/papers/cs/1096/http:zSzzSzwww.cs.umd.eduzSz~kilinzSztvtree.pdf/lin94tvtree.pdf buddy-trees: http://www.vldb.org/conf/1990/P590.PDF ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2003-08-12 9:58 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-08-07 19:41 [Caml-list] Multi-keyed lookup table? Matt Gushee 2003-08-07 20:16 ` Brian Hurt 2003-08-07 21:49 ` Yaron Minsky 2003-08-07 22:26 ` John Max Skaller [not found] ` <D4DBD8568F05D511A1C20002A55C008C11AC05E6@uswaumsx03medge.med.ge.com> 2003-08-08 21:30 ` Matt Gushee 2003-08-08 22:13 ` Brian Hurt [not found] ` <005d01c35e51$7c927200$6628f9c1@zofo> 2003-08-09 16:57 ` [Caml-list] Array.filter (was Multi-keyed lookup table?) Matt Gushee 2003-08-09 18:48 ` ijtrotts 2003-08-10 19:53 ` Michal Moskal 2003-08-10 2:34 ` ijtrotts 2003-08-11 5:48 ` David Brown 2003-08-10 18:53 ` ijtrotts 2003-08-10 20:23 ` Marcin 'Qrczak' Kowalczyk 2003-08-10 2:37 ` ijtrotts [not found] ` <200308102222.16369.qrczak@knm.org.pl> 2003-08-10 20:43 ` Michal Moskal 2003-08-10 21:59 ` Ville-Pertti Keinonen 2003-08-10 20:55 ` [Caml-list] Array.filter Jean-Baptiste Rouquier 2003-08-11 9:46 ` Michal Moskal 2003-08-10 22:29 ` [Caml-list] Array.filter (was Multi-keyed lookup table?) Shawn Wagner 2003-08-11 11:51 ` [Caml-list] Multi-keyed lookup table? Remi Vanicat 2003-08-07 22:19 ` John Max Skaller 2003-08-12 6:34 ` Florian Hars 2003-08-12 9:58 ` Michael Wohlwend
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox