From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p8KADMjU021784 for ; Tue, 20 Sep 2011 12:13:22 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhkBAPVleE7U436rimdsb2JhbABChFmjARQBAQEKCQ0HEgUjgVMBAQUjVhALDgwCJgICVwYTCYdwBKI8kiCBLIRAgREEjCeMQowT X-IronPort-AV: E=Sophos;i="4.68,410,1312149600"; d="scan'208";a="109709888" Received: from moutng.kundenserver.de ([212.227.126.171]) by mail4-smtp-sop.national.inria.fr with ESMTP; 20 Sep 2011 12:13:16 +0200 Received: from office1.lan.sumadev.de (dslb-084-059-072-098.pools.arcor-ip.net [84.59.72.98]) by mrelayeu.kundenserver.de (node=mreu4) with ESMTP (Nemesis) id 0MeptE-1QhiN23m2q-00OHIy; Tue, 20 Sep 2011 12:13:09 +0200 Received: from [192.168.5.106] (dslb-084-059-072-098.pools.arcor-ip.net [84.59.72.98]) by office1.lan.sumadev.de (Postfix) with ESMTPSA id 98E23C00C7; Tue, 20 Sep 2011 12:13:08 +0200 (CEST) From: Gerd Stolpmann To: Ian Zimmerman Cc: caml-list@inria.fr In-Reply-To: <87sjnsl2f0.fsf@foolinux.dyndns.org> References: <87sjnsl2f0.fsf@foolinux.dyndns.org> Content-Type: text/plain; charset="UTF-8" Date: Tue, 20 Sep 2011 12:13:07 +0200 Message-ID: <1316513587.16477.11.camel@thinkpad> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:mErusM53rOCf+XcIQmspUUCzkzL5aGUyfOF7Sofq9y9 bYxeukGx7ti9dzS8FLSvo4xwv3+J3q2n1UGU6ay+t+uQ8Vtp4K 67Bft8ISG1DPGMOygmvRLZLVw6t106hZeJPL6odl847Be3TRWF AJBU+F8KZziI9BwgwJv/AdUY4mldV7j7ZtUV+5qvcYQdDGMRLs x7gFzGFEfUnJAc4/EJVPSNmPnH+RL7bUcCGk3oFT1nked4/b3l Ovz/6tB0OGQXvfIsBbuM7+Xm4b54RyONWUg7IhbczUUPhkcZWA VUL2gS09L+B8iYvAoee3d7b0ZYbcZJH60HgatgSCjizSBGATPu 4Rk9G9erL120n2SbwolRG49IeVYwqVxyHCZXOfxGb Subject: Re: [Caml-list] Hashtbl or Set? Am Montag, den 19.09.2011, 13:09 -0700 schrieb Ian Zimmerman: > I need a somewhat large (thousands) set of strings, created once during > startup and never modified after. What is a better choice, a (string, > unit) Hashtbl.t or the Set module? If the Set module still uses trees > as it did when I was young :-), access will be logarithmic versus > constant for Hashtbl. But on the other hand a hash function must > examine all of every string while the comparison of 2 strings stops at > the first nonmatching character. > > I am thinking about the time to build the set as well as the probing > time. Nothing has changed so far. Besides naive Hashtbl and Set you have at least two more options: Hashtbl with self-defined hash function (i.e. one that stops after the n-th char), and Set with a hash-based comparison function (i.e. compare first the hashes of the strings, and only if the hashes are equal, compare the strings). In my experience, for a large constant set a Hashtbl usually works best. Set is usually only better when you can profit from functional updates. Gerd > > -- > Ian Zimmerman > gpg public key: 1024D/C6FF61AD > fingerprint: 66DC D68F 5C1B 4D71 2EE5 BD03 8A00 786C C6FF 61AD > Rule 420: All persons more than eight miles high to leave the court. > -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. ------------------------------------------------------------