From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id QAA02080; Mon, 19 Nov 2001 16:34:59 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id QAA08631 for ; Mon, 19 Nov 2001 16:34:58 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id fAJFYdb27333; Mon, 19 Nov 2001 16:34:39 +0100 (MET) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id QAA08106; Mon, 19 Nov 2001 16:34:39 +0100 (MET) Date: Mon, 19 Nov 2001 16:34:39 +0100 From: Xavier Leroy To: "Marcin 'Qrczak' Kowalczyk" Cc: caml-list@inria.fr Subject: Re: [Caml-list] Hashtbl and max_len Message-ID: <20011119163439.C7102@pauillac.inria.fr> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from qrczak@knm.org.pl on Sun, Nov 18, 2001 at 02:55:17PM +0000 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > The initial_size argument of Hashtbl.create has much bigger impact on > efficiency than I thought. > [...] > IMHO max_len shouldn't grow that much. Why does it grow at all when > the current size is not near Sys.max_array_length? The original strategy was indeed to bound the length of buckets by a constant, and resize when a bucket exceeds this length. However, this results in excessive resizing when the hashing causes collisions, or (worse!) the same key is repeatedly inserted in the hash table. Hence, as a stopgap, we decided to increase the maximal bucket length at each resizing. There are two problems with this strategy: - as you noticed, this causes the initial size to impact performance significantly; - checking for long buckets becomes linear in the size of the table, resulting in quadratic behavior for long sequences of insertions. The latter problem was the last nail in the coffin of this implementation. In the working sources, we now keep track of the size (total number of elements inserted in the table) and base resizing decisions on this. In case the hashing is good (all buckets have approximately the same length), this results in constant length for buckets. If the hashing is not good, the performances degrade more gracefully than with the old scheme. - Xavier Leroy PS. Thanks for pointing out the typos in the working implementation. The dangers of cut-and-paste programming... ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr