From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 8B629BC6B for ; Thu, 29 Nov 2007 20:06:51 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAO+bTkfUnw6Dnmdsb2JhbACCOY0GAgEBBwQGKQ X-IronPort-AV: E=Sophos;i="4.23,230,1194217200"; d="scan'208";a="6321769" Received: from pih-relay04.plus.net ([212.159.14.131]) by mail3-smtp-sop.national.inria.fr with ESMTP; 29 Nov 2007 20:06:51 +0100 Received: from [80.229.56.224] (helo=beast.local) by pih-relay04.plus.net with esmtp (Exim) id 1Ixoip-0003EX-Fu; Thu, 29 Nov 2007 19:06:51 +0000 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: Jean-Christophe =?iso-8859-1?q?Filli=E2tre?= Subject: Re: [Caml-list] Help with simple ocaml memoization problem Date: Thu, 29 Nov 2007 18:57:43 +0000 User-Agent: KMail/1.9.5 References: <200711290811.29007.jon@ffconsultancy.com> <474E7F2B.6090007@lri.fr> In-Reply-To: <474E7F2B.6090007@lri.fr> Cc: caml-list@yquem.inria.fr MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200711291857.43917.jon@ffconsultancy.com> X-Spam: no; 0.00; ocaml:01 memoization:01 filliatre:01 ocaml:01 stdlib:01 hash:01 hash:01 red-black:01 stdlib:01 integers:01 unrolling:01 node:01 nodes:01 30%.:98 height:98 On Thursday 29 November 2007 08:58, Jean-Christophe Filli=E2tre wrote: > Jon Harrop wrote: > > The Map implementation in the OCaml stdlib is also quite inefficient. I > > did a little benchmark once and discovered that Maps actually waste more > > space than Hashtbls. > > I find it unfair to compare an imperative and a persistence data > structure for performances. I agree. > Of course you are going to use some extra=20 > space if you need to keep old versions of the data stuctures valid. > But you are sharing *a lot* among the various versions. So if you are > manipulating several sets/maps with common ancestors at the same time, > you are saving memory w.r.t. other data structures such as hash tables. True, my benchmark was a drop-in replacement with no sharing. > Of course, if you are using a single data structure, in a linear way, > then yes a hash table is probably more efficient (provided you have a > good hash function, which is not always easy to achieve). > > Regarding implementation of ocaml maps, I wouldn't say that it is > inefficient: I did my own benchmarls (on sets, but this is the same > code) and found that ocaml AVLs are really efficient, on the contrary. > It usually beats other implementations (e.g. red-black trees from the > SML stdlib), or even specialized structures such as Patricia trees (when > keys are integers) on some operations. I found that by manually unrolling with a Node1 constructor for single-elem= ent=20 nodes you can reduce GC load and increase performance by ~30%. Perhaps "badly optimized" would have been a better phrase. For example, the= =20 Map implementation in the OCaml stdlib manually inlines the height function= =20 even thought it makes relatively little difference to performance: GC load = is=20 the real killer for most immutable data structures. =2D-=20 Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?e