From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 4941FBBC1 for ; Wed, 5 Mar 2008 23:02:40 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAACanzkfUGyodi2dsb2JhbACQeQEBAQgEBgkICAkHmkM X-IronPort-AV: E=Sophos;i="4.25,452,1199660400"; d="scan'208";a="23410363" Received: from smtp3-g19.free.fr ([212.27.42.29]) by mail4-smtp-sop.national.inria.fr with ESMTP; 05 Mar 2008 23:02:40 +0100 Received: from smtp3-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp3-g19.free.fr (Postfix) with ESMTP id 6F45317B57D; Wed, 5 Mar 2008 23:02:39 +0100 (CET) Received: from [192.168.0.3] (rke75-3-82-229-183-156.fbx.proxad.net [82.229.183.156]) by smtp3-g19.free.fr (Postfix) with ESMTP id 41ADC17B568; Wed, 5 Mar 2008 23:02:39 +0100 (CET) Message-ID: <47CF1726.4090507@frisch.fr> Date: Wed, 05 Mar 2008 22:56:54 +0100 From: Alain Frisch User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: Jon Harrop Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Canonical Set/Map datastructure? References: <47CECF23.1020508@exalead.com> <47CED80A.1010504@frisch.fr> <200803052003.46517.jon@ffconsultancy.com> In-Reply-To: <200803052003.46517.jon@ffconsultancy.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; frisch:01 frisch:01 ocaml's:01 hash-consing:01 memoization:01 hash:01 hash-consing:01 marshaling:01 invariants:01 equality:01 equality:01 wrote:01 wrote:01 caml-list:01 strings:01 Jon Harrop wrote: > On Wednesday 05 March 2008 17:27:38 Alain Frisch wrote: >> Patricia trees work fine when the set elements can easily be represented >> as strings of bits. >> ... >> structural equality = physical equality = set equality > > This is very interesting. What are the disadvantages? I'm guessing: slow value > creation for small values and heavy memory consumption. I haven't done any serious benchmark comparing Patricia trees and OCaml's Set module. For the situations where we're using hash-consed Patricia trees, it is simply not an option to use balanced trees (that would turn almost linear algorithms into quadratic-or-more ones). Of course, hash-consing (and memoization of set operations) means extra tables and lookups, so there will be some overhead. Last time I checked, it was much more efficient to use regular hash tables rather than weak ones, but this might have changed since then. Another disadvantage is that hash-consing usually does not play well with marshaling (e.g. you need to manually traverse unmarshaled values to restore invariants). Alain