From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id A5E35D565 for ; Wed, 27 Jul 2005 18:04:23 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j6RG4Ntb012166 for ; Wed, 27 Jul 2005 18:04:23 +0200 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA24901 for ; Wed, 27 Jul 2005 18:04:22 +0200 (MET DST) Received: from wetware.wetware.com (wetware.wetware.com [209.218.58.1]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j6RG4L9s012158 for ; Wed, 27 Jul 2005 18:04:22 +0200 Received: from [69.12.155.90] (helo=[10.0.1.5]) by wetware.wetware.com with esmtp (Exim 4.43) id 1DxoOK-0003Mn-L6 for caml-list@inria.fr; Wed, 27 Jul 2005 09:04:20 -0700 Mime-Version: 1.0 (Apple Message framework v733) In-Reply-To: <200507271012.02020.jon@ffconsultancy.com> References: <200507271012.02020.jon@ffconsultancy.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <133225B6-7E0E-4FAE-9EFA-4AE9650BDD3E@wetware.com> Content-Transfer-Encoding: 7bit From: james woodyatt Subject: Re: [Caml-list] Set union/inter/diff efficiency Date: Wed, 27 Jul 2005 09:04:19 -0700 To: Ocaml Trade X-Mailer: Apple Mail (2.733) X-Miltered: at concorde with ID 42E7B087.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 42E7B085.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; woodyatt:01 jhw:01 wetware:01 caml-list:01 implements:01 ocaml:01 heap:01 heap:01 seq:01 seq:01 sequences:01 intersection:01 woodyatt:01 jhw:01 wetware:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On 27 Jul 2005, at 02:12, Jon Harrop wrote: > > Does anyone have any ideas or references on how the union/inter/ > diff functions > of the Set module could be optimised by accepting a sequence of > sets rather > than a pair at a time? For example, if A overlaps B overlaps C but > A does not > overlap C then it is probably quicker to compute the union "(A U C) > U B" > rather than "A U B U C". > > Better still, does anyone have a replacement Set module which > implements this > functionality? No, but you could maybe make an extension more easily using my OCaml NAE core foundation library. Here is the pseudo-code for set union that I would try: Make a heap of sets [Cf_heap.of_seq]. Map into a sequence of sets [Cf_heap.to_seq]. Map into a sequence of element sequences [Cf_seq.map, Cf_set.to_seq_incr]. Load into a queue. While queue is not empty, Take an element sequence from the queue. Take an element from the head of the sequence. If there is no output yet, or the element is greater than current output, then Output the element If the element sequence tail is not empty, then Push the element sequence tail onto the queue End while You could do similar things for difference and intersection. I'm not optimistic that this will actually improve performance. Beating the implementation in the standard library is tricky and harder than one might think. -- j h woodyatt markets are only free to the people who own them.