From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 4C029BC37 for ; Sun, 3 Jan 2010 00:16:52 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsAAAH5iP0vUnwdjjmdsb2JhbACbRQEBAQEJCwgJEQa1L4QxBA X-IronPort-AV: E=Sophos;i="4.47,490,1257116400"; d="scan'208";a="40636043" Received: from relay.pcl-ipout01.plus.net ([212.159.7.99]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 03 Jan 2010 00:16:52 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAENiP0vUnw4U/2dsb2JhbADRNIQxBA Received: from pih-relay08.plus.net ([212.159.14.20]) by relay.pcl-ipout01.plus.net with ESMTP; 02 Jan 2010 23:16:51 +0000 Received: from [87.114.35.173] (helo=leper.local) by pih-relay08.plus.net with esmtp (Exim) id 1NRDDH-0006OQ-7a for caml-list@yquem.inria.fr; Sat, 02 Jan 2010 23:16:51 +0000 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list Subject: Prevalence of duplicate references in the heap Date: Sun, 3 Jan 2010 00:31:18 +0000 User-Agent: KMail/1.9.9 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201001030031.18944.jon@ffconsultancy.com> X-Plusnet-Relay: 5def3fc768d6f924b92b8754395c57c1 X-Spam: no; 0.00; struct:01 run-time:01 pointer:01 pointer:01 hash:01 ocaml:01 hash:01 ocaml:01 frog:98 heap:01 heap:01 data:02 data:02 represented:02 idiomatic:02 I took an odd design decision with HLVM and made references a struct of run-time type, metadata (e.g. array length), pointer to mark state and pointer to data. So every reference consumes 4x32=128 bits rather than the usual 32 bits but heap-allocated values no longer require a header. My performance results really surprised me. For example, the "gc" benchmark in the HLVM test suite fills a hash table that is represented by an array spine containing references to array buckets. Despite having (fat) references everywhere, HLVM is 2.2x faster than OCaml on x86. The main disadvantage of HLVM's approach is probably that every duplicate reference now duplicates the header information, wasting 96 bits. However, I do not believe references are duplicated in the heap very often. Both trees and hash tables contain many references but none are duplicated. So I'm wondering if anyone has studied the makeup of heaps in idiomatic OCaml code and could point me to data on the proportion of the heap typically consumed by duplicate references, i.e. how much space is HLVM likely to waste? Many thanks, -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e