From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 4FCAFBC6C for ; Wed, 14 Nov 2007 13:55:41 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAC9/OkfAXQImh2dsb2JhbACPAAEICik X-IronPort-AV: E=Sophos;i="4.21,416,1188770400"; d="scan'208";a="19278833" Received: from discorde.inria.fr ([192.93.2.38]) by mail4-smtp-sop.national.inria.fr with ESMTP; 14 Nov 2007 13:55:40 +0100 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id lAECtHP9031396 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Wed, 14 Nov 2007 13:55:40 +0100 X-IronPort-AV: E=Sophos;i="4.21,416,1188770400"; d="scan'208";a="4227534" Received: from estephe.inria.fr (HELO [128.93.11.95]) ([128.93.11.95]) by mail2-relais-roc.national.inria.fr with ESMTP; 14 Nov 2007 13:55:40 +0100 Message-ID: <473AF04C.7030107@inria.fr> Date: Wed, 14 Nov 2007 13:55:40 +0100 From: Xavier Leroy User-Agent: Thunderbird 1.5.0.7 (X11/20060915) MIME-Version: 1.0 To: Vladimir Shabanov Cc: caml-list@inria.fr Subject: Re: [Caml-list] OCaml runtime using too much memory in 64-bit Linux References: <4731F5D1.2070405@janestcapital.com> <1194459622.15728.94.camel@localhost.localdomain> <47320E10.1050307@janestcapital.com> <200711140520.46016.romain.beauxis@gmail.com> <8ef825670711140403m464c01detff5537abb2211946@mail.gmail.com> In-Reply-To: <8ef825670711140403m464c01detff5537abb2211946@mail.gmail.com> X-Enigmail-Version: 0.94.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Miltered: at discorde with ID 473AF035.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 runtime:01 run-time:01 malloc:01 allocating:01 distros:01 hash:01 ocaml's:01 pointers:01 ocaml:01 equality:01 polymorphic:01 polymorphic:01 heap:01 heap:01 Hello, Concerning this issue with large page tables on 64-bit architectures, I opened a problem report on the bug-tracking system to help gather more information. I'd like to ask all members of this list that reported the problem to kindly visit http://caml.inria.fr/mantis/view.php?id=4448 and add the required information as a note. That will help pinpointing the problem. Some more explanation on what's going on. The Caml run-time system needs to keep track of which memory areas belong to the major heap, and uses a page table for this purpose, with a dense representation (an array of bytes). If the major heap areas are closely spaced, this table is very small compared with the size of the heap itself. However, if these areas are widely spaced in the addressing space, the table can get big. For 32-bit platforms, this isn't much of a problem since the maximum size of the page table is 1 megabytes. For 64-bit platforms, the sky is the limit, however. So far, the only 64-bit platform where this has been a problem in the past is Linux with glibc, where blocks allocated by malloc() can come either from sbrk() or mmap(), two areas that are spaced several *exa*bytes apart. We worked around the problem by allocating all major heap areas directly with mmap(), obtaining closely spaced addresses. Apparently, this trick is no longer working on some systems, but I need to understand better which ones exactly. (I suspect some Linux distros that applied address randomization patches to the stock Linux kernel.) So, please provide feedback in the BTS. If the problem is confirmed, there are several ways to go about it. One is to implement the page table with a sparse data structure, e.g. a hash table. However, the major GC and some primitives like polymorphic equality perform *lots* of page table lookups, so a performance hit is to be expected. The other is to revise OCaml's data representations so that the GC and polymorphic primitives no longer need to know which pointers fall in the major heap. This seems possible in principle, but will take quite a bit of work and break a lot of badly written C/OCaml interface code. You've been warned :-) - Xavier Leroy