From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 7FF7FBC57 for ; Wed, 3 Mar 2010 08:06:52 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjMBAK6YjUvRVdnYkGdsb2JhbACRH4IQh1UIFQEBAQEJCQwHEwMfrmmBXoUNLYhKAQEDBYJVgiEEgxeLHw X-IronPort-AV: E=Sophos;i="4.49,572,1262559600"; d="scan'208";a="45893226" Received: from mail-gx0-f216.google.com ([209.85.217.216]) by mail3-smtp-sop.national.inria.fr with ESMTP; 03 Mar 2010 08:06:51 +0100 Received: by gxk8 with SMTP id 8so370022gxk.9 for ; Tue, 02 Mar 2010 23:06:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:cc:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=OSzhmRgYIlehEGgqNohGX4sxD9avHRncavmx22Lth7I=; b=OVpYEVtXngoXqBEb9abJ4WbynjAPtLZ2IqSkK292WZH5Az67jTiQH3zJoPJJqgTXei oOpgG+OjV688jqB2egyuvAoh7fnArEuIntJwSLPawX2VvE8ygfKmYir1bV4EMqo5IQUU 55uKujKV3rxcw6OI/2QkaT5XuecD/o9sFbV+Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=cc:message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=CcxzQFeLwzquDKn1yfQITNHnmn13NQ69LRkwCWKUFCzpCZITIyohDNnUyyGboJLYN2 DaYqGXGTALbbXtCWqlGafANprOxurrTyxhtTReJOVr8HBJHNMrxjSNWEJjEVCHtthXAI 2ikLaZCMKS6+n+Zlbd+iFajuKZDd6Dw45dE30= Received: by 10.100.246.15 with SMTP id t15mr1134657anh.177.1267600009673; Tue, 02 Mar 2010 23:06:49 -0800 (PST) Received: from ?192.168.0.34? (c-24-5-88-93.hsd1.ca.comcast.net [24.5.88.93]) by mx.google.com with ESMTPS id 6sm1696131yxg.30.2010.03.02.23.06.48 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 02 Mar 2010 23:06:49 -0800 (PST) Cc: OCaml Message-Id: <613DC64E-A4C6-4C45-8A07-D59C1CAD5372@gmail.com> From: Warren Harris To: Peter Hawkins In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Subject: Re: [Caml-list] gc overhead Date: Tue, 2 Mar 2010 23:06:32 -0800 References: <37150F07-8902-464A-9A0E-44A0C424C87B@gmail.com> <20100301085430.GA16106@annexia.org> <5043CED0-7CEC-4813-8325-76C3196E8FEE@gmail.com> <401ACAED-B7C3-4E3E-862E-045DEBC2A42B@gmail.com> X-Mailer: Apple Mail (2.936) X-Spam: no; 0.00; runtime:01 ocaml:01 ocaml's:01 ocaml's:01 invocation:01 ocaml:01 avoided:01 runtime:01 warren:98 monitors:98 warren:98 callers:98 wrote:01 wrote:01 stack:01 Peter, Thanks, this is excellent info. I've been using both gprof and shark and understand the tradeoffs. I really was looking for a way to just provide a simple live "gc overhead" number that we could graph along with a bunch of other server health stats for our zenoss monitors. Looks like I'd need to hack my runtime a bit to get this though. Warren On Mar 2, 2010, at 4:55 PM, Peter Hawkins wrote: > Hi... > > On Tue, Mar 2, 2010 at 3:08 PM, Warren Harris > wrote: >> >> Peter - gprof with ocaml works quite well: >> http://caml.inria.fr/pub/docs/manual-ocaml/manual031.html > > I'm fully aware of gprof and ocaml's support of profiling. > > OCaml's profiling support works by adding calls to the _mcount library > function at the entry point to every compiled function, which takes > approximately 10 instructions on x86 (pushes and pops to save > registers, and a call instruction). The _mcount function records > function call counts, and is also responsible for producing the call > graph. Separately, the profile library samples the program counter at > some frequency, which lets us work out in which functions the program > is spending its time. > > Using OCaml's profiling support has three problems: > 1) programs compiled with profiling are slower, and > 2) the profiling instrumentation itself distorts the resulting > profile, and > 3) the call graph accounting is inaccurate. > > Let's discuss each of these in turn: > > Problem (1) is simply that your program has extra overhead from all of > those _mcount calls, which occur on every function invocation. You > can't turn them off, and you can't make them happen less frequently. > It's an all-or-nothing proposition. It would be unusual to include > profiling instrumentation in a production system. > > Problem (2) is a little more subtle. Recall that the profiling > instrumentation adds ~10 instructions to the start of each function, > regardless of its size. For a large function, this may be a negligible > overhead. For a small function, say one that was only 5 or 10 > instructions in size to begin with, that is a substantial overhead. > Since we determine how much time is spent in each function by sampling > the program counter, small and frequently called functions will appear > to take relatively longer than larger functions in the resulting > profile. Small functions are common in OCaml code so we should see an > appreciable amount of distortion. > > Problem (3) is a criticism of the _mcount mechanism in general. For > each function f(), the profiler knows (a) how long we spent executing > f() in total, and (b) how many times each of f()'s callers invoked > f(). We do not know how much time f() spent executing on behalf of any > given caller. If we assume that all of f()'s invocations took > approximately the same amount of time, then we can use the caller > counts to approximate the time spent executing f() on behalf of each > caller. However, the assumption that f() always takes approximately > the same amount of time is not necessarily a good one. I think it's an > especially bad assumption in a functional program. > > These problems are avoided by using a sampling profiler like oprofile > or shark, which samples an _uninstrumented_ binary at a particular > frequency. Because the binary is unmodified, we can turn profiling on > and off on a running system, avoiding point (1); furthermore we can > adjust the sampling rate so profiling overhead is low enough to be > tolerable. Since there is no instrumentation added to the program, the > resulting profile does not suffer from the distortion of point (2). > Some profilers (e.g. shark on Mac OS X) can deal with point (3) as > well --- all we need to do is record a complete stack trace at > sampling time. > > My point was that oprofile or one of its cousins (e.g. shark) is > probably adequate for your needs. You can set the sampling rate low > enough that your service can run more or less as normal. To determine > GC overhead, you simply need to look at the total amount of time spent > in the various GC functions of the runtime. > > Peter