Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed
From: "Markus Mottl" <markus.mottl@gmail.com>
To: "Loup Vaillant" <loup.vaillant@gmail.com>
Cc: "Caml mailing list" <caml-list@yquem.inria.fr>,
	ocaml-users@janestcapital.com
Subject: Re: [Caml-list] The GC is not collecting... my mistake?
Date: Tue, 6 Nov 2007 12:31:07 -0500	[thread overview]
Message-ID: <f8560b80711060931h20702a72j918c1f857faa78e1@mail.gmail.com> (raw)
In-Reply-To: <6f9f8f4a0711060451g1219a880gd711d997043b016@mail.gmail.com>

On 11/6/07, Loup Vaillant <loup.vaillant@gmail.com> wrote:
> I thought the GC could collect the first values of my streams when the
> program don't need them any more, but it doesn't seem to be the case.
> Unfortunately, I was unable to reduce my problem to a proper minimum
> example, so I send it all.

Funny, my colleagues and I are also currently investigating a space
leak in OCaml.  Here is a short example:

file: foo.ml
---------------------------------------------------------------------------------------------
let alloc () = String.create 1, String.create 2
let alloc_loop () = for i = 1 to 1_000_000 do ignore (alloc ()) done
let print_len str = Printf.printf "%d\n%!" (String.length str)
let finaliser str = Printf.printf "finalized\n%!"

let main1 () =
  let a, b = alloc () in
  Gc.finalise finaliser a;
  print_len a;
  alloc_loop ();
  print_len b

let main2 () =
  let a, b = alloc () in
  Gc.finalise finaliser a;
  print_len a;
  let b_ref = ref b in
  alloc_loop ();
  print_len !b_ref

let () = if Sys.argv.(1) = "1" then main1 () else main2 ()
---------------------------------------------------------------------------------------------

If you compile this to native code, running "foo 1" will print "1"
followed by "2".  If you run "foo 2" it will print "1", then
"finalized", then "2".  Byte code will only print "1" and "2" in any
case - hm, weird.

Obviously, OCaml does not reclaim the tuple during the allocation loop
even though it could (and IMHO should).  This can introduce
substantial space leaks as happened to us.

It seems that OCaml keeps tuples around as long as you can still
access a binding that was created by matching the tuple.  Though
deferring access to tuple elements might slightly improve performance,
because there is no need to push things on the stack, etc., it can
make reasoning about space usage quite hard.

My colleagues and I agree that the current implementation violates
user expectations.  People will generally assume that heap objects
which are only referenced by a binding will go away as soon as they
cannot be accessed anymore.  The workaround as shown above in main2
(using intermediate references) is cumbersome and obviously doesn't
work with byte code anyway.

The remaining question is how the correct behavior could be
efficiently implemented.    I have no idea how the code generator and
runtime interact, but my guess is that some simple heuristics could be
used whether to defer access to tuple elements or not:

Tuple access can always be deferred as long as the tuple as a whole
could still be referenced in the same function (e.g. in the case of a
"let x, y, z as tpl" binding, where "tpl" may still be used).

If there is no function call (after inlining) or loop between the
creation of the bindings and their last use, then the access to the
tuple fields can also be deferred.  This would be beneficial e.g. in
the presence of branches that only use some of the bound variables,
but the user wants to create the match in one place only for
convenience.

Otherwise the tuple could be copied into the stack (or even
registers), and we use this transient object for access.  As soon as
some element cannot be accessed anymore, we simply overwrite the
corresponding slot on the stack (or the register) with an atomic value
(e.g. Val_unit) to destroy the reference to the object so that the GC
can reclaim it.

I hope this provides some food for thought.  Are there any plans to
fix this problem?

Best regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


  parent reply	other threads:[~2007-11-06 17:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-06 12:51 Loup Vaillant
2007-11-06 14:46 ` [Caml-list] " Dominique Martinet
2007-11-06 17:31 ` Markus Mottl [this message]
2007-11-07  9:13   ` Loup Vaillant
2007-11-07  9:42   ` Alain Frisch
2007-11-07 15:36     ` Markus Mottl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8560b80711060931h20702a72j918c1f857faa78e1@mail.gmail.com \
    --to=markus.mottl@gmail.com \
    --cc=caml-list@yquem.inria.fr \
    --cc=loup.vaillant@gmail.com \
    --cc=ocaml-users@janestcapital.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox