* The GC is not collecting... my mistake?
@ 2007-11-06 12:51 Loup Vaillant
2007-11-06 14:46 ` [Caml-list] " Dominique Martinet
2007-11-06 17:31 ` Markus Mottl
0 siblings, 2 replies; 6+ messages in thread
From: Loup Vaillant @ 2007-11-06 12:51 UTC (permalink / raw)
To: Caml mailing list
Hello,
I am trying to solve a problem with minimum space complexity, while
still being as pure as I can. I re-wrote streams (functional ones) for
that matter. My problem is that I can't accurately guess the space and
time complexity of my program.
The program depend mainly on two parameters: "l" (rather small) and
"n" (quite big). I though the time complexity of my program was O(l*n)
at worst (I may be right), and the space complexity was O(l) (I was
outright wrong).
I thought the GC could collect the first values of my streams when the
program don't need them any more, but it doesn't seem to be the case.
Unfortunately, I was unable to reduce my problem to a proper minimum
example, so I send it all.
Note: I am confident that my program is semantically correct (given
plenty of time and space). The problem it solves is taken from the
last contest in:
http://www.france-ioi.org
Thank you all,
Loup
Here is my program (the functions are not declared in the correct
order, for readability reasons):
(* MAIN PROGRAM *)
let main () =
let l = scan_int () in (* "l" is positive, rather small (100 at most) *)
let n = scan_int () (* "n" is positive, quite big (100_000 at most) *)
in
(* big function composition, applied to "n" *)
(sscan_n_int
>>> ((listify
>>> take (n-l+2)
>>> map (take l))
&&& drop l)
>>> uncurry2 combine
>>> filter (fun (l, e) ->
head l = e
&& not (exists ((<) e) l))
>>> length
>>> show_int) n
(* exec *)
let _ = main ()
(* HELPER COMBINATORS (like Haskell's arrows) *)
let (>>>) f g x = g (f x)
let (&&&) f g x = (f x, g x)
let uncurry2 f (x, y) = f x y
(* SCAN & PRINT *)
let scan_int () = Scanf.scanf " %d" (fun x -> x)
let show_int x = Printf.printf "%d\n" x
let rec sscan_n_int = function
| 0 -> empty
| n -> let i = scan_int() in
lazy(Cons (i, sscan_n_int (n-1)))
(* STREAM DEFINITION *)
type 'a cell =
| Empty
| Cons of 'a * 'a stream
and 'a stream = 'a cell lazy_t
let empty = lazy Empty
let head: 'a stream -> 'a =
fun l -> match Lazy.force l with
| Empty -> failwith "empty stream"
| Cons(x, _) -> x
let tail: 'a stream -> 'a stream =
fun l -> match Lazy.force l with
| Empty -> failwith "empty stream"
| Cons(_, l) -> l
let is_empty: 'a stream -> bool =
fun l -> match Lazy.force l with
| Empty -> true
| Cons(_, _) -> false
(* STREAM LIBRARY *)
let rec listify l =
if is_empty l then empty else
lazy (Cons (l, listify(tail l)))
let rec take n l = match n with
| n when n <= 0 -> empty
| n ->
if is_empty l then empty else
lazy (Cons (head l, take (n-1) (tail l)))
let rec drop n l = match n with
| 0 -> l
| n ->
if is_empty l then empty
else drop (n-1) (tail l)
let rec combine m l =
if is_empty m || is_empty l then empty
else lazy(Cons ((head m, head l), combine (tail m) (tail l)))
let rec map f l =
if is_empty l then empty
else lazy(Cons (f (head l), map f (tail l)))
let rec filter p l =
if is_empty l then empty
else
let hd = head l in
let tl = tail l in
if p hd
then lazy(Cons (hd, filter p tl))
else filter p tl
let rec _length acc l =
if is_empty l then acc
else _length (1 + acc) (tail l)
let length l = _length 0 l
let rec exists p l =
not (is_empty l) &&
(p (head l) || exists p (tail l))
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] The GC is not collecting... my mistake?
2007-11-06 12:51 The GC is not collecting... my mistake? Loup Vaillant
@ 2007-11-06 14:46 ` Dominique Martinet
2007-11-06 17:31 ` Markus Mottl
1 sibling, 0 replies; 6+ messages in thread
From: Dominique Martinet @ 2007-11-06 14:46 UTC (permalink / raw)
To: Loup Vaillant; +Cc: Caml mailing list
Hello,
(It did pass, I typed all of what you see next, then though it wasn't
worth it since I didn't answer your question anyway. Well, since you
though that didn't work, I might as well send it :P)
I'm not used to your way of programming (with arrows and such), and
guessing you were the one subscribe as "loup", I assume you're
considering the first Banderole problem. (my answer not fitting the
second one, heh)
Here's what I did, and what seemed the most logical to me at that time
(a plain iterative "check everything" function ; the check being
tail-recursive and dropping as soon as there's a problem). The time
computation requirement was 1s, and I had at worst 0.23, so that's
still quite usable I guess. (Time complexity should be O(l*n) worst
case and space one would be O(n) as I write everything in an array.
(That's n and not l, which is much worse :P))
I had some time at the end and planned to try the second one, but with
l=100_000 and n=2_000_000 (worst possibe thing they could ask us to
do), my program took 30 seconds to compute, and I don't think it
respects the space one either :P (using ulimit -v 2000)
let scan_int () = Scanf.scanf " %d" (fun x->x)
let show_int x = Printf.printf "%d\n" x
let _ =
let longueur,nb_pics=Scanf.scanf " %d %d" (fun x y -> x,y)
and compteur = ref 0 in
let tab_hauteurs = Array.make nb_pics 0 in
for i=0 to nb_pics - 1 do
tab_hauteurs.(i) <- scan_int ()
done;
let rec check i decallage =
if decallage < longueur
then if tab_hauteurs.(i+decallage) <= tab_hauteurs.(i)
then check i (decallage + 1)
else false
else tab_hauteurs.(i+decallage) = tab_hauteurs.(i)
in
for i=0 to nb_pics - longueur - 1 do
if check i 0 then incr compteur
done;
show_int !compteur
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] The GC is not collecting... my mistake?
2007-11-06 12:51 The GC is not collecting... my mistake? Loup Vaillant
2007-11-06 14:46 ` [Caml-list] " Dominique Martinet
@ 2007-11-06 17:31 ` Markus Mottl
2007-11-07 9:13 ` Loup Vaillant
2007-11-07 9:42 ` Alain Frisch
1 sibling, 2 replies; 6+ messages in thread
From: Markus Mottl @ 2007-11-06 17:31 UTC (permalink / raw)
To: Loup Vaillant; +Cc: Caml mailing list, ocaml-users
On 11/6/07, Loup Vaillant <loup.vaillant@gmail.com> wrote:
> I thought the GC could collect the first values of my streams when the
> program don't need them any more, but it doesn't seem to be the case.
> Unfortunately, I was unable to reduce my problem to a proper minimum
> example, so I send it all.
Funny, my colleagues and I are also currently investigating a space
leak in OCaml. Here is a short example:
file: foo.ml
---------------------------------------------------------------------------------------------
let alloc () = String.create 1, String.create 2
let alloc_loop () = for i = 1 to 1_000_000 do ignore (alloc ()) done
let print_len str = Printf.printf "%d\n%!" (String.length str)
let finaliser str = Printf.printf "finalized\n%!"
let main1 () =
let a, b = alloc () in
Gc.finalise finaliser a;
print_len a;
alloc_loop ();
print_len b
let main2 () =
let a, b = alloc () in
Gc.finalise finaliser a;
print_len a;
let b_ref = ref b in
alloc_loop ();
print_len !b_ref
let () = if Sys.argv.(1) = "1" then main1 () else main2 ()
---------------------------------------------------------------------------------------------
If you compile this to native code, running "foo 1" will print "1"
followed by "2". If you run "foo 2" it will print "1", then
"finalized", then "2". Byte code will only print "1" and "2" in any
case - hm, weird.
Obviously, OCaml does not reclaim the tuple during the allocation loop
even though it could (and IMHO should). This can introduce
substantial space leaks as happened to us.
It seems that OCaml keeps tuples around as long as you can still
access a binding that was created by matching the tuple. Though
deferring access to tuple elements might slightly improve performance,
because there is no need to push things on the stack, etc., it can
make reasoning about space usage quite hard.
My colleagues and I agree that the current implementation violates
user expectations. People will generally assume that heap objects
which are only referenced by a binding will go away as soon as they
cannot be accessed anymore. The workaround as shown above in main2
(using intermediate references) is cumbersome and obviously doesn't
work with byte code anyway.
The remaining question is how the correct behavior could be
efficiently implemented. I have no idea how the code generator and
runtime interact, but my guess is that some simple heuristics could be
used whether to defer access to tuple elements or not:
Tuple access can always be deferred as long as the tuple as a whole
could still be referenced in the same function (e.g. in the case of a
"let x, y, z as tpl" binding, where "tpl" may still be used).
If there is no function call (after inlining) or loop between the
creation of the bindings and their last use, then the access to the
tuple fields can also be deferred. This would be beneficial e.g. in
the presence of branches that only use some of the bound variables,
but the user wants to create the match in one place only for
convenience.
Otherwise the tuple could be copied into the stack (or even
registers), and we use this transient object for access. As soon as
some element cannot be accessed anymore, we simply overwrite the
corresponding slot on the stack (or the register) with an atomic value
(e.g. Val_unit) to destroy the reference to the object so that the GC
can reclaim it.
I hope this provides some food for thought. Are there any plans to
fix this problem?
Best regards,
Markus
--
Markus Mottl http://www.ocaml.info markus.mottl@gmail.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] The GC is not collecting... my mistake?
2007-11-06 17:31 ` Markus Mottl
@ 2007-11-07 9:13 ` Loup Vaillant
2007-11-07 9:42 ` Alain Frisch
1 sibling, 0 replies; 6+ messages in thread
From: Loup Vaillant @ 2007-11-07 9:13 UTC (permalink / raw)
To: Markus Mottl; +Cc: Caml mailing list, ocaml-users
2007/11/6, Markus Mottl <markus.mottl@gmail.com>:
> On 11/6/07, Loup Vaillant <loup.vaillant@gmail.com> wrote:
> > I thought the GC could collect the first values of my streams when the
> > program don't need them any more, but it doesn't seem to be the case.
> > Unfortunately, I was unable to reduce my problem to a proper minimum
> > example, so I send it all.
>
> Funny, my colleagues and I are also currently investigating a space
> leak in OCaml. Here is a short example:
>
> (* snip *)
>
> Obviously, OCaml does not reclaim the tuple during the allocation loop
> even though it could (and IMHO should). This can introduce
> substantial space leaks as happened to us.
Ouch. I wonder if this problem is related to mine (there are tuples in
my code). If so, a non disposed tuple may prevent the disposal of my
entire stream...
(Unless I Did make a mistake about the space complexity of my program.)
Thank you,
Loup
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] The GC is not collecting... my mistake?
2007-11-06 17:31 ` Markus Mottl
2007-11-07 9:13 ` Loup Vaillant
@ 2007-11-07 9:42 ` Alain Frisch
2007-11-07 15:36 ` Markus Mottl
1 sibling, 1 reply; 6+ messages in thread
From: Alain Frisch @ 2007-11-07 9:42 UTC (permalink / raw)
To: Markus Mottl; +Cc: Loup Vaillant, ocaml-users, Caml mailing list
Markus Mottl wrote:
> If you compile this to native code, running "foo 1" will print "1"
> followed by "2". If you run "foo 2" it will print "1", then
> "finalized", then "2". Byte code will only print "1" and "2" in any
> case - hm, weird.
For bytecode, this is an expected behavior: the back-end does not emit
any information that would allow the runtime system to know which value
on the stack is still live (at each possible GC point). The behavior in
native code is an optimization which lets the GC reclaim more memory,
but AFAIK, this is not specified. The safe assumption is that a value
identifier forces the value to remain live in all its syntactic scope
(this is not a formal definition; for instance, a tail call terminates
the scope of identifiers defined at the call site).
> Obviously, OCaml does not reclaim the tuple during the allocation loop
> even though it could (and IMHO should). This can introduce
> substantial space leaks as happened to us.
I agree this might be surprising, but since I don't see the behavior
changing for bytecode anyway, I don't think it is worth dealing with
this case in native code (any program that relies on the improved
behavior you ask for would have an unexpected behavior in bytecode).
The proper solution might be to reflect in the syntactic scope your
desire to see some value reclaimed:
let main2 () =
let b =
let a, b = alloc () in
Gc.finalise finaliser a;
print_len a;
b
in
(* Here a is no longer visible and can thus be reclaimed. *)
alloc_loop ();
print_len b
This works in bytecode as well.
Alain
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Caml-list] The GC is not collecting... my mistake?
2007-11-07 9:42 ` Alain Frisch
@ 2007-11-07 15:36 ` Markus Mottl
0 siblings, 0 replies; 6+ messages in thread
From: Markus Mottl @ 2007-11-07 15:36 UTC (permalink / raw)
To: Alain Frisch; +Cc: Loup Vaillant, Caml mailing list
On 11/7/07, Alain Frisch <alain@frisch.fr> wrote:
> The safe assumption is that a value
> identifier forces the value to remain live in all its syntactic scope
> (this is not a formal definition; for instance, a tail call terminates
> the scope of identifiers defined at the call site).
The situation is actually even worse than that, e.g.:
let _, b = expr in
...
In the case above the first element of the tuple and the tuple itself
that "expr" evaluate to will remain live until "b" is accessed even
though they are not bound to names. This is certainly very
unintuitive behavior. One would at least expect that values that
never get bound and aren't reachable through other bound values aren't
considered live.
> I agree this might be surprising, but since I don't see the behavior
> changing for bytecode anyway, I don't think it is worth dealing with
> this case in native code (any program that relies on the improved
> behavior you ask for would have an unexpected behavior in bytecode).
At least what concerns us we are not overly concerned about optimal
behavior in byte code, because everybody that cares about performance
(also memory-wise) uses native code anyway. I personally wouldn't
mind if byte code behaved differently wrt. GC-behavior.
> The proper solution might be to reflect in the syntactic scope your
> desire to see some value reclaimed:
True, but this quickly becomes cumbersome, and the user may easily
forget to do it in all places. Space leaks are generally extremely
hard to spot.
It seems like an easy thing to do to implement the intuitive notion
that a bound variable is considered live in an expression if it occurs
freely in this expression. This is still not perfect, because
liveness analysis is generally undecidable. But it surely comes close
to the heuristics that humans probably use to reason about liveness in
their code and would thus not violate their expectations.
Regards,
Markus
--
Markus Mottl http://www.ocaml.info markus.mottl@gmail.com
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-11-07 15:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-06 12:51 The GC is not collecting... my mistake? Loup Vaillant
2007-11-06 14:46 ` [Caml-list] " Dominique Martinet
2007-11-06 17:31 ` Markus Mottl
2007-11-07 9:13 ` Loup Vaillant
2007-11-07 9:42 ` Alain Frisch
2007-11-07 15:36 ` Markus Mottl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox