* GC with finalisation?
@ 1999-08-24 3:36 John Skaller
1999-08-26 21:09 ` Markus Mottl
1999-08-27 7:13 ` STARYNKEVITCH Basile
0 siblings, 2 replies; 7+ messages in thread
From: John Skaller @ 1999-08-24 3:36 UTC (permalink / raw)
To: caml-list
I could use some advice on this. I'm implementing
a Python interpreter/compiler in Ocaml. Python class
instances support __del__ methods -- that is, destructors.
CPython uses reference counting for collection and finalisation.
JPython leverages Java's collector and finalisation semantics.
My current ocaml implementation leverages the ocaml collector,
which doesn't support finalisation (as far as I know).
I seem to have several choices.
1) implement reference counting for finalisation
* disadvantage: same problem as CPython: doesn't resolve
circular references properly
* disadvantage: overhead and coding complexity
* advantage: synchronous finalisation
2) Implement a full resource collector
* disadvantage: overhead and complexity
* disadvantage: probably won't support synchronous finalisation
3) provide BOTH reference counting and a collector (as Perl does)
* disadvantage: overhead and complexity
* advantage: finalises everything
* advantage: synchonous finalisation in all cases
that CPython provides it
I am interested in any comments on how to solve this problem.
Order of finalisation can be important in Python.
I have a major literate programming tool (Interscript) written in Python,
the compiler/interpreter is being built to make it go faster.
I use finalisation semantics extensively (tables of information
are generated at the end of document processing, etc).
While I might be able to rewrite it to avoid automatic finalisation,
it would be nice to support the wide class of other Python programs
that rely on it.
-------------------------------------------------------
John Skaller email: skaller@maxtal.com.au
http://www.maxtal.com.au/~skaller
phone: 61-2-96600850
snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GC with finalisation?
1999-08-24 3:36 GC with finalisation? John Skaller
@ 1999-08-26 21:09 ` Markus Mottl
1999-08-27 5:10 ` John Skaller
1999-08-27 7:13 ` STARYNKEVITCH Basile
1 sibling, 1 reply; 7+ messages in thread
From: Markus Mottl @ 1999-08-26 21:09 UTC (permalink / raw)
To: John Skaller; +Cc: OCAML
> I seem to have several choices.
>
> 1) implement reference counting for finalisation
>
> * disadvantage: same problem as CPython: doesn't resolve
> circular references properly
>
> * disadvantage: overhead and coding complexity
>
> * advantage: synchronous finalisation
The main problem with reference counting is speed. It is possible to
resolve circular references in linear time, but the constant overhead
and the associated memory requirements are simply too large.
Anyway, there is a nice fortune cookie pointing out potential problems
with recursive datastructures when doing reference counting:
One day a student came to Moon and said, "I understand how to make
a better garbage collector. We must keep a reference count of the
pointers to each cons."
Moon patiently told the student the following story -- "One day a
student came to Moon and said, "I understand how to make a better
garbage collector..."
[snip GC approaches]
I would leave all the work to the OCaml-GC: use module "Weak" to register
values (eg.: objects) to be finalized. Then check these arrays from
time to time and call appropriate finalization functions for objects
that have been reclaimed.
Regards,
Markus
--
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GC with finalisation?
1999-08-26 21:09 ` Markus Mottl
@ 1999-08-27 5:10 ` John Skaller
0 siblings, 0 replies; 7+ messages in thread
From: John Skaller @ 1999-08-27 5:10 UTC (permalink / raw)
To: Markus Mottl; +Cc: OCAML
At 22:09 26/08/99 +0100, Markus Mottl wrote:
>> I seem to have several choices.
>> 1) implement reference counting for finalisation
>The main problem with reference counting is speed.
I was hoping to avoid it ..
>It is possible to
>resolve circular references in linear time, but the constant overhead
>and the associated memory requirements are simply too large.
I think this is less of a problem than actually
writing the collector, and then changing all my code
to use it.
>I would leave all the work to the OCaml-GC: use module "Weak" to register
>values (eg.: objects) to be finalized. Then check these arrays from
>time to time and call appropriate finalization functions for objects
>that have been reclaimed.
That is an interesting idea -- but it won't work,
because the destructor must be called _before_ the object
is destroyed by the system. For example, one of my objects
maintains the table of contents for a document,
and when the object becomes unreachable, it is printed
to a file as HTML. Obviously, the object still has to exist
at this point, and retain all attributes.
However, your idea of using weak arrays may
be useful after all, if I can create a 'symbol' for the
object, and put that into the weak array: the symbols
contains lists of symbols, representing objects the
object depends on. The collector will now collect
symbols, and by checking the weak array I can
execute destructors, and then release it
for collection (by removing it from a corresponding
'strong' array).
The problem is: it will be slower than
reference counting, and not be entirely synchronous.
The advantage is it will finalise unreachable
objects with circular references. Also,
reference counting can be added anyhow,
to make non-circular finalisation synchronous.
Hmm. Comments?
-------------------------------------------------------
John Skaller email: skaller@maxtal.com.au
http://www.maxtal.com.au/~skaller
phone: 61-2-96600850
snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GC with finalisation?
1999-08-24 3:36 GC with finalisation? John Skaller
1999-08-26 21:09 ` Markus Mottl
@ 1999-08-27 7:13 ` STARYNKEVITCH Basile
1999-08-28 4:29 ` John Skaller
1999-08-28 21:49 ` Markus Mottl
1 sibling, 2 replies; 7+ messages in thread
From: STARYNKEVITCH Basile @ 1999-08-27 7:13 UTC (permalink / raw)
To: John Skaller; +Cc: caml-list
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1919 bytes --]
>>>>> "John" == John Skaller <skaller@maxtal.com.au> writes:
John> I could use some advice on this. I'm implementing a Python
John> interpreter/compiler in Ocaml. Python class instances
John> support __del__ methods -- that is, destructors.
3 choices are then proposed. I suggest a fourth one:
4) patch the existing ocaml 2.02 garbage collector to allow finalized
objects containing (for example) a single Ocaml value:
this is not very difficult; you'll have to create a new tag, i.e. add
#define Finalwithvalue_tag 249
to mlvalues.h
then grep for Final_tag in the source, and add the case for
Finalwithvalue_tag (which should handle as value its second word
after the header, the first being the finalizer).
Notice that ocaml finalizer (including those alreday provided thru
Final_tag) should *not* allocate any value in the Ocaml heap (in
contrast with Java finalizers).
Your python finalizer could, for example, add the just finalized
Python object to a list -which is malloc-ed and free-d in C- (or
perhaps send it on a pipe to yourself) which should be scanned at
appropriate moments.
On the other hand, I consider that finalized objects supported by the
language (such as provided by Java & Python) are a mistake. In my
opinion, finalized objects should not have a user-provided finalizer,
but should only deal with system resources (file descriptors, X11
windows, coded only in C)
N.B. Any opinions expressed here are only mine, and not of my organization.
N.B. Les opinions exprimees ici me sont personnelles et n engagent pas le CEA.
---------------------------------------------------------------------
Basile STARYNKEVITCH ---- Commissariat à l Energie Atomique
DTA/LETI/DEIN/SLA * CEA/Saclay b.528 (p111f) * 91191 GIF/YVETTE CEDEX * France
phone: 1,69.08.60.55; fax: 1.69.08.83.95 home: 1,46.65.45.53
email: Basile point Starynkevitch at cea point fr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GC with finalisation?
1999-08-27 7:13 ` STARYNKEVITCH Basile
@ 1999-08-28 4:29 ` John Skaller
1999-08-28 21:49 ` Markus Mottl
1 sibling, 0 replies; 7+ messages in thread
From: John Skaller @ 1999-08-28 4:29 UTC (permalink / raw)
To: STARYNKEVITCH Basile; +Cc: caml-list
At 09:13 27/08/99 +0200, STARYNKEVITCH Basile wrote:
>>>>>> "John" == John Skaller <skaller@maxtal.com.au> writes:
>
> John> I could use some advice on this. I'm implementing a Python
> John> interpreter/compiler in Ocaml. Python class instances
> John> support __del__ methods -- that is, destructors.
>
>3 choices are then proposed. I suggest a fourth one:
>
> 4) patch the existing ocaml 2.02 garbage collector to allow finalized
> objects containing (for example) a single Ocaml value:
>
>this is not very difficult;
I had thought of that, but I didn't include it in my list
for two reasons:
1) I do not have the experience to undertake this task
without losing confidence in the resulting system. I have that
confidence now because I know the existing collector was written
by experts and has been tested by many users.
2) It will mean my clients cannot build my product,
without installing and compiling a patched version of Ocaml:
bytecode versions won't work either.
>you'll have to create a new tag, i.e. add
>
> #define Finalwithvalue_tag 249
>
>to mlvalues.h
>
> then grep for Final_tag in the source, and add the case for
> Finalwithvalue_tag (which should handle as value its second word
> after the header, the first being the finalizer).
>
>Notice that ocaml finalizer (including those alreday provided thru
>Final_tag) should *not* allocate any value in the Ocaml heap (in
>contrast with Java finalizers).
I see. This could be a problem in the first instance,
since the __del__ methods of Python class instances can
execute arbitrary code, invoking my ocaml Python interpreter:
this will certainly cause heap allocations.
[This does not rule out a solution using indirection]
>On the other hand, I consider that finalized objects supported by the
>language (such as provided by Java & Python) are a mistake.
I have a strong tendancy to agree with you; but I have
a problem implementing a language which does support automatic
finalisation, which I myself have used heavily. So I need
some kind of solution anyhow.
I have a tendancy _not_ to patch ocaml for just this reason:
however, I do wonder how other people handle finalisation issues;
including (but not limited to) releasing system resources.
>In my opinion, finalized objects should not have a user-provided finalizer,
>but should only deal with system resources (file descriptors, X11
>windows, coded only in C)
That has the disadvantage that it only handles system resources
for which you have a suitable implementation. It makes sense to
create 'user defined' resources, and manage them in ocaml.
It would seem that a module defining a 'generic' collector
with finalisation could help solve this and other related problems;
without compromising the ocaml memory collector.
Such a collector would have to be 'layered ontop' of
a system, instead of leveraging the ocaml collector, but that
would probably be an advantage in many ways: collecting
non-memory resources may well require different semantics
than optimised memory collection.
-------------------------------------------------------
John Skaller email: skaller@maxtal.com.au
http://www.maxtal.com.au/~skaller
phone: 61-2-96600850
snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GC with finalisation?
1999-08-27 7:13 ` STARYNKEVITCH Basile
1999-08-28 4:29 ` John Skaller
@ 1999-08-28 21:49 ` Markus Mottl
1 sibling, 0 replies; 7+ messages in thread
From: Markus Mottl @ 1999-08-28 21:49 UTC (permalink / raw)
To: STARYNKEVITCH Basile; +Cc: OCAML
> On the other hand, I consider that finalized objects supported by the
> language (such as provided by Java & Python) are a mistake. In my
> opinion, finalized objects should not have a user-provided finalizer,
> but should only deal with system resources (file descriptors, X11
> windows, coded only in C)
I once asked for finalizers (for objects) in this list, but I was pointed
to problems which I hadn't thought about: for example, user defined
finalizers might place the object to be reclaimed in a (global) reference
again - what is the garbage collector supposed to do in such cases?
Other kinds of side effects (eg. calling methods of other objects)
could lead to similar problems.
Another argument against finalization is the following: if your program
enters a very time-consuming loop in which it doesn't allocate anything,
the garbage collector will not be called and the finalization of
values/objects might be delayed for a long time.
Actually, this is even a problem in languages with stack allocated values:
as long as the value is not out of scope, the destructor will not be
called (unless the compiler manages to see that the object can be safely
finalized and deallocated earlier - which is probably more than tough).
Concurrent environments (eg. when doing network communication) may be
adversly effected by such delays.
Developers of libraries have no control about this, because this problem
depends on the way the programmer makes use of values/objects.
There are other means to "finalize" values: instead of using
destructors/finalizers, you could provide for higher-order functions
that do the job.
Eg. you want that a network connection is opened, used in some way and
closed again (but immediately!). This could look as follows:
let do_connection f =
let con = open_connection (...) in
f con;
close_connection con
If you provide only such functions to the user, he doesn't have to care
about closing connections or do some other finalization and he still
has the guarantee that connections are closed immediately - which might
allow the server on the other side of the connection to do something
more useful than wait...
Regards,
Markus Mottl
--
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GC with finalisation?
[not found] <3.0.6.32.19990830144523.00964e20@mail.triode.net.au>
@ 1999-08-30 12:31 ` Markus Mottl
0 siblings, 0 replies; 7+ messages in thread
From: Markus Mottl @ 1999-08-30 12:31 UTC (permalink / raw)
To: John Skaller; +Cc: OCAML
> >No, you need not care about references, because this will be done by the
> >GC of OCaml.
>
> We must be firing past each other. I have to 'model'
> all the references one python object has to other python
> objects using the Syms. No python object will be collected
> directly by ocaml, since they're all registered in a strong
> array, just to prevent this!
It's not the objects which are collected - it's their "Syms". The objects
may exist anywhere - but must exist in the finalizing closures of the
strong array.
> Instead, every binding/unbinding of references
> in the python objects has to be modelled using the Syms,
> so that inspecting the weak array of Syms, which ARE
> collected by ocaml, will allow me to invoke the finaliser,
> and THEN remove the final strong reference, to allow the
> now finalised object to be collected by ocaml.
>
> I can't just make a weak array of Syms, without
> the Syms refering to each other, because they'd
> all get collected immediately!
Where is the problem when several Syms get collected at once? As long
as objects (better: their handlers) get registered in the right order,
they will be finalized in the right order: simply, because the objects
contain a "Sym" to other objects (via the closure!) and thus, not
even the handlers of the other objects will be reclaimed before the
containing object!
This can, of course, lead to cycles which will never be reclaimed by
the OCaml-GC when the objects in the closures reference themselves
mutually. But this can be easily resolved by a "meta-GC":
You finalize all objects of reclaimed handlers. Then you put all
objects that are still available into a set. Now you traverse all
reachable objects from the root and fold'n'filter them out of the
set. The remaining set contains all objects which should be reclaimed
but reference themselves mutually -> these can now be removed from the
weak array and be finalized...
This last and a bit more time consuming pass (finding of cyclic
structures) needs not be conducted every time you do "normal"
finalization. You could use a "countdown" until this "cycle collection"
is launched.
> Right. And the 'Syms' will depend only on other Syms:
>
> type Sym = Sym list ref;;
I have rather thought about something like:
type 'a sym = Sym of 'a
class some_python_class = object
method finalize = print_endline "Finalized!"
end
let weak_ar = Weak.create 100
let strong_ar = Array.create 100 (Obj.magic 0)
let ar_size = ref (-1)
let register () =
let my_obj = new some_python_class in
let obj_handler = Sym my_obj in
Weak.set weak_ar 0 (Some obj_handler);
Array.set strong_ar 0 (fun _ -> my_obj#finalize);
incr ar_size;
obj_handler
let finalize () =
for i = 0 to !ar_size do
if not (Weak.check weak_ar i) then strong_ar.(i) ()
done
let _ =
ignore (register ()); (* Discards object_handler *)
Gc.full_major ();
finalize ()
This is a fast hack - eg. "finalize" doesn't remove the closure after
finalization, etc...
But the main idea should be clear. If you "register" the objects in the
order of their creation (naturally), they will also be finalized in the
right order if you traverse the weak array.
Note: if you remove the "ignore" in the main function and replace it by
some binding, the object will not get finalized! - This proves that it
is really due to the garbage collection that finalization happens.
The "full major collection" is called to force the object handler (Sym)
to be reclaimed. You need not do this during normal execution, but you
will have to before the program exits as it immediately does in this
small example.
Ah, and you could even catch exceptions in function "finalize"!
> >This guarantees that reclaimed elements of the weak array can be finalized
> >at once when traversing the array.
>
> The dependencies change dynamically, so it is not
> possible to order the Sym array like this: the ordering is not
> invariant.
The dependencies exist through object handlers - as long as the "real
object" is not reclaimed (it won't be as long as there is a closure
containing it), it will not "release" depending objects (better: their
"Sym"-handle). Thus, it is guaranteed that finalization respects this
order. Time of object creation is "only" the second "finalization order
criterion"...
Regards,
Markus Mottl
--
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~1999-08-30 13:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-08-24 3:36 GC with finalisation? John Skaller
1999-08-26 21:09 ` Markus Mottl
1999-08-27 5:10 ` John Skaller
1999-08-27 7:13 ` STARYNKEVITCH Basile
1999-08-28 4:29 ` John Skaller
1999-08-28 21:49 ` Markus Mottl
[not found] <3.0.6.32.19990830144523.00964e20@mail.triode.net.au>
1999-08-30 12:31 ` Markus Mottl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox