[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Hash Tables and GC



   Date: Sun, 11 Sep 88 13:12 EDT
   From: David L. Andre <DLA@JASPER.SCRC.Symbolics.COM>

       Date: Tue, 6 Sep 88 19:50 PDT
       From: Gregor.pa@Xerox.COM

       As I understand it, finalization is a mechanism for recording a function
       which the garbage collector should call on an object when it is about to
       be GC'd.

       I find it surprising that no one has talked about finalization during
       this discussion.  This may not be facility we want to add to Common
       Lisp, but as near as I can tell, weak pointers and finalization are the
       primitives you need to build these kind of hash tables.

       In addition, finalization is a useful mechanism to have direct access
       to.

  ... the function has to be called on an object which is in oldspace but it
   cannot migrate the object to copyspace before examining it.

  ...

   I understand how to implement this feature, but I don't see how to do it
   in an implemention-independent fashion in a copying garbage collector.
   This is significant since most garbage collectors on the market today
   are copying garbage collectors.  I also don't understand how to make it
   safe enough to be a language feature.

There is a single answer to both those questions:  Don't treat the
object any differently under after the finalization function has
run.  This means that unfinalized objects get moved to copyspace,
and so take two GC cycles to free.  (This is an acceptable cost,
to my mind:  You just pay double for finalizable memory.)

Here's how it would work from the client's point of view:  Finalizable
objects have a single bit of client-alterable state FINALIZABLE-P.  When
the GC is about to collect an object with this bit turned on, it instead
preserves the object in a central place.  After the GC has finished
running, all such objects get their FINALIZABLE-P bits turned off,
and they are then passed to their various finalization functions.
Those functions, in turn are free to manipulate the FINALIZABLE-P
bit; if after finalization the bit is turned off, and no references
remain, the next GC will treat the object like any other piece of
garbage.

Note that this model does not refer to exact nature of the underlying GC
technology.  It simply assumes there is a simple way to implement the
FINALIZABLE-P bit.  Note that if another GC happens while a finalization
function is running, even though the object's FINALIZABLE-P bit is
turned off, the object is retained, since it is referenced from the
stack.

It is not obvious to me when weak pointers to objects undergoing
finalization should be cleared.  I can think of cases supporting
either choice, of clearing the pointers during the first GC,
or during the second.  I lean towards clearing them during the
first GC, when the objects are being placed on the FINALIZATIONS
list.

Here's how you could implement this in a copying GC: In addition to a
FINALIZABLE-P bit, there is in fact a hidden slot called
FINALIZEABLE-CHAIN in the layout of finalizable objects.  (The bit can
maiybe be stored somewhere in the slot, instead of a tag.)  This slot
forms a linked list FINALIZABLE-LIST of all objects needing
finalization.  When the client turns the FINALIZABLE-P bit off, the
system is free to remove the object from the linked list when
convenient.  When the client turns the bit on, the system must ensure
that they are once again on the list.  At GC time, oldspace is scanned
as usual, but FINALIZABLE-LIST is not used as a root reference, nor is
the FINALIZABLE-CHAIN slot used to propagate referencing.  Just before
oldspace is flushed, however, FINALIZABLE-LIST is now used as a root
reference.  Any objects which are moved to copyspace at this time are
objects needing finalization.  Such objects are unlinked from the
FINALIZABLE-LIST (which by now is in newspace) but are placed on another
list called FINALIZATIONS.  It is this list that is traversed after the
GC, to call the finalization functions.

Note that these functions are called after the GC has finished
reconstructing the heap in copyspace.  In fact, there is no need to call
them immediately after the GC; the list could be processed at any
opportune moment, perhaps in a low-priority background process.

Some provision must be made to retain the FINALIZATIONS list across GCs
occurring during the actual process of finalization.  The simplest thing
to do is leave their FINALIZABLE-P bit turned on, so intervening GCs
just re-collect the FINALIZATIONS list from scratch.

By the way, a clean way to introduce finalization into CLOS would be to
have a finalizable superclass, which contained the hidden
FINALIZABLE-CHAIN slot, and supported an accessor for the FINALIZABLE-P
bit.  The generic function FINALIZE would be called to finalize such
objects, and subclasses could specialize it.

This design implies that some kinds of objects (e.g., cons cells) are
not finalizable; that's fine with me.  I use finalization, in C++,
mainly to deallocate memory when I'm done with it.  Lisp takes care of
that task, but there are other resources which both C++ and Lisp
finalization help manage, notably operating system channels for open
streams.  (Other examples: Temporary files or disk blocks, and resources
in coprocessors or servers.)  Therefore, finalization typically applies
to an object which serves as a container for the non-Lisp resources, and
it doesn't disturb me that such objects are a limited subset of all
possible Lisp objects.

				-- John