[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

:TEMPORARY hash tables; weak pointers



As long as we're still talking about weak pointers and related topics,
I should note that there seem to be two very different broad categories
of weak pointer usage.  (If you know a third, please speak up.)

1. Hash Consing
This kind of usage has been mentioned on this mailing list already.
The idea is that you need to add a slot to an object, but you don't
have control over the object's memory layout.  For example, CLOS
defines SYMBOL-CLASS as a SETF-able location for each symbol;
it behaves much like SYMBOL-NAME, SYMBOL-VALUE, etc.  (For most symbols,
the slot is unbound.)  Although SYMBOL-CLASS behaves semantically
just like a new slot in the per-symbol data structure, it is
almost certainly implemented differently than SYMBOL-VALUE.

Instead, the slot is often stored in a hash table, whose
keys are symbol addresses, and values are SYMBOL-CLASS
places.  This technique is general:  If there's no room
for an attribute in a data structure's contiguous block of memory,
allocate more space for the attribute somewhere else, and
record the new location in a hash table keyed by the old
location.

This technique requires weak pointers in the hash table keys.
Suppose a symbol is no longer used, and should be GC-able.
There is still a problem if the symbol has any hash-consed
slots (like SYMBOL-CLASS), since some hash table somewhere
records it as a key, and may thus prevent deletion of the
symbol.

It is clear that (:TEMPORARY T :TEST 'EQ) hash tables would
be well suited for hash consing.

2. Back Pointers
A completely different use of weak pointers, which I don't
recall seeing mentioned here recently, involves data structures
with back pointers.  In general, when a single central object
is referenced by many dependent objects, it is sometimes
necessary for the central object to have a list of back
pointers to the dependents.

The classic example has to do with text editor marks.  A "mark"
is a little pointer which records a position between two
characters in a string being edited.  Marks are more
interesting than simple integer positions, because they float:
If the first half of a string is deleted, all the marks into
the second half must continue to point at the same characters.
In practice, marks are ordered pairs of integers and strings,
so when the underlying string is shifted somehow, the editor
must be sure to visit all marks on that string, and update
their integer components.  This requires maintenance of
pointers which invert the mark-to-string relationship,
and the back pointers are in fact sets of pointers, since
string-to-mark is a one-to-many relationship.

The problem here is that marks are often consed and discarded
at a great rate.  But if a mark's back pointer is not
explicitly cleared, the string's reference to the mark will
keep the mark from being GC-ed.  The result is a performance
degradation, since the string is burdened with longer and
longer lists of useless marks to keep updated.

One thing that would help here is a "weak set" construct,
which would hold all of the back pointers.  The weak set
would support looping over its members, but would not
prevent reclamation of garbage objects.

You can build a weak set out of a (:TEMPORARY T :TEST 'EQ) hash
table by storing the back pointers as keys.  The values stored
on the keys are immaterial; T would do fine!  The only use of
GETHASH would be to SETF in new back pointers; the MAPHASH
function would perform the looping function.

I'm not completely comfortable with using hash tables this way;
it seems as waste of half of the hash table functionality.
Perhaps intellectual integrity requires a separate notion of
weak set?  I believe the following intriguing hash table extension
lets us keep our integrity, without introducing weak sets per se:
	(MAKE-HASH-TABLE &KEY ... (TYPE T) ...)

The new keyword :TYPE declares the type of all values to be stored
in the hash table.  Certain implementations may be able to optimize
the storage of numeric, character, or boolean types.  In addition, if
TYPE is a singleton (e.g., NULL or (MEMBER T)), certain
implementations may be able to use a significantly more efficient
representation.

So:
   (defun make-weak-set () (make-hash-table :temporary t :type 'null))
   (defun weak-set-adjoin (x s) (setf (gethash x s) nil) s)
   (defun weak-set-delete (x s) (remhash x s) s)
   (defun weak-set-member (x s) (not (gethash x s t)))
   (defun map-weak-set (f s) (maphash #'(lambda (k v) v (funcall f k)) s))

(What about declaring key types too?  Well, that's another story.
Note the key type is closely tied to the choices of test and
hash functions.)


So, those are the uses of weak pointers I know about.  What
else is there?

(Hope this doesn't beat the issue to death...)

					-- John