[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

hash table types



    Date: Sun, 12 May 85 08:51:09 EST
    From: George J. Carrette <GJC@MIT-MC>

    We figured that fixing the root of the problem, the lack of
    canonicalization in the hardware database would possibly cost
    enough time to break a production development window and put
    off the introduction of a new hardware item for a few months,
    with possible losses in sales of a million dollars or more
    for the year. Given this sort of lossage mode one cant stress
    enough the importance of having *some* way of getting
    compatibility between RELEASE N and RELEASE N+1. Remember
    that at some time in "zetalisp" EQUAL on type STRING meant
    STRING-EQUAL, which was for better or worse case insensitive.
    LMI release 2 and presumably also Symbolics release 6 does
    away with this. If there is nothing available which behaves
    in exactly the same way as the *old* EQUAL-HASH-TABLE then
    people can be screwed badly. Remember also that many ugly
    things that were left in Common-Lisp from maclisp days (such
    as the empty list being a SYMBOL, making the set of all lists
    and all symbols have a non-null intersection) were justified
    soley on the basis that there was no *automatic* way for
    people to detect if they were depending on such lossage and
    convert their programs. To quote Dave Moon: "Our customers
    just wont stand for it."

For that exact reason, "Our customers just won't stand for it",
Release 6 is compatible with Release 5 w.r.t. EQUAL-HASH-TABLEs,
i.e., case is ignoreed.  Release 6 provides hints on how to write
code that will work in both Release 6 and Release 7, requiring
recompilation and possibly minimal, if any conversion.  In
Release 7 there will most likely be a CASE-INSENSITIVE-HASH-TABLE
that uses CASE-INSENSITIVE-EQUAL as the predicate and
CASE-INSENSITIVE-EQUAL-HASH (unless EQUAL-HASH will continue to
ignore case).

More generally, and on the generic issue: Symbolics has had
generic hash tables for quite a while.  I don't know if Lisp
Machine Lisp is general has.  The definer of the hash table is
required to write two methods, the equality predicate and the
hash function.  They are quite useful.  All implementations must
do a dispatch, since the entrypoints are the same for all types
of hash tables.  In a flavor-based implementation, the dispatch
happens at the message send level.  In other implementations, my
guess is that FUNCALL is used where the function comes from some
defstruct slot.  In any case, generalizing does not appear to be
a hard thing to implement.

The external things I see needing changing are
 * An additional argument to make-hash-table.  My vote for a name
   is :hash-function.
 * If the hash function is not provided, it defaults from the
   predicate. 
 * It is an error not to supply a hash function if the predicate
   is not one of those builtin (currently, EQ, EQL and EQUAL).
   [Implementatyions may supply more builtins and opt not to
   signal the error.]
 * Requirements of the hash function: It must return an integer,
   possibly restricted to fixnum, possibly required to be
   positive.  [The LispM builtin hash functions attempt to return
   positive fixnums that use as many bits as possible.  This is
   so that there is more uniform distribution after it is MODed
   with the hash table modulus.]
 * Hints on how to write implementation independent, efficient
   hash functions.  This is the hardest part.  Maybe there are
   enough constants defined in the language to make this possible
   (e.g., bits-per-fixnum, etc).  Hmmm... Many LispM hash
   functions use the functions ROT (better called ROTATE-FIXNUM)
   and LSH (better called LOGICAL-SHIFT-FIXNUM).  They often use
   LOGXOR, but this is part of CLtL.  I have a memory of being
   told that ROT and LSH were consciously not part of the CL
   language because they weren't mathematical.  With names like
   ROTATE-FIXNUM and LOGICAL-SHIFT-FIXNUM maybe they should be
   put back?  [Also see Turing's biography about the hardware
   designers diking out his boolean instructions for similar
   reasons.] 
 * The use of MAKNUM, %POINTER or whatever it is for a particular
   implementation is to be forbidden for user suplied hash
   functions.  This is because the GC can move objects and change
   their addresses, and there is no way in CL to communicate this
   between the hash table and the GC.  Individual implementations
   may wish to implement such a communication and allow %POINTER
   and use it for builtin hash tables to improve efficiency.

Conclusion: generic hash tables are useful and probably simple to
implement.  User supplied hash functions are required but
difficult.

Additionaly, I think there should be a CASE-INSENSITIVE-EQUAL
function and a corresponding (builtin) hash table type.  I have
an uneasy feeling about EQUALP hash tables.  I think the
uneasiness comes in the traversing of arbitrary array structure
that EQUALP implies and the possible inefficiency.  The example
needs I have so far seen for EQUALP are dealt with by
CASE-INSENSITIVE-EQUAL.