[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

10-May-85 0146 hoey@nrl-aic Re: EQUALP hash tables (and more ...)



Received: from NRL-AIC.ARPA by SU-AI.ARPA with TCP; 10 May 85  01:46:39 PDT
Date: Fri, 10 May 1985 07:54:00 -0000
From: Dan Hoey <hoey@nrl-aic.ARPA>
Subject: Re: EQUALP hash tables (and more ...)
To: Jon White <JLW@SU-AI.ARPA>
Cc: Common-Lisp@SU-AI.ARPA
Message-Id: <484559691/hoey@nrl-aic>

    Date: 09 May 85  2356 PDT
    From: Jon White <JLW@SU-AI.ARPA>

    ... a more generalized form of hashing ... two more "properties" of
    hashtables -- the "equivalenceing" function, and the "numericalizing"
    function....

    For "equivalence" function, the user may supply #'eq, #'string=, or
    even something random like #'=-mod-5;

Bravo!  If CL should have hash tables, it should have these.  The
existing lack of functionality is embarrassing in an otherwise
orthogonal, extensible language.

    ... the "numericalizing" function is simply how to reduce a general
    s-expression into a reasonable sized fixnum....

I would leave off ``reasonable sized''; certainly gethash is capable of
hashing a fixnum value to its favorite range.

    I [believe] hashing on EQ means using the pointer address...
    but nothing in the CLM would prevent an implementation from using sxhash
    instead.

Two problems with this.  First, as you note, the key could get
modified.  I'm not sure you noticed, but this means that in order for
GETHASH to fail, it would have to examine the whole table for modified
structures.  Second, I think SXHASH might not terminate on circular
structures, though this caveat is not in the manual (and it, or its
denial, should be).

Dan

SXHASH on circular structures?

In-reply-to: your message of 10-May-85  3:54:51 EDT

Yes, one might implement SXHASH in a manner that would loop on circular
structures, but there is no requirement to do so.  In fact, the CLM is
so vague about sxhash that (defun sxhash (x) 1)  is a perfectly legitimate
definition!  

I think there has been a common misconception about sxhash that it must
descend into every node of a structure; indeed, the first implementation
in pdp10 MacLisp, over a decade ago, did just that (and I'm no doubt the 
guilty party).  But more recently, I (and others) have experimented with
definitions which are depth-limited.  [Similarly, during the NIL development,
both NIL and pdp10 MacLisp got definitions for a number of functions which were depth-limited, solely as a (slow) way to stop
depth-limiting;  this was done primarily as a low-overhead, fail-safe way to 
detect circularities.  However, a fairly large limit has to be allowed, and thus
this idea has a loop hole on "blam" lists given to EQUAL; but it effectively
and somewhat quickly stop circularites on LENGTH, NTH, and a few others.]

But the purpose of SXHASH is to provide a user interface to the "numericalizing"
function needed for hash arrays; there is no need to make it more discriminatory
that would be useful in providing a moderately random distribution of collision
chains in hash tables.  SXHASH is a many-to-one mapping, and there is no reason
to assume that, statistically speaking, the depth cutoff is more damaging than
is the accidental coinciding of a particular dataset -- one might think of this
"accidental coinciding" as the inherent noise in the information of the problem.

Of course, too small a limit on the depth would certainly be noticeable in
terms of the *average* speed of hashing functions, but just exactly where
that boundary is may be difficult to determine. [I wouldn't seriously
suggest using a constant function, or even one that merely discriminated
on the data type -- but can anyone seriously suggest that depth 1000 will
be statistically different from depth 100?  I would like to know.]
For some months now, Lucid Common Lisp has been running with variations on
the limited-depth sxhash, and not only does it meet the CLM requirements,
but it seems to meet our day-to-day needs.

-- JonL --