[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
EQUAL, and hash tables, and value/object distinctions
- To: jrose@Sun.COM, edsel!jonl@labrea.stanford.edu
- Subject: EQUAL, and hash tables, and value/object distinctions
- From: Michael Greenwald <Greenwald@STONY-BROOK.SCRC.Symbolics.COM>
- Date: Thu, 30 Jun 88 16:36 EDT
- Cc: goldman@vaxa.isi.edu, common-lisp@sail.stanford.edu
- In-reply-to: <8806301940.AA01804@lukasiewicz.sun.com>
Date: Thu, 30 Jun 88 12:40:54 PDT
From: jrose@Sun.COM (John Rose)
Two remarks on EQUAL and hash tables, from a hash table fan.
...
Date: Wed, 29 Jun 88 18:57:09 PDT
From: Jon L White <edsel!jonl@labrea.stanford.edu>
As you pointed out in subsequent discussion, a user-supplied EQUAL method,
whether for CLOS classes or as a defstruct option, leaves open the question
of mechanically verifying that the alleged predicate is reflexive, symmetric,
and transitive. Another major problem with using hash-tables on objects
with non-default EQUAL methods is that the SXHASH function must be similarly
extended for these data types; SXHASH must obey the property that
(equal x y) imples (= (sxhash x) (sxhash x))
See CLtL p285. At one time, there was a suggestion that hash tables admit
a user-supplied equivalence predicate; but it never got very far for just
this reason, that the equivalence predicate must be *** paired up** with an
appropriate "numericalizer" function, and many implementors felt that users
wouldn't understand these issues and wouldn't be able to construct up their
own versions of sxhash to go with their alleged equivalence predicates.
Yuck. Who are these brilliant Implementors, that we may all worship
at their feet? For they are the curators of such profound knowledge as
(EQUALITY-PREDICATE X Y) ==> (= (HASH-FN X) (HASH-FN Y))
Yes, but the equality-predicate must be transitive over the domain of
possible keys in order for a hash-function to work.
For example, one cannot construct a hash-function for #'= unless you limit
the valid keys to not include both a floating point number and two
integers that both coerce to that same float.
(i.e.
(let* ((a (expt 2 56))
(b (1+ a))
(c (float a)))
(values (= a b) (= a c) (= b c)))
=> NIL T T
)
I'm not voting for condescension, nor for protecting programmers from
themselves. I just want to point out that it is slightly more
complicated than you make it seem.
My personal opinion is that CLtL should have allowed both the predicate
and hash-function to be defined once it included hash tables in the
language spec. In a real application the programmer will probably want
to construct a custom hash function anyway, even for :TEST 'EQ. If the
table has a known, or typical, set of keys, that might allow a much more
efficient, or more collision-free, hash function.
Hash tables aren't so complicated that one can't construct them
yourself. My guess (and only a guess) is that the only reason they're
included in the language spec is to give a portable way of hashing on
%pointer where that might be useful. SXHASH would have been enough, and
that allows easy implementation of the common case of EQUAL hash tables,
too.
The minimalist in me still wonders why hash tables were included in
CLtL.
Seriously, I hear an alarming amount of condescension in the above reasoning.
The above logical implication is ALL THAT IS NEEDED to construct an
appropriate hash function. It is the sole and fundamental insight which
makes hash tables work; I have relied on it for years, building many a
hash table in various extensible frameworks. What's the problem of
requiring a hash-table builder from promising that his hash and equality
functions bear the simple relation to each other? And if he can't
handle that, what chance does he have of building a working stream type,
or making sense out of CLOS? (Note that many vendors have supplied more
or less hairy stream definition facilities, which usually require
invariants much more complex than the hash invariant above.)