[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: edsel!jonl@labrea.stanford.edu*Subject*: EQUAL, and hash tables, and value/object distinctions*From*: jrose@Sun.COM (John Rose)*Date*: Thu, 30 Jun 88 12:40:54 PDT*Cc*: goldman@vaxa.isi.edu, common-lisp@sail.stanford.edu*In-reply-to*: Jon L White's message of Wed, 29 Jun 88 18:57:09 PDT <8806300157.AA15565@bhopal.lucid.com>

Two remarks on EQUAL and hash tables, from a hash table fan. First, if EQUAL is going to descend structures and arrays to test for isomorphism, it should compare hash tables for equality according to their contents. That is, it should verify that two possibly-equal hash tables have the same set of keys, and that for each key the two stored values are EQUAL. Key equality testing must be done using each table's key comparison function. [Rationale & discussion are below, on the next page. But first I've got to respond to something else.] Date: Wed, 29 Jun 88 18:57:09 PDT From: Jon L White <edsel!jonl@labrea.stanford.edu> From: goldman@vaxa.isi.edu Date: Fri, 24 Jun 88 18:15:43 PDT ... If programmers to extend EQUAL in CLOS style -- which means with arbitrary procedural definitions for their new classes -- could EQUAL hash tables be implemented efficiently? How would they hash values belonging to the new classes? As you pointed out in subsequent discussion, a user-supplied EQUAL method, whether for CLOS classes or as a defstruct option, leaves open the question of mechanically verifying that the alleged predicate is reflexive, symmetric, and transitive. Another major problem with using hash-tables on objects with non-default EQUAL methods is that the SXHASH function must be similarly extended for these data types; SXHASH must obey the property that (equal x y) imples (= (sxhash x) (sxhash x)) See CLtL p285. At one time, there was a suggestion that hash tables admit a user-supplied equivalence predicate; but it never got very far for just this reason, that the equivalence predicate must be *** paired up** with an appropriate "numericalizer" function, and many implementors felt that users wouldn't understand these issues and wouldn't be able to construct up their own versions of sxhash to go with their alleged equivalence predicates. Yuck. Who are these brilliant Implementors, that we may all worship at their feet? For they are the curators of such profound knowledge as (EQUALITY-PREDICATE X Y) ==> (= (HASH-FN X) (HASH-FN Y)) If we mere Users cannot be trusted with such wisdom, how are we to navigate the Streams of Extensibility, or handle the Objects of Genericity? Oh great Implementors, save us, we pray, from overmuch functionality. Seriously, I hear an alarming amount of condescension in the above reasoning. The above logical implication is ALL THAT IS NEEDED to construct an appropriate hash function. It is the sole and fundamental insight which makes hash tables work; I have relied on it for years, building many a hash table in various extensible frameworks. What's the problem of requiring a hash-table builder from promising that his hash and equality functions bear the simple relation to each other? And if he can't handle that, what chance does he have of building a working stream type, or making sense out of CLOS? (Note that many vendors have supplied more or less hairy stream definition facilities, which usually require invariants much more complex than the hash invariant above.) As for "mechanical verification" of the hashing and equality invariants, that's a red herring. Since when do we mechanically verify any of the required properties of functional arguments to Lisp primitives? (I'm thinking particularly of the :KEY and :TEST arguments to sequence functions.) Just say that unless the supplied functions satisfy the required conditions "it is an error". The "pairing up" of equality and hashing could be done cleanly in CLOS by defining generic equality and hashing methods for particular hash table types. [SXHASH is required to be "good enough" for the EQUAL equivalence releation, and for subsets of that relation such as STRING=, but it is unlikely that any vendor makes it "good enough" for, say, EQUALP. Lucid has EQUALP hash tables, but they don't use SXHASH.] -- JonL -- ! Proposal: EQUAL should check for isomorphism of hash tables. The effect of this rule is to treat hash tables like lists, structures, and arrays for equality testing, under the following reasonable theory of the semantics of lists, structures, arrays, and hash tables: The meaning of any such data structure is determined by a small set of (call them) "access keys" and a value stored under each access key. The set of access keys for each kind of data structure is {CAR,CDR}, a set of slot access functions, an initial setquence of non-negative integers, and a set of Lisp objects, respectively. Other than the differences in these "access keys", all four of the above data structures are tuple types, in that they store a finite set of values, each accessible under its own key. An equality function which performs an isomorphism check on any of those tuple types should perform it on all of them, or have a very specific reason not to. If two hash tables differ not in their contents but in their implementation parameters (such as size or equality function), we must choose whether to compare structure abstractly (looking only at contents) or concretely (looking also at implementation parameters). This question comes up with respect to array structural equality also. For example, do we allow a fill pointer to define an array's length (i.e., its "access key" set), or do we look at the length of the underlying allocated storage? What about comparing an (array t) having a fill pointer, with a string? I believe the people working on this problem will come to a more or less abstract viewpoint, and if so, that viewpoint should be applied consistently to hash tables. An interesting question arises when two hash tables have different key equality predicates, and (suppose) we've already decided to treat equality predicates as implementation details, and ignore them directly. Here's one answer: Declare two hash tables EQUAL if they include each other. That is, for every key K in hash table A, require that (EQUAL (GETHASH A K) (GETHASH B K *UNIQUE-UNKNOWN-VALUE*)) and the same for every key in table B. Possible specific reasons not to test for hash table isomorphism, with comments or rebuttals in brackets: (1) It's too hard to implement. [No, it's actually pretty easy to implement this test portably using a MAPHASH over each table; each MAPHASH tests one direction of inclusion by performing GETHASHs.] (2) It's too slow to implement. [Any recursive descent takes a long time to do. But hash tables can probably be made faster to compare than corresponding CONS structures, because they can use saved key hash values to advantages. A hash table's key set can be hashed by combining all its key hash values.] (3) Recursive descent into hash tables is too unexpected. [For programmers unfamiliar with hash tables, many of their properties are going to be pretty surprising. Programmers who understand their nature and use are also going to understand their correspondence with lists, structures, and (especially) arrays; it is these programmers whose expectations we should design for.] (4) Hash tables are different because we expect frequent side effects on them, and so momentary isomorphism between two of them is not significant. [This is the best argument against, I think. See below.] About (4): Cons cells, structures, arrays, and hash tables can all be used either as pure values (that is, no updates after initial construction, and object identity is not important) or as updatable objects (updates are expected, and shared object identities are crucial). Under current Lisp practice, hash tables are rarely used as pure values. (Cons cells usually are, and structures and arrays occasionally are.) We may call types which are used as pure values "value-like" and types which are used as shared objects "object-like". Notice that EQUAL and EQL differ in their theory of cons cells: EQUAL treats them as pure values, and EQL as sharable objects. I believe a consistent way of accounting for this is to say that the behavior of EQUAL and EQL differ for types which have this dual nature, of value-like and object-like. (For types which are only value-like, such as complex numbers, EQL compares substructures, just as EQUAL does. For types which are only object-like, such as streams, EQUAL compares object identity, just as EQL does.) So, if you believe all that about EQUAL and EQL, the question of EQUAL on hash tables gets down to this: Are hash tables so object-like that no one is likely to think of them as pure values? Consider this: Hash tables can be used to implement sets, if you need to optimize the speed of the set-inclusion operator. Sets are value-like types. Hash tables can be used to implement finite functions; if they themselves are hashable and comparable, one can build higher-order functional systems out of them (and get memoization as a bonus). -- John

**Follow-Ups**:**EQUAL, and hash tables, and value/object distinctions***From:*Michael Greenwald <Greenwald@STONY-BROOK.SCRC.Symbolics.COM>

**EQUAL, and hash tables, and value/object distinctions***From:*Jon L White <edsel!jonl@labrea.stanford.edu>

**References**:**[Re: EQUAL]***From:*Jon L White <edsel!jonl@labrea.stanford.edu>

- Prev by Date:
**[Re: EQUAL]** - Next by Date:
**EQUAL, and hash tables, and value/object distinctions** - Previous by thread:
**[Re: EQUAL]** - Next by thread:
**EQUAL, and hash tables, and value/object distinctions** - Index(es):