[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

constant folding/smashing



A flurry of messages was generated last week (on another mailing list)
by a poor common lisp programmer whose program
apparently quit working (the way he expected) due to a change in the way
the compiler dealt with quoted lists.  The user was relying on two
properties of the old compiler:  
  i) he relied on a particular result of destructively modifying
     elements of a quoted list.

  ii) he relied on the compiler keeping distinct lexical occurrences of
      quoted conses as distinct (non-EQ) values.  The compiler had 
      been modified to collapse certain EQUAL quoted lists so that they
      became EQ.  One could no longer write
      (let ((unique-value-one '(nil))
	    (unique-value-two '(nil)))
	...)
      and use EQ to distinguish two values.

The responses correctly suggested that programs which relied on the former
compiler treatment could not be expected to be portable(across common lisp
implementations or across releases).  But it should be noted that the
reason for this is NOT that the "semantics of Common Lisp " makes that
clear, or that the particular vendor's lisp documentation gave any
indication that this programming practice was unsafe.
CLtL does not make it at all clear what a program is deemed to
be saying when it uses DEFCONSTANT or QUOTE.  The decisions made by
compiler writers seem to be based on years of lore about how these
constructs "should" be used, rather than on any specification of what they
mean.

For example, it is plausible that if I write
		      (let ((c1 '((a b) (c d))))
	                 <body>)

a compiler may "simplify" (CAAR c1) to 'a.  (i.e., if the value of a
constant is a list, its CARs/CDRs, arbitrarily deep, are not supposed to
be altered.)  But, as far as anything I can find WRITTEN DOWN goes, it is
just as PLAUSIBLE that if I write
			    (let ((c2 'X)) <body>)
I am declaring the PROPERTY-LIST of the symbol X to be unchanging, although
I doubt that anyone has yet employed that assumption in a compiler.

As correctly pointed out in one of the responses, the question of what 
comprises the boundary of "constantness" of constants is distinct
from the issue of "constant collapsing".  The latter is a question of
what EQUIVALENCE predicates can be imposed on lexically distinct constants
in a program.  Common Lisp provides 4 universal equivalence predicates
(EQ, EQL, EQUAL, and EQUALP), as well as some type-specific equivalences
(=, char=, string=).   The only reference to this issue I could find in
CLtL is a mention, under DEFCONSTANT, that an implementation may replace
lexically distinct references to a constant with EQL copies of its value
(an  authorization to actually use a WEAKER equivalence than one might
expect!)  This certainly leaves many other transformations in limbo as to 
legitimacy.

I believe that it is NOT acceptable for these issues of program meaning to
be undefined to the extent they currently are.

Following is a cut at a proposal for clarifying these issues:

I. The meaning of a constant is based solely on the S-expression 
representation of a program, not on its textual representation.
I doubt this controversial, but it does mean one can't rely on intuitions
about looking at the "source code" that appears inside a QUOTE.
[It does not matter if a constant is introduced by
			     '(a b #,(foo) c)
as opposed to
		  '(a b #,(make-myrecord :f1 1 :f2 2) c)
or
		     '(a b #s<myrecord :f1 1 :f2 2> c)
]

II.  I would define a new equivalence predicate, SAME-BOUNDARY.
SAME-BOUNDARY would be specified for each datatype in the language.  For
some, it could be defined in terms of existing equivalence predicates
(e.g., for integers, it would be EQ.)    For other datatypes of the
language, certain accessors are specified to define the boundary.  E.g.,
CAR/CDR for CONSes, SYMBOL-PNAME and SYMBOL-PACKAGE for SYMBOLs, all fields
for untyped defstructs, no accessors for PACKAGEs...
  SAME-BOUNDARY would then be recursively defined in the "obvious" way.

[Perhaps EQUALP could be clarified to be this, or perhaps its use is
already too wide spread do do that.
There should also be a version of COPY that yields a SAME-BOUNDARY copy.]

IT IS AN ERROR for a program to modify any part of a constant to a value
that is not in the same SAME-BOUNDARY equivalence class as its original
value.  An implemenation may replace a constant at any time with
a SAME-BOUNDARY copy; programs may thus rely on "pieces" of constants
only up to SAME-BOUNDARY equivalence.

  An implementation may introduce declarations or proclamations that
EXTEND the set of accessors defining the boundary of constanthood.

III.  Lexically distinct references to a "variable" declared by defconstant
may be replaced by SAME-BOUNDARY copies.

IV.  Lexically distinct occurrences of constant expressions
-- (QUOTE x), or self-evaluating expressions -- may be replaced
by SAME-BOUNDARY copies.

V.  Lexically distinct occurrences of constant expressions
may NOT be collapsed so that an equivalence predicate  P holds
between them provided P is required by Common Lisp to be STRONGER
than SAME-BOUNDARY over the datatype of the expressions.
[E.g., an implementation may collapse distinct integer constants
to become EQ because
  i.) SAME-BOUNDARY is defined as EQL over the integers, and
 ii.) CL does not require EQ to be stronger than EQL over the integers.
]