[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

arrays and vectors (long carefully-thought-out message)



I prefer the "simple switch" to the "RPG memorial" proposal, with one
modification to be found below.  The reason for this preference is that
it makes the "good" name, STRING for example, refer to the general class
of objects, relegating the efficiency decision to a modifier ("simple").
The alternative makes the efficiency issue too visible to the casual user,
in my opinion.  You have to always be thinking "do I only want this to
work for efficient strings, which are called strings, or should it work
for all kinds of strings, which are called arrays of characters?".
Better to say, "well this works for strings, and hmm, is it worth
restricting it to simple-strings to squeeze out maximal efficiency"?

Lest this seem like I am trying to sabotage the efficiency of Lisp
implementations that are stuck with "stock" hardware, consider the
following:

In the simple switch proposal, how is (MAKE-ARRAY 100) different from
(MAKE-ARRAY 100 :SIMPLE T)?  In fact, there is only one difference--it is
an error to use ADJUST-ARRAY-SIZE on the latter array, but not on the
former.  Except for this, simpleness consists, simply, of the absence of
options.  This suggests to me that the :SIMPLE option be flushed, and
instead a :ADJUSTABLE-SIZE option be added (see, I pronounce the colons).
Even on the Lisp machine, where :ADJUSTABLE-SIZE makes no difference, I
think it would be an improvement, merely for documentation purposes.  Now
everything makes sense: if you don't ask for any special features in your
arrays, you get simple ones, which is consistent with the behavior of the
sequence functions returning simple arrays always.  And if some
implementation decides they need the sequence functions to return
non-simple arrays, they can always add additional keywords to them to so
specify.  The only time you need to know about the word "simple" at all is
if you are making type declarations for efficiency, in which case you have
to decide whether to declare something to be a STRING or a SIMPLE-STRING.
And it makes sense that the more restrictive declaration be a longer word.
This also meets RPG's objection, which I think boils down to the fact
that he thought it was stupid to have :SIMPLE T all over his programs.
He was right.

I'm fairly sure that I don't understand the portability issues that KMP
brought up (I don't have a whole lot of time to devote to this).  But I
think that in my proposal STRINGP and SIMPLE-STRINGP are never the same
in any implementation; for instance, in the Lisp machine STRINGP is true
of all strings, while SIMPLE-STRINGP is only true of those that do not
have fill-pointers.  If we want to legislate that the :ADJUSTABLE-SIZE
option is guaranteed to turn off SIMPLE-STRINGP, I expect I can dig up
a bit somewhere to remember the value of the option.  This would in fact
mean that simple-ness is a completely implementation-independent concept,
and the only implementation-dependence is how much (if any) efficiency
you gain by using it, and how much of that efficiency you get for free
and how much you get only if you make declarations.

Perhaps the last sentence isn't obvious to everyone.  On the LM-2 Lisp
machine, a simple string is faster than a non-simple string for many
operations.  This speed-up happens regardless of declarations; it is a
result of a run-time dispatch to either fast microcode or slow microcode.
On the VAX with a dumb compiler and no tuning, a simple string is only
faster if you make declarations.  On the VAX with a dumb compiler but some
obvious tuning of sequence and string primitives to move type checks out of
inner loops (making multiple copies of the inner loop), simple strings are
faster for these operations, but still slow for AREF unless you make a type
declaration.  On the VAX with a medium-smart compiler that does the same
sort of tuning on user functions, simple strings are faster for user
functions, too, if you only declare (OPTIMIZE SPEED) [assuming that the
compiler prefers space over speed by default, which is the right choice in
most implementations], and save space as well as time if you go whole hog
and make a type declaration.  On the 3600 Lisp machine, you have sort of a
combination of the first case and the last case.

I also support the #* syntax for bit vectors, rather than the #" syntax.
It's probably mere temporal accident that the simple switch proposal
uses #" while the RPG memorial proposal uses #*.

To sum up:

A vector is a 1-dimensional array.  It prints as #(foo bar) or #<array...>
depending on the value of a switch.

A string is a vector of characters.  It always prints as "foo".  Unlike
all other arrays, strings self-evaluate and are compared by EQUAL.

A bit-vector is a vector of bits.  It always prints as #*101.  Since as
far as I can tell these are redundant with integers, perhaps like integers
they should self-evaluate and be compared by EQUAL.  I don't care.

A simple-vector, simple-string, or simple-bit-vector is one of the above
with none of the following MAKE-ARRAY (or MAKE-STRING) options specified:

	:FILL-POINTER
	:ADJUSTABLE-SIZE
	:DISPLACED-TO
	:LEADER-LENGTH, :LEADER-LIST (in implementations that offer them)

There are type names and predicates for the three simple array types.  In
some implementations using the type declaration gets you more efficient
code that only works for that simple type, which is why these are in the
language at all.  There are no user-visible distinctions associated with
simpleness other than those implied by the absence of the above MAKE-ARRAY
options.