[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Arrays and vectors (again)



Several people have stated that they dislike my earlier proposal because
it uses the good names (VECTOR, STRING, BIT-VECTOR, VREF, CHAR, BIT) on
general 1-D arrays, and makes the user say "simple" when he wants one of
the more specialized high-efficiency versions.  This makes extra work
for users, who will want simple vectors at least 95% of the time.  In
addition, there is the argument that simple vectors should be thought of
as a first-class data-type (in implementations that provide them) and
not as a mere degenerate form of array.

Just to see what it looks like, I have re-worked the earlier proposal to
give the good names to the simple forms.  This does not really eliminate
any of the classes in the earlier proposal, since each of those classes
had some attributes or operations that distinguished it from the others.

Since there are getting to be a lot of proposals around, we need some
nomencalture for future discussions.  My first attempt, with the
user-settable :PRINT option should be called the "print-switch"
proposal; the next one, with the heavy use of the :SIMPLE switch should
be the "simple-switch" proposal; this one can be called the "RPG
memorial" proposal.  Let me know what you think about this vs. the
simple-switch version -- I can live with either, but I really would like
to nail this down pretty soon so that we can get on with the
implementation.

**********************************************************************

Arrays can be 1-D or multi-D.  All arrays can be created by MAKE-ARRAY
and can be accessed with AREF.  Storage is done via SETF of an AREF.
1-D arrays are special, in that they are also of type SEQUENCE, and can
be referenced by ELT.  Also, only 1-D arrays can have fill pointers.

Some implementations may provide a special, highly efficient
representation for simple 1-D arrays, which will be of type VECTOR.  A
vector is 1-dimensional, cannot have a fill pointer, cannot be
displaced, and cannot be altered in size after its creation.  To get a
vector, you use the :VECTOR keyword to MAKE-ARRAY with a non-null value.
If there are any conflicting options specified, an error is signalled.
The MAKE-VECTOR form is equivalent to MAKE-ARRAY with :VECTOR T.

A STRING is a VECTOR whose element-type (specified by the :ELEMENT-TYPE
keyword) is STRING-CHAR.  Strings are special in that they print using
the "..." syntax, and they are legal inputs to a class of "string
functions".  Actually, these functions accept any 1-D array whose
element type is STRING-CHAR.  This more general class is called a
CHAR-SEQUENCE. 

A BIT-VECTOR is a VECTOR whose element-type is BIT, alias (MOD 2).
Bit-vectors are special in that they print using the #*... syntax, and
they are legal inputs to a class of boolean bit-vector functions.
Actually, these functions accept any 1-D array whose element-type is
BIT.  This more general class is called a BIT-SEQUENCE.

All arrays can be referenced via AREF, but in some implementations
additional efficiency can be obtained by declaring certain objects to be
vectors, strings, or bit-vectors.  This can be done by normal
type-declarations or by special accessing forms.  The form (VREF v n) is
equivalent to (AREF (THE VECTOR v) n).  The form (CHAR s n) is
equivalent to (AREF (THE STRING s) n).  The form (BIT b n) is equivalent
to (AREF (THE BIT-VECTOR b) n).

If an implementation does not support vectors, the :VECTOR keyword is
ignored except that the error is still signalled on inconsistent cases;
The additional restrictions on vectors are not enforced.  MAKE-VECTOR is
treated just like the equivalent make-array.  VECTORP is true of every
1-D array, STRINGP of every CHAR-SEQUENCE, and BIT-VECTOR of every
BIT-SEQUENCE.

CHAR-SEQUENCEs, including strings, self-eval; all other arrays cause an
error when passed to EVAL.  EQUAL descends into CHAR-SEQUENCEs, but not into
any other arrays.  EQUALP descends into arrays of all kinds, comparing
the corresponding elements with EQUALP.  EQUALP is false if the array
dimensions are not the same, but it is not sensitive to the element-type
of the array, whether it is a vector, etc.  In comparing the dimensions of
vectors, EQUALP uses the length from 0 to the fill pointer; it does not
look at any elements beyond the fill pointer.

The set of type-specifiers required for all of this is ARRAY, VECTOR,
STRING, BIT-VECTOR, SEQUENCE, CHAR-SEQUENCE, and BIT-SEQUENCE.
Each of these has a corresponding type-P predicate, and each can be
specified in list from, along with the element-type and dimension(s).

MAKE-ARRAY takes the following keywords: :ELEMENT-TYPE, :INITIAL-VALUE,
:INITIAL-CONTENTS, :FILL-POINTER, :DISPLACED-TO, :DISPLACED-INDEX-OFFSET,
and :VECTOR.

The following functions are redundant, but should be retained for
clarity and emphasis in code: MAKE-VECTOR, MAKE-STRING, MAKE-BIT-VECTOR.
MAKE-VECTOR takes a single length argument, along with :ELEMENT-TYPE,
:INITIAL-VALUE, and :INITIAL-CONTENTS.  MAKE-STRING and MAKE-BIT-VECTOR
are like MAKE-VECTOR, but do not take the :ELEMENT-TYPE keyword, since
the element-type is implicit.

If the :VECTOR keyword is not specified to MAKE-ARRAY or related forms,
the default is NIL.  However, sequences produced by random forms such as
CONCATENATE are vectors.

Strings always are printed using the "..." syntax.  Bit-vectors always
are printed using the #*... syntax.  Other vectors always print using
the #(...) syntax.  Note that in the latter case, any element-type
restriction is lost upon readin, since this form always produces a
vector of type T when it is read.  However, the new vector will be
EQUALP to the old one.  The #(...) syntax observes PRINLEVEL,
PRINLENGTH, and SUPPRESS-ARRAY-PRINTING.  The latter switch, if non-NIL,
causes the array to print in a non-readable form: #<ARRAY...>.

CHAR-SEQUENCEs print out as though they were strings, using the "..."
syntax.  BIT-SEQUENCES print out as BIT-STRINGS, using the #*... syntax.
All other arrays print out using the #nA(...) syntax, where n is the
number of dimensions and the list is actually a list of lists of lists,
nested n levels deep.  The array elements appear at the lowest level.
The #A syntax also observes PRINLEVEL, PRINLENGTH, and
SUPPRESS-ARRAY-PRINTING.  The #A format reads in as a non-displaced
array of element-type T.

Note that when an array is printed and read back in, the new version is
EQUALP to the original, but some information about the original is lost:
whether the original was a vector or not, element type restrictions,
whether the array was displaced, whether there was a fill pointer, and
the identity of any elements beyond the fill-pointer.  This choice was
made to favor ease of interactive use; if the user really wants to
preserve in printed form some complex data structure containing more
complex arrays, he will have to develop his own print format and printer.