[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LISP has both pure and reference data types



Tonite while thinking about my work interfacing PSL to Fortran, and recent
discussions about how in CL you aren't supposed to modify any structure
that was quoted in sourcecode, it occurred to me that LISP in general has
two distinct classes of data, which point hasn't been raised in any manual
I've read: (1) pure data, readonly, you are assured nobody will change it
out from under you, but to compute a new value you must allocate new memory;
(2) reference data, readwrite, designed to efficiently share side-effects
and/or to efficiently compute new values in place without having to allocate
a new place to put the newly-computed value. You can't mix the two. If you
try to modify a datum that somebody else expects to be pure, all hell breaks
loose, constants change out from under programs or you get a memory exception
from trying to modify a readonly page. But if you go to the bother of making
reference data but then don't bother to ever modify it in place, rather always
copy the whole thing to new memory every time you change something, your
program runs really slow.

Examples of pure data are: integers and reals (and complexes and rationals
in CL), strings that are given by syntax "...", and any other expression
that is given in source code explicitly or generated by macro expansion.
Examples of reference data are: all arrays constructed at runtime (which
generally means *all* arrays since most LISPs don't have a syntax for giving
arrays explicitly in source), any other non-atomic object that is created
at runtime, value cells, property-list cells, funtion cells.

Perhaps LISP (read CL and PSL) should make the distinction explicit, perhaps
even having parallel datatypes for those which are supposed to be readonly
(write once when allocating, thence readonly until GC'd) and those which are
supposed to be modifyable. For example, you could compute an integer cell
which had an initial value but that value could be changed out from under
the functions that referenced it, so that the side effects of changing that
integer could be shared more efficiently than if you shared the name of a
global or lexical variable whose value cell was modified to point to different
integers, and you could safely pass that integer cell to FORTRAN or other
software that wanted to modify the value in place; but you could also keep
around oldstyle integers that weren't supposed to be changed, and any
FORTRAN interface or other software could automatically know to make a copy
of the integer into a reference-integer cell before proceeding with the
code that modified it, leaving the oldstyle integer unmunged. Having
explicit data types for pure-integer versus modifyable-integer would be
cleaner than trapping memory exceptions on attempt to modify a readonly page,
and implementable on more systems.

Meanwhile, it should be documented that currently LISP doesn't have such
distinct data types in the language nor in most implementations, so it's
up to the user to manually keep track of which values are safe to change
and which aren't (and thus to be massively copied before passing them to
a routine which will modify in place for efficiency).

Also, perhaps we should have some new functions that do arithmetic in place,
to support efficient arithmetic in cases where shared type-declared lexical
variables aren't efficient enough or aren't general enough?
Thus in addition to Value <- (+ A B C ... Z) which can generate slightly
efficient code when all arguments are declared integer or real, we can
have (+InPlace Result A B C ... Z) which requires all arguments including
Result to be declared of the same type, likewise for the other primitive
arithmetic functions. +InPlace would not have to do a number-cons, even
if Result was a place deep inside some s-expression that was assumed
(user's responsibility currently) to already contain a modifyable number
of the correct type.
-------