[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
READ and "illegal" characters
- To: SEB1525@draper.com, common-lisp@SAIL.STANFORD.EDU
- Subject: READ and "illegal" characters
- From: Michael Greenwald <Greenwald@STONY-BROOK.SCRC.Symbolics.COM>
- Date: Tue, 30 Aug 88 12:46 EDT
- In-reply-to: The message of 25 Aug 88 13:36 EDT from "Steve Bacher (Batchman)" <SEB1525@draper.com>
Date: Thu, 25 Aug 88 13:36 EDT
From: "Steve Bacher (Batchman)" <SEB1525@draper.com>
Taking the description of the CL reader at face value, I infer that an
"illegal" character may occur in a symbol name if it is preceded by a
backslash ("single escape"), but not if it occurs inside a pair of
vertical bars ("multiple escape"). This seems strange. Is it merely
an oversight, or is it intentional?
There appear to be two types of "illegal" characters - (a) a character
with an "illegal" syntax type, or (b) a constituent character with an
"illegal" attribute.
Only illegal characters of type (b) are specified in the manual. Since
it is impossible for a programmer to explicitly specify the syntactic
type of a character (you can only copy it by set-syntax-from-char), it
is up to the implementors to allow, or disallow an "illegal" syntactic
type (type (a) illegal characters) in their implementation.
Step 9 on page 337 specifically says that the reader performs "one of
the following actions" according to the character's >syntactic< type.
If you want multiple escapes to behave identically to single escapes,
you can choose to make no characters with "illegal" syntactic type.
(Make them all whitespace, or constituent with an "illegal" attribute)
This finesses the question of whether CLtL means to treat single and
multiple-escapes differently. But it does mean that the question isn't
>significant<.
Notice, though, that portability of printed representation isn't an
issue here, because none of the standard characters have an "illegal"
syntax type.
This is a potentially significant problem, because it mandates that the
printer must slashify "illegal" characters by preceding each one
individually with a backslash rather than being able to just surround
the entire name with vertical bars. For some implementations (i.e. mine),
it is easier to embar the entire name, once it is determined that funny
characters are present somewhere in the name.
Is it intended that all characters not listed in the table as consituent,
macro, etc. are "illegal"? Or might an implementation be able to treat
them all as constituent characters?
I believe the latter must be correct, (the implementor can choose the
syntactic type of all non-standard characters), otherwise it would be
impossible to read in symbols in (for example) Japanese.