[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
cs proposal comments
- To: Common Lisp mailing <common-lisp@sail.stanford.edu>
- Subject: cs proposal comments
- From: Thom Linden <baggins@IBM.com>
- Date: Wed, 22 Feb 89 03:48:56 PST
>> From: sandra%defun@cs.utah.edu (Sandra J Loosemore)
>> Subject: comments on character proposal
>>
>> Getting rid of bits and fonts (section 2.1) seems like a very good
>> idea to me. I would argue for deleting these "features" completely
>> instead of merely deprecating them, because there now seems to be
>> general agreement that the whole idea was brain-damaged in the first
>> place, plus it's just about impossible to use them portably anyway
>> (since implementations are free not to support them). Deprecating the
>> features would simply perpetuate the current sad state of affairs in
>> to the ANSI standard.
I deleted Appendix B from the proposal. The attribute check list is
incorporated into the character chapter as implementation dependent.
>>
>> I am not at all sure why we need to standardize the idea of character
>> registries at all, much less state that a character can only belong to
>> one registry, or define a standard set of registries. What does having
>> registries buy the user, other than perhaps a way to test whether a
>> character belongs to one or not? Why isn't it sufficient just to say
>> that implementations can support extended characters, and leave it at
>> that?
The registries are introduced to allow an application a portable
way to name, compose and decompose characters. Currently, there is
no way to do this in any programming language. There are other
possiblities. For example, simply labeling all characters
uniquely; another to define a universal coded character set and use
these numeric codes to 'name' characters. I don't think using
numbers for naming characters is useful since I'll always forget
what character 34539 actually is! Registries seem to provide a
framework for useful categorization of characters. It also
avoids the current mess that the coded character set standards
are in.
>>
>> I'm confused about how you propose to handle characters that appear in
>> more than one character repetoire, and whether characters with accent
>> marks are considered distinct from characters without accents. For
>> example, is the French "C" with a cedilla distinct from a normal
>> French "C", and is that distinct from the standard-char "C"?
We handle characters that appear in more than one repertoire by
using registries. No character appears in more than one registry.
The constituents of the registries are not defined by Common LISP.
I believe that in most environments today, it is recognized that
characters with accents are distinct from their vanilla cousins.
As we have proposed registries, they contain semantically
distinct characters.
>>
>> The way the document describes things now, it seems like the Common
>> Lisp standard would have to include a statement of exactly what
>> characters belong in each of the standard registries listed in section
>> 2.2. Otherwise, implementors might go off and define their own
>> character registries that happen to include some characters that ought
>> to belong in one of these standard registries. For instance, the machine
>> I happen to be sitting in front of right now supports an 8-bit native
>> character set, and it seems perfectly reasonable for a Lisp runnning on
>> this machine to include all 256 characters in its base character set,
>> but some of those might actually be supposed to live off in some other
>> registry.
The registries are independent of any coded character sets.
In particular, coded character sets are not registries. Your base
repertoire (set of 256 characters) are possibly drawn from
several registries.
You are correct that lacking an international standard (or ANSI one),
for character registries an implementation could define the
a single registry containing all supported characters. It could
also define NO registries and use only the conventional naming
of characters. I expect an implementation taking the no-cost way
would choose the second approach. On the other hand, an
implementation supporting text processing across international
boundaries is more likely to define some reasonable registries
eg. Latin, Greek, etc..
>>
>> Also in section 2.2, why is it necessary for there to be a total
>> ordering, or even a partial ordering, of all characters? It seems
>> like CHAR< and friends are not very useful except when comparing base
>> characters anyway. It seems like it would difficult to get things
>> like the Spanish N-with-twiddle character to collate correctly anyway,
>> given the constraints you have put on how character codes are derived
>> and the requirement that CHAR< be just like < on the char-codes.
Right. This is now removed.
>>
>> It doesn't seem like STANDARD-CHAR-P belongs in the list of character
>> predicates on p. 9, since no extended characters can possibly be
>> STANDARD-CHAR-P anyway.
Right. This is now removed.
>>
>> The stuff in section 2.3 seems mostly reasonable to me. It's not really
>> clear why you need GENERAL-STRING (as distinct from STRING) and
>> SIMPLE-GENERAL-STRING (as distinct from SIMPLE-STRING). Again, some
>> rationale would be helpful.
GENERAL-STRING means (VECTOR CHARACTER). This is not the meaning of
STRING (a union type). I agree that GENERAL-STRING is not much
of an abbreviation over (VECTOR CHARACTER). It still seems somewhat
more mnemonic.
>>
>> In section 2.4, the general idea of specifying an external character
>> encoding to OPEN seems reasonable. However, I'm confused by the
>> business about having more than one coded character set mixed
>> together. If a character appears in more than one coded character
>> set, which encoding takes precedence? It seems like this has not been
>> well thought-out. Also, seeing as though we have just voted down a
>> proposal to add an EXTERNAL-WIDTH function, it seems like a very bad
>> idea to lump it in here.
Some encoding schemes allow disjoint coded characters sets to
coexist. That is, a given character would appear on one but not
the other. For example, a ISO8859/1 coded character set could
coexist with a coded character set for Chinese.
As for External-width, it was part of our subcommittee discussions
long before the recent stream proposal. It will be a separate
item in the list of character votes.
>>
>> Now for the general comments.
>>
>> One thing that is not clear to me from reading this document is how
>> much of it has already been standardized by ISO. I share Larry's
>> concern that we might standardize one thing, and then have ISO go off
>> and standardize something completely different. I think it's a
>> mistake to try to second-guess what ISO might do.
The revision might make this clearer. I think this is a
red herring anyhow. As a programming language committee
we need to specify what is useful in the context of LISP. We
can't expect a coded character set committee to figure it out.
On the other hand, we can influence what gets standardized
by defining our framework. The ISO Prolog std committee is
interested in what we define.
>>
>> I am also concerned about trying standardize things that have not yet
>> been implemented. I think it's a mistake to try to do language design
>> in a standards committee.
>>
>> Finally, I have some problems with the presentation of your proposal.
>> One problem, as I mentioned at the meeting, is that you've made it an
>> all-or-nothing package, and I can't vote for the whole thing because
>> there are some parts of it that do not seem appropriate, even though I
>> would support some of the other changes individually. The other
>> problem is that Appendix A is virtually unreadable. Some of the
>> conceptual changes involve wording changes to several passages, and I
>> know that there are some other changes in the appendix that are not
>> mentioned in the introductory blurb at all. Is it totally impossible
>> to recast the changes in standard cleanup format proposals? The
>> advantage of that format is that it presents more context, including a
>> clear statement of why the existing CLtL behavior is "broken" and a
>> rationale for the proposed change.
There will be several votes regarding this proposal. I don't
intend to rewrite the document in a cleanup format.
>>
>> I know that we adopted things like the CLOS document that were
>> presented as single mega-proposals, but those were primarily additions
>> to the language and what you are proposing is essentially a large
>> number of incompatible changes. I'm having a hard time identifying
>> what all of those changes are.
>>
Actually, I don't think it's as large a number of changes as you
imply. In any case, the vote split should help this out.