[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[no subject]
- To: common-lisp@SU-AI.ARPA
- From: STEELE@TL-20A.ARPA
- Date: Thu 13 Sep 84 13:37:51-EDT
The definitions of characters and strings in Common Lisp assume the
English language and lexicographic ordering. Do you contemplate any
changes to the definition to allow it to work better with other
natural languages? Are you aware of any Lisp implementations which
have addressed this issue?
Thanks,
Kathy Kwinn
There are three issues I can think of here offhand. One is that certain
alphabets contain extra (or different) letters, such as the Scandinavian
slashed-O and AE ligature. Common LISP allows an implementation to
extend the character set, and some of the extra characters may be
alphabetic. Such extra characters may be of type string-char as well.
So these should fit in nicely with the current definitions of such
functions and alpha-char-p and string-capitalize. The primary difficulty
here is that if one uses the standard ASCII code national variants,
these extra letters use ASCII codes normally used in America for "[",
"]", and so on, which are required standard Common LISP characters.
The second issue is accents, such as the accents acute and grave, the
circumflex, the cedilla, the umlaut, and so on. These cause difficulties
no matter how you look at it, especially in string-capitalize, and Common
LISP currently does not address these at all. (As noted in the manual,
these cause problems in English, too: (string-capitalize "don't") =>
"Don'T", not "Don't".)
The third issue is lexicographic ordering. The Common LISP ordering is
actually relatively loose. I believe that lexicographic ordering for
European languages is already difficult even using ASCII because of
accents, extra letters, and special rules (such as "ch" being considered
a single letter for ordering purposes in Spanish dictionaries). Common
LISP does not solve these problems, but is no worse than existing
practice.
--Guy
-------