[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A multi-byte character extension proposal

	Date: Tue, 23 Jun 87 12:54:31 PDT
	From: "Thomas Linden (Thom)" <baggins@ibm.com>
	        Are characters with different codes always syntactically
	           Can the standard character #\( have two different codes,
	corresponding, for example, to two different external file system
	representations of that character?  
(because you said 'the standard character'...)
We cannot talk about the different representations on files.
Some implementations may read 'similar' characters into one internal
code, but others may not.
	Does the JEIDA proposal permit two different string-chars to have
	the same print glyph, '(' for example, but different syntactical
We did not discuss about the issue related to your question,
because we have no scope on the characters which has the same print glyph but
different syntactical properties.
There are several japanese characters which have similar glyphs.
But their glyphs are not 'the same' (except for blank characters).
			Is it
	allowable to map both of these sets of codes into the one,
	internal Lisp character code set when inputting data to Lisp, and
	adopt our own conventions for translating output back to single
	and double byte?
	An elaboration of the the previous question: Is it possible for an
	implementation to represent all of the standard characters internally
	with 2-byte codes, and to map some 2-byte character codes and some
	1-byte character codes in system files onto the same set of 2-byte
	internal codes for the standard characters when read into Lisp?
	The English copy we saw of the proposal did not contain section 4.4.
	Based on our own translation from the original in Japanese, this
	section seems to discuss implementation issues. 
Since we could not make a good conclusion on the issue,
the section 4.4 of the early draft injapanese was deleted.
The proposal have many freedom for implementors.
	seem to be two possible treatments of double byte characters.  The
	first is the case where a double-byte character can be a standard
	character.  The second is where a double-byte character cannot be
	a standard character.
I think so too.
Implementation dependent.

						Is the difference
	between option 1 and option 2 whether the Lisp system would
	recognize a single-byte version and a double-byte version
	of this symbol-name in the same file as referring to the same
	(EQ) symbol?
	          1.  (list    abc    /fg   " xy " )
	          2.  (list    abc    /fg   " xy " )
	             --    ----   -----   ---    ----
	          3.  (list    abc    /fg   " xy " )
	             ------------------------   -----
We tried to select one and only one selection among the above 3 'options'.
But we found we cannot make decision until ISO related standardization
of japanese character representation.
I cannot understand what you said.
I don't imagine the status like "there is a character which has a same print glyph but different code."
Implementation dependent.
Standard-character may be single-byte or may be multi-byte,
according to the definition of the implementation.
	Is section 4.4 a part of the proposal to ANSI? 
	If you could elaborate (in English) on the content of section
	4.4, we would greatly appreciate it.
Please ask IBM  japan (your subsidiary) for the complex issue
behind the section 4.4 of the early draft in japanese.
We need more observations on other languages, file systems,
operating systems and JIS character set definition refinement
itself before we might make a firm guideline for the matter.
Your interpretation can cope with our proposal.
	If a Lisp system supports a large character code set, need it allow
	every character of type string-char to have a non-constituent syntax
	type defined in the readtable, or is the proposal's default that
	only standard characters need be represented in the readtable?
CLtL says (22.1.5 page  360):
"every character of type string-char must be represented in the readtable."
The members felt as we extended the definition of string-char to include
japanese characters, as the results of a natual interpretation of CLtL,
the readtable must have more than 64k 'logical' entries.
	  Thom Linden
Masayuki Ida

PS: our proposal is the result of several japanese and USA CL implementations.
Though we will welcome any opinions to our proposal, I feel
the final decision will be by ANSI, JIS, and ISO.
One of the members of our WG will attend the X3J13 meeting, since
I cannot leave my university on the next week.
He is a very active members and he knows the process.
He is scheduled to have a presentation on this issue at the X3J13 meeting.