Bug 270

Summary: Additional charset additions to Compound Text Spec
Product: XStandards Reporter: Alan Coopersmith <alan.coopersmith>
Component: ICCCMAssignee: Paul Anderson <pma>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: high CC: Markus.Kuhn
Version: X11R6.6   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Diff showing XFree86 changes to spec document

Description Alan Coopersmith 2004-03-04 10:13:27 UTC
XFree86 has extended the Compound Text 1.1 specification (xc/doc/spec/CTEXT)
to include the character sets for UTF-8, IS0 8559-10, ISO 8859-13, ISO 8859-14,
ISO 8859-15, ISO 8859-16, and JIS X0212-1990.
Comment 1 Alan Coopersmith 2004-03-04 10:14:50 UTC
Created attachment 121 [details] [review]
Diff showing XFree86 changes to spec document
Comment 2 Markus Kuhn 2004-09-22 05:59:52 UTC
Looks in principle fine to me, but could be slightly rephrased to clarify, why
one must not simply feed untranslated UTF-8 strings into CTEXT.

The current X.Org CTEXT specification explicitly forbids the addition of
something like UTF-8 in section 6: "ISO registered 'other coding systems' are
not used in Compound Text; extended segments are the only mechanism for non-2022
encodings."

UTF-8 is exactly an example of such an 'other coding system' ["coding system
different from that of ISO/IEC 2022"] in the ISO registry:

http://www.itscj.ipsj.or.jp/ISO-IR/

One has to use one of the ESC sequences reserved for leaving (ESC %G) and
returning into ISO 2022 (ESC %@), to switch between ISO 2022 and UTF-8.

Adding UTF-8 to CTEXT as an additional encoding option will not simplify
anything. Any recipient of CTEXT still needs to implement the whole thing, with
all the associated baggage of megabytes of conversion tables. The addition makes
it clear that only characters for which there wasn't already an existing
encoding option in CTEXT are allowed to be encoded in UTF-8. That restriction is
very critical in the interest of backwards compatibility. The text could perhaps
be rephrased a bit to strengthen this, namely for each encoding option, the X11
release with which it was added should be mentioned, and a general rule should
state that each character should be encoded with the oldest (and perhaps even
smallest) encoding option that covers it. This will make sure that CTEXT is not
abused by simply feeding raw UTF-8 strings into it.

In practice, UTF-8 should be used cleanly and on its own in UTF8_STRING. This
makes it easier for applications to negotiate between UTF8_STRING and CTEXT as
distinct encoding options, as opposed to muddling the two together into a single
type.

Perhaps, CTEXT should even be explicitely deprecated, with a pointer to
UTF8_STRING as being the preferred alternative.
Comment 3 Tollef Fog Heen 2010-02-09 13:45:30 UTC
Adding missing QA contact
Comment 4 GitLab Migration User 2019-02-21 10:28:27 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/doc/xorg-docs/issues/4.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.