XFree86 has extended the Compound Text 1.1 specification (xc/doc/spec/CTEXT)
to include the character sets for UTF-8, IS0 8559-10, ISO 8859-13, ISO 8859-14,
ISO 8859-15, ISO 8859-16, and JIS X0212-1990.
Created attachment 121 [details] [review]
Diff showing XFree86 changes to spec document
Looks in principle fine to me, but could be slightly rephrased to clarify, why
one must not simply feed untranslated UTF-8 strings into CTEXT.
The current X.Org CTEXT specification explicitly forbids the addition of
something like UTF-8 in section 6: "ISO registered 'other coding systems' are
not used in Compound Text; extended segments are the only mechanism for non-2022
UTF-8 is exactly an example of such an 'other coding system' ["coding system
different from that of ISO/IEC 2022"] in the ISO registry:
One has to use one of the ESC sequences reserved for leaving (ESC %G) and
returning into ISO 2022 (ESC %@), to switch between ISO 2022 and UTF-8.
Adding UTF-8 to CTEXT as an additional encoding option will not simplify
anything. Any recipient of CTEXT still needs to implement the whole thing, with
all the associated baggage of megabytes of conversion tables. The addition makes
it clear that only characters for which there wasn't already an existing
encoding option in CTEXT are allowed to be encoded in UTF-8. That restriction is
very critical in the interest of backwards compatibility. The text could perhaps
be rephrased a bit to strengthen this, namely for each encoding option, the X11
release with which it was added should be mentioned, and a general rule should
state that each character should be encoded with the oldest (and perhaps even
smallest) encoding option that covers it. This will make sure that CTEXT is not
abused by simply feeding raw UTF-8 strings into it.
In practice, UTF-8 should be used cleanly and on its own in UTF8_STRING. This
makes it easier for applications to negotiate between UTF8_STRING and CTEXT as
distinct encoding options, as opposed to muddling the two together into a single
Perhaps, CTEXT should even be explicitely deprecated, with a pointer to
UTF8_STRING as being the preferred alternative.
Adding missing QA contact
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/doc/xorg-docs/issues/4.