XFree86 has extended the Compound Text 1.1 specification (xc/doc/spec/CTEXT) to include the character sets for UTF-8, IS0 8559-10, ISO 8859-13, ISO 8859-14, ISO 8859-15, ISO 8859-16, and JIS X0212-1990.
Created attachment 121 [details] [review] Diff showing XFree86 changes to spec document
Looks in principle fine to me, but could be slightly rephrased to clarify, why one must not simply feed untranslated UTF-8 strings into CTEXT. The current X.Org CTEXT specification explicitly forbids the addition of something like UTF-8 in section 6: "ISO registered 'other coding systems' are not used in Compound Text; extended segments are the only mechanism for non-2022 encodings." UTF-8 is exactly an example of such an 'other coding system' ["coding system different from that of ISO/IEC 2022"] in the ISO registry: http://www.itscj.ipsj.or.jp/ISO-IR/ One has to use one of the ESC sequences reserved for leaving (ESC %G) and returning into ISO 2022 (ESC %@), to switch between ISO 2022 and UTF-8. Adding UTF-8 to CTEXT as an additional encoding option will not simplify anything. Any recipient of CTEXT still needs to implement the whole thing, with all the associated baggage of megabytes of conversion tables. The addition makes it clear that only characters for which there wasn't already an existing encoding option in CTEXT are allowed to be encoded in UTF-8. That restriction is very critical in the interest of backwards compatibility. The text could perhaps be rephrased a bit to strengthen this, namely for each encoding option, the X11 release with which it was added should be mentioned, and a general rule should state that each character should be encoded with the oldest (and perhaps even smallest) encoding option that covers it. This will make sure that CTEXT is not abused by simply feeding raw UTF-8 strings into it. In practice, UTF-8 should be used cleanly and on its own in UTF8_STRING. This makes it easier for applications to negotiate between UTF8_STRING and CTEXT as distinct encoding options, as opposed to muddling the two together into a single type. Perhaps, CTEXT should even be explicitely deprecated, with a pointer to UTF8_STRING as being the preferred alternative.
Adding missing QA contact
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/doc/xorg-docs/issues/4.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.