Bug 101230 - font/encodings: Update GB18030 to 2005 version for non-BMP unicode support
Summary: font/encodings: Update GB18030 to 2005 version for non-BMP unicode support
Alias: None
Product: xorg
Classification: Unclassified
Component: Lib/Xfont (show other bugs)
Version: git
Hardware: All All
: medium minor
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
Depends on:
Reported: 2017-05-30 06:43 UTC by Mingye Wang (Arthur2e5)
Modified: 2018-08-10 20:16 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Comment Mingye Wang (Arthur2e5) 2017-05-30 06:43:58 UTC
The Chinese GB 18030 standard defines a four-byte code to cover chunks of Unicode unmapped by its one-byte and two-byte codes  (~GBK, Euro moved from \x80 [cp936] to \xA2\xE3). In the 2000 version of GB 18030, such expansion is limited to the BMP minus surrogates; in the 2005 version, the entire Unicode range (minus surrogates) up to U+10FFFF is covered. With a spec update, people can do emojis in telnet with a legacy-ish encoding!

The 2005 upgrade is largely backwards compatible with the 2000 spec, with a PUA swap between \xA8\xBC (U+1E3F ḿ) and \x81\x35\xF4\x37 (provisional PUA: U+E7C7) that addresses a Unicode addition. Since the higher-range areas are still largely unpopulated, most of the change would simply involve expanding the 2000.1 file, while renaming everything to 2005.

Keep in mind though, an upcoming GB 18030 update is likely to address a few of extra Unicode updates per https://github.com/whatwg/encoding/issues/27#issuecomment-287745429. The range part is unlikely to change still -- it can't get wider with Unicode for now.
Comment 1 GitLab Migration User 2018-08-10 20:16:22 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/lib/libxfont/issues/5.

