We've been shipping this patch for some time in Debian now. The problem description from the patch header is reproduced below. You may want to note the licensing issue mentioned below, but we've been shipping it because the method by which this particular patch was generated and updated was also given below. This patch by Denis Barbier. WARNING: do not recode this file, it contains UTF-8 characters. The X11 protocol states that Unicode keysyms are in the range 0x01000100 - 0x0110FFFF. If the result of composing characters is a Unicode codepoint, X returns the corresponding Unicode keysym, which is its Unicode codepoint augmented by 0x01000000. Latin-1 characters must not appear with their Unicode codepoints in compose files, otherwise the returned composed character lies in the range 0x01000000 - 0x010000FF which is not valid. There are two solutions: either fix composing routines to return 0xZZ instead of 0x010000ZZ (where Z is an hexadecimal digit), or replace U00ZZ by their corresponding keysyms in compose files. The latter is more logical and less error prone, so compose files will be patched. Many applications accept these invalid Unicode keysyms, but few of them don't, most notably xemacs. Only UTF-8 locales are affected. This has been fixed very recently in XFree86 CVS (but not xorg), but for licensing reasons, this patch is not grabbed. Instead automatic conversion is performed by: sed -e '/XK_LATIN1/,/XK_LATIN1/!d' /usr/X11R6/include/X11/keysymdef.h \ | grep -v deprecated | grep 0x0 \ | sed -e 's/0x0/U0/' -e 's/XK_//' \ | awk '{ printf "s/\\b%s\\b/%s/ig\n", $3, $2; }' > sedfile for f in *.UTF-8 do sed -f sedfile $f > $f.tmp && mv $f.tmp $f done
Created attachment 6118 [details] [review] Compose fix for Latin-1
Sorry about the phenomenal bug spam, guys. Adding xorg-team@ to the QA contact so bugs don't get lost in future.
I do not understand this part: "The X11 protocol states that Unicode keysyms are in the range 0x01000100 - 0x0110FFFF. If the result of composing characters is a Unicode codepoint, X returns the corresponding Unicode keysym, which is its Unicode codepoint augmented by 0x01000000. Latin-1 characters must not appear with their Unicode codepoints in compose files, otherwise the returned composed character lies in the range 0x01000000 - 0x010000FF which is not valid." 1. Isnt't the "result of composing characters" always a Unicode codepoint? Do you mean here whether the result is a single Unicode codepoint or two Unicode codepoints? 2. Do you mean that <Multi_key> <macron> <U01EA> : "Ǭ" U01EC # LATIN CAPITAL LETTER O WITH OGONEK AND MACRON must not be changed to <Multi_key> <macron> <U100001EA> : "Ǭ" U01EC # LATIN CAPITAL LETTER O WITH OGONEK AND MACRON ? 3. The short description of the change is, for every Unicode keysym that is shown in the compose file as <U00??>, make it <U100001??> ? 4. Of course do the above - (3) - for keysymdef.h as well?
Script applied and result pushed in commit 4b0a14521449dfce8b4347bd17243efd1d3eae2d.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.