Bug 7417 - Compose fix for Latin-1
Summary: Compose fix for Latin-1
Status: CLOSED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Lib/Xlib (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: high normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-04 08:35 UTC by David Nusinow
Modified: 2007-12-05 16:46 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Compose fix for Latin-1 (35.91 KB, patch)
2006-07-04 08:36 UTC, David Nusinow
no flags Details | Splinter Review

Description David Nusinow 2006-07-04 08:35:43 UTC
We've been shipping this patch for some time in Debian now. The problem
description from the patch header is reproduced below. You may want to note the
licensing issue mentioned below, but we've been shipping it because the method
by which this particular patch was generated and updated was also given below.

This patch by Denis Barbier.

WARNING: do not recode this file, it contains UTF-8 characters.

The X11 protocol states that Unicode keysyms are in the range
0x01000100 - 0x0110FFFF.  If the result of composing characters
is a Unicode codepoint, X returns the corresponding Unicode
keysym, which is its Unicode codepoint augmented by 0x01000000.
Latin-1 characters must not appear with their Unicode codepoints
in compose files, otherwise the returned composed character lies
in the range 0x01000000 - 0x010000FF which is not valid.

There are two solutions: either fix composing routines to return
0xZZ instead of 0x010000ZZ (where Z is an hexadecimal digit),
or replace U00ZZ by their corresponding keysyms in compose files.
The latter is more logical and less error prone, so compose
files will be patched.
Many applications accept these invalid Unicode keysyms, but few of
them don't, most notably xemacs.  Only UTF-8 locales are affected.

This has been fixed very recently in XFree86 CVS (but not xorg),
but for licensing reasons, this patch is not grabbed.
Instead automatic conversion is performed by:
  sed -e '/XK_LATIN1/,/XK_LATIN1/!d' /usr/X11R6/include/X11/keysymdef.h \
  | grep -v deprecated | grep 0x0 \
  | sed -e 's/0x0/U0/' -e 's/XK_//' \
  | awk '{ printf "s/\\b%s\\b/%s/ig\n", $3, $2; }' > sedfile
  for f in *.UTF-8
  do
    sed -f sedfile $f > $f.tmp && mv $f.tmp $f
  done
Comment 1 David Nusinow 2006-07-04 08:36:05 UTC
Created attachment 6118 [details] [review]
Compose fix for Latin-1
Comment 2 Daniel Stone 2007-02-27 01:32:47 UTC
Sorry about the phenomenal bug spam, guys.  Adding xorg-team@ to the QA contact so bugs don't get lost in future.
Comment 3 Simos Xenitellis 2007-06-22 17:41:15 UTC
I do not understand this part:

"The X11 protocol states that Unicode keysyms are in the range
0x01000100 - 0x0110FFFF.  If the result of composing characters
is a Unicode codepoint, X returns the corresponding Unicode
keysym, which is its Unicode codepoint augmented by 0x01000000.
Latin-1 characters must not appear with their Unicode codepoints
in compose files, otherwise the returned composed character lies
in the range 0x01000000 - 0x010000FF which is not valid."

1. Isnt't the "result of composing characters" always a Unicode codepoint? Do you mean here whether the result is a single Unicode codepoint or two Unicode codepoints?
2. Do you mean that 

<Multi_key> <macron> <U01EA> 	: "Ǭ"   U01EC # LATIN CAPITAL LETTER O WITH OGONEK AND MACRON

must not be changed to 

<Multi_key> <macron> <U100001EA> 	: "Ǭ"   U01EC # LATIN CAPITAL LETTER O WITH OGONEK AND MACRON ?

3. The short description of the change is, for every Unicode keysym that is shown in the compose file as <U00??>, make it <U100001??> ?

4. Of course do the above - (3) - for keysymdef.h as well?
Comment 4 James Cloos 2007-08-18 14:34:11 UTC
Script applied and result pushed in commit 4b0a14521449dfce8b4347bd17243efd1d3eae2d.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.