Bug 11930

Summary: Compose file problem with some Greek accents
Product: xorg Reporter: Brice Goglin <brice.goglin>
Component: Lib/XlibAssignee: James Cloos <cloos>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: adia, mat, mfabian, simos.bugzilla, sndirsch
Version: git   
Hardware: Other   
OS: All   
URL: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=436923
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 13275    
Attachments:
Description Flags
nls/el_GR.UTF-8/Compose.pre with added sequences
none
nls/el_GR.UTF-8/Compose.pre with only psili/dasia keysyms none

Description Brice Goglin 2007-08-10 08:57:37 UTC
Reported yesterday by Jan Willem Stumpel on the Debian BTS, seems to apply to libX11 1.0.3 as well as git. He says:

This problem is recent, but I do not know when it started. The Compose file (nls/en_US.UTF-8/Compose) now has wrong definitions for the Greek DASIA and PSILI symbols. So for instance when the keyboard is switched to Greek polytonic, "a no longer produces an alpha with dasia (ἁ).
Instead, with Greek polytonic keyboard, the " key now produces a combining diacritical (supposed to be placed _after_ the sign they should combine with). Unfortunately, as is well known, combining diacriticals are very tricky things; many apps and fonts lack support for them.

The problem can be cured by globally replacing, in the Compose file,
    U10000313 --> U0313
and U10000314 --> U0314

Perhaps the fundamental solution is to introduce new named keysyms (proposal:
dead_dasia for "314" and dead_psili for "313") for use in both files.
Comment 1 Alexandros Diamantidis 2007-11-16 04:45:25 UTC
> Perhaps the fundamental solution is to introduce new named keysyms (proposal:
> dead_dasia for "314" and dead_psili for "313") for use in both files.

Now that new keysyms have been defined (see bug #9306), other places that use
the old ones should be changed to reflect this. The el_GR.UTF-8/Compose file
currently has two lines for each psili/dasia compose sequence, one with
dead_horn/dead_ogonek and one with U0313/U0314. The new keysyms are
dead_psili/dead_dasia.

The only question is, should the new sequences *replace* the old ones,
or should they just be added to the existing ones for backwards
compatibility? I'm attaching two versions of the changed file, corresponding
to the two possibilities. Someone more qualified than me should select the
best one to use.

BTW, I also took the opportunity to fix spacing in the file, so that
columns would align. That's why I'm not attaching patches, since the real
changes would be drowned in the whitespace ones. Sorry if this causes
trouble for those reviewing the changes.
Comment 2 Alexandros Diamantidis 2007-11-16 04:48:10 UTC
Created attachment 12592 [details]
nls/el_GR.UTF-8/Compose.pre with added sequences

This one adds a new line with the new keysyms. For example...

<dead_horn> <Greek_alpha>                               : "ἀ"  U1f00
<U0313> <Greek_alpha>                                   : "ἀ"  U1f00
<dead_psili> <Greek_alpha>                              : "ἀ"  U1f00
Comment 3 Alexandros Diamantidis 2007-11-16 04:54:14 UTC
Created attachment 12593 [details]
nls/el_GR.UTF-8/Compose.pre with only psili/dasia keysyms

Same as the previous attachment with all lines using the old keysyms removed.
Comment 4 Stefan Dirsch 2007-11-16 05:16:22 UTC
Could this be related to Bug #5129?
Comment 5 Alexandros Diamantidis 2007-11-16 06:00:00 UTC
(In reply to comment #4)
> Could this be related to Bug #5129?

Yes, in that the same changes should be applied to en_US.UTF-8/Compose too. Apart from that, I see that the polytonic Greek part of that file has many other
problems that should definitely be fixed. I'll take a look at this and post
a patch there.

Thanks!

Comment 6 Simos Xenitellis 2007-11-16 16:57:16 UTC
Vassilis Vasaitis wrote a script to create the section for Greek polytonic. His website is not active anymore, however I put the script online at
http://planet.ellak.gr/misc/polytonic-compose.pl

Would it make sense to get the Greek polytonic section created as the output of the script only?
Comment 7 Mike FABIAN 2007-11-22 07:32:06 UTC
Stefan Dirsch> ould this be related to Bug #5129

Yes, I reopened bug #5129  just because of this problem.

Somebody replaced all Uxxxx with U1000xxxx in the compose file recently
because of a perceived conflict which didn’t really exist. 

This change must be reverted because it was wrong.
Comment 8 James Cloos 2007-12-04 04:05:53 UTC
I used a script similar to the one in the log for libX11.git commit c76d30253f1483ac8200ad5c032a818907e65030 to add dead_psili and dead_dasia entries to the en_US.UTF-8 and el_GR.UTF-8 Compose.pre files.

The en file has entries like <U10000313> where then el has <U0313>.  en was changed by commit 4c3e34bece7402f08139d34d1ef5834e3cf533c7; I'll update el to match later today.

If anyone wants to work on making the build use http://planet.ellak.gr/misc/polytonic-compose.pl from Simos’ comment above to generate the Compose.pre at tar-creation-time, please add it here!
Comment 9 Alexandros Diamantidis 2007-12-04 12:24:54 UTC
In fact, you should make the opposite change - that is, keep el as it is now and 
update en. Unicode keysym names have the following meaning: Uxxxx, where xxxx is 
a hexadecimal string between 100 and 10ffff, corresponds to keysym with value 
0x01000000 + xxxx. So, entries like <U10000313> are incorrect - the correct 
keysym for Unicode character U+0313 is <U0313>.

Comment 13 on bug #5129 is saying this, too.

Here's a Perl one-liner to fix this:

perl -pe 's/U1([0-9A-Fa-f]{7})/sprintf "U%04X", hex $1/ge'

...although the en file has various other problems, and at least for the 
polytonic Greek part it will be better, I think, to dump the current entries
and recreate it with an appropriate script.

The current el Compose seems mostly correct to me, although I haven't tested
it yet. The only strange thing I noticed is sequences like...

<dead_dasia> <dead_ogonek>
<U0313> <dead_horn>

...etc. - that is, entries mixing different incorrect keysyms for aspirations. They don't cause any problems, but since a given keyboard layout will either have the correct new keysyms for psili and dasia, or will have one of the old icorrect pairs, they are unnecessary.
Comment 10 James Cloos 2007-12-04 13:15:38 UTC
(In reply to comment #9)
> Unicode keysym names have the following meaning: Uxxxx, where xxxx
> is 
> a hexadecimal string between 100 and 10ffff, corresponds to keysym with value 
> 0x01000000 + xxxx. So, entries like <U10000313> are incorrect - the correct 
> keysym for Unicode character U+0313 is <U0313>.

OK.  I see in libX11/src/KeysymStr.c (cf: http://cgit.freedesktop.org/xorg/lib/libX11/tree/src/KeysymStr.c at the end of the file) code which does exactly that; I’ll push a fix throughout the nls files in libX11.git.

It looks like code points beyond UFFFF use UXXXXXXXX (a ten octet buffer is allocated rather than a six octet buffer).  I’ll ensure any of those are also correct.
Comment 11 James Cloos 2007-12-04 14:29:34 UTC
The U1000XXXX → UXXXX and U1001XXXX → U1XXXX issue is fixed in commit 438d02ebc08ee171cf1d3936f4c81050d428ab92.

Please reopen if I missed anything.
Comment 12 Simos Xenitellis 2008-01-10 14:55:34 UTC
Thanks James.

On a somewhat similar note, I filed Bug 14013 to add "dead_perispomeni". Currently Greek Polytonic uses dead_tilde which corresponds to 0x303 (Unicode), and not 0x342 (Perispomeni). 

I am posting this in case any subscribers to this bug are also interested.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.