Bug 18751

Summary: Add compose sequences from gtk+ to X.Org
Product: xorg Reporter: Simos Xenitellis <simos.bugzilla>
Component: Lib/Xlib (data)Assignee: Xorg Project Team <xorg-team>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: minor    
Priority: low CC: alister.hood, bensberg, jeremyhu, monnier, reinouts
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard: 2011BRB_Reviewed
i915 platform: i915 features:
Attachments:
Description Flags
Compose sequences (originating from gtk+)
none
Patch with additions from gtk+
none
Improved patch with 10 less additions as was discussed
none
adds sequences for skull and crossbones, umbrella, and up and down arrows none

Description Simos Xenitellis 2008-11-27 17:34:09 UTC
Created attachment 20644 [details]
Compose sequences (originating from gtk+)

The compose sequences in the attachment used to exist in gtk+ 
but are not found in the current X.Org Compose (en_US.UTF-8) file.

A large group of these compose sequences are of the form

<Multi_key> <A> <acute>                         : "Á" U00C1
<Multi_key> <a> <acute>                         : "á" U00E1

which is 'letter' first, then 'punctuation'.
X.Org's Compose file does not have this type of sequence (has 'punctuation', then 'letter').

What I would like is a comment on which sequences are OK to add to X.Org's Compose.
I can file then individual reports, etc.

references: http://bugzilla.gnome.org/show_bug.cgi?id=557420
Comment 1 Alan Coopersmith 2009-03-12 19:51:55 UTC
Bug #3138 also points out that the ISO-8859-1 locales have the compose entries
in both orders, but the UTF-8 ones only have the one order.
Comment 2 Jeremy Huddleston Sequoia 2011-10-03 17:20:28 UTC
If you can provide a patch to add all these changes, I'll certainly review it.
Comment 3 Jeremy Huddleston Sequoia 2011-10-03 19:21:56 UTC
*** Bug 3138 has been marked as a duplicate of this bug. ***
Comment 4 Pander 2011-12-30 03:48:55 UTC
To be exact, the proposed patch from Simon originates from
  http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtk-compose-lookaside.txt?view=markup

Please use that URL for it is the trunk. When this has been fixed, please report back also to Simos so that GTK+/GNOME doesn't have to use their fix and can use the upxtream Compose from Xorg directly.

Jeremy, is the above URL workable or do want a true patch? If you want a patch file with the sequences in the proper sections with the proper UTF-8 comments, please let me know and I will provide it for you.

Fixing this will reconcile compose key sequences between GNOME and other X11-based desktop environments.
Comment 5 Jeremy Huddleston Sequoia 2012-01-02 20:21:55 UTC
Please provide a patch to libX11 (such as 'git format-patch HEAD^' after you commit your changes).  You should email this to the xorg-devel mailing list for review.
Comment 6 Pander 2012-01-05 06:33:32 UTC
Working on it as we speak. Is a lot of boring manual labour but worth it in the end. Will post patch to the list when it is done.
Comment 7 Pander 2012-01-06 08:45:44 UTC
Created attachment 55217 [details] [review]
Patch with additions from gtk+

General note
============
All proposed sequences have put in the correct place in the upstream file. Whitespace has been fixed where needed and comment (unicode name) been added. (Lot of work :S) also many automated checks have been run to filter out conflicting sequences listed below and improve the overall quality, also for some existing definitions. Check script is available for those who want to use it.

Preparing this patch took a lot of work. I appreciate the work done in gtk+ on the gtk-compose-lookaside.txt a lot, because it offers me as a GNOME user the sequences I really need. Even though it took a while to merge these upstream (no matter what reason) hopefully this patch will fix a lot and eventually gtk+ can use Xorg definitions directly. I understand completely that it is not easy to choose and I hope this patch will pave the way to unique and optimal compose sequence definitions.


Benefit
=======
Wordt case scenario is that gtk-compose-lookaside.txt will go from the the current 413 exceptions to only the 32 omitted sequences listed above. Best case scenario is that gtk-compose-lookaside.txt wil be emptied and GNOME (Unity) will use the same compose sequences as Xorg, KDE, etc.


Changes in upstream
===================
This
  <Multi_key> <d> <minus>          	: "₫"   U20ab # DONG SIGN
has been changed into
  <Multi_key> <d> <equal>          	: "₫"   U20ab # DONG SIGN
and the following has been added too balance it all
  <Multi_key> <equal> <d>          	: "₫"   U20ab # DONG SIGN
in order remove conflict with
  <Multi_key> <d> <minus>          	: "đ"   dstroke # LATIN SMALL LETTER D WITH STROKE
which exists in larger series of sequences of a letter and <minus> which are also more likely to be used. the use of <equal> instead of minus for dong currency is also supported by the fact that <equal> is used a lot for currencies, especially for those, like this one, containing two horizontal lines.

These
  <Multi_key> <o> <apostrophe> <A> 	: "Ǻ"   U01FA # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
  <Multi_key> <o> <apostrophe> <a> 	: "ǻ"   U01FB # LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE
have been changed into
  <Multi_key> <asterisk> <apostrophe> <A> 	: "Ǻ"   U01FA # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
  <Multi_key> <asterisk> <apostrophe> <a> 	: "ǻ"   U01FB # LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE
in order allow from proposal without conflict
  <Multi_key> <O> <apostrophe> 		: "Ó"   Oacute # LATIN CAPITAL LETTER O WITH ACUTE
  <Multi_key> <o> <apostrophe> 		: "ó"   oacute # LATIN SMALL LETTER O WITH ACUTE

These
  <Multi_key> <U> <comma> <E>      	: "Ḝ"   U1E1C # LATIN CAPITAL LETTER E WITH CEDILLA AND BREVE
  <Multi_key> <U> <comma> <e>      	: "ḝ"   U1E1D # LATIN SMALL LETTER E WITH CEDILLA AND BREVE
have been changed into
  <Multi_key> <U> <space> <comma> <E>  	: "Ḝ"   U1E1C # LATIN CAPITAL LETTER E WITH CEDILLA AND BREVE
  <Multi_key> <U> <space> <comma> <e>  	: "ḝ"   U1E1D # LATIN SMALL LETTER E WITH CEDILLA AND BREVE
in order allow from proposal without conflict
  <Multi_key> <U> <comma> 		: "Ų"   U0172 # LATIN CAPITAL LETTER U WITH OGONEK
  <Multi_key> <u> <comma> 		: "ų"   U0173 # LATIN SMALL LETTER U WITH OGONEK
Note that this changes in inline with usage of <space> <comma> for semicolon is used more often for hit


These two have been removed
  <Multi_key> <quotedbl> <backslash> 	: "〝"   U301d # REVERSED DOUBLE PRIME QUOTATION MARK
  <Multi_key> <quotedbl> <slash>   	: "〞"   U301e # DOUBLE PRIME QUOTATION MARK
because the compose sequences are easy to "stumble upon" when users are trying to figure out where double opening and closing quotes are. These are Unicode from CJK which are rarely supported in Latin fonts. Also Unicode Consortium advises *not* to use these in no-CJK works. I do would like to reintroduce these with a J or Q prefix in another proposal supporting a complete range of Japanese compose key sequences.


Ignored from downstream
=======================
The following sequences have been ignored from downstream proposal for which no compose sequence ware free because. These sequences were already used in larger series of sequences which are also more likely to be used. The ignored proposed are:
  <Multi_key> <C> <slash> 		: "¢"   cent # CENT SIGN
  <Multi_key> <slash> <C> 		: "¢"   cent # CENT SIGN
  <Multi_key> <L> <equal> 		: "£"   sterling # POUND SIGN
  <Multi_key> <equal> <L> 		: "£"   sterling # POUND SIGN
  <Multi_key> <l> <equal> 		: "£"   sterling # POUND SIGN
  <Multi_key> <equal> <l> 		: "£"   sterling # POUND SIGN
  <Multi_key> <underscore> <A> 		: "ª"   ordfeminine # FEMININE ORDINAL INDICATOR
  <Multi_key> <A> <underscore> 		: "ª"   ordfeminine # FEMININE ORDINAL INDICATOR
  <Multi_key> <underscore> <a> 		: "ª"   ordfeminine # FEMININE ORDINAL INDICATOR
  <Multi_key> <a> <underscore> 		: "ª"   ordfeminine # FEMININE ORDINAL INDICATOR
  <Multi_key> <underscore> <O> 		: "º"   masculine # MASCULINE ORDINAL INDICATOR
  <Multi_key> <O> <underscore> 		: "º"   masculine # MASCULINE ORDINAL INDICATOR
  <Multi_key> <underscore> <o> 		: "º"   masculine # MASCULINE ORDINAL INDICATOR
  <Multi_key> <o> <underscore> 		: "º"   masculine # MASCULINE ORDINAL INDICATOR
  <Multi_key> <period> <period> 		: "˙"   U02D9 # DOT ABOVE
  <Multi_key> <c> <o> 			: "©"   copyright # COPYRIGHT SIGN
  <Multi_key> <c> <O> 			: "©"   copyright # COPYRIGHT SIGN
  <Multi_key> <exclam> <S> 		: "§"   section # SECTION SIGN
  <Multi_key> <exclam> <s> 		: "§"   section # SECTION SIGN
  <Multi_key> <asciicircum> <0> 		: "°"   degree # DEGREE SIGN
  <Multi_key> <0> <asciicircum> 		: "°"   degree # DEGREE SIGN
Plenty alternative sequences that are also easy to use are already available.

The following sequences have been ignored from the proposal because they are blocking another sequence that is part of a longer series of sequences:
  <Multi_key> <parenleft> <r> 		: "®"   registered # REGISTERED SIGN
  <Multi_key> <parenleft> <c> 		: "©"   copyright # COPYRIGHT SIGN
  <Multi_key> <parenleft> <A> 		: "Ă"   U0102 # LATIN CAPITAL LETTER A WITH BREVE
  <Multi_key> <parenleft> <a> 		: "ă"   U0103 # LATIN SMALL LETTER A WITH BREVE
  <Multi_key> <parenleft> <G> 		: "Ğ"   U011E # LATIN CAPITAL LETTER G WITH BREVE
  <Multi_key> <parenleft> <g> 		: "ğ"   U011F # LATIN SMALL LETTER G WITH BREVE
Plenty alternative sequences that are also easy to use are already available.

The following sequences have been ignored from the proposal because they are blocking complete series of sequences which are also is part of a even longer series of sequences:
  <Multi_key> <asciicircum> <underscore> 	: "¯"   macron # MACRON
  <Multi_key> <Greek_iota> <apostrophe> 	: "ί"   U03AF # GREEK SMALL LETTER IOTA WITH TONOS
Plenty alternative sequences that are also easy to use are already available.

The following sequences have been ignored from the proposal because they will get blocked by existing upstream shorter seqeunce which are more likely to be used:
  <Multi_key> <quotedbl> <apostrophe> <Greek_upsilon> 	: "ΰ"   U03B0 # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
  <Multi_key> <quotedbl> <apostrophe> <Greek_iota> 	: "ΐ"   U0390 # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
  <Multi_key> <quotedbl> <apostrophe> <space>	: "΅"   U0385 # GREEK DIALYTIKA TONOS
Plenty alternative sequences that are also easy to use are already available.


Note to downstream
==================
This line contains a typo
  <Greek_accentdieresis> <Greek_upsilon>		: "ΐ" U03B0
and has been to but was ignored (see above) anyway
  <Greek_accentdieresis> <Greek_upsilon>	: "ΰ"   U03B0 # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS


Note to upstream
================
All Unicode names have been verified with UnicodeData.txt from Unicode Consortium. Only these already upstream existing codes were unknown:
  <dead_currency> <U> 		      : "圓"   U5713              # YUAN / WEN
  <dead_currency> <u> 		      : "元"   U5143              # YUAN / WEN
  <dead_currency> <Y> 		      : "円"   U5186              # YEN
Beset to leave them as is.


References
==========
Upstream bug report
  https://bugs.freedesktop.org/show_bug.cgi?id=18751
Downstream bug report
  https://bugzilla.gnome.org/show_bug.cgi?id=666710
Related upstream bug report
  https://bugs.freedesktop.org/show_bug.cgi?id=44312
Comment 8 Pander 2012-03-02 07:47:05 UTC
Created attachment 57923 [details] [review]
Improved patch with 10 less additions as was discussed
Comment 9 Pander 2012-03-02 07:48:17 UTC
Latest patch down not have the following lines as was discussed on the mailing list:

<Multi_key> <0> <c> 			: "©"   copyright # COPYRIGHT SIGN
<Multi_key> <0> <C> 			: "©"   copyright # COPYRIGHT SIGN
<Multi_key> <c> <0> 			: "©"   copyright # COPYRIGHT SIGN
<Multi_key> <C> <0> 			: "©"   copyright # COPYRIGHT SIGN
<Multi_key> <s> <0> 			: "§"   section # SECTION SIGN
<Multi_key> <0> <s> 			: "§"   section # SECTION SIGN
<Multi_key> <S> <0> 			: "§"   section # SECTION SIGN
<Multi_key> <0> <S> 			: "§"   section # SECTION SIGN
<Multi_key> <0> <X> 			: "¤"   currency # CURRENCY SIGN
<Multi_key> <X> <0> 			: "¤"   currency # CURRENCY SIGN
<Multi_key> <0> <x> 			: "¤"   currency # CURRENCY SIGN
<Multi_key> <x> <0> 			: "¤"   currency # CURRENCY SIGN
<Multi_key> <exclam> <p> 		: "¶"   paragraph # PILCROW SIGN
<Multi_key> <exclam> <P> 		: "¶"   paragraph # PILCROW SIGN

The patch has also been generated on latest git head of 2012-03-02
Comment 10 James Cloos 2012-03-14 15:13:49 UTC
Pushed as 91bcce48d94792f78333d2aea73961cc2e739d2e.

Additional fix pushed as 62d42953893f93a98db0504eaf06d650ceaf5811:

    Fix the gtk+ additions
    
    (Some of) the Dstroke and dstroke entries already were present as U011[01],
    even though XK_Dstroke and XK_dstroke are part of the latin2 set in keysymdef.h.
    
    The addition of <Multi_key> <o> <apostrophe> as a postfix version of
    <Multi_key> <apostrophe> <o> blocks the existing entries for ǻ and Ǻ.
    That prevents its and <Multi_key> <O> <apostrophe>’s addition.
Comment 11 James Cloos 2012-03-14 15:18:55 UTC
Julien notes that http://patchwork.freedesktop.org/patch/3289/
wanted to use <Multi_key> <O> <X> for ☠ U2620 # SKULL AND CROSSBONES.

I’m inclined to undo that addition from here for ☠’s benefit.

Thoughts?
Comment 12 Pander 2012-03-19 06:23:13 UTC
Do you also want to use the reverse sequence for skull and crossbones? That would lead to less confusing sequences.
Comment 13 Pander 2012-03-19 07:20:33 UTC
Also some housekeeping needs to be done on several warnings and errors. These are listed bu running this new version of check.py, see http://pastebin.com/AmqxYPn5
Comment 14 Julien Cristau 2012-03-19 11:35:56 UTC
(In reply to comment #12)
> Do you also want to use the reverse sequence for skull and crossbones? That
> would lead to less confusing sequences.

i don't think we should add two sequences where one works just fine, no.
Comment 15 James Cloos 2012-03-20 08:23:24 UTC
>>>>> "b" == bugzilla-daemon  <bugzilla-daemon@freedesktop.org> writes:

>> Do you also want to use the reverse sequence for skull and crossbones? That
>> would lead to less confusing sequences.

b> i don't think we should add two sequences where one works just fine, no.

I think he meant instead of, not in addition to.
Comment 16 Benno Schulenberg 2013-09-08 16:06:13 UTC
Created attachment 85440 [details] [review]
adds sequences for skull and crossbones, umbrella, and up and down arrows

(In reply to comment #15)
> >> [Pander wrote:] Do you also want to use the reverse sequence for
> >> skull and crossbones? That would lead to less confusing sequences.
> 
> > [Julien wrote:] i don't think we should add two sequences where
> > one works just fine, no.
> 
> [James wrote:] I think he meant instead of, not in addition to.

I think Pander meant in addition to, because the sequences with <O> <X> and with <X> <O> already exist, both producing the currency sign.  If the first starts to produce a skull with crossbones, and the latter still the sign for monies... that would be somewhat confusing.  Better have the reversed sequence do the same.
Comment 17 GitLab Migration User 2018-08-10 20:11:52 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/lib/libx11/issues/53.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.