The Compose file en_US.UTF-8 has some conflicts (same compose sequence, different resulting character). These are conflicts, that is, there are two compose sequences in the same Compose file that produce the same character. A. WARNING: Same keysyms for 1: LATIN_CAPITAL_LETTER_O_WITH_MACRON_AND_ACUTE and 2: GREEK_UPSILON_WITH_ACUTE_AND_HOOK_SYMBOL (GDK_Multi_key, GDK_apostrophe, GDK_Omacron, 0, 0) <Multi_key> <apostrophe> <Omacron> : "Ṓ" U1E52 # LATIN CAPITAL LETTER O WITH MACRON AND ACUTE <Multi_key> <apostrophe> <U03d2> : "ϓ" U03D3 # GREEK UPSILON WITH ACUTE AND HOOK SYMBOL The issue is that Omacron has the value of "0x03d2" which conflicts with a character from the Greek Unicode Block. B. WARNING: Same keysyms for 1: COLON_SIGN and 2: CENT_SIGN (GDK_Multi_key, GDK_slash, GDK_C, 0, 0) <Multi_key> <slash> <C> : "₡" U20a1 # COLON SIGN <Multi_key> <slash> <C> : "¢" U00A2 # CENT SIGN C. WARNING: Same keysyms for 1: COLON_SIGN and 2: CENT_SIGN (GDK_Multi_key, GDK_C, GDK_slash, 0, 0) <Multi_key> <C> <slash> : "₡" U20a1 # COLON SIGN <Multi_key> <C> <slash> : "¢" U00A2 # CENT SIGN D. WARNING: Same keysyms for 1: LATIN_CAPITAL_LETTER_O_WITH_MACRON_AND_ACUTE and 2: GREEK_UPSILON_WITH_ACUTE_AND_HOOK_SYMBOL (GDK_Multi_key, GDK_acute, GDK_Omacron, 0, 0) <Multi_key> <acute> <Omacron> : "Ṓ" U1E52 # LATIN CAPITAL LETTER O WITH MACRON AND ACUTE <Multi_key> <acute> <U03d2> : "ϓ" U03D3 # GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
Interesting. Wrt A&D, <Omacron> should not conflict with <U03d2>; <U03d2> should be defined as 0x10003d2, not as 0x03d2, ya? Is this instead a bug in the code that generated the list of conflicts? Wrt B&C, cent and colon have the obvious fix of using /c and c/ for ¢ and using /C and C/for ₡. I bet most users would expect cents to use a miniscule c and colon a majuscule anyway.
(In reply to comment #1) > Interesting. > > Wrt A&D, <Omacron> should not conflict with <U03d2>; <U03d2> should be > defined as 0x10003d2, not as 0x03d2, ya? Is this instead a > bug in the code that generated the list of conflicts? "Omacron" is defined in http://cvs.freedesktop.org/xorg/xc/include/keysymdef.h?view=markup In particular: #define XK_Omacron 0x03d2 /* U+014C LATIN CAPITAL LETTER O WITH MACRON */ Therefore, it's a problem in keysymdef.h. It would make sense to me Omacron to have the value of th 0x014C. How come does it have a different value? If for some reason it needs to be offsetted by 0x1000000, could you please add a patch? > > Wrt B&C, cent and colon have the obvious fix of using /c and c/ for ¢ and > using /C and C/for �. I bet most users would expect cents to use > a miniscule c and colon a majuscule anyway. Could you please patch this up as well?
(In reply to comment #2) > (In reply to comment #1) > > Interesting. > > > > Wrt A&D, <Omacron> should not conflict with <U03d2>; <U03d2> should be > > defined as 0x10003d2, not as 0x03d2, ya? Is this instead a > > bug in the code that generated the list of conflicts? > > "Omacron" is defined in > http://cvs.freedesktop.org/xorg/xc/include/keysymdef.h?view=markup > In particular: > #define XK_Omacron 0x03d2 /* U+014C LATIN CAPITAL LETTER > O WITH MACRON */ > > Therefore, it's a problem in keysymdef.h. > It would make sense to me Omacron to have the value of th 0x014C. How come does > it have a different value? > If for some reason it needs to be offsetted by 0x1000000, could you please add a > patch? Existing legacy keysyms are considered part of the protocol and will not -- not -- be changed. Before Unicode came around, characters like Omacron were defined with arbitrary keysyms (e.g. 0x03D2). Now Unicode has come, we can all use that, but we cannot change the value of legacy keysyms. So, if you want to use a Unicode value for which no keysym is defined, you offset it by 0x10000000. So, U03D2 is 0x100003D2. Omacron is 0x000003D2. So there's no conflict. > > Wrt B&C, cent and colon have the obvious fix of using /c and c/ for ¢ and > > using /C and C/for ���. I bet most users would expect cents to use > > a miniscule c and colon a majuscule anyway. > > Could you please patch this up as well? Please submit a diff to the Compose file with the results of whatever you come up with. I think you're best-placed to deal with this kind of thing.
(In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > Interesting. > > > > > > Wrt A&D, <Omacron> should not conflict with <U03d2>; <U03d2> should be > > > defined as 0x10003d2, not as 0x03d2, ya? Is this instead a > > > bug in the code that generated the list of conflicts? > > > > "Omacron" is defined in > > http://cvs.freedesktop.org/xorg/xc/include/keysymdef.h?view=markup > > In particular: > > #define XK_Omacron 0x03d2 /* U+014C LATIN CAPITAL LETTER > > O WITH MACRON */ > > > > Therefore, it's a problem in keysymdef.h. > > It would make sense to me Omacron to have the value of th 0x014C. How come does > > it have a different value? > > If for some reason it needs to be offsetted by 0x1000000, could you please add a > > patch? > > Existing legacy keysyms are considered part of the protocol and will not -- not > -- be changed. Before Unicode came around, characters like Omacron were defined > with arbitrary keysyms (e.g. 0x03D2). Now Unicode has come, we can all use > that, but we cannot change the value of legacy keysyms. So, if you want to use > a Unicode value for which no keysym is defined, you offset it by 0x10000000. > So, U03D2 is 0x100003D2. Omacron is 0x000003D2. So there's no conflict. > > > > Wrt B&C, cent and colon have the obvious fix of using /c and c/ for ¢ and > > > using /C and C/for ���. I bet most users would expect cents > to use > > > a miniscule c and colon a majuscule anyway. > > > > Could you please patch this up as well? > > Please submit a diff to the Compose file with the results of whatever you come > up with. I think you're best-placed to deal with this kind of thing. Thus, is the request for a patch that will add 0x100000 (if not already added) to all Unicode keysyms in the Compose file? I am happy to do it, and specifically provide a script for this, as the patch will be very big.
sure, adding all the missing entries sounds fine (though to resolve *this* bug, one would still need to remove some of the multiple definitions, no?). i just need something that I can apply the result of to the tree.
Created attachment 5353 [details] [review] Patch to remove conflict between cent/colon sequence Multi_key + C + slash = Colon Multi_key + C + slash = cent Multi_key + slash + C = Colon Multi_key + slash + C = cent These are conflicts, so, following discussion with Daniel, we remove the option for C (capital C) + slash to produce cent. For consistency, we do the same for C + bar.
(In reply to comment #5) > sure, adding all the missing entries sounds fine (though to resolve *this* bug, > one would still need to remove some of the multiple definitions, no?). i just > need something that I can apply the result of to the tree. Quite luckily, the "conflicts" are only those shown in this bug report which are very few. As conflict I describe the situation where two same sequences produce different characters. Due to this, one of the two sequences is not available at all to the end user. There are also "multiple" definitions, that is, two different sequences producing the same character. It's kind of redundancy. These are not fatal and can be accepted. The Hebrew section has several of those.
Created attachment 5357 [details] Updated Compose.pre with 0x10000000 added to Unicode keysyms. This is the updated Compose.pre. Due to the changes in the spaces between the fields, there is no benefit to provide the patch. I reapplied the Unicode names of the characters from UnicodeData.txt
committed to git
(In reply to comment #9) > committed to git Is that both patches? 1. https://bugs.freedesktop.org/attachment.cgi?id=5353 2. https://bugs.freedesktop.org/attachment.cgi?id=5357
no, because #5357 is a complete file, and AFAICT doesn't have the problem #5353 was fixing
Daniel Stone> So, if you want to use a Unicode value for which Daniel Stone> no keysym is defined, you offset it by 0x10000000. Here is a typo. The offset is 0x01000000! Daniel Stone> So, U03D2 is 0x100003D2. Omacron is 0x000003D2. Again the same typo. U03D2 is 0x010003D2. Daniel Stone> So there's no conflict. Yes.
The entries using <U100xxxxx> in the Compose file attached in comment #8 are wrong. They just don’t work. <U000xxxxx> is correct.
Simon Xenitellis> These are conflicts, that is, there are two Simon Xenitellis> compose sequences in the same Compose Simon Xenitellis> file that produce the same character. Simon Xenitellis> A. WARNING: Same keysyms for Simon Xenitellis> 1: LATIN_CAPITAL_LETTER_O_WITH_MACRON_AND_ACUTE and Simon Xenitellis> 2: GREEK_UPSILON_WITH_ACUTE_AND_HOOK_SYMBOL Simon Xenitellis> (GDK_Multi_key, GDK_apostrophe, GDK_Omacron, 0, 0) Simon Xenitellis> <Multi_key> <apostrophe> <Omacron> : "Ṓ" U1E52 # LATIN CAPITAL LETTER O WITH MACRON AND ACUTE Simon Xenitellis> <Multi_key> <apostrophe> <U03d2> : "ϓ" U03D3 # GREEK UPSILON WITH ACUTE AND HOOK SYMBOL I don’t see a conflict here. I can have both entries in the Compose file and both of them work. The keysyms do not conflict.
For example, if I add a like like keysym 5 = 5 percent 0x010003d3 Omacron to my ~/.Xmodmap for testing, I can type U+03D3 with AltGr+5 and Omacron with Shift+AltGr+5. And using that, I verified that both Compose sequences work and do *not* conflict.
see also http://bugzilla.novell.com/show_bug.cgi?id=337760 for some more problems in the current Compose file.
The other problems mentioned in http://bugzilla.novell.com/show_bug.cgi?id=337760 apart from the <U100xxxxx> -> <U100xxxxx> problem are apparently already fixed in the latest git checkout.
Created attachment 12264 [details] [review] libX11/nls/en_US.UTF-8/Compose.pre.diff Patch against libX11/nls/en_US.UTF-8/Compose.pre
(In reply to comment #15) > For example, if I add a like like > > keysym 5 = 5 percent 0x010003d3 Omacron > > to my ~/.Xmodmap for testing, I can type U+03D3 with AltGr+5 and Omacron > with Shift+AltGr+5. And using that, I verified that both Compose sequences > work and do *not* conflict. > Simon Xenitellis> <Multi_key> <apostrophe> <Omacron> : "Ṓ" U1E52 # LATIN CAPITAL LETTER O WITH MACRON AND ACUTE Simon Xenitellis> <Multi_key> <apostrophe> <U03d2> : "ϓ" U03D3 # GREEK UPSILON WITH ACUTE AND HOOK SYMBOL You mean at for the first line one would have to press Multi_key + apostrophe + AltGr + 5 : Ṓ and for the second line Multi_key + apostrophe + Shift + AltGr + 5 : ϓ and it works? It would look strange to me if it worked but I believe we have different things in mind when discussing this.
Simos Xenitellis> You mean at for the first line one would have to press Simos Xenitellis> Multi_key + apostrophe + AltGr + 5 : Ṓ Simos Xenitellis> and for the second line Simos Xenitellis> Multi_key + apostrophe + Shift + AltGr + 5 : ϓ Simos Xenitellis> and it works? Yes, exactly.
(In reply to comment #20) > Simos Xenitellis> You mean at for the first line one would have to press > Simos Xenitellis> Multi_key + apostrophe + AltGr + 5 : Ṓ > Simos Xenitellis> and for the second line > Simos Xenitellis> Multi_key + apostrophe + Shift + AltGr + 5 : ϓ > Simos Xenitellis> and it works? > > Yes, exactly. I am not quite sure if the average user will be able to make it with these sequences. Those "duplicates" came about when I wrote a script to convert the Xorg Compose file into a format that the GTK+ Input Method can recognise. In GTK+ IM, the above "duplicates" do not work; the second in the duplicate is always hidden. The same Xorg Compose file is replicated in SCIM (Afaik), and my guess is that there this problem also exists.
Simos Xenitellis> I am not quite sure if the average user will be able Simos Xenitellis> to make it with these sequences. That is a different problem from them being duplicates. Simos Xenitellis> Those "duplicates" came about when I wrote a script Simos Xenitellis> to convert the Xorg Compose file into a format that Simos Xenitellis> the GTK+ Input Method can recognise. In GTK+ IM, the Simos Xenitellis> above "duplicates" do not work; the second in the Simos Xenitellis> duplicate is always hidden. Isn’t this a GTK+ bug then? Simos Xenitellis> The same Xorg Compose file is replicated in SCIM Simos Xenitellis> (Afaik), and my guess is that there this problem Simos Xenitellis> also exists. SCIM (and, by the way, Qt as well) have the Compose file hardcoded. In case of SCIM, the auther of SCIM converted the Xorg Compose file to a the header file “scim_compose_key_data.h” which is compiled into SCIM. Therefore, unfortunately, the Compose handling in Xorg and SCIM may slightly differ because the current “scim_compose_key_data.h” was created years ago and the Xorg Compose file has since been changed somewhat. The author of SCIM wanted to hardcode this to make it fast. That might be a good idea. Nevertheless it is not nice that this can cause subtle differences in the compose handling. I am just working on an improvement to SCIM to parse the Xorg Compose file on the system where SCIM is compiled and compile that into SCIM. That is probably a reasonable compromise between speed and trying to make the Compose handling behave the same in Xorg and SCIM.
Please see also the original comment in bug #11930: Alexandros Diamantidis> The problem can be cured by globally replacing, in the Compose file, Alexandros Diamantidis> U10000313 --> U0313 Alexandros Diamantidis> and U10000314 --> U0314 *All* appearances of U1000xxxx should be replaced with U0000xxxx (or, probably even better, the shorter Uxxxx. Both work. But U1000xxxx does *not* work). U100xxxx should be replaced with Uxxxxx (or U000xxxxx). That is what my patch does.
Matthias, please commit Mike's patch.
The U1000XXXX → UXXXX and U1001XXXX → U1XXXX issue is fixed in commit 438d02ebc08ee171cf1d3936f4c81050d428ab92. I believe that covers the last issues here? Is this one ready to close?
James Cloos> I believe that covers the last issues here? James Cloos> Is this one ready to close? Yes, I also think this was the last issue here and this bug can be closed.
Verified. This is fixed in git. Thanks Mike!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.