Bug 50895

Summary: Update code for apostrophe from U+2019 (quotation mark) to U+02BC (letter)
Product: xkeyboard-config Reporter: Volodymyr M. Lisivka <vlisivka>
Component: GeneralAssignee: xkb
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: arysin, vlisivka
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Change U2019 to U02BC for apostrophe in ua(unicode).
Adding aprostrophe U+02BC to ua(unicode) layout
Change U2019 to U02BC for apostrophe in ua(unicode).
Move U+0027 apostrophe to first level and add U+02BC in ua(unicode) layout

Description Volodymyr M. Lisivka 2012-06-08 13:23:31 UTC
Created attachment 62817 [details] [review]
Change U2019 to U02BC for apostrophe in ua(unicode).

Please, apply attached patch to change code for Ukrainian letter apostrophe from U+2019 (RIGHT SINGLE QUOTATION MARK) to U+02BC (MODIFIER LETTER APOSTROPHE).

U+02BC letter apostrophe was poorly supported in software, so U+2019 quotation mark was chosen as temporary substitution. Situation is changed: I see no problems with U+02BC in Fedora 16. Moreover, U+02BC letter serves it purpose much better in most cases of text selection by double clicking on the word or spell checking (except for LibreOffice, where it is supported equally well to other variants of apostrophe).

Moreover, U+02BC letter is chosen for Ukrainian letter apostrophe to use in national domain system by representative organization UANIC (see http://uanic.net/node/204 for details, text in Russian, email is in English). They also providing modified keyboard layouts for Windows users (see 

Please, apply patch (see attachment), which is created by Andriy Rysin <arysin@gmail.com>  (see http://linux.org.ua/cgi-bin/yabb/YaBB.pl?num=1189996822/210 , response #212, for discussion in Ukrainian).

This patch changes key code for apostrophe from U+2019 to U+02BC and adds U+2019 quotation mark to third level at key <t>.
Comment 1 Andriy Rysin 2012-06-08 19:00:46 UTC
The discussion of changing the available apostrophes for uk(unicode) layout was held on Ukrainian Linux group (http://linux.org.ua/cgi-bin/yabb/YaBB.pl?num=1189996822/195#201 - the conversation is in Ukrainian) and couple of other sites and so far the agreement for majority of participants was that U+02BC cannot serve as the main apostrophe for Ukrainian layout.
Even though the support for this apostrophe improved over time there's still problems handling it in some software and fonts. That is fixable even if not immediately. But the main problem is that pretty much all of the existing Ukrainian texts use the ASCII apostrophe (0x27) and very small part use U+2019. So if users are going to search for Ukrainian words with U+02BC no matches will be returned by search engines like Google (instead it'll return matches without apostrophe which is not what user would want), the same problem will be present in internal website searches, forum searches etc.
As Google and other search engines treat U+2019 and 0x27 as equal those two are more interchangeable (so existing U+2019 even though not the "right" apostrophe for Ukrainian was less of a problem), with U+02BC it's not the case. The same is true for local searches performed in documents etc - for users that are not experienced about all peculiarities of available apostrophes searching in existing documents when using U+02BC as default apostrophe will be a nightmare.
So the proposal which was agreed by most of participants in that discussion was to put U+02BC into second level and put 0x27 into first level of the <TLDE> key. This way inexperienced users will still be able to work as before, but they will be able to type in Ukrainian domain names with apostrophe by using shift. As it's not expected to be a lot of Ukrainian names especially with apostrophe in near future, having it in the second level should be a good solution.
Also U+2019 is moved to the bottom of the keyboard in the forth level of letter N, thus if there's a need to search in some Ukrainian texts using this apostrophe it can be easily typed but so that it would not be too close to more appropriate apostrophes and would not confuse users that don't explicitly need to use it.
Comment 2 Andriy Rysin 2012-06-08 19:02:56 UTC
Created attachment 62828 [details] [review]
Adding aprostrophe U+02BC to ua(unicode) layout

Please disregard my first patch provided by Volodymyr M. Lisivka - it was a first draft and agreed is not a good solution any more.
Comment 3 Yuri Chornoivan 2012-06-08 22:08:59 UTC
It would be better to use the patch by Andriy Rysin as U+02BC apostrophe is unsupported by Ukrainian TeX symbol maps. Its glyph is unavailable in most of free and proprietary fonts.

U+02BC is traditionally used in Bodo, Dogri, and Maithili, not in Ukrainian Cyrillic, so it needs a lot of time for adoption.
Comment 4 Volodymyr M. Lisivka 2012-06-09 02:11:53 UTC
Created attachment 62840 [details] [review]
Change U2019 to U02BC for apostrophe in ua(unicode).

Reverting changes made by strangers.
Comment 5 Volodymyr M. Lisivka 2012-06-09 03:24:26 UTC
I checked Google search for all five variants of apostrophe. First 3 (U02BC, U2019, and ASCII) are supported equally well:

https://www.google.com.ua/search?q=%D0%BC%CA%BC%D1%8F%D1%82%D0%B0&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:uk:unofficial&client=firefox-a
https://www.google.com.ua/search?q=%D0%BC%E2%80%99%D1%8F%D1%82%D0%B0&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:uk:unofficial&client=firefox-a
https://www.google.com.ua/search?q=%D0%BC%27%D1%8F%D1%82%D0%B0&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:uk:unofficial&client=firefox-a

Substitution is proposed by Google search engine for alternate variants (double quote and back quote):

https://www.google.com.ua/search?q=%D0%BC%22%D1%8F%D1%82%D0%B0&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:uk:unofficial&client=firefox-a
https://www.google.com.ua/search?q=%D0%BC%60%D1%8F%D1%82%D0%B0&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:uk:unofficial&client=firefox-a


I checked TeXlive from Fedora 16: support for U02BC can be added by following command:

\DeclareUnicodeCharacter {700}{'}

However, as I can see here: http://lists.debian.org/debian-tex-maint/2012/04/msg00392.html , this problem is already fixed in latest version of TeXLive.


Users of recent versions of Windows (Vista) and Mac are reported that U02BC is supported by their OS'es. Ukrainian keyboard for Windows with U02BC for apostrophe can be downloaded here: http://uanic.net/node/219 (page in Russian, direct link: http://uanic.net/layout_ukr.rar ).

So, I see no major problems with support of U02BC in Fedora 16 or other OS'es. Moreover, U2019 and ASCII apostrophe are not recommended to use, because, when used, they are breaking word into two pieces, which causes problems with spell checking in most programs (e.g. gEdit, gnome-terminal, etc.), except LibreOffice and similar software.
Comment 6 Andriy Rysin 2012-06-09 06:20:31 UTC
Google gives different results for Ukrainian words with U+02BC than for 0x27 (or U+2019). E.g. if you try to search for «привʼязаний» (with U+02BC) you only will see words without apostrophe at all - only «привязаний». That's definitely not what user wants.
Comment 7 Volodymyr M. Lisivka 2013-06-10 11:23:31 UTC
After 1 year (09.Jun.2012-10.Jun.2013) of voting on linux.org.ua site, see http://linux.org.ua/cgi-bin/yabb/YaBB.pl?num=1189996822, we have following results:

  U+02BC (ʼ)   26 (41.9%)
  U+0027 (')   18 (29%)
  U+2019 (’)   18 (29%)

so, I assume that Ukrainian community supports these changes.

After more than year of testing at my own notebook, I see no technical problems with U+02BC at all.
Comment 8 Sergey V. Udaltsov 2013-06-15 21:20:30 UTC
That voting looks convincing. Can anyone explain me why we should not consider that matter setteld?
Comment 9 Andriy Rysin 2013-06-16 03:37:14 UTC
Just two small notes:
1) note: making U+02BC a default is a big change for those who currently have U+2019 and 0x27 as shown in next point, so - votes to make default the U+02BC - 26, against - 36
2) simple example why 02BC can't be good default (was valid year ago, still valid now, and I am afraid will be still valid for years to come):
let's google word "п'ять" (five) with three different apostrophes:
https://www.google.com/search?q="п'ять" (0x27) - 3 070 000 results
https://www.google.com/search?q="п’ять" (0x2019) - 3 070 000 results
https://www.google.com/search?q="пʼять" (0x02BC) - 24 200 results

I don't think I can trust the apostrophe which returns me 24k results (instead of 3 million) for simple Ukrainian numeral. :)

A lot of software (e.g. Firefox, LibreOffice.org...) treat 0x27 and 0x2019 similarly (e.g. when doing spellchecking 0x2019 is converted to 0x27). This is not the case with 0x20BC so expect many raised eyebrows if 0x20BC suddenly becomes default for unsuspecting users. :)

The results could be even worse in other places, e.g. http://uk.wikipedia.org gives the right article on "п'ять" with 0x27, but it gives 608 other results with 0x2019, and finds only 1 (unrelated) article with 0x2BC.

Hopefully this shows why 0x2BC is pretty poor choice as default one for most of the users.

Now that does not mean we can't add it to the keyboard, the question is only a good position. So far I see only 2 approaches:
1) leave current apostrophes where they are and add 0x02BC somewhere else, e.g. on 4th level on some of the numerics
2) make 0x27 the default apostrophe, put 0x2BC above (instead of 0x2019) to stress it's a preferred one if you really don't like the 0x27 and move 0x2019 somewhere else (but close) - for people who started to use it - and it's been used quite a bit since it was introduced

I prefer 1) for now so (those few) people can start using 0x2BC if they need to and other users don't get surprised. And when 0x02BC reaches the level of support the other two have we can talk about making it default.
Comment 10 Sergey V. Udaltsov 2015-05-13 22:51:59 UTC
This one seems a bit overlooked by everybody. Volodymyr, are you still around? Could you comment on Andriy's observations?
Comment 11 Volodymyr M. Lisivka 2015-05-14 22:19:12 UTC
There is not to much to comment.

Google will learn new character for apostrophe in same way like it learned five other characters.

Voting results on linux.org.ua:

  U+02BC (ʼ)   26 (41.9%)
  U+0027 (')   18 (29%)
  U+2019 (’)   18 (29%)

26 against 18.

Spell checking and other software works out of box because it is just another letter, nothing special at all.

It will be more words with proper apostrophe when we all will move to new standard.

Currently, new apostrophe is added to Ukrainian keyboard in Cyanogen (Android) only.

BTW: I registered domain mʼясо.укр (dig xn--m-z6a27ila3e.укр), but it is not bound to any host (my bad).
Comment 12 Andriy Rysin 2015-05-23 18:57:51 UTC
I don't think we want to overestimate google's learning abilities: I repeated my queries I posted above almost exactly 2 years ago for word "п’ять" (five):
https://www.google.com/search?q=%22%D0%BF%CA%BC%D1%8F%D1%82%D1%8C%22 (0x02BC) — 7,480 results
https://www.google.com/search?q=%22%D0%BF%E2%80%99%D1%8F%D1%82%D1%8C%22 (0x2019) — 1,030,000 results
https://www.google.com/search?q=%22%D0%BF%27%D1%8F%D1%82%D1%8C%22 (0x27) — 1,050,000 results
As one can see in 2 years situation didn't change at all — simple Ukrainian word "five" with (theoretically correct apostrophe) 0x02BC almost does not give any results.

And unfortunately it's not just google: I got the same results on yahoo.com: 213 000 results for the 0x27 and 0x2019 and 1720 results for the 0x02BC. And other search engines give pretty much similar results.

So although I don't oppose adding this apostrophe to the keyboard I would strongly suggest against putting it as a main one as I have hard time seeing people need to type a domain names that's not even bound to a host often, but many people do search on the web quite regularly.

The patch I proposed earlier (https://bugs.freedesktop.org/attachment.cgi?id=62828) puts 0x27 as a base apostrophe (as it's still most used one) and 0x02BC above it, and 0x2019 in another place just in case somebody needs to search in texts typed with 0x2019 (although now thinking about it I'd rather put 0x2019 in 3rd level of '/" key rather than in 4th level of t.
Comment 13 Sergey V. Udaltsov 2015-05-26 19:39:43 UTC
Volodymyr, I am inclined to agree with Andriy. After all, 18+18>26, even without looking at huge difference in google. We cannot afford SURPRISING people with unexpected behaviour. And we do not have any power over google - I am absolutely not sure it will learn the right character today or tomorrow. So while I agree that having the right character is essential - I am afraid we cannot make it default, even if it is more correct in theory.

So, I am going to apply the patch #1

Andriy, unfortunately that patch is not applicable to the git head. Could you pls check?

Thank you
Comment 14 Andriy Rysin 2015-05-27 01:20:07 UTC
I asked in couple of communities about comments for this change and preferred locations for (deprecated) U2019, I hope to get some feedback.
In any case I'll create new patch and will attach it here in couple of days.
Comment 15 Andriy Rysin 2015-05-27 23:56:43 UTC
Created attachment 116104 [details] [review]
Move U+0027 apostrophe to first level and add U+02BC in ua(unicode) layout
Comment 16 Michael Zajac 2015-08-10 19:06:02 UTC
Google’s reported number of results at the top of the first page is a lie. If you click through the results to the very last page, you get a very different result. If you click the “some results were omitted” link, you get something else again:

https://www.google.com/search?q=%22%D0%BF%CA%BC%D1%8F%D1%82%D1%8C%22 (0x02BC)
First page: “About 7,010 results”
Last page: “Page 23 of about 217 results”
Last page with omitted results included: “Page 67 of about 7,000 results”

https://www.google.com/search?q=%22%D0%BF%E2%80%99%D1%8F%D1%82%D1%8C%22 (0x2019)
First page: “About 992,000 results”
Last page: “Page 22 of about 211 results”
Last page with omitted results included: “Page 81 of about 978,000 results”

https://www.google.com/search?q=%22%D0%BF%27%D1%8F%D1%82%D1%8C%22 (0x27)
First page: “About 989,000 results”
Last page: “Page 22 of about 212 results”
Last page with omitted results included: “Page 81 of about 994,000 results”

I’m not sure how to interpret this, but none of Google’s results support validity of the 7,000/1,000,000 figures.
Comment 17 Sergey V. Udaltsov 2015-09-09 23:49:32 UTC
Thank you Andriy, your patch is committed.
Comment 18 Andriy Rysin 2015-09-10 04:13:11 UTC
Thanks Sergey. I'll apply same change to the Windows and Mac versions of the layout (http://r2u.org.ua/wiki/keyboard/UkrainianUnicode) some time soon.
Comment 19 Volodymyr M. Lisivka 2015-09-10 13:47:38 UTC
It looks like I need to add ua(true_unicode) layout, with 0x2bc (primary) and 0x2019 (as regular Unicode single quote) to solve problem without disturbing Andriy patch.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.