Bug 29465 - Remove locl substitutions for Romanian
Summary: Remove locl substitutions for Romanian
Status: RESOLVED FIXED
Alias: None
Product: DejaVu
Classification: Unclassified
Component: General (show other bugs)
Version: unspecified
Hardware: All All
: medium normal
Assignee: Deja Vu bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-09 11:40 UTC by Mihai Capotă
Modified: 2011-02-19 10:20 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Firefox rendering both with and without locl at the same time (69.73 KB, image/png)
2010-08-09 11:40 UTC, Mihai Capotă
Details

Description Mihai Capotă 2010-08-09 11:40:34 UTC
Created attachment 37740 [details]
Firefox rendering both with and without locl at the same time

DejaVu currently uses locl substitution for Romanian, as introduced with SVN revisions 2258, 2259 and 2260.

These substitutions are causing confusion.

The problem with Romanian letters Ș and Ț is ongoing; it is in no way historical. The best example I can think of is the Romanian Wikipedia still using the old Unicode code points for compatibility reasons, and even automatically transforming the new code points in edited text to the old code points. [1]

Furthermore, the implementation of locl is not uniform. For example, Firefox and Chromium use locl in the GUI, but don't use it for content. See the attached screenshot from Ubuntu 10.04.

In these circumstances, the false consistency introduced by the locl substitutions only makes things worse by confusing people about the code points they are using.

[1] (Romanian) http://ro.wikipedia.org/wiki/Wikipedia:Diacriticele_vechi_%C8%99i_noi
Comment 1 Ben Laenen 2010-08-09 12:42:36 UTC
The problem is that many documents still use the S cedilla instead of S comma. Too many to ignore. And I agree it would be better if we could drop the locl on s cedilla, but we're far from that.

Trying to enforce it by changing the font doesn't work anyway. People use whatever they get when pressing the key on their keyboard, so that has to be changed if that still doesn't work properly in some operating systems.


The screenshot shows two different fonts in the webpage for s cedilla and s comma btw. That means that the font used for rendering s cedilla doesn't have a s comma and displays as such.
Comment 2 Mihai Capotă 2011-02-15 07:17:13 UTC
Sorry for not responding in such a long time. The default font in my distribution (Ubuntu) has changed and I was focused on making sure the new font is right for Romanian.

Would you please read the discussion on the Ubuntu locl substitution bug (no account required)?
https://bugs.launchpad.net/ubuntu-font-family/+bug/635615
Comment 3 Ben Laenen 2011-02-17 04:00:24 UTC
I'm willing to change this and remove the locl substitution rules. It's certainly a hack, which was included to improve documents in Romanian using the wrong code points. But I guess the time has come to stop trying to fix a Unicode issue from many years ago.

I still wonder why Wikipedia wouldn't use the correct code points though. Looks to me like not all Romanians are agreeing with this then...
Comment 4 Mihai Capotă 2011-02-17 05:35:54 UTC
Actually, Wikipedia has also changed since I reported this bug. If you check the link I mentioned initially [1], you will see that the recommendation is to use the correct code points when editing. Furthermore, Wikipedia will automatically change the old code points into the new ones.

I did some more reading and I think I understand where this glyph substitution idea came from. Even after the new code points were created, the Unicode standard recommended the use of the comma glyphs for Romanian to represent the cedilla code points. This is what Unicode 5.2 (October 2009) says:

"The form with the cedilla is preferred in Turkish, and the form with the comma below is preferred in Romanian. The characters with explicit commas below are provided to permit the distinction from characters with a cedilla."

This recommendation about forms/glyphs was only removed in Unicode 6 (October 2010), which states:

"The Unicode Standard provides unambiguous representations for all of the forms, for example, U+0219 ș latin small letter s with comma below versus U+015F ş latin small letter s with cedilla. In modern usage, the preferred representation of Romanian text is with U+0219 ș latin small letter s with comma below, while Turkish data is represented with U+015F ş latin small letter s with cedilla."

[1] http://ro.wikipedia.org/wiki/Wikipedia:Diacriticele_vechi_%C8%99i_noi
Comment 5 Ben Laenen 2011-02-18 09:00:22 UTC
I have just removed the locl features for these glyphs. Please test it out (either compile SVN or wait until the next daily snapshot becomes available) to make sure it works as you expect (we have a new release upcoming in one week time).

Ben
Comment 6 Mihai Capotă 2011-02-19 10:20:58 UTC
Tested with snapshot dejavu-lgc-fonts-ttf-2.32-20110219-2464. I confirm it works as expected. I can now see the difference between cedilla and comma characters regardless of application language. Thank you.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.