Created attachment 21483 [details] [review] Fix Sinhala coverage Hi, I've attached a small patch to fix the Sinhala coverage in fontconfig. The patch also contains a reference that describes the Unicode Sinhala code chart. cya, #
Fixed in my tree: Created commit c5c0279: Fix Sinhala coverage (bug #19288) 1 files changed, 4 insertions(+), 5 deletions(-)
Created attachment 22408 [details] [review] Patch correcting the previous patch While the original patch adds some needed characters, there are also some historic characters in the .orth file not in modern use that should be removed. Attaching a documented patch that should be applied over the original patch.
Hi Roozbeh, The document you refer to in your patch [1] is a draft of SLS1134:2004:SCCII. The relevant document is actually SCCII Part 2. Your patch is incorrect, however, you have raised a valid point regarding some of the letters and diacritics that no longer appear in modern literature. This has been been adddressed by defining three levels of compliance for fonts. I will contact the SCCII Part 2 authors and check which of the three levels are applicable in fontconfig's case. cya, # [1] http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2737.pdf
(In reply to comment #3) > Your patch is incorrect, How is it incorrect?! It applies on top of your patch and removes the letters that are not in modern use.
In my tree: commit 21d1fafbc2970060e0e99f7c8b68b018e43d821e Author: Roozbeh Pournader <roozbeh@gmail.com> Date: Sun Feb 1 18:52:41 2009 -0800 Remove Sinhala characters not in modern use (bug #19288)
Created attachment 22577 [details] [review] Fix Sinhala coverage v2 Hi, Sorry, I still have not been able to contact the main author of SLS1134:PART2:2007, but did manage to discuss it with a member of the committee that approves these standards. In fontconfig's case, we want to only enforce level 1 compliance since it is the minimum required. Instead of the level 3 compliance in my previous patch. For more details please read the explanatory text within the updated patch. Thanks, #
Roozbeh Pournader <roozbeh@gmail.com> 2009-01-31 18:25:06 PST wrote: > How is it incorrect?! You mean other than it having absolutely no relationship to the relevant standard? It looks like you created your own standard! It definitely was *not* targeted at level 3, nor level 2 compliance. For level 1 compliance, your patch *incorrectly* removed: U+0D8E (iruuyanna) U+0DA6 (sanyaka jayanna) and your patch *incorrectly* kept: U+0DDF (gayanukitta) cya, #
Any new patch should be on top of Roozbeh's patch since I've committed that.
Created attachment 22579 [details] [review] Fix errors introduced by Roozbeh Pournader to Sinhala coverage (commit 21d1fafbc2970060e0e99f7c8b68b018e43d821e) Hi Behdad, I clearly stated Roozbeh's patch was incorrect, but you checked it in regardless! Anyway, as you requested, I've taken my second patch (https://bugs.freedesktop.org/attachment.cgi?id=22577) and generated a patch that applies over Roozbeh's patch (https://bugs.freedesktop.org/attachment.cgi?id=22408). cya, #
(In reply to comment #9) > Created an attachment (id=22579) [details] > Fix errors introduced by Roozbeh Pournader to Sinhala coverage (commit > 21d1fafbc2970060e0e99f7c8b68b018e43d821e) > > Hi Behdad, > > I clearly stated Roozbeh's patch was incorrect, but you checked it in > regardless! Roozbeh is my reference for orth files. You have to convince him. I'll check in what he gives me. What's the big deal with committing something incorrect anyway? It's not like Roozbeh's patch was totally and completely wrong. Anyway, I'd rather keep the commented out lines in the file. Please attach a new patch with Sinhala characters not desired commented out and preferably explained. > Anyway, as you requested, I've taken my second patch > (https://bugs.freedesktop.org/attachment.cgi?id=22577) and generated a patch > that applies over Roozbeh's patch > (https://bugs.freedesktop.org/attachment.cgi?id=22408). > > cya, > # >
--- Comment #10 from Behdad Esfahbod <freedesktop@behdad.org> 2009-02-04 10:44:31 PST --- > Anyway, I'd > rather keep the commented out lines in the file. Please attach a new patch > with Sinhala characters not desired commented out and preferably explained. The "not desired" characters are already listed conveniently in one spot: +# Level 1 compliance can be described as level 3 with the exclusion of: +# U+0D8F ILUYANNA (independent vowel) +# U+0D90 ILUUYANNA (independent vowel) +# U+0DDF GAYANUKITTA (dependent vowel) +# U+0DF3 DIGA GAYANUKITTA (dependent vowel) +# U+0DF4 KUNDDALIYA (Punctuation) cya, #
Please add them back to their place in the sorted order of characters.
Created attachment 22587 [details] [review] Fix errors introduced by Roozbeh Pournader to Sinhala coverage (commit 21d1fafbc2970060e0e99f7c8b68b018e43d821e) Hi Behdad, Anything else? cya, #
commit e12e058fff54f3ca9289703dba4709f559abcf66 Author: Behdad Esfahbod <behdad@behdad.org> Date: Wed Feb 4 15:58:36 2009 -0500 Update Sinhala orthography (#19288) Patch from Harshula Jayasuriya.
Behdad, thanks for committing the corrections. If someone feels the urge to modify the Sinhala coverage again without expertise in Sinhala, please first contact someone who is familiar with the SLS1134 standard at: sinhala-technical at lists.sourceforge.net and: ltrl at ucsc.cmb.ac.lk cya, #
I don't like the approach and especially some of the language used in the discussion. But that last patch itself and the way it has been derived seems problematic to me. It's important to note that fontconfig is not claiming conformance to SLS 1134 (or any of its Levels), is not seeking certification from SLS, and has its own rules for which characters to put in its orthography files. Also, we cannot rely on a single contributor's judgment. We need to be provided with rationales too. Getting back to the technicalities, these are the questions that would need answering: 1) Is U+0D8E SINHALA LETTER IRUUYANNA used in modern Sinhala? If not, why should fontconfig include it in its font requirements? According to what we know from the only freely available part of the SLS 1134 [http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2737.pdf], "ඎ [U+0D8E] also does not occur in present usage, but its corresponding its corresponding vocalic stroke, ෲ [U+0DF2] is used; for example, කර්තෲ [U+0D9A U+0DBB U+0DCA U+0DAD U+0DF2]". 2) Is U+0DA6 SINHALA LETTER SANYAKA JAYANNA used in modern Sinhala? If not, why should fontconfig include it in its font requirements? According to the same document, "The consonant ඦ [U+0DA6] (ndja) is included although it is not found in contemporary writing." I understand that you are referring to a part of SLS 1134. But we need to know the rationale for the choice made in SLS 1134. (I'm OK with removing U+0DDF SINHALA VOWEL SIGN GAYANUKITTA. It rarely appears by itself: it usually is a part of U+0DDE SINHALA VOWEL SIGN KOMBUVA HAA GAYANUKITTA.)
Roozbeh. Which chars do you want to see removed again? Removing chars is easy to justify. Adding is harder.
(In reply to comment #17) > Roozbeh. Which chars do you want to see removed again? Removing chars is easy > to justify. Adding is harder. The controversial characters are U+0D8E and U+0DA6. These two characters appear to be even rarer than U+0DDF (which I agree with Harshula about its non-inclusion in the orth file).
commit 48fd5244d049f3470a4b5be480bcd835820d17fa Author: Behdad Esfahbod <behdad@behdad.org> Date: Thu Feb 5 23:37:16 2009 -0500 Further update Sinhala orthography (#19288)
> --- Comment #16 from Roozbeh Pournader <roozbeh@gmail.com> 2009-02-05 00:51:55 PST --- > It's important to note that fontconfig is not claiming conformance to SLS 1134 > (or any of its Levels), is not seeking certification from SLS, and has its own > rules for which characters to put in its orthography files. SLS1134:PART2:2007 describes the minimal requirements for a font to be considered to support Unicode Sinhala. The standard is written and vetted by those who actually know something about the Sinhala script and language. > Also, we cannot rely on a single contributor's judgment. We need to be > provided with rationales too. I completely agree! It is important to take into account the decisions made by linguists and standards bodies with expertise in Sinhala, instead of a single contributor who is unfamiliar with Sinhala. > 1) Is U+0D8E SINHALA LETTER IRUUYANNA used in modern Sinhala? If not, why > should fontconfig include it in its font requirements? According to what we > know from the only freely available part of the SLS 1134 > [http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2737.pdf], "ඎ [U+0D8E] also does not > occur in present usage, but its corresponding its corresponding vocalic stroke, > ෲ [U+0DF2] is used; for example, කර්තෲ [U+0D9A U+0DBB U+0DCA U+0DAD U+0DF2]". Firstly, you continue to quote from an old draft. If you want to quote from SLS1134, please first obtain it from SLSI. Secondly, independent vowel iruuyanna is needed to describe the usage of the corresponding dependent vowel [1][2]. This would be *obvious* to anyone who has looked at literature on Sinhala Grammar or the Sinhala script. That is why the iru/iruu independent and dependent vowels are required as a set at Level 1 compliance. Similarly, ilu/iluu independent and dependent vowels are required as set at Level 3 compliance. > 2) Is U+0DA6 SINHALA LETTER SANYAKA JAYANNA used in modern Sinhala? If not, why > should fontconfig include it in its font requirements? According to the same > document, "The consonant ඦ [U+0DA6] (ndja) is included although it is not > found in contemporary writing." Firstly, yes, apparently it is used in modern *colloquial* language. Secondly, it is needed to describe the structure of the Sinhala script as it is the nasalised form of the unaspirated voiced palatal consonant. Again, this should be *obvious* to anyone with knowledge of the Sinhala script. > I understand that you are referring to a part of SLS 1134. But we need to know > the rationale for the choice made in SLS 1134. Note, I did *not* write the SLS1134 standard, however, I did help in the review process. > (I'm OK with removing U+0DDF SINHALA VOWEL SIGN GAYANUKITTA. It rarely appears > by itself: it usually is a part of U+0DDE SINHALA VOWEL SIGN KOMBUVA HAA > GAYANUKITTA.) It can not appear "by itself", it is a dependent vowel. Gayanukitta has to be preceded by a consonant or particular independent vowels. cya, # [1] Gunasekara, A. M. (1891). A Comprehensive Grammar of the Sinhalese Language. [2] Karunatillake, W. S. (1998). An Introduction to Spoken Sinhala.
I think I've heard both sides enough by now. I think the orth file in my tree now is adequate enough. I don't have more time to spend on a couple characters... Thanks.
Roozbeh, do you have a response to comment 20. It sounds reasonable for fontconfig orth to follow the minimum Level 1 requirements for Sinhala fonts set out in the latest Sri Lanka Standard, which Harshula is referencing.
(In reply to comment #22) > Roozbeh, do you have a response to comment 20. Yes, I have good answers. But it's personally very hard for me to continue a normal discussion and ignore the insulting remarks made by the reporter. > It sounds reasonable for fontconfig orth to follow the minimum Level 1 > requirements for Sinhala fonts set out in the latest Sri Lanka Standard, which > Harshula is referencing. Not necessarily. In the meanwhile, I have received copies of all the parts of SLS 1134. Part 2, which the reporter is referring to here, is only for testing software that claims to be supporting SLS 1134. fontconfig is no such software. Similar standards exist for various other languages, and fontconfig is not following them either, not for lack of access, but because of different use cases. For example, the Persian (fa) orthography, does not strictly follow the Iranian national standard for informational interchange (that me and Behdad co-authored): the actual Persian text out there does not frequently use some of characters that the national standard prescribes. Same is true with various similar standards from Europe. fontconfig is not a font claiming to support SLS 1134 Part 2 Level 1, so SLS 1134 Part 2 is irrelevant here, while main SLS 1134 is relevant, since we are using it as a source of information only (the published version of main SLS 1134 contains the same text I referred to in comment #16, on its page 6). Also, when SLS 1134 Part 2 talks about supporting ඎ (using its glyph only, and not its Unicode codepoint), it is not referring to the Unicode character U+0D8E, it is talking about the conceptual vowel vocalic rr. The conceptual vowel is represented using U+0D8E only when it's an independent vowel (which "does not occur in present usage" according to main part of SLS 1134), but as U+0DF2 when it's a dependent vowel sign (that "is used"). To prove my point, one can confirm that SLS 1134 Part 2 Level 1 does not list U+0DF2 or any other dependent vowel sign (which we actually list in the orth file). Finally, we can't detect SLS 1134 Part 2 Level 1 font support with a simple orth file anyway: "All combinations of the above consonants with the above vowels shall be supported. Consonant-vowel combinations with the yansaya (යංශය), rakaransaya (රකාරංශය) and repaya shall also be supported. [...] A Level 1 font should support the ක්ෂ (ksha) conjunct [...]". We can't detect any of that with orth files presently. We are just using a very simple heuristic in fontconfig.
> --- Comment #23 from Roozbeh Pournader <roozbeh@gmail.com> 2009-02-16 20:19:00 PST --- > For example, the Persian (fa) orthography, does not strictly follow the > Iranian national standard for informational interchange (that me and Behdad > co-authored): the actual Persian text out there does not frequently use some of > characters that the national standard prescribes. So, does the Iranian National Standard contain an explicit definition of the minimal requirements for a font to be considered valid? Or perhaps, a tiered set of compliance levels, or was that overlooked? > fontconfig is not a font claiming to support SLS 1134 Part 2 Level 1, so > SLS 1134 Part 2 is irrelevant here, while main SLS 1134 is relevant, since > we are using it as a source of information only 1) Fontconfig is the component that determines if a font is a valid Sinhala font. 2) SLS1134:Part2 describes the minimal requirements for a font to be a valid Sinhala font. Therefore, SLS1134:Part2 is actually *more* relevant to this issue than the descriptive text in SLS1134 which exists to provide background information to those unfamiliar with Sinhala. It is illogical for you to keep asserting that the descriptive text in SLS1134 is more relevant than the SLS1134:Part2 *standard* that explicitly states the minimal requirements for a Sinhala font. > (the published version of main SLS 1134 contains the same text I > referred to in comment #16, on its page 6). That is not true! The descriptive text in the final published SLS1134 is: ----------------------------------------------------------------------- 3. ඎ also does not occur in present usage, but its corresponding vowel sign, ෲ is used; for example, ශාස්තෲන්. ----------------------------------------------------------------------- That is how I knew you were *not* quoting from the final published version. > Also, when SLS 1134 Part 2 talks about supporting ඎ (using its glyph only, > and not its Unicode codepoint), it is not referring to the Unicode character > U+0D8E, it is talking about the conceptual vowel vocalic rr. The conceptual > vowel is represented using U+0D8E only when it's an independent vowel (which > "does not occur in present usage" according to main part of SLS 1134), but as > U+0DF2 when it's a dependent vowel sign (that "is used"). To prove my point, > one can confirm that SLS 1134 Part 2 Level 1 does not list U+0DF2 or any other > dependent vowel sign (which we actually list in the orth file). What?! That is a creative interpretation! You are completely *incorrect*, it is referring to ඎ (U+0D8E - independent vowel iruuyanna). SLS1134:Part2 clearly states: ----------------------------------------------------------------------- A font supporting SLS 1134 at Level 1 shall represent all the following Sinhala letters. a) Vowels අ, ආ, ඇ, ඈ, ඉ, ඊ, උ, ඌ, ඍ, ඎ, එ, ඒ, ඓ, ඔ, ඕ, ඖ, ං, ඃ b) Consonants ක, ඛ, ග, ඝ, ඞ ඟ, ච, ඡ, ජ, ඣ, ඥ, ඤ, ඦ, ට, ඨ, ඩ, ඪ, ණ, ඬ, ත, ථ, ද, ධ, න, ඳ, ප, ඵ, බ, භ, ම, ඹ, ය, ර, ල, ව, ශ, ෂ, ස, හ, ළ, ෆ, c) and Sinhala Characters given in Table 1. All combinations of the above consonants with the above vowels shall be supported. Consonant-vowel combinations with the yansaya(යංශය) rakaransaya (රකාරංශය) and repaya shall also be supported. ... ----------------------------------------------------------------------- Firstly, the aforementioned extract from SLS1134:Part2 lists all the independent vowels (including ඎ (U+0D8E)) and all the consonants required by a Sinhala font. Secondly, it goes on to state "All combinations of the above consonants with the above vowels shall be supported". i.e. All the dependent vowels corresponding to the listed independent vowels must be supported. Thirdly, did you bother to look at Table 1? Look at Row 10 (excluding the row containing the column titles). Guest what? It explicitly states "0D8E" for ඎ! > Finally, we can't detect SLS 1134 Part 2 Level 1 font support with a simple > orth file anyway: "All combinations of the above consonants with the above > vowels shall be supported. Consonant-vowel combinations with the yansaya > (යංශය), rakaransaya (රකාරංශය) and repaya shall also be > supported. [...] A Level 1 font should support the ක්ෂ (ksha) conjunct > [...]". We can't detect any of that with orth files presently. We are just > using a very simple heuristic in fontconfig. Yes, that is quite obvious! The point is that the simple rules we use should be in accordance with SLS1134:Part2 when technically feasible. There is absolutely no technical barrier to including U+0D8E (independent vowel IRUUYANNA). I note that you were unable to construct any arguments to support your stance on excluding U+0DA6 SANYAKA JAYANNA. Is it fair to assume, you now agree with me on including U+0DA6 SANYAKA JAYANNA? cya, #
(In reply to comment #24) > [...] or was that overlooked? [...] You are continuing your insulting tone. You are trying to claim that I'm too incompetent to participate. And since Sinhala is your language, not mine, I supposedly don't understand the script and its requirements. Then, you're taking it to the next level and trying to say that I was probably incompetent to write the Iranian national standards too. I'm leaving this conversation.
Jens, as I said I'm satisfied with what I've committed so far, and this conversation is at best less than decent. I'm not interested in working on any bugs involving Harshula anymore, unless he starts talking professionally and with technical merit. Roozbeh has been doing localization work for years, he is actively involved in the Unicode Technical Committee, and knows his stuff. I'll take any bits he gives me.
Hi Behdad, > --- Comment #26 from Behdad Esfahbod <freedesktop@behdad.org> 2009-02-17 19:25:49 PST --- > Jens, as I said I'm satisfied with what I've committed so far, and this > conversation is at best less than decent. I'm not interested in working on any > bugs involving Harshula anymore, unless he starts talking professionally and > with technical merit. Roozbeh has been doing localization work for years, he > is actively involved in the Unicode Technical Committee, and knows his stuff. > I'll take any bits he gives me. 1) I feel this whole process has been *extremely* unprofessional. The outcome of this bug was determined by Roozbeh and you by ignoring the established Sri Lankan National Standards and the "technical merits" of my stance. This is made evident when you said, "Roozbeh is my reference for orth files. You have to convince him. I'll check in what he gives me. What's the big deal with committing something incorrect anyway?" 2) I'm sure Roozbeh has expertise in some localization areas. However, he does not have expertise when it comes to Sinhala. He did not even have the relevant documents when he formed his initial opinion. I have not witnessed this level of contempt and disregard for an established standard and and an expert in Sinhala orthography. 3) I spent many hours satisfying every single one of your iterative requests and providing a technical argument with evidence to support my stance. I am very disappointed on how unprofessionally I have been treated. cya, #
I believe I've fixed this in 2.7.0. Please reopen otherwise.
Hi Behdad, The commit 967267556c762d2746f819eca85f3c59fbb95875 you applied is incorrect. Simply reverting it would result in the si.orth being correct and in compliance with SLS1134:Part2. What further supporting evidence would you like me to provide? cya, #
Harshula, As Roozbeh said repeatedly, we don't care about SLS1134:Part2. What's wrong about the current orth file and why is what you propose an improvement?
Hi Behdad, > As Roozbeh said repeatedly, we don't care about SLS1134:Part2. How can you just ignore a national standard that contains a section specifically defining the minimum set of glyphs required for a font to be considered valid? > What's wrong about the current orth file and why is what you propose an improvement? These two codepoints should *not* be commented out: 1) 0d8e IRUUYANNA 2) 0da6 SANYAKA JAYANNA Independent vowel IRUUYANNA (1) is needed to describe the usage of the corresponding dependent vowel. That is why the iru/iruu independent and dependent vowels are required as a set at Level 1 compliance. You can not similarly remove one from the set of four. Similarly, ilu/iluu independent and dependent vowels are required as set at Level 3 compliance. SANYAKA JAYANNA (2) is apparently used in modern *colloquial* language. Furthermore, it is needed to describe the structure of the Sinhala script as it is the nasalised form of the unaspirated voiced palatal consonant (JAYANNA). Fontconfig determines if a font is a valid Sinhala font by examining its coverage. By *not* including (1) and (2) in this coverage check, you are allowing incomplete fonts to be recognised as valid. cya, #
(In reply to comment #31) > Fontconfig determines if a font is a valid Sinhala font by examining its > coverage. By *not* including (1) and (2) in this coverage check, you are > allowing incomplete fonts to be recognised as valid. Sure, that's a known compromise. Almost all orth files make that kind of compromise. Leaving REOPENED, whatever...
Hi Behdad, >Sure, that's a known compromise. Almost all orth files make that kind of >compromise. That's precisely why fontconfig should be employing Level 1 compliance. It is the compromised coverage intended for software and font developers. Level 3 is the uncompromising full compliance. Out of curiosity, what is your technical reasoning for commenting out: 1) 0d8e IRUUYANNA 2) 0da6 SANYAKA JAYANNA cya, #
As I said many times, you have to convince Roozbeh, not me.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/fontconfig/fontconfig/issues/87.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.