Summary: | Add ortho file for locale mni_IN | ||
---|---|---|---|
Product: | fontconfig | Reporter: | Parag <panemade> |
Component: | orth | Assignee: | Roozbeh Pournader <roozbeh> |
Status: | RESOLVED FIXED | QA Contact: | Behdad Esfahbod <freedesktop> |
Severity: | normal | ||
Priority: | highest | CC: | akira, freedesktop, panemade |
Version: | 2.8 | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | 42900 | ||
Bug Blocks: | |||
Attachments: |
Ortho file
Manipuri language Orthography file Manipuri language orthography file |
Description
Parag
2009-12-14 23:06:02 UTC
Created attachment 33715 [details] [review] Ortho file Ping for feedback on attached patch. Can we have fontconfig update before Final freeze of Fedora 13 development cycle? Behdad, Fedora 13 Final freeze is due on tomorrow. Can we have new fontconfig built in Fedora 13 before that? Fedora 14 is already out since a month and still we are waiting for this fix to be included in fontconfig. Please, can someone look into this and commit attached patch and release new fontconfig? Thanks. Locale file for this orthography is not yet in glibc. We are working with language people to get locale file in glibc. But I have not seen any documentation on orthography that will prevent addition of this orthography in advance. As per fontconfig-devel.sgml "Fontconfig has orthographies for all of the ISO 639-1 languages" This mni locale code is part of ISO 639-1 ping behdad I have updated Indic ortho files in local git checkout and copied that directory to http://paragn.fedorapeople.org/fontconfig/ Can you push a git tree out (on github or something)? The uploaded tree is unusable for me. (In reply to comment #9) > Can you push a git tree out (on github or something)? The uploaded tree is > unusable for me. Does following repo looks ok? https://github.com/pnemade/fontconfig-indic-ortho No. Clone the fontconfig tree. Commit your various ortho files there, and push into github. Read any of the many tutorials on the web for howto. (In reply to comment #11) > No. Clone the fontconfig tree. Commit your various ortho files there, and > push into github. Read any of the many tutorials on the web for howto. Hope following will help you. https://github.com/pnemade/fontconfig/tree/indic-ortho Looks better. Although what would be really useful would be for each individual logical change to be in its own commit with its own bug reference, so I can simply review them one by one and cherry pick. (In reply to comment #13) > Looks better. Although what would be really useful would be for each > individual logical change to be in its own commit with its own bug reference, > so I can simply review them one by one and cherry pick. done at https://github.com/pnemade/fontconfig/tree/indic-orth Ok. What are the changes there doing? I see most are just comments, but others add new characters to the orth files. What's the justification? (In reply to comment #15) > Ok. What are the changes there doing? I see most are just comments, but > others add new characters to the orth files. What's the justification? Yes. With above changes I tried to add comments to all the characters and tried to match character class categories using its Unicode range as per language's own Unicode chart. All above updates are based on Unicode 5.2 chart. Added few characters as per given in its own Unicode 5.2 chart, so extending range on some lines or creating new lines in ortho files. If extending existing ortho files with just modification to split character range as per their classes is not worth to push in upstream then I will re-create new fontconfig tree to just include new ortho files and also drop new characters from 5.2 standard, missing in current ortho files. Is there any plan in future to update ortho files to latest Unicode standard? or its not a necessary step for fontconfig to keep its ortho updated? (In reply to comment #16) > Yes. With above changes I tried to add comments to all the characters and tried > to match character class categories using its Unicode range as per language's > own Unicode chart. All above updates are based on Unicode 5.2 chart. > > Added few characters as per given in its own Unicode 5.2 chart, so extending > range on some lines or creating new lines in ortho files. > > If extending existing ortho files with just modification to split character > range as per their classes is not worth to push in upstream then I will > re-create new fontconfig tree to just include new ortho files and also drop new > characters from 5.2 standard, missing in current ortho files. First of all, Unicode 6.0 is already out for quite a while. Then, lots of the characters added in newer versions of Unicode are just added because they are used in some rare, historic, minority, or extinct orthographies. We should rarely update the orthography files to the latest version of Unicode, as many fonts still do not support them. I understand that this is an improvement over existing ortho files (which don't have good info on the Indic orthographies anyway), but it's far from what we need. What we need is actually going in and checking which languages use which characters in each block commonly, and what the fonts do. Randomly checking one file, Hindi (hi.orth), I see that the changes add U+0951 and U+0952 which are Vedic marks, which I believe are only used in Sanskrit transliterations. What we need for each language, is references and justifications for inclusion or exclusion of each character inside the orthography file. (In reply to comment #18) > (In reply to comment #16) > > Yes. With above changes I tried to add comments to all the characters and tried > > to match character class categories using its Unicode range as per language's > > own Unicode chart. All above updates are based on Unicode 5.2 chart. > > > > Added few characters as per given in its own Unicode 5.2 chart, so extending > > range on some lines or creating new lines in ortho files. > > > > If extending existing ortho files with just modification to split character > > range as per their classes is not worth to push in upstream then I will > > re-create new fontconfig tree to just include new ortho files and also drop new > > characters from 5.2 standard, missing in current ortho files. > > First of all, Unicode 6.0 is already out for quite a while. Then, lots of the > characters added in newer versions of Unicode are just added because they are > used in some rare, historic, minority, or extinct orthographies. We should > rarely update the orthography files to the latest version of Unicode, as many > fonts still do not support them. > > I understand that this is an improvement over existing ortho files (which don't > have good info on the Indic orthographies anyway), but it's far from what we > need. What we need is actually going in and checking which languages use which > characters in each block commonly, and what the fonts do. > > Randomly checking one file, Hindi (hi.orth), I see that the changes add U+0951 > and U+0952 which are Vedic marks, which I believe are only used in Sanskrit > transliterations. What we need for each language, is references and > justifications for inclusion or exclusion of each character inside the > orthography file. Thanks for your brief answer. I probably will close all the enhancement bugs resolving to NOTABUG. Is there any good documentation available on how to add new ortho file or modify existing one? Can upstream please add it somewhere so that it can be help contributors to know how they can add/modify ortho files? (In reply to comment #20) > Is there any good documentation available on how to add new ortho file or > modify existing one? Can upstream please add it somewhere so that it can be > help contributors to know how they can add/modify ortho files? We always want *improvements* to orth files. However, in none of your changes you explained why it is and *improvement*. As Roozbeh pointed out, merely adding newly encoded Unicode characters is not an improvement. Maybe you are misunderstanding what the orth files are supposed to contain: they are not an exhaustive list of characters used in a language. They list the base minimum set of characters that a font should have for the font to be widely considered useful for that language. Hope that helps. mni_IN glibc locale will use Bengali as a script Created attachment 57384 [details] Manipuri language Orthography file On sixth page of Reference file http://tdil-dc.in/tdildcMain/articles/283709Script_Grammar_for_Manipuri.pdf characters required for Manipuri language given. Reference file is certified by Manipuri Sahitya Parishad. Is there any references that contains the Unicode codepoints for validating it? Created attachment 57576 [details] Manipuri language orthography file Yes, unfortunately code-points are not there in script grammar pdf. Might be linguist ignored it. I have encircled those characters in Unicode Bengali script block http://pravins.fedorapeople.org/Manipuri-characters.pdf Fixed in c7a671ab Thanks a lot. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.