Bug 25652 - Add ortho file for locale mni_IN
Summary: Add ortho file for locale mni_IN
Status: RESOLVED FIXED
Alias: None
Product: fontconfig
Classification: Unclassified
Component: orth (show other bugs)
Version: 2.8
Hardware: Other All
: highest normal
Assignee: Roozbeh Pournader
QA Contact: Behdad Esfahbod
URL:
Whiteboard:
Keywords:
Depends on: 42900
Blocks:
  Show dependency treegraph
 
Reported: 2009-12-14 23:06 UTC by Parag
Modified: 2012-02-24 00:16 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Ortho file (2.02 KB, patch)
2010-03-03 03:07 UTC, Parag
Details | Splinter Review
Manipuri language Orthography file (1.35 KB, text/plain)
2012-02-21 03:52 UTC, Pravin
Details
Manipuri language orthography file (1.46 KB, text/plain)
2012-02-23 23:14 UTC, Pravin
Details

Description Parag 2009-12-14 23:06:02 UTC
please add ortho file for locale mni_IN which is bengali script
based language.


http://en.wikipedia.org/wiki/Manipuri_language
Comment 1 Parag 2010-03-03 03:07:03 UTC
Created attachment 33715 [details] [review]
Ortho file
Comment 2 Parag 2010-03-17 20:21:52 UTC
Ping for feedback on attached patch.
Comment 3 Parag 2010-04-05 03:40:38 UTC
Can we have fontconfig update before Final freeze of Fedora 13 development cycle?
Comment 4 Parag 2010-05-02 21:48:15 UTC
Behdad,
Fedora 13 Final freeze is due on tomorrow. Can we have new fontconfig built in Fedora 13 before that?
Comment 5 Parag 2010-12-08 01:31:23 UTC
Fedora 14 is already out since a month and still we are waiting for this fix to be included in fontconfig.

Please, can someone look into this and commit attached patch and release new fontconfig?

Thanks.
Comment 6 Parag 2010-12-10 01:49:37 UTC
Locale file for this orthography is not yet in glibc. We are working with language people to get locale file in glibc. But I have not seen any documentation on orthography that will prevent addition of this orthography in advance. 

As per fontconfig-devel.sgml
"Fontconfig has orthographies for all of the ISO 639-1 languages" 

This mni locale code is part of ISO 639-1
Comment 7 Parag 2011-03-14 22:58:30 UTC
ping behdad
Comment 8 Parag 2011-03-14 23:01:06 UTC
I have updated Indic ortho files in local git checkout and copied that
directory to http://paragn.fedorapeople.org/fontconfig/
Comment 9 Behdad Esfahbod 2011-03-15 08:46:06 UTC
Can you push a git tree out (on github or something)?  The uploaded tree is unusable for me.
Comment 10 Parag 2011-03-16 07:55:19 UTC
(In reply to comment #9)
> Can you push a git tree out (on github or something)?  The uploaded tree is
> unusable for me.

Does following repo looks ok?
https://github.com/pnemade/fontconfig-indic-ortho
Comment 11 Behdad Esfahbod 2011-03-16 09:54:02 UTC
No.  Clone the fontconfig tree.  Commit your various ortho files there, and push into github.  Read any of the many tutorials on the web for howto.
Comment 12 Parag 2011-03-16 22:45:29 UTC
(In reply to comment #11)
> No.  Clone the fontconfig tree.  Commit your various ortho files there, and
> push into github.  Read any of the many tutorials on the web for howto.

Hope following will help you.
https://github.com/pnemade/fontconfig/tree/indic-ortho
Comment 13 Behdad Esfahbod 2011-03-18 13:22:53 UTC
Looks better.  Although what would be really useful would be for each individual logical change to be in its own commit with its own bug reference, so I can simply review them one by one and cherry pick.
Comment 14 Parag 2011-03-22 02:32:21 UTC
(In reply to comment #13)
> Looks better.  Although what would be really useful would be for each
> individual logical change to be in its own commit with its own bug reference,
> so I can simply review them one by one and cherry pick.

done at https://github.com/pnemade/fontconfig/tree/indic-orth
Comment 15 Behdad Esfahbod 2011-03-22 11:58:12 UTC
Ok.  What are the changes there doing?  I see most are just comments, but others add new characters to the orth files.  What's the justification?
Comment 16 Parag 2011-03-22 23:35:15 UTC
(In reply to comment #15)
> Ok.  What are the changes there doing?  I see most are just comments, but
> others add new characters to the orth files.  What's the justification?

Yes. With above changes I tried to add comments to all the characters and tried to match character class categories using its Unicode range as per language's own Unicode chart. All above updates are based on Unicode 5.2 chart.

Added few characters as per given in its own Unicode 5.2 chart, so extending range on some lines or creating new lines in ortho files.

If extending existing ortho files with just modification to split character range as per their classes is not worth to push in upstream then I will re-create new fontconfig tree to just include new ortho files and also drop new characters from 5.2 standard, missing in current ortho files.
Comment 17 Parag 2011-03-22 23:46:04 UTC
Is there any plan in future to update ortho files to latest Unicode standard? or its not a necessary step for fontconfig to keep its ortho updated?
Comment 18 Roozbeh Pournader 2011-03-23 15:03:44 UTC
(In reply to comment #16)
> Yes. With above changes I tried to add comments to all the characters and tried
> to match character class categories using its Unicode range as per language's
> own Unicode chart. All above updates are based on Unicode 5.2 chart.
> 
> Added few characters as per given in its own Unicode 5.2 chart, so extending
> range on some lines or creating new lines in ortho files.
> 
> If extending existing ortho files with just modification to split character
> range as per their classes is not worth to push in upstream then I will
> re-create new fontconfig tree to just include new ortho files and also drop new
> characters from 5.2 standard, missing in current ortho files.

First of all, Unicode 6.0 is already out for quite a while. Then, lots of the characters added in newer versions of Unicode are just added because they are used in some rare, historic, minority, or extinct orthographies. We should rarely update the orthography files to the latest version of Unicode, as many fonts still do not support them.

I understand that this is an improvement over existing ortho files (which don't have good info on the Indic orthographies anyway), but it's far from what we need. What we need is actually going in and checking which languages use which characters in each block commonly, and what the fonts do.

Randomly checking one file, Hindi (hi.orth), I see that the changes add U+0951 and U+0952 which are Vedic marks, which I believe are only used in Sanskrit transliterations. What we need for each language, is references and justifications for inclusion or exclusion of each character inside the orthography file.
Comment 19 Parag 2011-03-24 23:36:08 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > Yes. With above changes I tried to add comments to all the characters and tried
> > to match character class categories using its Unicode range as per language's
> > own Unicode chart. All above updates are based on Unicode 5.2 chart.
> > 
> > Added few characters as per given in its own Unicode 5.2 chart, so extending
> > range on some lines or creating new lines in ortho files.
> > 
> > If extending existing ortho files with just modification to split character
> > range as per their classes is not worth to push in upstream then I will
> > re-create new fontconfig tree to just include new ortho files and also drop new
> > characters from 5.2 standard, missing in current ortho files.
> 
> First of all, Unicode 6.0 is already out for quite a while. Then, lots of the
> characters added in newer versions of Unicode are just added because they are
> used in some rare, historic, minority, or extinct orthographies. We should
> rarely update the orthography files to the latest version of Unicode, as many
> fonts still do not support them.
> 
> I understand that this is an improvement over existing ortho files (which don't
> have good info on the Indic orthographies anyway), but it's far from what we
> need. What we need is actually going in and checking which languages use which
> characters in each block commonly, and what the fonts do.
> 
> Randomly checking one file, Hindi (hi.orth), I see that the changes add U+0951
> and U+0952 which are Vedic marks, which I believe are only used in Sanskrit
> transliterations. What we need for each language, is references and
> justifications for inclusion or exclusion of each character inside the
> orthography file.

Thanks for your brief answer. I probably will close all the enhancement bugs resolving to NOTABUG.
Comment 20 Parag 2011-03-25 21:55:04 UTC
Is there any good documentation available on how to add new ortho file or modify existing one? Can upstream please add it somewhere so that it can be help contributors to know how they can add/modify ortho files?
Comment 21 Behdad Esfahbod 2011-03-28 14:12:21 UTC
(In reply to comment #20)
> Is there any good documentation available on how to add new ortho file or
> modify existing one? Can upstream please add it somewhere so that it can be
> help contributors to know how they can add/modify ortho files?

We always want *improvements* to orth files.  However, in none of your changes you explained why it is and *improvement*.  As Roozbeh pointed out, merely adding newly encoded Unicode characters is not an improvement.

Maybe you are misunderstanding what the orth files are supposed to contain: they are not an exhaustive list of characters used in a language.  They list the base minimum set of characters that a font should have for the font to be widely considered useful for that language.

Hope that helps.
Comment 22 Parag 2012-02-20 21:59:18 UTC
mni_IN glibc locale will use Bengali as a script
Comment 23 Pravin 2012-02-21 03:52:54 UTC
Created attachment 57384 [details]
Manipuri language Orthography file

On sixth page of Reference file http://tdil-dc.in/tdildcMain/articles/283709Script_Grammar_for_Manipuri.pdf  characters required for Manipuri language given.

Reference file is certified by Manipuri Sahitya Parishad.
Comment 24 Akira TAGOH 2012-02-21 04:22:56 UTC
Is there any references that contains the Unicode codepoints for validating it?
Comment 25 Pravin 2012-02-23 23:14:14 UTC
Created attachment 57576 [details]
Manipuri language orthography file

Yes, unfortunately code-points are not there in script grammar pdf. Might be linguist ignored it.

I have encircled those characters in  Unicode Bengali script block http://pravins.fedorapeople.org/Manipuri-characters.pdf
Comment 26 Akira TAGOH 2012-02-23 23:51:03 UTC
Fixed in c7a671ab
Comment 27 Pravin 2012-02-24 00:16:11 UTC
Thanks a lot.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.