25652 – Add ortho file for locale mni_IN

Bug 25652 - Add ortho file for locale mni_IN

Summary: Add ortho file for locale mni_IN

Status:	RESOLVED FIXED

Alias:	None

Product:	fontconfig
Classification:	Unclassified
Component:	orth (show other bugs)
Version:	2.8
Hardware:	Other All

Importance:	highest normal
Assignee:	Roozbeh Pournader
QA Contact:	Behdad Esfahbod

URL:
Whiteboard:
Keywords:

Depends on:	42900
Blocks:
	Show dependency tree / graph

Reported:	2009-12-14 23:06 UTC by Parag
Modified:	2012-02-24 00:16 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments
Ortho file (2.02 KB, patch) 2010-03-03 03:07 UTC, Parag	Details \| Splinter Review
Manipuri language Orthography file (1.35 KB, text/plain) 2012-02-21 03:52 UTC, Pravin	Details
Manipuri language orthography file (1.46 KB, text/plain) 2012-02-23 23:14 UTC, Pravin	Details
Show Obsolete (2) View All

Description Parag 2009-12-14 23:06:02 UTC

please add ortho file for locale mni_IN which is bengali script
based language.


http://en.wikipedia.org/wiki/Manipuri_language

Comment 1 Parag 2010-03-03 03:07:03 UTC

Created attachment 33715 [details] [review]
Ortho file

Comment 2 Parag 2010-03-17 20:21:52 UTC

Ping for feedback on attached patch.

Comment 3 Parag 2010-04-05 03:40:38 UTC

Can we have fontconfig update before Final freeze of Fedora 13 development cycle?

Comment 4 Parag 2010-05-02 21:48:15 UTC

Behdad,
Fedora 13 Final freeze is due on tomorrow. Can we have new fontconfig built in Fedora 13 before that?

Comment 5 Parag 2010-12-08 01:31:23 UTC

Fedora 14 is already out since a month and still we are waiting for this fix to be included in fontconfig.

Please, can someone look into this and commit attached patch and release new fontconfig?

Thanks.

Comment 6 Parag 2010-12-10 01:49:37 UTC

Locale file for this orthography is not yet in glibc. We are working with language people to get locale file in glibc. But I have not seen any documentation on orthography that will prevent addition of this orthography in advance. 

As per fontconfig-devel.sgml
"Fontconfig has orthographies for all of the ISO 639-1 languages" 

This mni locale code is part of ISO 639-1

Comment 7 Parag 2011-03-14 22:58:30 UTC

ping behdad

Comment 8 Parag 2011-03-14 23:01:06 UTC

I have updated Indic ortho files in local git checkout and copied that
directory to http://paragn.fedorapeople.org/fontconfig/

Comment 9 Behdad Esfahbod 2011-03-15 08:46:06 UTC

Can you push a git tree out (on github or something)?  The uploaded tree is unusable for me.

Comment 10 Parag 2011-03-16 07:55:19 UTC

(In reply to comment #9)
> Can you push a git tree out (on github or something)?  The uploaded tree is
> unusable for me.

Does following repo looks ok?
https://github.com/pnemade/fontconfig-indic-ortho

Comment 11 Behdad Esfahbod 2011-03-16 09:54:02 UTC

No.  Clone the fontconfig tree.  Commit your various ortho files there, and push into github.  Read any of the many tutorials on the web for howto.

Comment 12 Parag 2011-03-16 22:45:29 UTC

(In reply to comment #11)
> No.  Clone the fontconfig tree.  Commit your various ortho files there, and
> push into github.  Read any of the many tutorials on the web for howto.

Hope following will help you.
https://github.com/pnemade/fontconfig/tree/indic-ortho

Comment 13 Behdad Esfahbod 2011-03-18 13:22:53 UTC

Looks better.  Although what would be really useful would be for each individual logical change to be in its own commit with its own bug reference, so I can simply review them one by one and cherry pick.

Comment 14 Parag 2011-03-22 02:32:21 UTC

(In reply to comment #13)
> Looks better.  Although what would be really useful would be for each
> individual logical change to be in its own commit with its own bug reference,
> so I can simply review them one by one and cherry pick.

done at https://github.com/pnemade/fontconfig/tree/indic-orth

Comment 15 Behdad Esfahbod 2011-03-22 11:58:12 UTC

Ok.  What are the changes there doing?  I see most are just comments, but others add new characters to the orth files.  What's the justification?

Comment 16 Parag 2011-03-22 23:35:15 UTC

(In reply to comment #15)
> Ok.  What are the changes there doing?  I see most are just comments, but
> others add new characters to the orth files.  What's the justification?

Yes. With above changes I tried to add comments to all the characters and tried to match character class categories using its Unicode range as per language's own Unicode chart. All above updates are based on Unicode 5.2 chart.

Added few characters as per given in its own Unicode 5.2 chart, so extending range on some lines or creating new lines in ortho files.

If extending existing ortho files with just modification to split character range as per their classes is not worth to push in upstream then I will re-create new fontconfig tree to just include new ortho files and also drop new characters from 5.2 standard, missing in current ortho files.

Comment 17 Parag 2011-03-22 23:46:04 UTC

Is there any plan in future to update ortho files to latest Unicode standard? or its not a necessary step for fontconfig to keep its ortho updated?

Comment 18 Roozbeh Pournader 2011-03-23 15:03:44 UTC

(In reply to comment #16)
> Yes. With above changes I tried to add comments to all the characters and tried
> to match character class categories using its Unicode range as per language's
> own Unicode chart. All above updates are based on Unicode 5.2 chart.
> 
> Added few characters as per given in its own Unicode 5.2 chart, so extending
> range on some lines or creating new lines in ortho files.
> 
> If extending existing ortho files with just modification to split character
> range as per their classes is not worth to push in upstream then I will
> re-create new fontconfig tree to just include new ortho files and also drop new
> characters from 5.2 standard, missing in current ortho files.

First of all, Unicode 6.0 is already out for quite a while. Then, lots of the characters added in newer versions of Unicode are just added because they are used in some rare, historic, minority, or extinct orthographies. We should rarely update the orthography files to the latest version of Unicode, as many fonts still do not support them.

I understand that this is an improvement over existing ortho files (which don't have good info on the Indic orthographies anyway), but it's far from what we need. What we need is actually going in and checking which languages use which characters in each block commonly, and what the fonts do.

Randomly checking one file, Hindi (hi.orth), I see that the changes add U+0951 and U+0952 which are Vedic marks, which I believe are only used in Sanskrit transliterations. What we need for each language, is references and justifications for inclusion or exclusion of each character inside the orthography file.

Comment 19 Parag 2011-03-24 23:36:08 UTC

(In reply to comment #18)
> (In reply to comment #16)
> > Yes. With above changes I tried to add comments to all the characters and tried
> > to match character class categories using its Unicode range as per language's
> > own Unicode chart. All above updates are based on Unicode 5.2 chart.
> > 
> > Added few characters as per given in its own Unicode 5.2 chart, so extending
> > range on some lines or creating new lines in ortho files.
> > 
> > If extending existing ortho files with just modification to split character
> > range as per their classes is not worth to push in upstream then I will
> > re-create new fontconfig tree to just include new ortho files and also drop new
> > characters from 5.2 standard, missing in current ortho files.
> 
> First of all, Unicode 6.0 is already out for quite a while. Then, lots of the
> characters added in newer versions of Unicode are just added because they are
> used in some rare, historic, minority, or extinct orthographies. We should
> rarely update the orthography files to the latest version of Unicode, as many
> fonts still do not support them.
> 
> I understand that this is an improvement over existing ortho files (which don't
> have good info on the Indic orthographies anyway), but it's far from what we
> need. What we need is actually going in and checking which languages use which
> characters in each block commonly, and what the fonts do.
> 
> Randomly checking one file, Hindi (hi.orth), I see that the changes add U+0951
> and U+0952 which are Vedic marks, which I believe are only used in Sanskrit
> transliterations. What we need for each language, is references and
> justifications for inclusion or exclusion of each character inside the
> orthography file.

Thanks for your brief answer. I probably will close all the enhancement bugs resolving to NOTABUG.

Comment 20 Parag 2011-03-25 21:55:04 UTC

Is there any good documentation available on how to add new ortho file or modify existing one? Can upstream please add it somewhere so that it can be help contributors to know how they can add/modify ortho files?

Comment 21 Behdad Esfahbod 2011-03-28 14:12:21 UTC

(In reply to comment #20)
> Is there any good documentation available on how to add new ortho file or
> modify existing one? Can upstream please add it somewhere so that it can be
> help contributors to know how they can add/modify ortho files?

We always want *improvements* to orth files.  However, in none of your changes you explained why it is and *improvement*.  As Roozbeh pointed out, merely adding newly encoded Unicode characters is not an improvement.

Maybe you are misunderstanding what the orth files are supposed to contain: they are not an exhaustive list of characters used in a language.  They list the base minimum set of characters that a font should have for the font to be widely considered useful for that language.

Hope that helps.

Comment 22 Parag 2012-02-20 21:59:18 UTC

mni_IN glibc locale will use Bengali as a script

Comment 23 Pravin 2012-02-21 03:52:54 UTC

Created attachment 57384 [details]
Manipuri language Orthography file

On sixth page of Reference file http://tdil-dc.in/tdildcMain/articles/283709Script_Grammar_for_Manipuri.pdf  characters required for Manipuri language given.

Reference file is certified by Manipuri Sahitya Parishad.

Comment 24 Akira TAGOH 2012-02-21 04:22:56 UTC

Is there any references that contains the Unicode codepoints for validating it?

Comment 25 Pravin 2012-02-23 23:14:14 UTC

Created attachment 57576 [details]
Manipuri language orthography file

Yes, unfortunately code-points are not there in script grammar pdf. Might be linguist ignored it.

I have encircled those characters in  Unicode Bengali script block http://pravins.fedorapeople.org/Manipuri-characters.pdf

Comment 26 Akira TAGOH 2012-02-23 23:51:03 UTC

Fixed in c7a671ab

Comment 27 Pravin 2012-02-24 00:16:11 UTC

Thanks a lot.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.