Presently, the language model for fontconfig is based on the RFC 3066 model of language tags, which is not future-proof. The language model should switch to that of current BCP 47 (RFC 4646). Or an extended version that adds ISO 639-3 codes (RFC 4646bis): the ISO 639-3 codes are already in use in various localization communities.
The main difference would be supporting script subtags. An HTML author may have Kurdish in the Arabic script, and it wouldn't care/know if it's for Iran or Iraq. So it can tag that part of the page as ku-Arab. The browser can later request fonts for ku-Arab from fontconfig, instead of automagically finding it is probably ku-IR to fontconfig.
For existing glibc locales, and applications that use that data to render fonts (pango?), "@" would translate to "-x-", converting locales like sr_RS@latin to valid BCP 47 tags like "sr-RS-x-latin" which we can support as an alias/copy of sr-Latn. This way, we would be handling glibc tags as what they really are: private use subtags. Alternatively, we can ask the applications to convert sr_RS@latin to sr-Latn-RS themselves somehow, which we will match with our sr-Latn.
This would also make maintaining orth files somehow easier and cleaner, as the languages written in different scripts won't be separated by countries, but by scripts. We will have a ku-Arab, with ku-IR and ku-IQ aliasing/including it, a ku-Latn, with ku-TR aliasing it, and a ku-Cyrl, with no alias until we find more information on which Kurdish .
Over time, country code language codes are supposed to disappear, so if somebody really wants ku-IR, he SHOULD say "ku-Arab-IR", which makes matching and everything much easier too.
What about taking the orthography information from the relevant section in the CLDR? (http://www.unicode.org/cldr/ & http://live.gnome.org/LocaleProject)
It will make it easier for the people providing the information as well.
(In reply to comment #1)
> What about taking the orthography information from the relevant section in the
> CLDR? (http://www.unicode.org/cldr/ & http://live.gnome.org/LocaleProject)
At the moment, the orthography data in fontconfig is both more correct and more complete than CLDR's orthography data. But generally, yes, when the orthography data in CLDR starts to match fontconfig's quality, we should consider switching to that, as it would be easier to maintain.
But that is a different bug. This bug is about the model, not the data.
I'm interested in this idea. we may need to support the language tags entirely defined in RFC5646 because of Bug#35809 perhaps. that may be able to collect the information with the combination of the tags if requirements can be simplified to the tags. that may helps to determine the best matching fonts if any strict patterns are required.
one might not like to have the additional external dependency but I wrote a library to deal with the language tag and hope it may helps or implement the own code based on it perhaps:
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/fontconfig/fontconfig/issues/50.