I would like to use fc-match to figure out how substitution works in a case the
application fills charset field in a pattern, i.e. I see that filenames in my
browser have different fonts depending whether they are in English, or Russian.
I tried issuing
fc-match "Bitstream Vera Sans:charset=something"
But I got 'segmentation fault'. I tried fc-list : charset to figure out what I
can put there, but it gave no guess.
Yes, there's no human-readable string representation for charsets. Now that the
cache doesn't strings, perhaps we can replace the old nasty representation with
I fixed the segfault at least; still remaining to be decided is how to present
charsets in a sensible fashion.
maybe supporting the well-known charsets name like ISO8859-* would be more useful. or the block name in Unicode since there are no fonts covering everything in the world.
(In reply to comment #3)
> maybe supporting the well-known charsets name like ISO8859-* would be more
> useful. or the block name in Unicode since there are no fonts covering
> everything in the world.
Nah, ISO8859-* is not that interesting, and would need data tables that I really want to see die forever. Unicode blocks are not interesting because of alll the holes and rare characters. You rarely find any font supporting a full block, except for the ASCII and Latin1 blocks maybe.
Sure. well, the side-effect of supporting this might be that there are possibility to improve giving a rate to select the better fonts. right now fontconfig has the orth files per languages. I think this direction is right because rendering characters with different fonts per charset where we have seen in X core fonts was really ugly. however it has a dilemma of the strict orthography vs the lazy orthography like Bug#17619. we still need some input from someone through the fontconfig config to determine which one they prefer from the aspect of the quality etc though, how many charsets for the specific language the font support is measurable and supporting more charsets should be preferred.
For example, there are some charsets in Japanese like JIS X 0201, JIS X 0208, JIS X 0212, JIS X 0213 and some revisions on them. 0201 and 0208 is a must to support Japanese though, 0212 and 0213 may be optional in most cases. but nice to have it.
So I'd suggest to have separate tables for charsets and link to the orth file with some information to indicate a mandatory or an optional. and give a different rate for them and select the better fonts against it then. or maybe even good to have a way to do it per character code in the config. well, it's off topic for this issue though.
Author: Behdad Esfahbod <firstname.lastname@example.org>
Date: Thu Jul 3 17:52:54 2014 -0400
Change charset parse/unparse format to be human readable
Previous format was unusable. New format is ranges of hex values.
To choose space character and Latin capital letters for example:
$ fc-pattern ':charset=20 41-5a'
Pattern has 1 elts (size 16)
0000: 00000000 00000001 07fffffe 00000000 00000000 00000000 00000000 00000000