7597 – add syntax to enter charset on commandline tools (and conf file?)

Bug 7597 - add syntax to enter charset on commandline tools (and conf file?)

Summary: add syntax to enter charset on commandline tools (and conf file?)

Status:	RESOLVED FIXED

Alias:	None

Product:	fontconfig
Classification:	Unclassified
Component:	library (show other bugs)
Version:	2.3
Hardware:	Other Linux (All)

Importance:	high enhancement
Assignee:	fontconfig-bugs
QA Contact:	Behdad Esfahbod

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	8100
	Show dependency tree / graph

Reported:	2006-07-22 08:12 UTC by andu
Modified:	2014-07-03 22:03 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description andu 2006-07-22 08:12:50 UTC

I would like to use fc-match to figure out how substitution works in a case the  
application fills charset field in a pattern, i.e. I see that filenames in my  
browser have different fonts depending whether they are in English, or Russian.  
I tried issuing  
  
fc-match "Bitstream Vera Sans:charset=something"  
  
But I got 'segmentation fault'. I tried fc-list : charset to figure out what I  
can put there, but it gave no guess.

Comment 1 Keith Packard 2006-09-01 12:16:57 UTC

Yes, there's no human-readable string representation for charsets. Now that the
cache doesn't strings, perhaps we can replace the old nasty representation with
something sensible.

Comment 2 Keith Packard 2006-09-02 20:27:11 UTC

I fixed the segfault at least; still remaining to be decided is how to present
charsets in a sensible fashion.

Comment 3 Akira TAGOH 2011-09-04 19:32:30 UTC

maybe supporting the well-known charsets name like ISO8859-* would be more useful. or the block name in Unicode since there are no fonts covering everything in the world.

Comment 4 Behdad Esfahbod 2011-09-06 06:55:54 UTC

(In reply to comment #3)
> maybe supporting the well-known charsets name like ISO8859-* would be more
> useful. or the block name in Unicode since there are no fonts covering
> everything in the world.

Nah, ISO8859-* is not that interesting, and would need data tables that I really want to see die forever.  Unicode blocks are not interesting because of alll the holes and rare characters.  You rarely find any font supporting a full block, except for the ASCII and Latin1 blocks maybe.

Comment 5 Akira TAGOH 2011-09-06 18:26:08 UTC

Sure. well, the side-effect of supporting this might be that there are possibility to improve giving a rate to select the better fonts. right now fontconfig has the orth files per languages. I think this direction is right because rendering characters with different fonts per charset where we have seen in X core fonts was really ugly. however it has a dilemma of the strict orthography vs the lazy orthography like Bug#17619. we still need some input from someone through the fontconfig config to determine which one they prefer from the aspect of the quality etc though, how many charsets for the specific language the font support is measurable and supporting more charsets should be preferred.

For example, there are some charsets in Japanese like JIS X 0201, JIS X 0208, JIS X 0212, JIS X 0213 and some revisions on them. 0201 and 0208 is a must to support Japanese though, 0212 and 0213 may be optional in most cases. but nice to have it. 

So I'd suggest to have separate tables for charsets and link to the orth file with some information to indicate a mandatory or an optional. and give a different rate for them and select the better fonts against it then. or maybe even good to have a way to do it per character code in the config. well, it's off topic for this issue though.

Comment 6 Behdad Esfahbod 2014-07-03 22:03:41 UTC

commit e708e97c351d3bc9f7030ef22ac2f007d5114730
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Thu Jul 3 17:52:54 2014 -0400

    Change charset parse/unparse format to be human readable
    
    Previous format was unusable.  New format is ranges of hex values.
    To choose space character and Latin capital letters for example:
    
    $ fc-pattern ':charset=20 41-5a'
    Pattern has 1 elts (size 16)
        charset:
        0000: 00000000 00000001 07fffffe 00000000 00000000 00000000 00000000 00000000
    (s)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.