Summary: | Support for sFamilyClass font attribute to allow searching by serif style | ||
---|---|---|---|
Product: | fontconfig | Reporter: | Eric Wasylishen <ewasylishen> |
Component: | library | Assignee: | Akira TAGOH <akira> |
Status: | RESOLVED MOVED | QA Contact: | Behdad Esfahbod <freedesktop> |
Severity: | enhancement | ||
Priority: | medium | CC: | akira, arthur200126, fontconfig-bugs, freedesktop, mpsuzuki, stefan.bruens |
Version: | 2_1 | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 30225 | ||
Attachments: |
Panose and sFamilyClass values of TrueType fonts included with Ubuntu
a patchset to cache IBM sFamilyClass & Panose, query them in pattern, and sort the fonts by them. revised patchset to cache IBM sFamilyClass & Panose, query them in pattern, and sort the fonts by them. |
Description
Eric Wasylishen
2010-08-10 20:34:22 UTC
We can read the Panose values also. Do you feel like bringing this up on the mailing list? My immediate reaction is: - What to do for non-OpenType fonts - How accurate are these values across fonts available for free? Oops, I slipped to find this discussion. Behdad, what kind of the fonts you mind about, when you say "non-OpenType"? a) older TrueType on MacOS, without OS/2 table b) non-sfnt-housed PostScript fonts, like PS Type1, CID-keyed fonts, etc c) non-sfnt-housed bitmap fonts, like BDF, PCF, etc (In reply to comment #2) > Oops, I slipped to find this discussion. > Behdad, what kind of the fonts you mind about, when you say "non-OpenType"? > > a) older TrueType on MacOS, without OS/2 table > b) non-sfnt-housed PostScript fonts, like PS Type1, CID-keyed fonts, etc > c) non-sfnt-housed bitmap fonts, like BDF, PCF, etc All of these. fontconfig is not OpenType-only. Hmm, I think a synthesis of sFamilyClass and Panose for fonts without OS/2 is requested. Is this the role of FreeType2, or of fontconfig? If you think it's the role of FreeType2, Please let me know, I will discuss with FreeType maintainers. According to IBM typeface classification in OpenType spec: http://www.microsoft.com/typography/otspec/ibmfc.htm "0" means "no classification" indicating that the font provides no information of the classification. This is exactly the case that the font without OS/2 table. So, I propose to use 0x0000 as fallback value for sFamilyClass of the fonts without OS/2. In Panose spec: http://www.panose.com/ProductsServices/pan1.aspx (see the end of the page), section 1.5 describes 2 values to be used when correct Panose values are unavailable: "0" (any) and "1" (no fit). According to Panose spec, "0" (any) should be used when the font selection/substitution system can synthesize a typeface fitting to the requested parameters from this font resource, as CFF Multiple Master Font. "1" (no fit) should be used to disable the evaluation of the parameter. The typical case is an Arabic typeface (Panose defines the classification parameters only for Latin script), so "1 1 1 1 1 1 1 1 1 1" (all parameters filled by 1 (no fit)) should be used to disable Panose evaluation completely. So "1 1 1 1 1 1 1 1 1 1" would be reasonable fallback value for Panose. For weight and proportion parameters, there is a possibility that we can synthesize the values detailed than "1", by using non-OS/2 info. But I'm not sure if it is better solution, because the typeface classification (text/display, handwritten, decorative, pictorial or other) is needed to determine how to parameterize the weight and proportion for Panose. According to the Panose spec, text/display & decorative typeface can hold 8-level parameter for a proportion, but handwritten & pictorial typeface can hold 2-level parameter for a proportion. If we cannot determine the typeface category, we cannot determine the number of levels to classify the proportion in Panose. What do you think about the idea to have a fallback values for sFamilyClass & Panose? Another reasonable attitude would be that fontconfig returns anything for the font without OS/2 table. I still want to see how we may want to query these before I can make my mind one way or another. Oh, it seems that I failed to understand what you wanted to clarify by your previous question. So, should I write some draft to lookup a font by sFamilyClass and/or Panose values? All the information you have provided so far has been quite useful. I'm thinking about it. But at the end of the day I don't want fontconfig to provide arbitrary information that you cannot search on. For example, (it belongs to the other bug but I'll say here), if we add license information, you should be able to match on GPL fonts, etc. One thing is definitely not right: if you have to list all fonts and find the one you want yourself by walking over them and checking the fontconfig-provided attributes. Created attachment 38886 [details] Panose and sFamilyClass values of TrueType fonts included with Ubuntu I attached a listing of Panose and sFamilyClass values on the default TrueType fonts on my Ubuntu 10.04 system, just to give an idea of what sort of data are on real fonts. Most have Panose values, only a few have sFamilyClass values. My two cents are that Panose seems to be an attempt at a complete font matching system like Fontconfig itself, except based entirely on objective, geometric measurements of the fonts. See http://www.monotypeimaging.com/ProductsServices/pan2.aspx . It seems to me to be orthogonal to the sFamilyClass values, which are subjective and let the font designer indicate how the font fits in to historical classifications. Personally, I find the sFamilyClass values more interesting; they're more human-friendly. However, searching by sFamilyClass wouldn't be that useful on my Ubuntu system right now without some more of the fonts having that metadata. # thanks Eric, now I'm writing my reply to Behdad. Before all, thank you for posting your investigation. When I check TrueType fonts bundled to Windows 7 in my lab, there are 481 fonts, and 336 fonts (ca 70%) have non-zero sFamilyClass values. About your comment that most free fonts on Ubuntu does not have meaningful values in sFamilyClass, there might be a chicken-egg problem. I think currently most free softwares does not care about sFamilyClass and Panose at all, so free font developers have little motivations to define them carefully. Also, current variety of typeface in free fonts is not so wide to use detailed classification of sFamilyClass, I think. Behdad, I understand the point is: "fontconfig is designed to provide compact API set to lookup the fonts by their attributes. If the information is difficult to help the font-search via compact API (e.g. getting long free text and grep by fontconfig client), it should not be handled by fontconfig, so it should not be cached." Although some people may have different view, I think it is one of the most reasonable attitude. # Some people want to use fontconfig as a database # collecting the data copied from fonts in the system: # some data can be searched, other data are just # readable and cannot be searched at all. To fit your view about what data should be handled by fontconfig, I think, some draft API set to lookup a font by sFamilyClass & Panose should be written by the people who want to use them. If some properties are not used, or have some conflicts with other existing properties (e.g. weight, proportion), they should not be cached. Is this right direction? Something along those lines I guess, yes... Created attachment 39011 [details]
a patchset to cache IBM sFamilyClass & Panose, query them in pattern, and sort the fonts by them.
Here is a patch set to cache IBM sFamilyClass & Panose,
query them in pattern, and sort the fonts by their values.
I want to hear the comments if such API is reasonable
interface to query sFamilyClass & Panose.
The patchset includes 3 patches:
------------------------------------------------
1) fontconfig_cache-familyclass+panose_20100928a_addrange8.diff
This patch introduce new value type "FcRange8", which
consists from two 8-bit values; "base", the best preferred
value, and "limit", the worse acceptable value.
sFamilyClass and Panose are a collection of 8-bit integer
(reading the specs carefully, I guess 4-bit could cover,
but TTF spends 8-bit for each integer), and usually the
clients are interested in giving the ranges of acceptable
values for each integers, instead of giving the exact values
for each integers. For example, to choose Serif fonts by
sFamilyClass, the higher 8-bit of sFamilyClass should be
0x01-0x07.
It is possible to hide two 8-bit integer into existing
types (e.g. FcChar32), I introduced new type to avoid
tricky coding in this proof of concept.
-------------------------------------------------
2) fontconfig_cache-familyclass+panose_20100928a_addfamilyclass+panose.diff
This patch adds the handler of sFamilyClass and Panose.
It loads sFamilyClass and Panose from OS/2 table,
and split them to component 8-bit integers, and cache
them as "fclass", "fsubclaas", "panose0", "panose1"...
The naming convention should be improved in future.
If the font has no OS/2 table, sFamilyClass values are
fallbacked to 0, Panose values are fallbacked to 1,
as I've discussed in previous comment.
Also it appends the range matching rules to the end of
_FcMatcher[] table used by FcCompareValueList().
By applying 1) + 2) patch,
fc-match -s :fclass=0x01-0xFF family fclass
will show a list including subtle effect of sFamilyClass
values.
-----------------------------------------------
3) fontconfig_cache-familyclass+panose_20100928a_addxmlsupport.diff
In patch 2), the effect of sFamilyClass & Panose is
very subtle, because the rules are appended to the
end of _FcMatcher[] table. If the client wants to
prioritize the effect of sFamilyClass & Panose, the
preference of _FcMatcher[] table should be edited.
This patch adds the private functions(*) editing
the strong/weak preferences in _FcMatcher[]
table, and new XML element <matcher> to
invoke the function. Applying this patch,
the XML element like:
<matcher>
<edit name="fclass" mode="assign_replace" binding="same"><int> 0</int></edit>
<edit name="fsubclass" mode="assign_replace" binding="same"><int> 2</int></edit>
<edit name="panose0" mode="assign_replace" binding="same"><int> 1</int></edit>
<edit name="panose1" mode="assign_replace" binding="same"><int> 3</int></edit>
<edit name="panose2" mode="assign_replace" binding="same"><int> 4</int></edit>
<edit name="panose3" mode="assign_replace" binding="same"><int> 5</int></edit>
<edit name="panose4" mode="assign_replace" binding="same"><int> 6</int></edit>
<edit name="panose5" mode="assign_replace" binding="same"><int> 7</int></edit>
<edit name="panose6" mode="assign_replace" binding="same"><int> 8</int></edit>
<edit name="panose7" mode="assign_replace" binding="same"><int> 9</int></edit>
<edit name="panose8" mode="assign_replace" binding="same"><int>10</int></edit>
<edit name="panose9" mode="assign_replace" binding="same"><int>11</int></edit>
</matcher>
makes the preferences for sFamilyClass & Panose
higher. Also public functions to do such per-application
without changing configuration files would be needed.
By the way, I wish if _FcMatcher[] is exposed to the clients and the user can configure the table. Is this bad idea? (In reply to comment #13) > By the way, I wish if _FcMatcher[] is exposed to > the clients and the user can configure the table. > Is this bad idea? I think there's a bug open about it already. Created attachment 39120 [details]
revised patchset to cache IBM sFamilyClass & Panose, query them in pattern, and sort the fonts by them.
Sorry, I slipped to include "fcrange.c" to the previous tarball.
Here is the revised tarball of the 3 patches described in
previous message.
(In reply to comment #14) > (In reply to comment #13) > > By the way, I wish if _FcMatcher[] is exposed to > > the clients and the user can configure the table. > > Is this bad idea? > > I think there's a bug open about it already. Bug 19375 "RFE: Add an API to get the binding type of values" is that you reminded? I wrote a patch to replace _FcMatchers[] by the dynamically allocated linked list and the client can insert/delete the rules in font matching. The approach might be different from what Karl (the submitter of bug 19375) was thinking, but his motivation might be similar with me. Should I go there and continue the discussion? *** Bug 30225 has been marked as a duplicate of this bug. *** http://cgit.freedesktop.org/~tagoh/fontconfig/commit/?h=panose-sfamilyclass-support This is just a prototype for this idea. I think adding raw data for sFamilyClass and/or Panose may be complicated. I did add the familyclass element to the cache instead. that should be simple enough and works as expected. Interesting. Did you survey fonts to see how many have bogus values for these? In fact this code was tested since Fedora 18 and borrowed from libeasyfc which is the backend of fonts-tweak-tool. there were only one or two cases as long as I got a report which it was classified to the unexpected thing and it was fixed in a font. there might be more but it may be a good start. Due to the limitation of the availability for those properties, Type1 and BDF fonts are classified to "unknown" at this moment FWIW. That's great. I'm not sure when I'll merge this into master since it requires bumping the cache version again. that could be in fontconfig-ng or when we have more features that requires the bump. I'm planning to merge those changes into master shortly. if anyone has any comments, please let me know. After thinking more, it may helps a lot for applications which has own font selection or requiring the manual font selection. but doesn't help at this moment for creating config file. may need to think about how to set a priority against familyclass in a cache. Aside from the SFNT sFamilyClass/Panose values, a more primitive form of self-identification is found in PFM files that accompany some PostScript Type1 fonts. This single-byte identification is named as "dfPitchAndFamily" in Adobe's documentations[1]; structually, the higher four bits encodes a generic class, while the least significant bit is set for proportional fonts. [1]: https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5178.PFM.pdf#page=7 A fuller definition of the higher four bits can be found in fontforge source[2]. (Yes, Windows FNT files have that byte too...) It includes 0x00 for "don't care", 0x10 for "serif", 0x20 for "sans", 0x30 for "fixed" (mono? why is it separate from the least significant bit?), 0x40 for "script", and 0x50 for "decorative". [2]: https://github.com/fontforge/fontforge/blob/b9149c1/fontforge/winfonts.c#L95-L102 -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/fontconfig/fontconfig/issues/93. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.