Bug 29497 - Support for sFamilyClass font attribute to allow searching by serif style
Summary: Support for sFamilyClass font attribute to allow searching by serif style
Status: RESOLVED MOVED
Alias: None
Product: fontconfig
Classification: Unclassified
Component: library (show other bugs)
Version: 2_1
Hardware: Other All
: medium enhancement
Assignee: Akira TAGOH
QA Contact: Behdad Esfahbod
URL:
Whiteboard:
Keywords:
: 30225 (view as bug list)
Depends on:
Blocks: 30225
  Show dependency treegraph
 
Reported: 2010-08-10 20:34 UTC by Eric Wasylishen
Modified: 2018-08-20 21:52 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Panose and sFamilyClass values of TrueType fonts included with Ubuntu (14.25 KB, text/plain)
2010-09-22 11:40 UTC, Eric Wasylishen
Details
a patchset to cache IBM sFamilyClass & Panose, query them in pattern, and sort the fonts by them. (7.15 KB, application/octet-stream)
2010-09-28 07:36 UTC, suzuki toshiya
Details
revised patchset to cache IBM sFamilyClass & Panose, query them in pattern, and sort the fonts by them. (8.03 KB, application/octet-stream)
2010-10-02 10:51 UTC, suzuki toshiya
Details

Description Eric Wasylishen 2010-08-10 20:34:22 UTC
The OpenType spec has a field called sFamilyClass in the required OS/2 table, which allows the font designer to classify the font based on a "family class" (e.g. serif, sans-serif) and "family subclass" (e.g. modern, old style). 

See:

http://www.microsoft.com/typography/otspec/os2.htm#fc

and:

http://www.microsoft.com/typography/otspec/ibmfc.htm


It would be really nice if Fontconfig read this field from fonts and allowed searching based on it (probably treating it as two separate fields: maybe FC_FAMILY_CLASS and FC_FAMILY_SUBCLASS?)

In my opinion, this kind of classification would be really useful to present to users for choosing fonts. I would be happy to write a patch if you think this is a good idea.. :-)
Comment 1 Behdad Esfahbod 2010-08-12 04:42:45 UTC
We can read the Panose values also.

Do you feel like bringing this up on the mailing list?  My immediate reaction is:

- What to do for non-OpenType fonts

- How accurate are these values across fonts available for free?
Comment 2 suzuki toshiya 2010-09-16 04:37:58 UTC
Oops, I slipped to find this discussion.
Behdad, what kind of the fonts you mind about, when you say "non-OpenType"?

a) older TrueType on MacOS, without OS/2 table
b) non-sfnt-housed PostScript fonts, like PS Type1, CID-keyed fonts, etc
c) non-sfnt-housed bitmap fonts, like BDF, PCF, etc
Comment 3 Behdad Esfahbod 2010-09-20 16:49:11 UTC
(In reply to comment #2)
> Oops, I slipped to find this discussion.
> Behdad, what kind of the fonts you mind about, when you say "non-OpenType"?
> 
> a) older TrueType on MacOS, without OS/2 table
> b) non-sfnt-housed PostScript fonts, like PS Type1, CID-keyed fonts, etc
> c) non-sfnt-housed bitmap fonts, like BDF, PCF, etc

All of these.  fontconfig is not OpenType-only.
Comment 4 suzuki toshiya 2010-09-22 06:01:20 UTC
Hmm, I think a synthesis of sFamilyClass and Panose for fonts
without OS/2 is requested. Is this the role of FreeType2, or
of fontconfig? If you think it's the role of FreeType2, Please
let me know, I will discuss with FreeType maintainers.

According to IBM typeface classification in OpenType spec:
  http://www.microsoft.com/typography/otspec/ibmfc.htm
"0" means "no classification" indicating that the font provides
no information of the classification. This is exactly the case
that the font without OS/2 table. So, I propose to use 0x0000
as fallback value for sFamilyClass of the fonts without OS/2.

In Panose spec:
  http://www.panose.com/ProductsServices/pan1.aspx
(see the end of the page), section 1.5 describes 2 values to
be used when correct Panose values are unavailable: "0" (any)
and "1" (no fit).

According to Panose spec,
"0" (any) should be used when the font selection/substitution
system can synthesize a typeface fitting to the requested
parameters from this font resource, as CFF Multiple Master Font.

"1" (no fit) should be used to disable the evaluation of the
parameter. The typical case is an Arabic typeface (Panose
defines the classification parameters only for Latin script),
so "1 1 1 1 1 1 1 1 1 1" (all parameters filled by 1 (no fit))
should be used to disable Panose evaluation completely.
So "1 1 1 1 1 1 1 1 1 1" would be reasonable fallback value
for Panose.

For weight and proportion parameters, there is a possibility
that we can synthesize the values detailed than "1", by using
non-OS/2 info. But I'm not sure if it is better solution,
because the typeface classification (text/display, handwritten,
decorative, pictorial or other) is needed to determine how to
parameterize the weight and proportion for Panose.
According to the Panose spec, text/display & decorative
typeface can hold 8-level parameter for a proportion, but
handwritten & pictorial typeface can hold 2-level parameter
for a proportion. If we cannot determine the typeface category,
we cannot determine the number of levels to classify the
proportion in Panose.

What do you think about the idea to have a fallback values for
sFamilyClass & Panose? Another reasonable attitude would be
that fontconfig returns anything for the font without OS/2 table.
Comment 5 Behdad Esfahbod 2010-09-22 10:42:39 UTC
I still want to see how we may want to query these before I can make my mind one way or another.
Comment 6 suzuki toshiya 2010-09-22 10:56:06 UTC
Oh, it seems that I failed to understand what you wanted to clarify by
your previous question. So, should I write some draft to lookup a font
by sFamilyClass and/or Panose values?
Comment 7 Behdad Esfahbod 2010-09-22 11:02:02 UTC
All the information you have provided so far has been quite useful.  I'm thinking about it.  But at the end of the day I don't want fontconfig to provide arbitrary information that you cannot search on.

For example, (it belongs to the other bug but I'll say here), if we add license information, you should be able to match on GPL fonts, etc.

One thing is definitely not right: if you have to list all fonts and find the one you want yourself by walking over them and checking the fontconfig-provided attributes.
Comment 8 Eric Wasylishen 2010-09-22 11:40:35 UTC
Created attachment 38886 [details]
Panose and sFamilyClass values of TrueType fonts included with Ubuntu

I attached a listing of Panose and sFamilyClass values on the default TrueType fonts on my Ubuntu 10.04 system, just to give an idea of what sort of data are on real fonts. Most have Panose values, only a few have sFamilyClass values.

My two cents are that Panose seems to be an attempt at a complete font matching system like Fontconfig itself, except based entirely on objective, geometric measurements of the fonts. See http://www.monotypeimaging.com/ProductsServices/pan2.aspx . It seems to me to be orthogonal to the sFamilyClass values, which are subjective and let the font designer indicate how the font fits in to historical classifications. Personally, I find the sFamilyClass values more interesting; they're more human-friendly. 

However, searching by sFamilyClass wouldn't be that useful on my Ubuntu system right now without some more of the fonts having that metadata.
Comment 9 suzuki toshiya 2010-09-22 12:07:37 UTC
# thanks Eric, now I'm writing my reply to Behdad.

Before all, thank you for posting your investigation.

When I check TrueType fonts bundled to Windows 7
in my lab, there are 481 fonts, and 336 fonts (ca 70%)
have non-zero sFamilyClass values.

About your comment that most free fonts on Ubuntu
does not have meaningful values in sFamilyClass,
there might be a chicken-egg problem. I think currently
most free softwares does not care about sFamilyClass
and Panose at all, so free font developers have little
motivations to define them carefully. Also, current
variety of typeface in free fonts is not so wide to use
detailed classification of sFamilyClass, I think.
Comment 10 suzuki toshiya 2010-09-22 12:15:14 UTC
Behdad,

I understand the point is: "fontconfig is designed to
provide compact API set to lookup the fonts by
their attributes. If the information is difficult to help
the font-search via compact API (e.g. getting long
free text and grep by fontconfig client), it should not
be handled by fontconfig, so it should not be cached."

Although some people may have different view,
I think it is one of the most reasonable attitude.
# Some people want to use fontconfig as a database
# collecting the data copied from fonts in the system:
# some data can be searched, other data are just
# readable and cannot be searched at all.

To fit your view about what data should be handled
by fontconfig, I think, some draft API set to lookup
a font by sFamilyClass & Panose should be written
by the people who want to use them. If some properties
are not used, or have some conflicts with other existing
properties (e.g. weight, proportion), they should not
be cached. Is this right direction?
Comment 11 Behdad Esfahbod 2010-09-22 12:26:01 UTC
Something along those lines I guess, yes...
Comment 12 suzuki toshiya 2010-09-28 07:36:55 UTC
Created attachment 39011 [details]
a patchset to cache IBM sFamilyClass & Panose, query them in pattern, and sort the fonts by them.

Here is a patch set to cache IBM sFamilyClass & Panose,
query them in pattern, and sort the fonts by their values.
I want to hear the comments if such API is reasonable
interface to query sFamilyClass & Panose.

The patchset includes 3 patches:

------------------------------------------------
1) fontconfig_cache-familyclass+panose_20100928a_addrange8.diff

This patch introduce new value type "FcRange8", which
consists from two 8-bit values; "base", the best preferred
value, and "limit", the worse acceptable value.

sFamilyClass and Panose are a collection of 8-bit integer
(reading the specs carefully, I guess 4-bit could cover,
but TTF spends 8-bit for each integer), and usually the
clients are interested in giving the ranges of acceptable
values for each integers, instead of giving the exact values
for each integers. For example, to choose Serif fonts by
sFamilyClass, the higher 8-bit of sFamilyClass should be
0x01-0x07.

It is possible to hide two 8-bit integer into existing
types (e.g. FcChar32), I introduced new type to avoid
tricky coding in this proof of concept.

-------------------------------------------------
2) fontconfig_cache-familyclass+panose_20100928a_addfamilyclass+panose.diff

This patch adds the handler of sFamilyClass and Panose.
It loads sFamilyClass and Panose from OS/2 table,
and split them to component 8-bit integers, and cache
them as "fclass", "fsubclaas", "panose0", "panose1"...
The naming convention should be improved in future.
If the font has no OS/2 table, sFamilyClass values are
fallbacked to 0, Panose values are fallbacked to 1,
as I've discussed in previous comment.

Also it appends the range matching rules to the end of
_FcMatcher[] table used by FcCompareValueList().
By applying 1) + 2) patch,

fc-match -s :fclass=0x01-0xFF family fclass

will show a list including subtle effect of sFamilyClass
values.
-----------------------------------------------
3) fontconfig_cache-familyclass+panose_20100928a_addxmlsupport.diff

In patch 2), the effect of sFamilyClass & Panose is
very subtle, because the rules are appended to the
end of _FcMatcher[] table. If the client wants to
prioritize the effect of sFamilyClass & Panose, the
preference of _FcMatcher[] table should be edited.

This patch adds the private functions(*) editing
the strong/weak preferences in _FcMatcher[]
table, and new XML element <matcher> to
invoke the function. Applying this patch,
the XML element like:

  <matcher>
    <edit name="fclass"    mode="assign_replace" binding="same"><int> 0</int></edit>
    <edit name="fsubclass" mode="assign_replace" binding="same"><int> 2</int></edit>
    <edit name="panose0"   mode="assign_replace" binding="same"><int> 1</int></edit>
    <edit name="panose1"   mode="assign_replace" binding="same"><int> 3</int></edit>
    <edit name="panose2"   mode="assign_replace" binding="same"><int> 4</int></edit>
    <edit name="panose3"   mode="assign_replace" binding="same"><int> 5</int></edit>
    <edit name="panose4"   mode="assign_replace" binding="same"><int> 6</int></edit>
    <edit name="panose5"   mode="assign_replace" binding="same"><int> 7</int></edit>
    <edit name="panose6"   mode="assign_replace" binding="same"><int> 8</int></edit>
    <edit name="panose7"   mode="assign_replace" binding="same"><int> 9</int></edit>
    <edit name="panose8"   mode="assign_replace" binding="same"><int>10</int></edit>
    <edit name="panose9"   mode="assign_replace" binding="same"><int>11</int></edit>
  </matcher>

makes the preferences for sFamilyClass & Panose
higher. Also public functions to do such per-application
without changing configuration files would be needed.
Comment 13 suzuki toshiya 2010-09-28 07:41:53 UTC
By the way, I wish if _FcMatcher[] is exposed to
the clients and the user can configure the table.
Is this bad idea?
Comment 14 Behdad Esfahbod 2010-09-28 11:36:15 UTC
(In reply to comment #13)
> By the way, I wish if _FcMatcher[] is exposed to
> the clients and the user can configure the table.
> Is this bad idea?

I think there's a bug open about it already.
Comment 15 suzuki toshiya 2010-10-02 10:51:36 UTC
Created attachment 39120 [details]
revised patchset to cache IBM sFamilyClass & Panose, query them in pattern, and sort the fonts by them.

Sorry, I slipped to include "fcrange.c" to the previous tarball.
Here is the revised tarball of the 3 patches described in
previous message.
Comment 16 suzuki toshiya 2010-10-02 11:06:59 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > By the way, I wish if _FcMatcher[] is exposed to
> > the clients and the user can configure the table.
> > Is this bad idea?
> 
> I think there's a bug open about it already.

Bug 19375 "RFE: Add an API to get the binding type
of values" is that you reminded? I wrote a patch to
replace _FcMatchers[] by the dynamically allocated
linked list and the client can insert/delete the rules
in font matching. The approach might be different
from what Karl (the submitter of bug 19375) was
thinking, but his motivation might be similar with me.
Should I go there and continue the discussion?
Comment 17 Behdad Esfahbod 2011-03-14 15:14:35 UTC
*** Bug 30225 has been marked as a duplicate of this bug. ***
Comment 18 Akira TAGOH 2013-09-18 10:07:13 UTC
http://cgit.freedesktop.org/~tagoh/fontconfig/commit/?h=panose-sfamilyclass-support

This is just a prototype for this idea. I think adding raw data for sFamilyClass and/or Panose may be complicated. I did add the familyclass element to the cache instead. that should be simple enough and works as expected.
Comment 19 Behdad Esfahbod 2013-09-18 14:56:06 UTC
Interesting.  Did you survey fonts to see how many have bogus values for these?
Comment 20 Akira TAGOH 2013-09-19 03:24:45 UTC
In fact this code was tested since Fedora 18 and borrowed from libeasyfc which is the backend of fonts-tweak-tool. there were only one or two cases as long as I got a report which it was classified to the unexpected thing and it was fixed in a font. there might be more but it may be a good start.

Due to the limitation of the availability for those properties, Type1 and BDF fonts are classified to "unknown" at this moment FWIW.
Comment 21 Behdad Esfahbod 2013-09-19 18:32:17 UTC
That's great.
Comment 22 Akira TAGOH 2013-09-24 09:23:15 UTC
I'm not sure when I'll merge this into master since it requires bumping the cache version again. that could be in fontconfig-ng or when we have more features that requires the bump.
Comment 23 Akira TAGOH 2014-03-26 08:00:30 UTC
I'm planning to merge those changes into master shortly. if anyone has any comments, please let me know.
Comment 24 Akira TAGOH 2014-09-25 10:55:00 UTC
After thinking more, it may helps a lot for applications which has own font selection or requiring the manual font selection. but doesn't help at this moment for creating config file. may need to think about how to set a priority against familyclass in a cache.
Comment 25 Mingye Wang (Arthur2e5) 2018-03-23 21:03:39 UTC
Aside from the SFNT sFamilyClass/Panose values, a more primitive form of self-identification is found in PFM files that accompany some PostScript Type1 fonts. This single-byte identification is named as "dfPitchAndFamily" in Adobe's documentations[1]; structually, the higher four bits encodes a generic class, while the least significant bit is set for proportional fonts.
   [1]: https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5178.PFM.pdf#page=7

A fuller definition of the higher four bits can be found in fontforge source[2]. (Yes, Windows FNT files have that byte too...) It includes 0x00 for "don't care", 0x10 for "serif", 0x20 for "sans", 0x30 for "fixed" (mono? why is it separate from the least significant bit?), 0x40 for "script", and 0x50 for "decorative".

  [2]: https://github.com/fontforge/fontforge/blob/b9149c1/fontforge/winfonts.c#L95-L102
Comment 26 GitLab Migration User 2018-08-20 21:52:04 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/fontconfig/fontconfig/issues/93.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.