Bug 90330 - Preserve binding when preparing patterns
Summary: Preserve binding when preparing patterns
Status: RESOLVED MOVED
Alias: None
Product: fontconfig
Classification: Unclassified
Component: library (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: fontconfig-bugs
QA Contact: Behdad Esfahbod
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-06 00:56 UTC by Behdad Esfahbod
Modified: 2018-08-20 21:49 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Patch (5.50 KB, patch)
2015-05-06 01:58 UTC, Behdad Esfahbod
Details | Splinter Review

Description Behdad Esfahbod 2015-05-06 00:56:27 UTC
In FcFontRenderPrepare(), matching values are always added with binding=strong.  If we change that to retain the binding of the matched value and also add API to get binding, then we can have a recommended way to determine whether font fallback happened.
Comment 1 Behdad Esfahbod 2015-05-06 01:24:10 UTC
Looks like it's harder than I expected...
Comment 2 Behdad Esfahbod 2015-05-06 01:58:21 UTC
Created attachment 115569 [details] [review]
Patch

Sample patch attached.  It slightly slows down matching, but should be fine.

We might need to go loosen up some of the binding="same" from 30-metric-aliases.conf after we expose this.

With this patch:

behdad:src 0$ ../fc-match/fc-match 'arial,nazli' --sort --verbose | grep 'family:' | head
	family: "Arial"(s)
	family: "Arial"(s)
	family: "Liberation Sans"(s)
	family: "Nimbus Sans L"(w)
	family: "DejaVu Serif"(w)
	family: "DejaVu Serif"(w)
	family: "DejaVu Serif"(w)
	family: "DejaVu Serif"(w)
	family: "Kinnari"(w)
	family: "Norasi"(w)

without this patch:

ehdad:src 0$ ../fc-match/fc-match 'arial,nazli' --sort --verbose | grep 'family:' | head
	family: "Arial"(s)
	family: "Arial"(s)
	family: "Liberation Sans"(s)
	family: "Nimbus Sans L"(s)
	family: "DejaVu Serif"(s)
	family: "DejaVu Serif"(s)
	family: "DejaVu Serif"(s)
	family: "DejaVu Serif"(s)
	family: "Kinnari"(s)
	family: "Norasi"(s)

check the bindings.
Comment 3 Behdad Esfahbod 2015-05-06 01:59:41 UTC
API to get binding is bug 19375.
We don't have API to set binding either, but that's less critical.
Comment 4 Behdad Esfahbod 2015-05-06 02:15:52 UTC
Since bindings are not public API right now, we might as well add a new value, so we'd have:

  binding=strong -> match
  binding=weak   -> approximate
  binding=??     -> fallback

or maybe use weak for fallback and add a semi-strong for approximates.  That might reduce the number of configurations we would need to adjust.
Comment 5 Behdad Esfahbod 2015-05-06 07:43:48 UTC
The website fontfamily.io has a similar model, where it reports supported, aliased, or OS default:

Eg.:

  http://fontfamily.io/times
Comment 6 Karl Tomlinson 2015-05-07 05:04:40 UTC
(In reply to Behdad Esfahbod from comment #4)
> Since bindings are not public API right now, we might as well add a new
> value, so we'd have:
> 
>   binding=strong -> match
>   binding=weak   -> approximate
>   binding=??     -> fallback
> 
> or maybe use weak for fallback and add a semi-strong for approximates.  That
> might reduce the number of configurations we would need to adjust.

The current behavior of families with weak bindings being considered only after families with strong bindings may be worth remembering here.
Comment 7 Behdad Esfahbod 2015-05-07 06:32:35 UTC
(In reply to Karl Tomlinson from comment #6)
> (In reply to Behdad Esfahbod from comment #4)
> > Since bindings are not public API right now, we might as well add a new
> > value, so we'd have:
> > 
> >   binding=strong -> match
> >   binding=weak   -> approximate
> >   binding=??     -> fallback
> > 
> > or maybe use weak for fallback and add a semi-strong for approximates.  That
> > might reduce the number of configurations we would need to adjust.
> 
> The current behavior of families with weak bindings being considered only
> after families with strong bindings may be worth remembering here.

Right... I'm really hesitant to touch the matcher, but want to address the bigger question of the match-quality this time.

Lets hear from Akira as well.  If he doesn't have time to work on it, I might give it a try myself.
Comment 8 Karl Tomlinson 2015-05-07 08:21:02 UTC
(In reply to Behdad Esfahbod from comment #7)
> Right... I'm really hesitant to touch the matcher, but want to address the
> bigger question of the match-quality this time.

Oh, yes.  I didn't mean to imply that matching behaviour needed changing, but I assumed that distinguishing alias and fallback quality might require new binding strengths in <alias> rules.

Perhaps not if the original families in the pattern could have a binding > strong.

The weak match behaviour also comes into play with any changes to 30-metric-aliases.conf.
Comment 9 Behdad Esfahbod 2015-05-07 08:41:29 UTC
I can think of four different levels of matchness:

1. match: user requested Arial and I found it,

2. approximate: user requested Arial and I found Liberation Sans,

3. fallback: user requested Arial and I found this sans-serif Persian font,

4. no match: user requested Arial, here's some font that covers some characters no other font supports, and it doesn't have anything to do with Arial.

The last one is easy to add, it's just all the matching fonts that have score 0 for both family-strong and family-weak match.  First one is also easy, that's what my patch does.

Right now 2 is also marked as match, that's because 30-metric-aliases.conf does binding="same".  If we remove that, then 2 and 3 will become the same.  I like to try to distinguish them.

So, yes, I think adding more levels to the bindings is possible.  We can then decide how to bucket them.  We can still bucket them into strong and weak and have the exact same matching algorithm that we have right now, or if we really wanted to, we can add more buckets.
Comment 11 Urs Liska 2015-05-08 19:19:57 UTC
I don't understand the technical details of this, but wanted to note here that GNU LilyPond would also greatly benefit from this.

We are improving our mechanism to switch to alternative music fonts, and it is absolutely necessary to determine if a requested font is present on the user's system. Random fallback fonts returned by fontconfig don't help us, we have to exactly know if the font is present or apply our own fallback and use LilyPond's application font then.
Comment 12 Karl Tomlinson 2015-05-10 01:15:57 UTC
(In reply to Behdad Esfahbod from comment #9)
> Right now 2 is also marked as match, that's because 30-metric-aliases.conf
> does binding="same".

That might be fine, because the client can check whether the family name is a perfect match.

> If we remove that, then 2 and 3 will become the same. 
> I like to try to distinguish them.

Yes, distinguishing these would be useful.  It is much harder for the client to distinguish these.

Also, binding="same" is very useful.
Changing from same to weak binding in 30-metric-aliases would mean that "Arial, Droid Sans Fallback" would change to prefering Droid Sans Fallback over Arimo and Liberation Sans for example.
Comment 13 Akira TAGOH 2015-05-11 06:20:56 UTC
The idea sounds good but:

(In reply to Behdad Esfahbod from comment #9)
> I can think of four different levels of matchness:
> 
> 1. match: user requested Arial and I found it,
> 
> 2. approximate: user requested Arial and I found Liberation Sans,
> 
> 3. fallback: user requested Arial and I found this sans-serif Persian font,
> 
> 4. no match: user requested Arial, here's some font that covers some
> characters no other font supports, and it doesn't have anything to do with
> Arial.

If we don't consider to drop adding sans-serif like we do in 49-sansserif.conf, we won't see the result in 4. that will be always shown as a fallback in 3 then.

or we could make it conditional to query a font without a fallback perhaps like:

<match target="pattern">
   <test name="fallback">
      <bool>true</bool>
   </test>
   <test qual="all" name="family" compare="not_eq">
        <string>sans-serif</string>
   </test>
   ...
</match>

> 
> The last one is easy to add, it's just all the matching fonts that have
> score 0 for both family-strong and family-weak match.  First one is also
> easy, that's what my patch does.
> 
> Right now 2 is also marked as match, that's because 30-metric-aliases.conf
> does binding="same".  If we remove that, then 2 and 3 will become the same. 
> I like to try to distinguish them.
> 
> So, yes, I think adding more levels to the bindings is possible.  We can
> then decide how to bucket them.  We can still bucket them into strong and
> weak and have the exact same matching algorithm that we have right now, or
> if we really wanted to, we can add more buckets.
Comment 14 bungeman 2015-05-12 13:35:07 UTC
(In reply to Behdad Esfahbod from comment #9)
> I can think of four different levels of matchness:
> 
> 1. match: user requested Arial and I found it,
> 
> 2. approximate: user requested Arial and I found Liberation Sans,
> 
> 3. fallback: user requested Arial and I found this sans-serif Persian font,
> 
> 4. no match: user requested Arial, here's some font that covers some
> characters no other font supports, and it doesn't have anything to do with
> Arial.
> 
> The last one is easy to add, it's just all the matching fonts that have
> score 0 for both family-strong and family-weak match.  First one is also
> easy, that's what my patch does.
> 
> Right now 2 is also marked as match, that's because 30-metric-aliases.conf
> does binding="same".  If we remove that, then 2 and 3 will become the same. 
> I like to try to distinguish them.
> 
> So, yes, I think adding more levels to the bindings is possible.  We can
> then decide how to bucket them.  We can still bucket them into strong and
> weak and have the exact same matching algorithm that we have right now, or
> if we really wanted to, we can add more buckets.

I can think of one more, just to muddy the waters, which is 'preferred'. This is when Arial is requested, but the configuration says prefer Liberation Sans. In some sense this is a 'match', even though the returned font may in some sense be unrelated to the requested font. Maybe this is a magic value of setting both the 'match' and 'approximate' bits of the match.
Comment 15 bungeman 2015-05-12 13:38:19 UTC
(In reply to Behdad Esfahbod from comment #10)
> See Chrome needing this feature for example:
> 
> https://code.google.com/p/chromium/codesearch#chromium/src/third_party/skia/
> src/ports/SkFontConfigInterface_direct.cpp&q=cousine&sq=package:
> chromium&l=178

Note that the 'real' FontConfig back-end (not yet used by Chromium) in Skia goes through some work to try to implement something like this, see https://code.google.com/p/chromium/codesearch#chromium/src/third_party/skia/src/ports/SkFontMgr_fontconfig.cpp&q=SkFontMgr&sq=package:chromium&l=727 for onMatchFamilyStyle, remove_weak, and is_weak there. It would be great to have the matching strength, or even just the value strength, exposed in FontConfig.
Comment 16 Behdad Esfahbod 2015-05-12 19:05:07 UTC
(In reply to bungeman from comment #14)
> (In reply to Behdad Esfahbod from comment #9)
> > I can think of four different levels of matchness:
> > 
> > 1. match: user requested Arial and I found it,
> > 
> > 2. approximate: user requested Arial and I found Liberation Sans,
> > 
> > 3. fallback: user requested Arial and I found this sans-serif Persian font,
> > 
> > 4. no match: user requested Arial, here's some font that covers some
> > characters no other font supports, and it doesn't have anything to do with
> > Arial.
> > 
> > The last one is easy to add, it's just all the matching fonts that have
> > score 0 for both family-strong and family-weak match.  First one is also
> > easy, that's what my patch does.
> > 
> > Right now 2 is also marked as match, that's because 30-metric-aliases.conf
> > does binding="same".  If we remove that, then 2 and 3 will become the same. 
> > I like to try to distinguish them.
> > 
> > So, yes, I think adding more levels to the bindings is possible.  We can
> > then decide how to bucket them.  We can still bucket them into strong and
> > weak and have the exact same matching algorithm that we have right now, or
> > if we really wanted to, we can add more buckets.
> 
> I can think of one more, just to muddy the waters, which is 'preferred'.
> This is when Arial is requested, but the configuration says prefer
> Liberation Sans. In some sense this is a 'match', even though the returned
> font may in some sense be unrelated to the requested font. Maybe this is a
> magic value of setting both the 'match' and 'approximate' bits of the match.

That was what I called "approximate".  How is your 'preferred' level different?
Comment 17 Behdad Esfahbod 2015-05-12 19:07:29 UTC
Previously I said that we don't have any API to set or get bindings.  That's incorrect.  There is FcPatternAddWeak() already.
Comment 18 Behdad Esfahbod 2015-05-12 19:15:33 UTC
Copying from mailing list discussion:

On 15-05-09 03:35 AM, Raimund Steger wrote:
> On 05/08/15 20:29, Behdad Esfahbod wrote:
>> Hi all,
>>
>> I'm sure most of you who have been around know that it's been a very common
>> request to want to know whether fontconfig found a match for a request or just
>> fallbacks.  I've been thinking about this a lot recently, and like to make
>> sure we fix it this time.
>
> This sounds interesting. The recent discussion about LilyPond reminded me that
> while fontconfig does have FcFontList(3) and FcFontSetList(3) to query fonts
> without any scoring at all,
>

> (1) this is normally not exposed by higher-level libraries;
>
> (2) this doesn't sort regular variants near the top (obviously), so
> FcFontMatch(3) has to be called at some point anyway;

Right.  I was thinking last night that we might even be able to close the gap
between FcFontList and FcFontSort in the future.  If for every specified
element in the pattern the user can say how strict of a requirement that is,
one end will become FcFontList, the other FcFontSort.  Users can then have,
say, strict requirement for FC_SCALABLE, but non-strict for others, that
essentially filters out bitmap fonts, a feature many clients need and
currently implement incorrectly by calling FcFontSort() and filtering out the
bitmap fonts.


> hence improving FcFontMatch(3) and FcFontSort(3) might indeed be worthwhile.
>
> At the moment, whether a configuration rule uses prepend_first v. prepend v.
> append v. append_last might carry a meaning of 'approximate' v. 'fallback'
> when a human reader examines the config, but not to the matching engine where
> everything is only a position in the property list. The 'fallback' boundary
> somewhere in that list is one purely of convention.

Correct.  I think we can use those rules to carry new information in the pattern.


> As long as the original documented purpose of the binding value in terms of
> fonts-conf(5) (that 'lang' could overrule 'family' in the match if the user
> explicitly stated the former, i. e. one of prioritizing property A over
> property B) is kept unchanged I think yes, it is possible to factor that
> meaning into the value.

Agreed.  Though, I still sometimes don't fully understand why that was desirable.


> To state it more explicitly, it is probably not a good idea to introduce
> strong family bindings into a pattern that didn't originally have them, by
> means of alias rules, only because we now think of binding as "how close is
> family A to family B". Such alias rules, where they specify binding="same"
> now, could only specify something like "same-or-lower" in the future --
> correct me if I'm wrong.

Spot on.  I was thinking about a binding="lower" kind of thing.  We can map
those internally to integers that keep going lower every time.
Comment 19 bungeman 2015-05-19 22:12:56 UTC
(In reply to Behdad Esfahbod from comment #16)
> (In reply to bungeman from comment #14)
> > (In reply to Behdad Esfahbod from comment #9)
> > > I can think of four different levels of matchness:
> > > 
> > > 1. match: user requested Arial and I found it,
> > > 
> > > 2. approximate: user requested Arial and I found Liberation Sans,
> > > 
> > > 3. fallback: user requested Arial and I found this sans-serif Persian font,
> > > 
> > > 4. no match: user requested Arial, here's some font that covers some
> > > characters no other font supports, and it doesn't have anything to do with
> > > Arial.
> > > 
> > > The last one is easy to add, it's just all the matching fonts that have
> > > score 0 for both family-strong and family-weak match.  First one is also
> > > easy, that's what my patch does.
> > > 
> > > Right now 2 is also marked as match, that's because 30-metric-aliases.conf
> > > does binding="same".  If we remove that, then 2 and 3 will become the same. 
> > > I like to try to distinguish them.
> > > 
> > > So, yes, I think adding more levels to the bindings is possible.  We can
> > > then decide how to bucket them.  We can still bucket them into strong and
> > > weak and have the exact same matching algorithm that we have right now, or
> > > if we really wanted to, we can add more buckets.
> > 
> > I can think of one more, just to muddy the waters, which is 'preferred'.
> > This is when Arial is requested, but the configuration says prefer
> > Liberation Sans. In some sense this is a 'match', even though the returned
> > font may in some sense be unrelated to the requested font. Maybe this is a
> > magic value of setting both the 'match' and 'approximate' bits of the match.
> 
> That was what I called "approximate".  How is your 'preferred' level
> different?

I think what I meant by 'preferred' is like <prefer>, while 'approximate' seems more like <accept>, with <default> matching up with 'fallback'. In other words, with 'preferred' I know that, at least so far as the user is concerned, I got an actual 'perfect' match, even if the font data and resolved pattern disagree. It's not just <accept>able or 'approximate'; it's 'falling forward' as opposed to 'falling back'.
Comment 20 Behdad Esfahbod 2015-05-19 22:39:03 UTC
(In reply to bungeman from comment #19)
> (In reply to Behdad Esfahbod from comment #16)
> > (In reply to bungeman from comment #14)
> > > (In reply to Behdad Esfahbod from comment #9)
> > > > I can think of four different levels of matchness:
> > > > 
> > > > 1. match: user requested Arial and I found it,
> > > > 
> > > > 2. approximate: user requested Arial and I found Liberation Sans,
> > > > 
> > > > 3. fallback: user requested Arial and I found this sans-serif Persian font,
> > > > 
> > > > 4. no match: user requested Arial, here's some font that covers some
> > > > characters no other font supports, and it doesn't have anything to do with
> > > > Arial.
> > > > 
> > > > The last one is easy to add, it's just all the matching fonts that have
> > > > score 0 for both family-strong and family-weak match.  First one is also
> > > > easy, that's what my patch does.
> > > > 
> > > > Right now 2 is also marked as match, that's because 30-metric-aliases.conf
> > > > does binding="same".  If we remove that, then 2 and 3 will become the same. 
> > > > I like to try to distinguish them.
> > > > 
> > > > So, yes, I think adding more levels to the bindings is possible.  We can
> > > > then decide how to bucket them.  We can still bucket them into strong and
> > > > weak and have the exact same matching algorithm that we have right now, or
> > > > if we really wanted to, we can add more buckets.
> > > 
> > > I can think of one more, just to muddy the waters, which is 'preferred'.
> > > This is when Arial is requested, but the configuration says prefer
> > > Liberation Sans. In some sense this is a 'match', even though the returned
> > > font may in some sense be unrelated to the requested font. Maybe this is a
> > > magic value of setting both the 'match' and 'approximate' bits of the match.
> > 
> > That was what I called "approximate".  How is your 'preferred' level
> > different?
> 
> I think what I meant by 'preferred' is like <prefer>, while 'approximate'
> seems more like <accept>, with <default> matching up with 'fallback'. In
> other words, with 'preferred' I know that, at least so far as the user is
> concerned, I got an actual 'perfect' match, even if the font data and
> resolved pattern disagree. It's not just <accept>able or 'approximate'; it's
> 'falling forward' as opposed to 'falling back'.

<prefer> seems to only be used currently to implement virtual families (sans, serif, etc).  Also, if user asked for "Arial", it's hard to say anything other than Arial is a better match...

If user asked for "sans", however, it would be useful to tell them that indeed what we returned is the preferred sans font...

So yeah, I think I like adding additional meanings to <accept>, <default>, and <prefer>.

So now we have two competing proposals: binding strength, versus the accept/default/prefer mechanism.

In fact, now that I check, I think we are not consistent in how we use those things.  For example, 30-metric-aliases has these:

        <alias binding="same">
          <family>Nimbus Sans</family>
          <default>
          <family>Helvetica</family>
          </default>
        </alias>

I think the default should be accept instead.

So, <prefer> currently happens without binding="same".  I guess it should.  Generally, sounds like <accept> and <prefer> should go with binding="same", whereas <default> shouldn't.  Does that sound about right?
Comment 21 Karl Tomlinson 2015-05-19 23:31:49 UTC
(In reply to bungeman from comment #19)
> I think what I meant by 'preferred' is like <prefer>, while 'approximate'
> seems more like <accept>, with <default> matching up with 'fallback'. In
> other words, with 'preferred' I know that, at least so far as the user is
> concerned, I got an actual 'perfect' match, even if the font data and
> resolved pattern disagree. It's not just <accept>able or 'approximate'; it's
> 'falling forward' as opposed to 'falling back'.

I wonder whether this distinction is really necessary.  If the app does want
to act differently on knowledge of preferred vs accept, then it can look at
the edited match/sort pattern.

I suspect this could be difficult to prescribe and implement as a property on
a single family.  If the app requests "Favourite font", "Helvetica" and the
user prefers Arial over Helvetica, then I'm not sure that an Arial match is
really preferred rather than accepted.

I also suspect this would be overloading binding too much with something is
really something quite different.  (See next comment.)
Comment 22 Karl Tomlinson 2015-05-19 23:37:28 UTC
(In reply to Behdad Esfahbod from comment #20)
> <prefer> seems to only be used currently to implement virtual families
> (sans, serif, etc).

> So, <prefer> currently happens without binding="same".  I guess it should. 
> Generally, sounds like <accept> and <prefer> should go with binding="same",
> whereas <default> shouldn't.  Does that sound about right?

When using <prefer> with sans-serif, etc., the weak binding is important and
so I assume intentional.

If the app asks for "sans-serif", then the behavior with weak is that the font
provided supports the language.  binding="same" would remove that behavior and
always return the same font regardless of language.

I suspect that bindings don't map well to prefer/accept/default because
bindings are about whether language support should take priority.
prefer/accept/default determine order amongst existing families of the same
binding (though I'm not sure that the behavior of always preferring all strong
families over any weak families is intentional).

Adding a "no match" binding for use in FcFontRenderPrepare may still be a way
to provide the useful information that the font didn't match the family at
all.  I can't think of a conflict with existing binding usage.

> If user asked for "sans", however, it would be useful to tell them that
> indeed what we returned is the preferred sans font...

I think it is sufficient to distinguish "weak" for preferred font for this
language, and "no match" for random font which may or may not support the
language.

binding="same" is often important, however, to ensure that a weak
<accept> doesn't behave like a <default>. 

>         <alias binding="same">
>           <family>Nimbus Sans</family>
>           <default>
>           <family>Helvetica</family>
>           </default>
>         </alias>
> 
> I think the default should be accept instead.

Probably (without checking how that interacts with other aliases in that
file), because I assume the intention is that, if an app asks for "Nimbus
Sans", "Some fallback font", then the priority should be "Nimbus Sans",
"Helvetica", "Some fallback font", rather than the current "Nimbus Sans",
"Some fallback font", "Helvetica".  Perhaps there are counter examples, but
accept and default at least provide some degree of choice in behavior.
Comment 23 GitLab Migration User 2018-08-20 21:49:54 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/fontconfig/fontconfig/issues/73.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.