19376 – Optimize family name matching in sort and match functions

Bug 19376 - Optimize family name matching in sort and match functions

Summary: Optimize family name matching in sort and match functions

Status:	RESOLVED MOVED

Alias:	None

Product:	fontconfig
Classification:	Unclassified
Component:	library (show other bugs)
Version:	2_1
Hardware:	Other All

Importance:	high enhancement
Assignee:	fontconfig-bugs
QA Contact:	Behdad Esfahbod

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2009-01-03 00:07 UTC by Karl Tomlinson
Modified:	2018-08-20 21:45 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Karl Tomlinson 2009-01-03 00:07:27 UTC

Currently the time for a FcFont(Set)Sort with trim = false (or for a
FcFont(Set)Match) I assume) is typically dominated by comparing family names.

By the time aliases are added to the sort pattern there are typically 80-100
family names in the pattern (p).  For 300 to 500 fonts (some of which will
have family names for each language) (n), that is p * n ~ 40,000
case-insensitive string comparisons. 

I think a sensible approach to optimizing this would be to sort the families
in the sort pattern (removing duplicates, recording the score for each family,
and remembering to consider the binding) so that each family lookup is
O(log p) and the total complexity is O((n + p) log p).

This would be faster whenever n >> log p, which is almost always the case.

(Behdad was considering something similar.  I think this is slightly different
from what Behdad had in mind, but perhaps he thought of this too.)

(I guess the same argument could be applied to any (sortable) property, but it
is typically only family properties that have so many values.)

(The sorted family list is essentially a cheap hash table.  I'm guessing that
looking up a family in a sorted list is going to be cheaper ~O(log p), assuming there are not too many similar family names), than generating a hash for the family O(length of string).)

Comment 1 Behdad Esfahbod 2009-01-03 17:20:42 UTC

What I had in mind was sorting both the fontset and the pattern family list and walk them together, giving a m+n algorithm.

The problem with both approaches is that, I think, some fonts can have multiple family names.  That's a pain...  Maybe initially I just create a hash table and stuff all the fonts into it.

Comment 2 Karl Tomlinson 2009-01-06 13:16:01 UTC

(In reply to comment #1)
> The problem with both approaches is that, I think, some fonts can have multiple
> family names.  That's a pain...

Multiple family names makes sorting the fonts inconvenient, but putting only the families from the match pattern into a hash table enables an algorithm of p + n complexity.  Because p is only of moderate size and so log p is small, a sort of (only) the match pattern families may well function at least as efficiently as hash lookups.

Comment 3 GitLab Migration User 2018-08-20 21:45:25 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/fontconfig/fontconfig/issues/34.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.