Bug 64766

Summary: Make fontconfig scanning faster
Product: fontconfig Reporter: nfxjfg
Component: libraryAssignee: fontconfig-bugs
Status: RESOLVED FIXED QA Contact: Behdad Esfahbod <freedesktop>
Severity: normal    
Priority: medium CC: akira, eduard.braun2, freedesktop, iplaw67, khaledhosny, prahal, timo.teras
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 100096    
Attachments: WIP Patch
Update documentation

Description nfxjfg 2013-05-19 18:34:59 UTC
Scanning new fonts is pretty slow. There are two cases where this is quite noticeable: 1. adding big fonts from memory, and 2. adding/removing fonts to directories with many fonts. The 2nd case happens typically only on OSX, obscure Linux systems, or MS Windows, but the 1st is a problem on all platforms.

Looking at profiler output, most time when scanning a font is spent inside of freetype below FT_Load_Glyph. (Something about running the truetype bytecode interpreter...) Is that really needed for basic fontconfig functionality?

The 2nd case is worse: changing a single file in a directory containing fonts forces the whole directory to be scanned. Can't caching be done on a per-file basis?

For some users this is a deal breaker, and they're asking for hacks to remove fontconfig usage etc. It would be nice if fontconfig could solve this issue by optimizing the scan process and making the caching more clever.
Comment 1 Akira TAGOH 2013-05-20 08:32:33 UTC
(In reply to comment #0)
> Looking at profiler output, most time when scanning a font is spent inside
> of freetype below FT_Load_Glyph. (Something about running the truetype
> bytecode interpreter...) Is that really needed for basic fontconfig
> functionality?

That is required to let fontconfig know what the glyph coverage a font has.

> The 2nd case is worse: changing a single file in a directory containing
> fonts forces the whole directory to be scanned. Can't caching be done on a
> per-file basis?

I don't see what you really mean. in fact fontconfig do cache a font per file.

> For some users this is a deal breaker, and they're asking for hacks to
> remove fontconfig usage etc. It would be nice if fontconfig could solve this
> issue by optimizing the scan process and making the caching more clever.

I don't know how often it happens though, they may see something like annoyance in fontconfig without creating a cache before doing something on their fontconfig-based applications. this is why fontconfig is going to create a cache though.
Comment 2 Behdad Esfahbod 2013-05-20 12:08:42 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > Looking at profiler output, most time when scanning a font is spent inside
> > of freetype below FT_Load_Glyph. (Something about running the truetype
> > bytecode interpreter...) Is that really needed for basic fontconfig
> > functionality?
> 
> That is required to let fontconfig know what the glyph coverage a font has.

We should force hinting off though.

> > The 2nd case is worse: changing a single file in a directory containing
> > fonts forces the whole directory to be scanned. Can't caching be done on a
> > per-file basis?
> 
> I don't see what you really mean. in fact fontconfig do cache a font per
> file.

When a directory changes, we rescan the entire directory to rebuild the cache for that dir.

> > For some users this is a deal breaker, and they're asking for hacks to
> > remove fontconfig usage etc. It would be nice if fontconfig could solve this
> > issue by optimizing the scan process and making the caching more clever.
> 
> I don't know how often it happens though, they may see something like
> annoyance in fontconfig without creating a cache before doing something on
> their fontconfig-based applications. this is why fontconfig is going to
> create a cache though.
Comment 3 Akira TAGOH 2013-05-22 03:44:27 UTC
(In reply to comment #2)
> We should force hinting off though.

We did it already, no?

static FcBool
FcFreeTypeCheckGlyph (FT_Face face, FcChar32 ucs4,
                      FT_UInt glyph, FcBlanks *blanks,
                      FT_Pos *advance,
                      FcBool using_strike)
{
    FT_Int          load_flags = FT_LOAD_IGNORE_GLOBAL_ADVANCE_WIDTH | FT_LOAD_NO_SCALE | FT_LOAD_NO_HINTING;

> When a directory changes, we rescan the entire directory to rebuild the
> cache for that dir.

I have no idea how this can be improved though.
Comment 4 Behdad Esfahbod 2013-05-22 18:31:45 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > We should force hinting off though.
> 
> We did it already, no?
> 
> static FcBool
> FcFreeTypeCheckGlyph (FT_Face face, FcChar32 ucs4,
>                       FT_UInt glyph, FcBlanks *blanks,
>                       FT_Pos *advance,
>                       FcBool using_strike)
> {
>     FT_Int          load_flags = FT_LOAD_IGNORE_GLOBAL_ADVANCE_WIDTH |
> FT_LOAD_NO_SCALE | FT_LOAD_NO_HINTING;

Someone should check freetype to make sure it doesn't load bytecode if it doesn't need it.


> > When a directory changes, we rescan the entire directory to rebuild the
> > cache for that dir.
> 
> I have no idea how this can be improved though.

Me neither :(.
Comment 5 nfxjfg 2013-05-22 20:43:26 UTC
> That is required to let fontconfig know what the glyph coverage a font has.

Somehow freetype seems to spend rather a lot of effort on that. I don't know how this stuff works, but to my naive eyes is looks like there's further optimization possible, though possibly that would require changes in freetype instead of fontconfig.

>> When a directory changes, we rescan the entire directory to rebuild the
>> cache for that dir.

> I have no idea how this can be improved though.

For example, the granularity of the cache could be changed and create a new cache for every 20 files or so. Or it could be made such that the old cache is reused for files that have not changed. No doubt, these changes would be intrusive to the cache code, but given the problems with font scanning times, I think it would be worth it.

Now that you have added a hash for each font, maybe it would be possible to cache the results for memory fonts as well? Hashing actually makes loading a font even slower, but since you're doing it anyway, maybe it could be used for that purpose as well.
Comment 6 Behdad Esfahbod 2013-05-23 01:14:15 UTC
(In reply to comment #5)
> > That is required to let fontconfig know what the glyph coverage a font has.
> 
> Somehow freetype seems to spend rather a lot of effort on that. I don't know
> how this stuff works, but to my naive eyes is looks like there's further
> optimization possible, though possibly that would require changes in
> freetype instead of fontconfig.

I may take a look.  We load all glyphs just to make sure the font actually has it, as opposed to having bogus or empty glyphs.  Maybe that level of paranoia is not quite needed these days...


> >> When a directory changes, we rescan the entire directory to rebuild the
> >> cache for that dir.
> 
> > I have no idea how this can be improved though.
> 
> For example, the granularity of the cache could be changed and create a new
> cache for every 20 files or so.

Each cache file would mean one more mmap() in every fontconfig-using client.  That's already bad enough...

> Or it could be made such that the old cache
> is reused for files that have not changed. No doubt, these changes would be
> intrusive to the cache code, but given the problems with font scanning
> times, I think it would be worth it.
>
> Now that you have added a hash for each font, maybe it would be possible to
> cache the results for memory fonts as well? Hashing actually makes loading a
> font even slower, but since you're doing it anyway, maybe it could be used
> for that purpose as well.

Yeah, maybe we can have a cache of individual fonts based on hashes.  Then we can rebuild directory caches based on those individual fonts much faster.

For memory fonts, is it really a big deal?  Why do people query those anyway?
Comment 7 nfxjfg 2013-11-15 19:33:05 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Somehow freetype seems to spend rather a lot of effort on that. I don't know
> > how this stuff works, but to my naive eyes is looks like there's further
> > optimization possible, though possibly that would require changes in
> > freetype instead of fontconfig.
> 
> I may take a look.  We load all glyphs just to make sure the font actually
> has it, as opposed to having bogus or empty glyphs.  Maybe that level of
> paranoia is not quite needed these days...

Can't this be done when attempting to use the glyph?

> > For example, the granularity of the cache could be changed and create a new
> > cache for every 20 files or so.
> 
> Each cache file would mean one more mmap() in every fontconfig-using client.
> That's already bad enough...

Sure, you'll perhaps need more mmap calls. But that's much better than blocking an application on font scanning. You could also create a light weight central index (which can be easily reconstructed when a "sub"-cache changes), and map the cache file only when it's needed. Well, I don't know what exactly the font cache stores, but maybe this can serve as inspiration.

> > Or it could be made such that the old cache
> > is reused for files that have not changed. No doubt, these changes would be
> > intrusive to the cache code, but given the problems with font scanning
> > times, I think it would be worth it.
> >
> > Now that you have added a hash for each font, maybe it would be possible to
> > cache the results for memory fonts as well? Hashing actually makes loading a
> > font even slower, but since you're doing it anyway, maybe it could be used
> > for that purpose as well.
> 
> Yeah, maybe we can have a cache of individual fonts based on hashes.  Then
> we can rebuild directory caches based on those individual fonts much faster.
> 
> For memory fonts, is it really a big deal?  Why do people query those anyway?

Yes, it's a big deal. There are many use cases for memory fonts, such as embedding fonts in Matroska (mkv) files, web-fonts, fonts embedded in documents, special application-specific fonts.

By the way, I think adding font hashes just made font scanning slower for an obscure feature that probably will have only one user at best...
Comment 8 Behdad Esfahbod 2013-11-16 03:04:29 UTC
(In reply to comment #7)
>
> > For memory fonts, is it really a big deal?  Why do people query those anyway?
> 
> Yes, it's a big deal. There are many use cases for memory fonts, such as
> embedding fonts in Matroska (mkv) files, web-fonts, fonts embedded in
> documents, special application-specific fonts.

Most those usecases don't need to pass the font to fontconfig at all.  They already have the font, they should just use it.
Comment 9 nfxjfg 2013-11-22 12:45:42 UTC
(In reply to comment #8)
> (In reply to comment #7)
> >
> > > For memory fonts, is it really a big deal?  Why do people query those anyway?
> > 
> > Yes, it's a big deal. There are many use cases for memory fonts, such as
> > embedding fonts in Matroska (mkv) files, web-fonts, fonts embedded in
> > documents, special application-specific fonts.
> 
> Most those usecases don't need to pass the font to fontconfig at all.  They
> already have the font, they should just use it.

That's not really true. They might have to do this because the text rendering framework they're using uses fontconfig, because they want substitution of missing characters, or because they get a set of fonts and still need to do font selection.
Comment 10 Behdad Esfahbod 2013-11-22 15:59:19 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > >
> > > > For memory fonts, is it really a big deal?  Why do people query those anyway?
> > > 
> > > Yes, it's a big deal. There are many use cases for memory fonts, such as
> > > embedding fonts in Matroska (mkv) files, web-fonts, fonts embedded in
> > > documents, special application-specific fonts.
> > 
> > Most those usecases don't need to pass the font to fontconfig at all.  They
> > already have the font, they should just use it.
> 
> That's not really true. They might have to do this because the text
> rendering framework they're using uses fontconfig, because they want
> substitution of missing characters, or because they get a set of fonts and
> still need to do font selection.

Then I like to hear about such usecases.  That's what I'm saying.
Comment 11 nfxjfg 2013-11-22 16:16:20 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > (In reply to comment #7)
> > > >
> > > > > For memory fonts, is it really a big deal?  Why do people query those anyway?
> > > > 
> > > > Yes, it's a big deal. There are many use cases for memory fonts, such as
> > > > embedding fonts in Matroska (mkv) files, web-fonts, fonts embedded in
> > > > documents, special application-specific fonts.
> > > 
> > > Most those usecases don't need to pass the font to fontconfig at all.  They
> > > already have the font, they should just use it.
> > 
> > That's not really true. They might have to do this because the text
> > rendering framework they're using uses fontconfig, because they want
> > substitution of missing characters, or because they get a set of fonts and
> > still need to do font selection.
> 
> Then I like to hear about such usecases.  That's what I'm saying.

I already listed them above...
Comment 12 Akira TAGOH 2013-12-04 09:54:51 UTC
(In reply to comment #11)
> I already listed them above...

So are you saying fontconfig could provide much better fonts than memory fonts? isn't it supposed to be better or an expected font for document author at least?
Comment 13 nfxjfg 2013-12-05 23:43:58 UTC
(In reply to comment #12)
> So are you saying fontconfig could provide much better fonts than memory
> fonts? isn't it supposed to be better or an expected font for document
> author at least?

Not sure what you mean. They're using fontconfig to select a font from multiple fonts embedded in a document, with fallback to system fonts.
Comment 14 Akira TAGOH 2013-12-06 03:31:01 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > So are you saying fontconfig could provide much better fonts than memory
> > fonts? isn't it supposed to be better or an expected font for document
> > author at least?
> 
> Not sure what you mean. They're using fontconfig to select a font from
> multiple fonts embedded in a document, with fallback to system fonts.

Well, I have never seen any examples that embedded extra fonts and allowing to select a font from those. for what? am I getting confused?

AFAIK they embed a font to provide same representation on different platforms. I don't see any needs if one wants to use the system fonts instead of the embedded fonts nor embedding extra fonts which may not be used in some cases.
Comment 15 Martin Herkt 2013-12-06 10:44:03 UTC
(In reply to comment #14)
> Well, I have never seen any examples that embedded extra fonts and allowing
> to select a font from those. for what? am I getting confused?
> 
> AFAIK they embed a font to provide same representation on different
> platforms. I don't see any needs if one wants to use the system fonts
> instead of the embedded fonts nor embedding extra fonts which may not be
> used in some cases.

Okay, let me try to explain this to you with a practical example.
There is a widely used subtitle format which only stores the names of the desired font family in its styles. Fonts are typically stored along with the subtitles, video and audio tracks in a container format, but not always. At playback time, you don't know which (if any!) of these files belong to the required font families, and you can't be sure they cover all the required glyphs, and you need to find the right one in a way similar to Windows because that's what most of those scripts expect. That is why one would query memory fonts with fontconfig.
Comment 16 Akira TAGOH 2013-12-06 11:15:53 UTC
(In reply to comment #15)
> Okay, let me try to explain this to you with a practical example.
> There is a widely used subtitle format which only stores the names of the
> desired font family in its styles. Fonts are typically stored along with the
> subtitles, video and audio tracks in a container format, but not always. At
> playback time, you don't know which (if any!) of these files belong to the
> required font families, and you can't be sure they cover all the required
> glyphs, and you need to find the right one in a way similar to Windows
> because that's what most of those scripts expect. That is why one would
> query memory fonts with fontconfig.

I don't think it is memory fonts we are talking about. as you already confessed the above, it just contains a name of font family, not embedding a font itself. it would just looks similar to most documents in the rich-format do like in LibreOffice. it isn't memory fonts.
Comment 17 Martin Herkt 2013-12-06 14:17:23 UTC
(In reply to comment #16)
>it just contains a name of font family, not embedding a font itself

It seems some clarification is necessary. Font files are attached in the media container (in addition, the subtitle file itself may actually contain embedded fonts as UUencoded files). We load those in memory.
Comment 18 Martin Herkt 2013-12-06 15:28:26 UTC
Why did this even turn into a debate about whether there are valid use cases for in-memory font queries? Regardless of that, the matter of fact is that fontconfig's scanning is slow enough to be a major nuisance to users even if there are no in-memory fonts - enough to completely break usability on non-Linux platforms (let me take this moment to remind you of http://lists.freedesktop.org/archives/fontconfig/2009-May/003156.html since this still happens very frequently and leads to people claiming that software crashes on their systems when it's really just fontconfig freezing the thing).

At this point, what fontconfig is used for is completely irrelevant, since the problem always manifests itself regardless of use case. This is serious enough that it must be solved by fontconfig/freetype, and the end user or application that uses fontconfig should not be expected to work around this deficiency.
Comment 19 Behdad Esfahbod 2014-05-08 18:31:42 UTC
I suggest we remove the "load every glyph to check the outline is not empty" part of fontconfig.

There are very legitimate uses of such empty glyphs in fonts that have GSUB.

Akira, do you agree?  This should give us a huge speedup.  I can work on that.
Comment 20 Akira TAGOH 2014-05-09 02:51:31 UTC
(In reply to comment #19)
> I suggest we remove the "load every glyph to check the outline is not empty"
> part of fontconfig.
> 
> There are very legitimate uses of such empty glyphs in fonts that have GSUB.
> 
> Akira, do you agree?  This should give us a huge speedup.  I can work on
> that.

If that can be done, that really sounds nice though, how does fontconfig estimate the charset coverage then?
Comment 21 Behdad Esfahbod 2014-05-09 16:12:13 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > I suggest we remove the "load every glyph to check the outline is not empty"
> > part of fontconfig.
> > 
> > There are very legitimate uses of such empty glyphs in fonts that have GSUB.
> > 
> > Akira, do you agree?  This should give us a huge speedup.  I can work on
> > that.
> 
> If that can be done, that really sounds nice though, how does fontconfig
> estimate the charset coverage then?

Just accept whatever character is mapped in cmap.  The current code does that but then *also* checks the glyph outline of those mapped glyphs, and rejects any that has an empty outline, unless it's listed in the <blanks> tag.

I'll ask Keith to comment here, but I'm guessing that 15 years ago bogus fonts were more common...

Or, we can easily change the code and test lots of fonts to see what kind of difference this makes at all.
Comment 22 Keith Packard 2014-05-09 17:08:37 UTC
I added the outline scanning code because we found many font files that were generated by subsetting a large font with extensive unicode coverage. The subsetting process would leave the code point map in place and only elide the actual glyph data.  So, the only way we could discover the true coverage of the font was to look at the actual glyphs.

If the failure mode was to simply select the wrong font for some code points, that might be fine. However, the failure mode is instead to display blank spaces for the missing glyphs.

These were not hand-hacked fonts, but rather commercial fonts from semi-reputable font foundaries. The commit when this was added appears to be lost in time; having occurred in libXft, before fontconfig was split out back in 2002.

I'd suggest figuring out if FreeType could return this data faster than by parsing the whole outline; presumably there's a table present which indicates where the outline would be and how long it is?

As for speeding up re-scanning directories, the obvious solution would be to re-use the old cached data for fonts which haven't changed and copy that to the new cache file.
Comment 23 Behdad Esfahbod 2014-05-09 19:22:02 UTC
Ok, if that's still a valid reason, what we can do is parse loca/glyf tables ourselves.  For CFF, still rely on FreeType.  That should be fast enough for all purposes.

In fact, it's enough to parse loca only and walk over it.  I'll give it a try some time.
Comment 24 Behdad Esfahbod 2014-05-16 21:26:56 UTC
Ok, I looked into this a bit.  First, I made some cleanups to the FC_HASH code.  It's as streamlined as it can be right now.  That said:

  * We spend more than half of the time in computing the hash,

  * Of the remaining time, we spend half verifying that each glyph has an outline,

The rest I suppose we spend computing the languages covered by the font...

So.  I think we might want to:

  * Reconsider the FC_HASH.  I could see a use for FC_HASH for speeding up cache rebuild, but given that querying the rest of the font is faster than computing the hash, it makes no sense.  So we're left with a FC_HASH that some clients might find useful, but so far we don't have anyone actually using it.  So we might as well remove it.

  * Alternatively, keep FC_HASH, but not compute it if file=NULL is passed to FcFreeTypeQueryFace.  Previously, file=NULL was NOT supported and one had to pass file="" or something like that.  I just made file="NULL happily not add FC_FILE.  I think we can use the same as a signal to skip FC_HASH calculation.  That helps with the "make querying memory fonts faster" at least, but not quite with the cache generation speed,

  * If font has loca/glyf tables, access those directly instead of loading glyphs from FreeType,

  * See where the rest of the time is going and speed that up.

What do people think about FC_HASH?  Should we just remove it?
Comment 25 nfxjfg 2014-05-16 21:32:39 UTC
> What do people think about FC_HASH?  Should we just remove it?

Well, AFAIK you just introduced it recently (breaking everything for everyone for a while), but yes, you should remove it again. Who the hell uses it, and why do they have to use fontconfig for that? IMHO that was a fringe-case feature request that made everything worse for 99% of all people, while probably making a single person happy.
Comment 26 Behdad Esfahbod 2014-05-16 21:42:51 UTC
Err.   Disabling the 2s wait in fc-cache, new numbers look much worse.  Caching a collection of > 1000 fonts in one single directory:

  - Currently: 8.5s
  - Without hash: 3s
  - Without FcFreeTypeCheckGlyph: 0.25s

I suggest we do this:

  - Deprecate and NOT compute FC_HASH,

  - Use loca table to reject glyphs with no outline.  For non-TrueType (CFF, bdf, pcf, etc) just accept whatever the font claims it covers,

  - Remove the 2s wait if file system is not FAT.  This one is not really important, but nice to have.

Doing these all makes querying memory fonts so fast that I won't hesitate recommending it to people anymore.

Unless someone has objections, I'll go ahead and push these out.
Comment 27 nfxjfg 2014-05-16 21:49:43 UTC
May I ask why there's a 2 seconds wait?
Comment 28 Behdad Esfahbod 2014-05-16 21:54:31 UTC
(In reply to comment #27)
> May I ask why there's a 2 seconds wait?

From fc-cache.c:

    /* 
     * Now we need to sleep a second  (or two, to be extra sure), to make
     * sure that timestamps for changes after this run of fc-cache are later
     * then any timestamps we wrote.  We don't use gettimeofday() because
     * sleep(3) can't be interrupted by a signal here -- this isn't in the
     * library, and there aren't any signals flying around here.
     */
    /* the resolution of mtime on FAT is 2 seconds */
    if (changed)
        sleep (2);
Comment 29 L A Walsh 2014-05-19 15:16:57 UTC
(In reply to comment #3)
> I have no idea how this can be improved though.
----
Don't do it.

Only scan the files that are newer than the directory change time.
If no files are newer, no update.

If 1 file is newer (out of 3000), You copy the original file to a new 
file using a read/write size of 16MB-256MB.

Then add the new file at the point it needs to be added (and delete
it's old cached copy -- all in 1 write to the new file, then copy the rest
of the file in large chunks.

Reconstructing the font cache on my system takes 20-25 minutes if I 
have 32-bit and 64-bit libraries.  With just 64-bit libs, it's a bit
over 10, with 32 being closer to 15.  Cygwin (which has access to
all of window's fonts) takes 15-20 on 32-bit, haven't tried it since I
switched to 64-bit.

Note -- this wait usually happens at inconvenient moments, like when I
go to start 'gvim' to do any work.  Vs. having it be done in background w/out
me waiting on it.

No reason why I should have to wait for the fontcache to be rebuilt.
Look at the design of 'locate'.  It can take a half an hour to run on my system (multiple TB disks).  The old database remains an is usable by users while
the new one is built.  Only when the new one is finished and ready for use
is it moved in to replace the old version.  

If you are scanning the dir and produce a copy so you are just changing
cached-content for new files, you'll be building your new copy in a
new location -- so any processes that start before it is built, will use
the old cache.

I'd like to point out, that font-data is architecture independent.  Turning
it into architecture *dependent* data that needs to be reconstructed for
32 and 64 bit on the same machine is a horrible step backwards in compatibility.

Computing any type of digest will slow down things tremendously.  As an example, I refer to my daily backups.  Using ***any type*** of compression (lowest settings on gzip were last I tested, years ago, but lzop isn't
that much of an improvement), slowed backups down to between 5-10MB/s.
Without any compression -- ~200-250MB/s.  I gave up any compression and use  
5 rotating buffers of 1GB each. (notes in my backup script showed
1GB to be best (# 512m=> 220MB/s, 1g=>250mb/s, 2g => ~230MB/s).

Unless I'm sadly very mistaken, I'm pretty sure there are numerous ways
to speed up fontconfig.

Note -- some vendors are starting to install 1 font/directory.

That's only going to make the problem worse -- having to read through 3000
cache files in 3000 different dirs??   ARG!!!
Comment 30 Behdad Esfahbod 2014-07-04 17:57:12 UTC
(In reply to comment #22)
> I added the outline scanning code because we found many font files that were
> generated by subsetting a large font with extensive unicode coverage. The
> subsetting process would leave the code point map in place and only elide
> the actual glyph data.  So, the only way we could discover the true coverage
> of the font was to look at the actual glyphs.
> 
> If the failure mode was to simply select the wrong font for some code
> points, that might be fine. However, the failure mode is instead to display
> blank spaces for the missing glyphs.
> 
> These were not hand-hacked fonts, but rather commercial fonts from
> semi-reputable font foundaries. The commit when this was added appears to be
> lost in time; having occurred in libXft, before fontconfig was split out
> back in 2002.
> 
> I'd suggest figuring out if FreeType could return this data faster than by
> parsing the whole outline; presumably there's a table present which
> indicates where the outline would be and how long it is?

Ok, I did a patch to not check outline for CFF fonts and only check the glyph offsets in 'loca' table for TrueType fonts.  This has significant speedup benefit mainly because of I/O.  It ends up reading less than 5 / 10 percent of the font, so it was a huge boost.

This approach correctly takes care of simple TrueType glyphs with no outline.  But I found many fonts have composite glyphs that reference a glyph that has no outlines.  Currently I can't detect that.  Another group was CFF chars mapping to cid1 which has no outline.  I might be able to special-case that, but am not very confident about it.

As such, I don't think this approach actually works.  But even with warm caches it's still a huge speedup.  For some 22k fonts:

BEFORE
real	1m18.376s
user	1m12.497s
sys	0m3.346s
AFTER
real	0m18.118s
user	0m12.997s
sys	0m2.654s

That suggests that there's still possible value in speeding up the check either withing FT_Load_Glyph, OR, implementing 'glyf' table access in fontconfig, ignore bitmap fonts, and call into FreeType for CFF fonts.  I'll give that a try.
Comment 31 Behdad Esfahbod 2014-07-04 17:58:07 UTC
Created attachment 102281 [details] [review]
WIP Patch
Comment 32 Behdad Esfahbod 2014-07-04 18:16:16 UTC
(In reply to comment #30)

> That suggests that there's still possible value in speeding up the check
> either withing FT_Load_Glyph, OR, implementing 'glyf' table access in
> fontconfig, ignore bitmap fonts, and call into FreeType for CFF fonts.  I'll
> give that a try.

Though, maybe not immediately...  Removing FC_HASH was a huge speed improvement so we're good for a while...
Comment 33 Timo Teräs 2014-11-24 11:17:31 UTC
(In reply to L A Walsh from comment #29)
> (In reply to comment #3)
> > I have no idea how this can be improved though.
> ----
> Don't do it.
> 
> Only scan the files that are newer than the directory change time.
> If no files are newer, no update.

Please don't do this. When installing fonts via package manager (dpkg, apt, yum, whatever) they will install new font, and set the timestamp to packaging time, but the directory mtime is updated.

If possible use individual file's mtime and size, and compare against values stored in cache. If either differs, the font file is dirty and needs recalculation.
Comment 34 nfxjfg 2014-11-24 11:40:50 UTC
To be fair, font scanning should just be fast. Even if the wait time is only seconds on the very first scan, the initial impression of your software might be ruined, and the user will look for something else. Taking dozens of minutes is a damn joke.

Fontconfig is really a serious liability. In libass, we've been trying to get rid of it on win32 and osx for a while now.
Comment 35 Behdad Esfahbod 2014-11-24 18:48:00 UTC
(In reply to nfxjfg from comment #34)
> To be fair, font scanning should just be fast. Even if the wait time is only
> seconds on the very first scan, the initial impression of your software
> might be ruined, and the user will look for something else. Taking dozens of
> minutes is a damn joke.
> 
> Fontconfig is really a serious liability. In libass, we've been trying to
> get rid of it on win32 and osx for a while now.

It wasn't designed to be used on win32 and osx.
Comment 36 nfxjfg 2014-11-24 18:58:59 UTC
(In reply to Behdad Esfahbod from comment #35)
> It wasn't designed to be used on win32 and osx.

I understand that, but it creates big problems for portable software.

In fact, it seems serious portable software should invent its own font abstraction layer, instead of using fontconfig&freetype. It would be nice if that were not the case.

Also, the fontconfig slowness actually does cause problems on Linux too. libass is slow at using external fonts embedded in subs (which happen frequently with the kind of files libass is meant to work with), and I bet there are other use-cases where this can become a problem, such as webfonts.
Comment 37 Behdad Esfahbod 2014-11-24 21:50:01 UTC
(In reply to nfxjfg from comment #36)
> (In reply to Behdad Esfahbod from comment #35)
> > It wasn't designed to be used on win32 and osx.
> 
> I understand that, but it creates big problems for portable software.
> 
> In fact, it seems serious portable software should invent its own font
> abstraction layer, instead of using fontconfig&freetype. It would be nice if
> that were not the case.

You can always use Pango.


> Also, the fontconfig slowness actually does cause problems on Linux too.
> libass is slow at using external fonts embedded in subs (which happen
> frequently with the kind of files libass is meant to work with), and I bet
> there are other use-cases where this can become a problem, such as webfonts.

Not going to repeat what I said multiple times before.
Comment 38 Alex Thurgood 2015-01-03 17:38:04 UTC
Adding self to CC if not already on
Comment 39 Behdad Esfahbod 2015-05-12 19:23:51 UTC
For reference, I discussed using custom fonts with pango+fontconfig in detail here:
http://mces.blogspot.ca/2015/05/how-to-use-custom-application-fonts.html
Comment 40 Timo Teräs 2015-10-12 10:24:01 UTC
(In reply to Behdad Esfahbod from comment #26)
> I suggest we do this:
> 
>   - Deprecate and NOT compute FC_HASH,

This was pushed earlier, nice!

>   - Use loca table to reject glyphs with no outline.  For non-TrueType (CFF,
> bdf, pcf, etc) just accept whatever the font claims it covers,

Was this done already or not?

>   - Remove the 2s wait if file system is not FAT.  This one is not really
> important, but nice to have.

This is not done yet. And I'd really like this to go in before the next release. Would it be possible to push this out? I see recent commit now even taking into account the nanosecond field of mtime... this two second delay is pretty much harmful. If FAT is of concern still, it should have additional code to 'touch' the file until the stat() returned time is changed, or something similar.

> Doing these all makes querying memory fonts so fast that I won't hesitate
> recommending it to people anymore.
> 
> Unless someone has objections, I'll go ahead and push these out.

Please push them out :)
Comment 41 nfxjfg 2015-10-12 11:00:15 UTC
libass finally "solved" this problem by implementing native backends on OSX/win. Even on Linux, we've got a big performance win by not relying on fontconfig for memory fonts.
Comment 42 Akira TAGOH 2015-10-15 07:14:02 UTC
(In reply to Timo Teräs from comment #40)
> >   - Remove the 2s wait if file system is not FAT.  This one is not really
> > important, but nice to have.
> 
> This is not done yet. And I'd really like this to go in before the next
> release. Would it be possible to push this out? I see recent commit now even
> taking into account the nanosecond field of mtime... this two second delay
> is pretty much harmful. If FAT is of concern still, it should have
> additional code to 'touch' the file until the stat() returned time is
> changed, or something similar.

Hmm, it is easier to have a wait in case the targeted FS is FAT. current code has enough functionality to do that. but the problem is that wait itself seems not meaningless so that fc-cache isn't a singleton process and easy to conflict with the multiple processes on current code base.
Though it might be better than do nothing.
If there are any better idea, the suggestions are welcome.
I may go ahead with checking fs type for a workaround to see if fc-cache should wait for 2s or not otherwise.
Comment 43 Behdad Esfahbod 2015-10-21 13:37:14 UTC
(In reply to Timo Teräs from comment #40)
> (In reply to Behdad Esfahbod from comment #26)

> >   - Use loca table to reject glyphs with no outline.  For non-TrueType (CFF,
> > bdf, pcf, etc) just accept whatever the font claims it covers,
> 
> Was this done already or not?

No, it proved to be problematic.
Comment 44 Mingye Wang (Arthur2e5) 2016-03-06 06:20:10 UTC
Regarding FC_HASH for OpenType and TrueType fonts, will checkSumAdjustment help?

Well, to me all I want from such checksuming can be covered by PostScript Name and the Version Number of the Font (at least for OpenType and TrueType)...
Comment 45 Behdad Esfahbod 2016-03-06 18:39:07 UTC
(In reply to Arthur Wang from comment #44)
> Regarding FC_HASH for OpenType and TrueType fonts, will checkSumAdjustment
> help?
> 
> Well, to me all I want from such checksuming can be covered by PostScript
> Name and the Version Number of the Font (at least for OpenType and
> TrueType)...

None of those are guaranteed to be unique or set correctly to begin with...
Comment 46 Dan Kegel 2016-09-08 03:48:07 UTC
[Fun fact: fc-cache took over two minutes to run just now;
installing a fresh copy of ubuntu 16.04 on a netbook (amd e-350)
and then doing 'apt update; apt dist-upgrade' triggered it.
If this box had an SSD, font scanning might even have been a noticeable bottleneck.]
Comment 47 Behdad Esfahbod 2017-07-18 23:44:57 UTC
Motivated by bug 101820, I like to revive this work.

There are two options:

  - Do not bother detecting glyphs with no outline.  This will cause some broken fonts to result in inferior rendering.  I'm not sure how bad that is in practice.

  - Try to detect glyphs with no outline, but implement it ourselves instead of loading using FreeType.  Let's discuss CFF/TrueType separately.  Read https://bugs.freedesktop.org/show_bug.cgi?id=64766#c30 first to refresh.

    * TrueType: tryng to detect empty composites would mean loading glyf table and reading every glyph.  This will disable any I/O gains, but will still provide a speedup.  Alternatively, we can do what my previous patch did: detect and ignore empty single glyphs, but don't worry about empty composite glyphs.  That way we just read loca table and not glyf,

  * CFF: Implementing full loading is not feasible.  However, maybe we can detect glyphs mapping to cid1 and only check that glyph.  For everything else assume it's non-empty.  Another approach would be to load CharString INDEX and use that to detect empty glyphs (1-byte charstring).  That's similar to using loca table only.  Not sure if this detects any cases in the wild.

Then again, as Khaled has pointed out to me, this empty-glyph check is a liability for fonts that want to decompose glyphs into components, as the nominal glyph cannot be empty or fontconfig won't detect language coverage correctly.

So, maybe it is time to let this test go...  I like to hear what others think before going ahead and implementing this.  If we completely remove the check, that should make fontconfig scanning, like, over 10 times faster easily...
Comment 48 Akira TAGOH 2017-07-30 15:01:23 UTC
Just out of curiousity, modifying FreeType itself to load things on-demand but everything, isn't feasible and realistic? if it can be improved on the existing one instead of implementing similar things again, that would be nice though.
Comment 49 Behdad Esfahbod 2017-08-01 02:55:09 UTC
I also want to note that decoding color fonts is bringing down fontconfig to its knees!

Somehow older versions of fontconfig (2.11.0; with a FreeType that *does* understand color fonts) also like to rescan my color fonts in each run, don't know why. :(  This means that any fontconfig-using app is currently taking five seconds to start for me..
Comment 50 Behdad Esfahbod 2017-08-01 03:00:49 UTC
(In reply to Akira TAGOH from comment #48)
> Just out of curiousity, modifying FreeType itself to load things on-demand
> but everything, isn't feasible and realistic? if it can be improved on the
> existing one instead of implementing similar things again, that would be
> nice though.

Things *are* loaded on demand in FreeType.  But we currently load every glyph.  That's what we are trying to avoid.
Comment 51 Behdad Esfahbod 2017-08-01 03:10:06 UTC
Akira, do you by any chance remember under what circumstances fontconfig 2.11.0 would create a cache file that it would then find invalid?  I'm hitting that and like to track down which of the files in my ~/.fonts is causing it.  Thanks.
Comment 52 Behdad Esfahbod 2017-08-01 03:23:49 UTC
Humm.  For me, SFNSDisplay.ttf was causing it.  I have had copied it into ~/.fonts to debug https://bugs.freedesktop.org/show_bug.cgi?id=101159

But what we thought was causing that bug did not exist in 2.11.0.  Baffled.  I also wonder if some of the "update storm that freezes desktop" issue is similar to this: creating invalid cache file...
Comment 53 Akira TAGOH 2017-08-01 05:27:05 UTC
Behdad, only things I remember around that version was, there was some tries to address the race on updating caches. at the end, in the latest (2.12.4 at this moment), most of them should be gone by checking a nano sec in mtime. for the lag, causing the lag with that font should be fixed in 2.12.3 as the original reporter agreed with it.
Comment 54 Behdad Esfahbod 2017-08-01 05:30:43 UTC
Thanks.  I'm specifically talking about SFNSDisplay.ttf causing invalid cache creation with 2.11.0 (and up to last'ish release).  Would be nice to track down what was causing it.  I believe that might be what others are hitting as well.  I don't think it's just the race...
Comment 55 Akira TAGOH 2017-08-01 05:47:38 UTC
Oh, sure. then the first one should be different issue. just ignore it. anyway, would be nice to update the base...
Comment 56 Behdad Esfahbod 2017-08-04 18:07:40 UTC
I pushed a proposed branch here:

https://github.com/behdad/fontconfig/commits/faster

Should be 10x faster... It throws away blanks and blank check completely, plus some other old cruft.

Documentation has not been updated re deprecated symbols.
Comment 57 Akira TAGOH 2017-08-08 10:06:52 UTC
Thanks. simply tried to have a look and I can see some different result in charset when I compared caches of the before-and-after.

fonts in texlive-lm package in Fedora, wqy-zenhei, sil-abyssinica, DejaVu fonts, and many.
Comment 58 Jerry Casiano 2017-08-09 00:27:18 UTC
Created attachment 133399 [details] [review]
Update documentation
Comment 59 Behdad Esfahbod 2017-08-09 22:32:21 UTC
(In reply to Akira TAGOH from comment #57)
> Thanks. simply tried to have a look and I can see some different result in
> charset when I compared caches of the before-and-after.
> 
> fonts in texlive-lm package in Fedora, wqy-zenhei, sil-abyssinica, DejaVu
> fonts, and many.

It does indeed change, if fonts have broken chars.  Care to look into a few of them?

I think we can provide the old functionality in a validity checker tool, to be used by font packagers in distros for example.
Comment 60 Behdad Esfahbod 2017-08-11 02:29:12 UTC
We are discussing this here:
https://lists.freedesktop.org/archives/fontconfig/2017-August/005986.html
Comment 61 Behdad Esfahbod 2017-09-12 21:10:45 UTC
I committed this.  Closing.  Please reopen if something breaks.
Comment 62 Akira TAGOH 2017-10-16 11:11:55 UTC
*** Bug 103291 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.