Bug 106618

Summary: Relocated caches not quite working
Product: fontconfig Reporter: Alexander Larsson <alexl>
Component: libraryAssignee: Akira TAGOH <akira>
Status: RESOLVED FIXED QA Contact: Behdad Esfahbod <freedesktop>
Severity: normal    
Priority: medium CC: akira, alexl, fontconfig-bugs, freedesktop, robert, sbergman
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Add FcCacheAllocate() helper
Rewrite paths in caches earlier
Remove alias table
Rewrite paths in caches earlier
Rewrite paths in caches earlier
Remove alias table

Description Alexander Larsson 2018-05-22 13:47:44 UTC
Now that F28 and flatpak are starting to ship, we're finding issues with them. For example, the libreoffice flatpak can't find any fonts.

The reason for this is that libreoffice gets FC_FILE from the patterns, and this gets the host-side path (/usr/share/fonts/...) instead of the rewritten host-side path (/run/host/fonts/).

Its a bit weird though, because fc-list prints the correct path. In fact, if you run:
 FC_DEBUG=8191 fc-list -v 
In the sandbox you can see that the first time (FC_DEBUG) the pattern is printed its in /usr, but the final result is in /run/host/fonts.

I took the LO code and made a minimal example of printing the wrong (/usr) thing:

#include <stdio.h>
#include <string.h>
#include <fontconfig/fontconfig.h>

int
main(int argc, char *argv[])
{
  FcFontSet* fs = FcConfigGetFonts( FcConfigGetCurrent(), FcSetSystem );

  if (fs == NULL)
    return 1;

  for (int i = 0; i < fs->nfont; ++i)
    {
      FcPattern* p = fs->fonts[i];
      FcChar8* file = NULL;
      FcResult eFileRes  = FcPatternGetString(p, FC_FILE, 0, &file);

      FcPatternPrint (p);
      if (eFileRes == FcResultMatch)
        printf ("FC_FILE: %s\n", file);
    }

  return 0;
}

When is the rewriting happening? I think it needs to happen earlier.
Comment 1 Alexander Larsson 2018-05-22 14:05:31 UTC
If anyone wants help reproducing/debugging/testing this in flatpak, I can help out, but I have no idea where to start fixing this.
Comment 2 Akira TAGOH 2018-05-23 03:07:16 UTC
FcFontSet returned by FcConfigGetFonts() is actually the raw cache data which is mmap'ed. as we decided not to modify caches itself, this code won't work as expected. the path in FcPattern is modified when the copies of FcPattern from cache is being created in FcFontMatch and FcFontList. we need to change both fontconfig and LO to get this working. i.e. an API in fontconfig to obtain the path where points to /run/host/fonts, and use it in LO instead of looking up FC_FILE directly. hmm, it was there. but I've removed it because it wasn't used anywhere anymore, but anyway.

I'll work on it.
Comment 3 Stephan Bergmann 2018-05-23 06:58:59 UTC
My naive understanding is that a new "mode" has been implemented in fontconfig that maps between certain internal ("cached") pathnames and their external/client-visible counterparts.  And that the flatpak infrastructure somehow makes sure that a fontconfig instance run in the flatpak sandbox operates in that mode.

Then, I wonder why a new fontconfig API would be needed that apps like LibreOffice would need to adapt to.  Couldn't FcPatternGetString(...,FC_FILE,...) be modified to do any necessary pathname mapping when fontconfig is in that special mode?
Comment 4 Akira TAGOH 2018-05-23 07:17:21 UTC
This works as long as they use APIs which returns something they need to manage its lifecycle. FcPatternGet* just returns a reference in FcPattern and the instance of FcPattern in this case is the original cache data which can't be usually modified. so:

(In reply to Stephan Bergmann from comment #3)
> Then, I wonder why a new fontconfig API would be needed that apps like
> LibreOffice would need to adapt to.  Couldn't
> FcPatternGetString(...,FC_FILE,...) be modified to do any necessary pathname
> mapping when fontconfig is in that special mode?

In that way, FcPatternGetString has to create a copy of string. I don't mind either but LO has to change the code to free it as needed. also this affects all of applications. plus, such use case in the code isn't common way. thus new API sounds better to me to keep compatibility.
Comment 5 Stephan Bergmann 2018-05-23 07:30:27 UTC
But couldn't the lifetime of a fontconfig-internal mapped-pathname string, returned from FcPatternGetString(p,FC_FILE,...), be tied to the FcPattern *p internally in fontconfig?
Comment 6 Akira TAGOH 2018-05-23 07:35:29 UTC
(In reply to Stephan Bergmann from comment #5)
> But couldn't the lifetime of a fontconfig-internal mapped-pathname string,
> returned from FcPatternGetString(p,FC_FILE,...), be tied to the FcPattern *p
> internally in fontconfig?

Yes. therefore gave up to modify it because no advantage of mmap'ing cache files from the POV of memory footprint.
Comment 7 Stephan Bergmann 2018-05-23 07:44:37 UTC
(In reply to Akira TAGOH from comment #6)
> (In reply to Stephan Bergmann from comment #5)
> > But couldn't the lifetime of a fontconfig-internal mapped-pathname string,
> > returned from FcPatternGetString(p,FC_FILE,...), be tied to the FcPattern *p
> > internally in fontconfig?
> 
> Yes. therefore gave up to modify it because no advantage of mmap'ing cache
> files from the POV of memory footprint.

Sorry, I don't understand your response.  (I don't know much about fontconfig, nor about the motivations and details of the changes apparently made to fontconfig for the benefits of flatpak.  What I as a LO developer would like to avoid if possible is a change in the fontconfig API, where LO would have to support both the old and new API for the foreseeable future.  That's why I kept asking.  But sure, if there is no good possibility to avoid that, then so be it.)
Comment 8 Alexander Larsson 2018-05-23 07:47:00 UTC
I think the issue is that the FcPattern object is actually a raw pointer into the mmaped cache, and FcPatternDestroy () falls back on:

void
FcPatternDestroy (FcPattern *p)
{
    ...

    if (FcRefIsConst (&p->ref))
    {
	FcCacheObjectDereference (p);
	return;
    }

So, it never frees anything, it just finds the cache and unrefs that. Nor can we really modify the memory that the FcPattern * points to.

However, as long as applications used to work and work in the "normal" case we're never going to be able to get everything changed to work in flatpak if the code has to be rewritten, so we need a solution here.

I'll have a look and see what options i can figure out.
Comment 9 Alexander Larsson 2018-05-23 14:10:39 UTC
Created attachment 139711 [details] [review]
Add FcCacheAllocate() helper
Comment 10 Alexander Larsson 2018-05-23 14:11:06 UTC
Created attachment 139712 [details] [review]
Rewrite paths in caches earlier
Comment 11 Alexander Larsson 2018-05-23 14:11:20 UTC
Created attachment 139713 [details] [review]
Remove alias table
Comment 12 Alexander Larsson 2018-05-23 14:14:48 UTC
Here is a new patch series that rewrites the paths earlier. Its somewhat wasteful with space, as it makes copies of the FcPattern and FcPatternElt objects in the case where we're using a relocated directory (but not by default), but it still uses the FcValueLists from the cache. The values were 75% of the cache in my short tests.

Technically it would be possible to not duplicate the FcPatternElt, and instead make all the users of it special case FC_FILE, but that seems like a large change that easily can miss some corner case.

I'll apply this to a flatpak runtime build and try libreoffice against it.
Comment 13 Alexander Larsson 2018-05-23 17:35:10 UTC
Created attachment 139720 [details] [review]
Rewrite paths in caches earlier

Ugh, there was a bug in the second patch, replaced...
Comment 14 Alexander Larsson 2018-05-23 17:35:49 UTC
Note that the 2 last patches are in the wrong order now...
Comment 15 Alexander Larsson 2018-05-23 19:35:57 UTC
Ok, i built a version of the freedesktop.org 1.6 runtime with this, and the host fonts now work in the libreoffice flatpak.
I'm going to also build the gnome runtime and see if this also fixes https://github.com/flathub/org.mozilla.Thunderbird/issues/36
Comment 16 Behdad Esfahbod 2018-05-24 03:48:57 UTC
My original idea also was to do the mapping in FcPatternGetString() for FC_FILE, and hook the allocated string to a hashmap hooked to the FcCache for the pattern.  That would solve the lifecycle problem.

But Alex's solution, to premap everything, is simpler, even if slower and less memory-efficient.
Comment 17 Alexander Larsson 2018-05-24 06:10:43 UTC
I looked at just hooking FcPatternGetString, but that didn't seem safe to me. The element values are used in a lot of other ways too, like in FcPatternHash, FcPatternEqual and even in completely other places like FcNameUnparseEscaped. And all these directly access the elements array with no abstraction inbetween.

Instead of replacing the entire elements array we could have a single on for the FC_FILE, and then add an abstraction for looking up element N from the elements array, but that would be a far larger and more intrusive patch, and I wanted something safe that we can get in immediately because people are hitting this regression.
Comment 18 Alexander Larsson 2018-05-24 06:15:43 UTC
So, i rebuilt the gnome sdk with this in it and tried thunderbird, but it crashes with valgrind reporting:

==70== Warning: set address range perms: large range [0x20454b34b000, 0x20458b34b000) (noaccess)
==70== Invalid read of size 4
==70==    at 0xC949AA0: FcPatternReference (fcpat.c:1159)
==70==    by 0x1073D88F: ??? (in /app/lib/thunderbird-52.7.0/libxul.so)
...
==70==  Address 0x258e8d78 is 24 bytes inside a block of size 466 free'd
==70==    at 0x4A0906A: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0xC931E27: FcCacheRemoveUnlocked (fccache.c:648)
==70==    by 0xC931E27: FcDirCacheDisposeUnlocked (fccache.c:680)
==70==    by 0xC931E27: FcCacheObjectDereference (fccache.c:715)
==70==    by 0x1073FE0D: ??? (in /app/lib/thunderbird-52.7.0/libxul.so)
...
==70==  Block was alloc'd at
==70==    at 0x4A07EBD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0xC932489: FcCacheAllocate (fccache.c:730)
==70==    by 0xC936C11: FcConfigAddCache (fccfg.c:397)
...

So, there seems to be some refcounting issue in the patch.
Comment 19 Alexander Larsson 2018-05-24 07:13:15 UTC
Created attachment 139730 [details] [review]
Rewrite paths in caches earlier
Comment 20 Alexander Larsson 2018-05-24 07:13:34 UTC
Created attachment 139731 [details] [review]
Remove alias table
Comment 21 Alexander Larsson 2018-05-24 07:14:37 UTC
Ah, I forgot to special case the FcCacheObjectReference the same way i did the FcCacheObjectDereference. New version has that fix, plus some cleanup and is now in order.
Comment 22 Alexander Larsson 2018-05-24 07:15:27 UTC
And yes, this fixes both libreoffice and thunderbird, so i'm considering pushing a adding a copy of this patch to the fd.o runtime immediately.
Comment 23 Akira TAGOH 2018-05-24 10:56:23 UTC
Thanks for the patch. I'm writing a test case for this now. will spend more time for testing.
Comment 24 Alexander Larsson 2018-05-24 12:43:05 UTC
You don't really have to use flatpak to test this btw, just copy the .uuid and the fonts from /usr/share/fonts somewhere else.
Comment 25 Behdad Esfahbod 2018-05-24 21:08:46 UTC
Comment on attachment 139730 [details] [review]
Rewrite paths in caches earlier

Review of attachment 139730 [details] [review]:
-----------------------------------------------------------------

::: src/fcpat.c
@@ +378,5 @@
> +{
> +  /* We use a value to find the cache, instead of the FcPattern object
> +   * because the pattern itself may be a cache allocation if we rewrote the path,
> +   * so the p may not be in the cached region. */
> +  return FcPatternEltValues(&FcPatternElts (p)[0]);

What if pattern is empty. Doesn't this cause invalid memory access?

Also, there's an assumption that cached patterns have at least two items and FC_FILE is NOT the first one.  Document this?
Comment 26 Alexander Larsson 2018-05-25 06:14:36 UTC
Comment on attachment 139730 [details] [review]
Rewrite paths in caches earlier

Review of attachment 139730 [details] [review]:
-----------------------------------------------------------------

::: src/fcpat.c
@@ +378,5 @@
> +{
> +  /* We use a value to find the cache, instead of the FcPattern object
> +   * because the pattern itself may be a cache allocation if we rewrote the path,
> +   * so the p may not be in the cached region. */
> +  return FcPatternEltValues(&FcPatternElts (p)[0]);

This is only called for cached patterns, which I doubt are empty, and since the elements are in order, and FILE is index 21 after things like family, style and size I think this is unlikely to happen.

Maybe that should be documented and verified in the code though.
Comment 27 Alexander Larsson 2018-05-25 06:17:06 UTC
I'm adding this patch to the flatpak runtime for now.
Comment 29 Akira TAGOH 2018-05-25 06:26:25 UTC
Okay, tested and works fine. merged changes with test cases. thanks.
Comment 30 Alexander Larsson 2018-05-25 07:02:49 UTC
For the record, I still think the real fix is to change the cache format to use dynamic-library style indirection, like the GOT.

I.e. have the FC_FILE not be an byte offset, but an index into a separate table of byte offsets, and store the default table at the end of the cache file, properly page aligned. Then you mmap the cache in two chunks, once for the main part and the second for the offset table + strings. However, when you relocate the cache the other part is not mmaped, but instead recreated dynamically.

Thats a lot more work though, and this patch fixes things as-is.
Comment 31 Alexander Larsson 2018-05-25 07:06:26 UTC
It might even be possible to do in a compatible way by adding a new FC_FILE_INDIRECT object type and doing some magic with that.
Comment 32 Akira TAGOH 2018-05-25 07:18:48 UTC
We could consider implementing that when the cache version is going to be bumped next time perhaps.
Comment 33 Behdad Esfahbod 2018-05-25 22:37:21 UTC
Introducing FC_FILE_INDIRECT would be an API break that would require all users of the library to modify their code, so, not in scope.
Comment 34 Akira TAGOH 2018-05-25 23:36:47 UTC
(In reply to Behdad Esfahbod from comment #33)
> Introducing FC_FILE_INDIRECT would be an API break that would require all
> users of the library to modify their code, so, not in scope.

right. 'that' in my previous comment meant restructuring cache format.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.