Created attachment 134807 [details]
The broken cache file.
I faced an issue caused by broken cache files for fontconfig.
I don't know how fontconfig fails to generate proper caches.
But, somehow, fontconfig generated the broken cache which I attached in this bug report.
As I know, the broken cache file should be detected by FcCache*Valid functions and it should be regenerated.
But, the attached cache file was not caught by the *Valid functions. It caused crash and missing font rendering issues in our platform (Tizen).
The bug raised from fontconfig 2.12.1. I found a patch for FcCacheOffsetsValid() just after releasing 2.12.1.
But, after applying the patch, fontconfig still couldn't detect it properly.
As our investigation, font->num was 0 and font->size was 0 in FcCacheOffsetsValid() function with the issue case.
Is this acceptable scenario? Or can I make it return with FcFalse in that condition?
You can test the attached cache file with fc-cat. :)
(You need to change FcCacheTimeValid() function to always return FcTrue
if you want to test this cache file in random environments.)
Hm, okay. I can reproduce the crash when bypassing the check of FcCacheTimeValid. though this cache is totally bogus. how did you generate it? fc-cache does create the same one once you remove it? then do you have any patches applied to your fontconfig?
Anyway, FcCacheOffsetsValid will need more checking to detect such bogus caches right. will work on it.
Created attachment 134810 [details] [review]
The attached patch will validate this bogus cache though, I don't think this is really a correct fix for this. we should investigate why/how this was created and fix it.
Created attachment 134812 [details]
Normal (Proper) cache file.
I agree with your opinion.
We need to figure out how/why this broken cache was generated.
It should be fixed. And it will be great solution.
But, unfortunately, we couldn't figure out how it was generated.
This broken cache file was found from release binary of our product.
But, it was not created by any processes and fc-cache from engineer binary.
I will attach proper cache file. You can see this file is longer than the broken one.
The broken one looks like a cache writer is dead while it is being written.
We assumed that it could be happened if device is turned off (power off) while fontconfig is writing the cache. But, not proved.
Anyway, I'll ask product engineer to test product after applying your patch.
cache will be written atomically so even if something happened during writing the cache data into the file, it won't be overwritten by the broken one.
I mean what if very first cache writer is dead/stopped during writing cache file?
Anyway, we need to reproduce to create this broken one to verify what is happened. We couldn't reproduce creating the broken one above 2 month. :(
I only have a coredump from the product which had issue.
The cache file and crash callstack were extracted from the coredump.
If I can reproduce it again, I will raise another bug report or write a comment here.
(In reply to Youngbok Shin from comment #5)
> I mean what if very first cache writer is dead/stopped during writing cache
See FcDirCacheWrite. given that this was generated by fontconfig, this breakage happens only when the memory is corrupted.
The memory corruption could be one of possible reason. If the memory corruption is the main reason of this, I suppose that we could reproduce it easily or find other serious issues. But, I couldn’t still find memory corruption issues about this.
I found a write() call in FcDirCacheWrite() is used without calling fsync().
I supposed fsync() is called for better performance. The open-write-close code without fsync() or O_SYNC flag does not make any guarantee that data has been committed to disk. It can cause corrupted cache file when the device is shutdown by power outage. How about your opinion about open-write-close without fsync()?
Hmm, does it guarantee the file size?
Anyway, see the difference between the valid and the invalid one. you can see the difference at the earlier place and interestingly the invalid cache has non-zero value. this would means the situation isn't what you expect.
For your concern, maybe we can't have any solution for that because it isn't what we can control. we should rather do check carefully and simply drop the invalid cache. in that sense, I may want to merge https://bugs.freedesktop.org/attachment.cgi?id=134810 this proposed change.
> Hmm, does it guarantee the file size?
Do you mean fsync() guarantees synchronization without considering its file size?
Then, as I know, yes. It writes a whole buffer to the file descriptor at once. I'll be bad for performance.
I think even if we can see non-zero values at head of invalid cache file, we still need to consider the file corruption by power outage. If a file system couldn't create a whole file or write anything on a file by power outage, we wouldn't call it "corruption".
Here is a Q&A about these scenarios.
If we can't allow bad performance by calling fsync() for writing the big cache files (to prevent very very rare file corruption issues), I think we need to add more validation code. Just like your temporary patch for the issue.
I agree with you. Only we can do for my concern is adding more validation code. Thank you!
Okay, merged into master. let's see.