Created attachment 136640 [details]
blank United Kingdom tax return form
Forwarding from https://bugzilla.gnome.org/792393
The UK tax return form shows jumbled text when opened in Evince 3.18.2. It displays normally on Mac OS X with the default viewer, and I assume it does on Windows.
The jumbling starts on page 3, with odd;y-spaced commas replacing most, but not all, text. The page footer is replaced with random letters.
On page 6, the footer is back to normal, but all the body text is replaced with random letters and numbers.
The file is a fillable form. The problem was first noted with a copy containing my personal data. The attachment is a blank copy that also shows the problem.
I've confirmed this with both pdftoppm and pdftocairo from poppler master. Running "pdftocairo -png 'blank return.pdf' bad" produces a page 3 with much of the text replaced with commas. Oddly, rendering just page 3 with "pdftocairo -png -f 3 -l 3 'blank return.pdf' good" works correctly.
Created attachment 136641 [details]
result of "pdftocairo -png 'blank return.pdf' bad"
This image shows an incorrect rendering of page 3 from running "pdftocairo -png 'blank return.pdf' bad".
Created attachment 136642 [details]
result of "pdftocairo -png -f 3 -l 3 'blank return.pdf' good"
This image shows the correct rendering of page 3 from running "pdftocairo -png -f 3 -l 3 'blank return.pdf' good".
CairoFontEngine.cc is caching fonts based on the indirect reference number and generation under the assumption that they will be unique, but a font on page 2 and 3 are aliasing so it uses the wrong font. Splash is probably doing something similar.
Two different fonts have the same number and generation because these fonts don't really have an indirect reference due to the way the way the PDF defines the resources and font dictionaries:
7 0 obj
/Resources 8 0 R
... other page entries ...
8 0 obj
... font dictionary entries ...
The GfxFontDict constructor has code to generate a fake reference based on the /Font dictionary's number but that doesn't work well in this PDF because the Font dictionary doesn't have an indirect reference either.
This appears to be fixed in XPDF 4.00 because the GfxFontDict constructor now includes code to generate the fake reference based on a hash instead.
*** Bug 91004 has been marked as a duplicate of this bug. ***
Created attachment 136832 [details] [review]
GfxFontDict: merge reference generation from xpdf 4.00
The GfxFontDict constructor generates a fake indirect reference if the
font dictionary doesn't have a real indirect reference. It sometimes
assigns the same reference to two different fonts leading to a wrong
font being used. XPDF 4.00 fixes this by using the hash of the font
data to create the fake reference.