Bug 63963 - pdftops - some fonts are encoded incorrectly in level2 postscript
Summary: pdftops - some fonts are encoded incorrectly in level2 postscript
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-26 14:37 UTC by Alex Korobkin
Modified: 2016-12-08 20:46 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
test.pdf - original pdf file (56.26 KB, text/plain)
2013-04-26 14:37 UTC, Alex Korobkin
Details
test-ps - result of pdftops level2 conversion (287.35 KB, application/postscript)
2013-04-26 14:38 UTC, Alex Korobkin
Details
test.pdf - original pdf file (56.26 KB, application/pdf)
2013-04-26 14:39 UTC, Alex Korobkin
Details
patch to fix the problem (8.55 KB, patch)
2016-07-06 02:02 UTC, William Bader
Details | Splinter Review
patch to fix the problem (8.54 KB, patch)
2016-07-06 02:16 UTC, William Bader
Details | Splinter Review
patch to fix the problem (7.70 KB, patch)
2016-07-06 02:30 UTC, William Bader
Details | Splinter Review
patch to fix the problem (7.72 KB, patch)
2016-09-04 00:18 UTC, William Bader
Details | Splinter Review
patch to fix the problem (7.85 KB, patch)
2016-09-04 00:33 UTC, William Bader
Details | Splinter Review
Revised patch that does not store maxValidGlyph in GfxFont or FoFiTrueType (5.61 KB, patch)
2016-12-01 04:05 UTC, William Bader
Details | Splinter Review

Description Alex Korobkin 2013-04-26 14:37:34 UTC
Created attachment 78523 [details]
test.pdf - original pdf file

Hi team, 

This sample PDF file, when converted to PS level2 by pdftops 0.22.3, cannot be processed by a printer or by GhostScript interpreter because it chokes on embedded fonts and quits with rangecheck error in xyshow function. 

The offending part of PS code is this:

/F9_0 13.3333 Tf
(\012\355)
[13.3333
0] Tj

Nice folks at GhostScript explained to me that after defining the type42 font /DejaVuSans_00, the ps code then does this:

16 dict begin
/FontName /DejaVuSans def
/FontType 0 def
/FontMatrix [1 0 0 1 0 0] def
/FMapType 2 def
/Encoding [
0
] def
/FDepVector [
/DejaVuSans_00 findfont
] def
FontName currentdict end definefont pop

As you can see, the font has a single dependent font, and it has map
type 2. Map type 2 means the () string passed to show has a font number
and a character number.  So the string (\012\355) means sub-font 10
character 247, hence the rangecheck error.

In the PDF, all of the falls to F0 (Arial) are of the form:

/F0 13.3333 Tf
1 0 0 -1 10 18 Tm
<0035> Tj

but the calls to F1 (DejaVu Sans) look like:

/F1 13.3333 Tf
1 0 0 -1 139 18 Tm
<0AED> Tj

So pdftops is copying the <0AED> literally to (\012\355), but not
embedding enough subfonts.  The other fonts coincidentally all use
glyphs in <0000>--<00FF> and miss the bug.

I'm attaching both PDF and PS files for your consideration.
Comment 1 Alex Korobkin 2013-04-26 14:38:11 UTC
Created attachment 78524 [details]
test-ps - result of pdftops level2 conversion
Comment 2 Alex Korobkin 2013-04-26 14:39:30 UTC
Comment on attachment 78523 [details]
test.pdf - original pdf file

mistakenly attached as plain text, removing.
Comment 3 Alex Korobkin 2013-04-26 14:39:49 UTC
Created attachment 78525 [details]
test.pdf - original pdf file
Comment 4 James Cloos 2013-04-26 22:13:57 UTC
I provided that diagnosis which Alexei quoted, but I’ve not had any time to work through the src to figure out why it fails as it does.

pdfinfo says the font is not subset, but having extracted it I see that it sort-of is.  The contents of the unused glyf routines have been elided, but only their contents.  The entries themselves remain.

ttx shows me that the 0x0AEDth glyph is uni200B (ZERO WIDTH SPACE).  Odd glyph to emit into a pdf, but pdftops should get it right anyway.

Using -level3 pdftops emits a CID font into the ps, which works properly.

I used git master to replicate the bug; it is not limited to 0.22.3.
Comment 5 Albert Astals Cid 2013-05-16 19:29:02 UTC
James do you think you will ahve time to look at what we do wrong or should someone else try to?
Comment 6 James Cloos 2013-05-16 23:51:53 UTC
I’ll at least look at the call graph.

The »if (font->isCIDFont())« block of PSOutputDev::drawString() is involved.

Maybe the if (uMap) block?

I need to recompile w/o optimization to know more, but there must be a disconnect between the subsetting code and the codeToGID() function.
Comment 7 William Bader 2016-07-03 06:08:46 UTC
I had a similar problem in https://bugs.freedesktop.org/show_bug.cgi?id=96644 , and I am close to a patch to fix it.
I think that FoFiTrueType.cc should remember how many glyphs it writes.
When PSOutputDev builds the resources, it can build a hash with the font name and the number of glyphs.
Then when PSOutputDev drawString builds the output string, if it can find the number of glyphs in the hash and if a code exceeds the last valid glyph, it can increment the last pair of values in the dxdy array by dx and dy but otherwise ignore the code.

The problem is that in PS Level 1 or 2, FoFiTrueType makes a composite font, but some fonts have a lot of empty positions, so it writes only to the last used glyph. For example, a font in test.pdf has nGlyphs 1674 but maxUsedGlyph 88. Writing all of the empty glyphs in FoFiTrueType would fix the problem, but the resulting ps file would be much larger. Then when the ps uses code 2797 (hex AED), it considers 0xA as the index of the composite font, and there will be enough fonts in the FDepVector. I suppose that unused glyphs have a 0 size and leave no marks, so the alternative is to remember that the font has only 88 glyphs (actually FoFiTrueType rounds it up to 255) and then ignoring the attempt to show code 2797 (except for adding in the dx dy). It looks like these codes are used for alignment. Maybe some applications have poor subsetting logic that doesn't compact empty glyphs. The PDF in my bug report was made by 'Adobe InDesign CC 2015 (Windows)'.

William
Comment 8 William Bader 2016-07-06 02:02:58 UTC
Created attachment 124925 [details] [review]
patch to fix the problem

This patch should fix the problem without affecting the generated postscript for other files.
Comment 9 William Bader 2016-07-06 02:16:46 UTC
Created attachment 124926 [details] [review]
patch to fix the problem
Comment 10 William Bader 2016-07-06 02:30:34 UTC
Created attachment 124927 [details] [review]
patch to fix the problem
Comment 11 William Bader 2016-09-04 00:18:40 UTC
Created attachment 126196 [details] [review]
patch to fix the problem

This fixes a problem where the previous patch could fail on fonts where font->getName() returned NULL.
Comment 12 William Bader 2016-09-04 00:33:46 UTC
Created attachment 126197 [details] [review]
patch to fix the problem

Add checks that font->getName() is not NULL.
Comment 13 Albert Astals Cid 2016-11-29 23:24:04 UTC
I don't like how you use GfxFont and FoFiTrueType as vessels for that maxValidGlyph, isn't that you can transport somehow else?

For example in FoFiTrueType::convertToType it could be an "output parameter"?
Comment 14 William Bader 2016-12-01 02:46:58 UTC
I pulled the current poppler git source, and when I build it unpatched and run pdftops on the test pdf for this bug, I get a lot of errors like the one below.

Syntax Error (927): Arg #0 to 'Tj' operator is wrong type (hexstring)

I suspect that is related to the recent patches by Jakub.
Is anyone looking into that?
Comment 15 William Bader 2016-12-01 04:05:17 UTC
Created attachment 128294 [details] [review]
Revised patch that does not store maxValidGlyph in GfxFont or FoFiTrueType

I made a new version of the patch that passes maxValidGlyph as a parameter to FoFiTrueType::convertToType0() instead of storing it in GfxFont and FoFiTrueType.
This simplifies the patch but might be more expensive in PDFs with a lot of short strings because PSOutputDev::drawString() has to look up the value in a hash instead of caching it in GfxFont.
This patch is based on poppler-0.49.0 because the current git source does not build a working pdftops.
Comment 16 William Bader 2016-12-01 06:17:32 UTC
I have a patch for the "Arg #0 to 'Tj' operator is wrong type (hexstring)" error at https://bugs.freedesktop.org/show_bug.cgi?id=98921
Comment 17 Albert Astals Cid 2016-12-08 20:46:29 UTC
Pushed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.