Bug 93299 - Poppler fails to display pdf correctly
Summary: Poppler fails to display pdf correctly
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
: 79804 94837 (view as bug list)
Depends on:
Reported: 2015-12-08 19:02 UTC by Yan Pas
Modified: 2018-08-20 21:55 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:

data (630.04 KB, application/zip)
2015-12-08 19:39 UTC, Yan Pas
Fallback to looking up glyph by Unicode code point (3.71 KB, patch)
2015-12-30 04:01 UTC, Jason Crain
Details | Splinter Review
bug97131-2.pdf (3.24 MB, application/download)
2015-12-30 17:31 UTC, Albert Astals Cid
Fallback to looking up glyph by Unicode code point (13.72 KB, patch)
2016-01-15 14:56 UTC, Jason Crain
Details | Splinter Review
example rendering (65.32 KB, image/png)
2016-01-18 02:18 UTC, Jason Crain
Fallback to looking up glyph by Unicode code point (with ICU) (11.65 KB, patch)
2016-02-04 15:48 UTC, Jason Crain
Details | Splinter Review
Fallback to looking up glyph by Unicode code point (without ICU) (7.10 KB, patch)
2016-02-04 15:50 UTC, Jason Crain
Details | Splinter Review

Description Yan Pas 2015-12-08 19:02:11 UTC
Poppler version 0.24.5-2ubuntu4
This pdf in qpdfview looks bad. Some symbols are lost, see the screenshot (chrome viewer on the right). Qpdfview version is 0.4.7
pdf: http://www.mathprofi.ru/kak_podobrat_chastnoe_reshenie_dy.pdf

I fisrtly reported it to qpdfview, screenshots may be obtained here https://bugs.launchpad.net/qpdfview/+bug/1523702
Comment 1 Yan Pas 2015-12-08 19:39:44 UTC
Created attachment 120417 [details]
Comment 2 Jason Crain 2015-12-30 04:01:25 UTC
Created attachment 120735 [details] [review]
Fallback to looking up glyph by Unicode code point

The glyphs don't display because the document uses a nonembedded font, and the document uses names like "Decyrillic", but the substitution font uses a name like "afii10021".  This patch does two things.  First, it provides the required Unicode characters to fontconfig so it can do a better job of finding a good font.  Second, it falls back to looking up the glyph ID by Unicode character if lookup by name or character code fails.
Comment 3 Albert Astals Cid 2015-12-30 17:31:28 UTC
Crashes with bug97131-2.pdf

Displaying page using "Splash" backend:  116 

Program received signal SIGSEGV, Segmentation fault.
NameToCharCode::hash (this=this@entry=0x6e04d0, name=name@entry=0xd20657079546500 <error: Cannot access memory at address 0xd20657079546500>) at /home/tsdgeos/devel/poppler/poppler/NameToCharCode.cc:112
112       for (p = name; *p; ++p) {
(gdb) bt
#0  0x00007ffff5f08380 in NameToCharCode::hash(char const*) (this=this@entry=0x6e04d0, name=name@entry=0xd20657079546500 <error: Cannot access memory at address 0xd20657079546500>) at /home/tsdgeos/devel/poppler/poppler/NameToCharCode.cc:112
#1  0x00007ffff5f08553 in NameToCharCode::lookup(char const*) (this=0x6e04d0, name=0xd20657079546500 <error: Cannot access memory at address 0xd20657079546500>) at /home/tsdgeos/devel/poppler/poppler/NameToCharCode.cc:95
#2  0x00007ffff5ef2039 in GlobalParams::mapNameToUnicodeText(char const*) (this=<optimized out>, charName=<optimized out>) at /home/tsdgeos/devel/poppler/poppler/GlobalParams.cc:858
#3  0x00007ffff5ef293e in GlobalParams::findSystemFontFile(GfxFont*, SysFontType*, int*, GooString*, GooString*) (base14Name=0x0, font=0x9ae460) at /home/tsdgeos/devel/poppler/poppler/GlobalParams.cc:1112
#4  0x00007ffff5ef293e in GlobalParams::findSystemFontFile(GfxFont*, SysFontType*, int*, GooString*, GooString*) (this=0x6e8410, font=font@entry=0x9ae460, type=type@entry=0x7fffffffc3d8, fontNum=fontNum@entry=0x7fffffffc3dc, substituteFontName=substituteFontName@entry=0x0, base14Name=base14Name@entry=0x0) at /home/tsdgeos/devel/poppler/poppler/GlobalParams.cc:1209
#5  0x00007ffff5ed3da5 in GfxFont::locateFont(XRef*, PSOutputDev*) (this=this@entry=0x9ae460, xref=<optimized out>, ps=ps@entry=0x0) at /home/tsdgeos/devel/poppler/poppler/GfxFont.cc:710
#6  0x00007ffff5f5a009 in SplashOutputDev::doUpdateFont(GfxState*) (this=this@entry=0x7fffffffcc10, state=state@entry=0x7c3440) at /home/tsdgeos/devel/poppler/poppler/SplashOutputDev.cc:1969
#7  0x00007ffff5f5a9c2 in SplashOutputDev::drawChar(GfxState*, double, double, double, double, double, double, unsigned int, int, unsigned int*, int) (this=0x7fffffffcc10, state=0x7c3440, x=<optimized out>, y=<optimized out>, dx=<optimized out>, dy=<optimized out>, originX=<optimized out>, originY=<optimized out>, code=1, nBytes=<optimized out>, u=<optimized out>, uLen=<optimized out>) at /home/tsdgeos/devel/poppler/poppler/SplashOutputDev.cc:2338
#8  0x00007ffff5ece823 in Gfx::doShowText(GooString*) (this=this@entry=0xc19820, s=0x835370) at /home/tsdgeos/devel/poppler/poppler/Gfx.cc:4066
#9  0x00007ffff5ecf105 in Gfx::opShowText(Object*, int) (this=0xc19820, args=0x7fffffffc7e0, numArgs=<optimized out>) at /home/tsdgeos/devel/poppler/poppler/Gfx.cc:3809
#10 0x00007ffff5ec752e in Gfx::go(bool) (this=this@entry=0xc19820, topLevel=topLevel@entry=true) at /home/tsdgeos/devel/poppler/poppler/Gfx.cc:763
#11 0x00007ffff5ec79d0 in Gfx::display(Object*, bool) (this=this@entry=0xc19820, obj=obj@entry=0x7fffffffcae0, topLevel=topLevel@entry=true) at /home/tsdgeos/devel/poppler/poppler/Gfx.cc:729
#12 0x00007ffff5f0da65 in Page::displaySlice(OutputDev*, double, double, int, bool, bool, int, int, int, int, bool, bool (*)(void*), void*, bool (*)(Annot*, void*), void*, bool) (this=0x7ccf80, out=0x7fffffffcc10, out@entry=0xffffffff00000000, hDPI=72, 
    hDPI@entry=0, vDPI=72, vDPI@entry=4,0413305021760614e-317, rotate=rotate@entry=0, useMediaBox=false, useMediaBox@entry=8, crop=crop@entry=true, sliceX=-1, 
    sliceX@entry=0, sliceY=-1, sliceW=-1, sliceH=-1, printing=false, abortCheckCbk=0x0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0x0, annotDisplayDecideCbkData=0x0, copyXRef=true) at /home/tsdgeos/devel/poppler/poppler/Page.cc:599
#13 0x00007ffff5f152e9 in PDFDoc::displayPageSlice(OutputDev*, int, double, double, int, bool, bool, bool, int, int, int, int, bool (*)(void*), void*, bool (*)(Annot*, void*), void*, bool) (this=<optimized out>, out=0xffffffff00000000, 
    out@entry=0x7fffffffcc10, page=913297635, hDPI=0, hDPI@entry=72, vDPI=4,0413305021760614e-317, 
    vDPI@entry=72, rotate=rotate@entry=0, useMediaBox=useMediaBox@entry=false, crop=crop@entry=true, printing=255, sliceX=-1, sliceY=-1, sliceW=0, sliceH=0, abortCheckCbk=0x0, abortCheckCbkData=0x0, annotDisplayDecideCbk=
    0x0, annotDisplayDecideCbkData=0x1, copyXRef=true) at /home/tsdgeos/devel/poppler/poppler/PDFDoc.cc:504
#14 0x00007ffff7bc1731 in Poppler::Page::renderToImage(double, double, int, int, int, int, Poppler::Page::Rotation) const (this=this@entry=0x88fd60, xres=xres@entry=72, yres=yres@entry=72, x=x@entry=-1, y=y@entry=-1, w=w@entry=-1, h=h@entry=-1, rotate=Poppler::Page::Rotate0) at /home/tsdgeos/devel/poppler/qt4/src/poppler-page.cc:358
#15 0x0000000000405f04 in PDFDisplay::display() (this=0x7fffffffdc50) at /home/tsdgeos/devel/poppler/qt4/tests/test-poppler-qt4.cpp:65
#16 0x00007ffff6ba18c3 in QWidget::event(QEvent*) (this=0x7fffffffdc50, event=0x7fffffffd230) at kernel/qwidget.cpp:8435
#17 0x00007ffff6b4ccdc in QApplicationPrivate::notify_helper(QObject*, QEvent*) (this=this@entry=0x626c30, receiver=receiver@entry=0x7fffffffdc50, e=e@entry=0x7fffffffd230) at kernel/qapplication.cpp:4570
#18 0x00007ffff6b54f63 in QApplication::notify(QObject*, QEvent*) (this=<optimized out>, receiver=0x7fffffffdc50, e=0x7fffffffd230) at kernel/qapplication.cpp:4011
#19 0x00007ffff780385d in QCoreApplication::notifyInternal(QObject*, QEvent*) (this=0x7fffffffdc40, receiver=receiver@entry=0x7fffffffdc50, event=event@entry=0x7fffffffd230) at kernel/qcoreapplication.cpp:955
#20 0x00007ffff6b4b146 in qt_sendSpontaneousEvent(QObject*, QEvent*) (event=event@entry=0x7fffffffd230, receiver=receiver@entry=0x7fffffffdc50) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:234
#21 0x00007ffff6b4b146 in qt_sendSpontaneousEvent(QObject*, QEvent*) (receiver=receiver@entry=0x7fffffffdc50, event=event@entry=0x7fffffffd230) at kernel/qapplication.cpp:5568
#22 0x00007ffff6bf4cc3 in QKeyMapper::sendKeyEvent(QWidget*, bool, QEvent::Type, int, QFlags<Qt::KeyboardModifier>, QString const&, bool, int, unsigned int, unsigned int, unsigned int, bool*) (keyWidget=keyWidget@entry=0x7fffffffdc50, grab=grab@entry=false, type=QEvent::KeyPress, code=16777237, modifiers=..., text=..., autorepeat=true, count=1, nativeScanCode=116, nativeVirtualKey=65364, nativeModifiers=0) at kernel/qkeymapper_x11.cpp:1866
#23 0x00007ffff6bf51d2 in QKeyMapperPrivate::translateKeyEvent(QWidget*, _XEvent const*, bool) (this=0x7fffffffd7a0, keyWidget=0x7fffffffdc50, event=0x0, grab=<optimized out>) at kernel/qkeymapper_x11.cpp:1836
#24 0x00007ffff6bcd2c1 in QApplication::x11ProcessEvent(_XEvent*) (this=0x7fffffffdc40, event=event@entry=0x7fffffffd7a0) at kernel/qapplication_x11.cpp:3641
#25 0x00007ffff6bf7b52 in x11EventSourceDispatch(GSource*, GSourceFunc, gpointer) (s=0x629720, callback=0x0, user_data=0x0) at kernel/qguieventdispatcher_glib.cpp:146
#26 0x00007ffff5268117 in g_main_context_dispatch (context=0x6282e0) at /build/glib2.0-IfQpAv/glib2.0-2.47.3/./glib/gmain.c:3154
#27 0x00007ffff5268117 in g_main_context_dispatch (context=context@entry=0x6282e0) at /build/glib2.0-IfQpAv/glib2.0-2.47.3/./glib/gmain.c:3769
#28 0x00007ffff5268370 in g_main_context_iterate (context=context@entry=0x6282e0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at /build/glib2.0-IfQpAv/glib2.0-2.47.3/./glib/gmain.c:3840
#29 0x00007ffff526841c in g_main_context_iteration (context=0x6282e0, may_block=may_block@entry=1) at /build/glib2.0-IfQpAv/glib2.0-2.47.3/./glib/gmain.c:3901
#30 0x00007ffff78341ee in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (this=0x6281a0, flags=...) at kernel/qeventdispatcher_glib.cpp:450
#31 0x00007ffff6bf7c26 in QGuiEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (this=<optimized out>, flags=...) at kernel/qguieventdispatcher_glib.cpp:204
#32 0x00007ffff78020d1 in QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (this=this@entry=0x7fffffffdb80, flags=...) at kernel/qeventloop.cpp:149
#33 0x00007ffff7802445 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (this=this@entry=0x7fffffffdb80, flags=...) at kernel/qeventloop.cpp:204
#34 0x00007ffff7808429 in QCoreApplication::exec() () at kernel/qcoreapplication.cpp:1227
#35 0x00007ffff6b4af2c in QApplication::exec() () at kernel/qapplication.cpp:3828
#36 0x0000000000405445 in main(int, char**) (argc=2, argv=0x7fffffffddd8) at /home/tsdgeos/devel/poppler/qt4/tests/test-poppler-qt4.cpp:223
Comment 4 Albert Astals Cid 2015-12-30 17:31:58 UTC
Created attachment 120740 [details]
Comment 5 Jason Crain 2016-01-15 14:56:03 UTC
Created attachment 121066 [details] [review]
Fallback to looking up glyph by Unicode code point

Sorry about the crash.  That's what I get for testing with a subset of documents that happened to not have any CID fonts. :/

This version of the patch fixes the crash / invalid cast by checking for and filtering out CID fonts.  I think the CID case is already taken care of by requesting the language from fontconfig.  And I'm not completely sure how the CID charcode to unicode stuff works, but it looks like some of the maps are thousands of entries long, which would make this approach infeasable for CID fonts anyway.

While doing more testing, I also noticed that the previous patch caused some documents to have less appropriate fonts substituted.  An italic, bold, or light font might be replaced with a more generic font because it had better unicode coverage, even if the document didn't actually use any of the characters.  To prevent that, this patch adds a dependency on icu to look up the unicode character class and only add the character if it's alphabetic.  This also doesn't use the character set for a ZapfDingbats or Symbol font, because some documents used WinAnsiEncoding for these, or if fontconfig was able to find a font with the exact name as the document requested, such as MTExtra or ComicSansMS.

Also fixed a problem where because the buildFcPattern function modifies the font name from it's arguments in place, it may not work right if it's called a second time with the same arguments and can lose the bold/italic/etc modifiers from the font.
Comment 6 Albert Astals Cid 2016-01-16 12:07:10 UTC
I'm a bit hesitant on the ifdef for ICU, makes it for hard testing for the two codepaths. How much does it actually improve things?
Comment 7 Jason Crain 2016-01-18 02:18:14 UTC
Created attachment 121099 [details]
example rendering

(In reply to Albert Astals Cid from comment #6)
> I'm a bit hesitant on the ifdef for ICU, makes it for hard testing for the
> two codepaths. How much does it actually improve things?

You would prefer ICU be a mandatory dependency?  Or hesitant on having ICU be a dependency at all?

Completely removing the block of ICU/FcCharSet code, the majority of the text in this PDF (and a couple others I found) is probably going to be missing because there's a good chance that fontconfig is going to choose a font without Cyrillic glyphs.

If I keep the FcCharSet code but remove the ICU IsAlphabetic check, some documents will have regressions as fontconfig chooses different fonts.  Most of the differences are benign, replacing a font with another similar font, but some are not.  Examples in the attached image.

* http://www.epson.com/cmc_upload/0/000/044/561/10000XL_CatSheet.pdf
* http://upload.wikimedia.org/wikipedia/de/7/77/Wikipedia_2005_Inlaycard_small.pdf
Light font is replaced with a regular font because the light variant is missing some symbols like bullet, trademark, notequal, and others that are in the default encodings.

* https://www.ftb.ca.gov/forms/2013/13_540.pdf - numbers along the bottom of the page
* http://bugs.ghostscript.com/attachment.cgi?id=7435 - ACPIspec40a.pdf - code samples such as top of page 32
And a few other documents.  First choice font is missing a euro symbol so fontconfig picks a bold oblique font.  I'm starting to suspect this is a fontconfig bug because there are non-bold non-oblique fonts available further down the substitution list.

* http://acroeng.adobe.com/Test_Files/weblinks/Links_Bookmarks/FinalLinkTest.pdf
On page 3, a Bookman font is replaced with a sans-serif font because it's missing some math symbols.

If those aren't a problem I can do a simpler check, skip ICU/IsAlphabetic and only filter out Unicode private use area characters.  Or if it is a problem but you don't want to add ICU for something relatively simple like this, make my own blacklist of certain characters.  That might be better anyway because I think the ligatures (ff ffi ffl) are also causing some minor oddities and those are still alphabetic letters.
Comment 8 Albert Astals Cid 2016-01-25 22:36:23 UTC
I would preferably not use ICU but if it is needed make it mandatory.

The issues you mention don't seem very problematic, after all it's people's fault for producing files without the fonts embedded.

How hard would it be to have a patch that has mandatory ICU and one that hasn't ICU at all so we could see the difference?
Comment 9 Jason Crain 2016-02-04 15:48:51 UTC
Created attachment 121522 [details] [review]
Fallback to looking up glyph by Unicode code point (with ICU)

Patch with a mandatory dependency on ICU.
Comment 10 Jason Crain 2016-02-04 15:50:58 UTC
Created attachment 121523 [details] [review]
Fallback to looking up glyph by Unicode code point (without ICU)

Patch without ICU.
Comment 11 Carlos Garcia Campos 2016-02-04 17:08:41 UTC
If we make ICU a required dependency, it could be used to improve other things like some features of text output dev. The only problem of ICU is that it's a huge library and it could be too much in some embedded environments, but I'm favor of using making ICU a hard dep.
Comment 12 Adrian Johnson 2016-03-20 02:09:58 UTC
I would prefer it if kept the mandatory dependencies to a minimum. However if making ICU optional is going to make maintenance more difficult then just make it mandatory,

Something we could consider doing for optional but highly recommended dependencies is instead of auto detecting if the dependency exists, fail if it doesn't but provide a --disable-XXX option to allow building without it. This will prevent issues like bug 94586.
Comment 13 Jason Crain 2016-04-08 05:19:54 UTC
*** Bug 94837 has been marked as a duplicate of this bug. ***
Comment 14 Jason Crain 2016-04-08 05:32:49 UTC
*** Bug 79804 has been marked as a duplicate of this bug. ***
Comment 15 GitLab Migration User 2018-08-20 21:55:05 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/123.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.