Bug 18023 - pdftotext utility crashes on some PDF file(s), when poppler-data is not installed
Summary: pdftotext utility crashes on some PDF file(s), when poppler-data is not insta...
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All All
: high major
Assignee: poppler-bugs
QA Contact:
URL: http://www.hk-lawyer.com/2008-1/Jan-1...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-12 03:05 UTC by Mark Kaplan
Modified: 2008-10-12 05:25 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Mark Kaplan 2008-10-12 03:05:28 UTC
1. Crash happens only in the case when poppler-data decoding tables are not installed (but they are indeed optional, one may not install them as result of licensing problem)
2. Crash happens as result of unchecked zero pointer usage 
3. Crash happens for the latest stable version, as well, as for previous versions
4. call stack is as following:

Program terminated with signal 11, Segmentation fault.
#0  TextPage::beginWord (this=0x805dfb0, state=0x807ee30, x0=346.49509999999992, y0=717.5299) at TextOutputDev.cc:1958
1958      if (state->getFont()->getType() == fontType3) {
(gdb) bt
#0  TextPage::beginWord (this=0x805dfb0, state=0x807ee30, x0=346.49509999999992, y0=717.5299) at TextOutputDev.cc:1958
#1  0xb7e77bfe in TextPage::addChar (this=0x805dfb0, state=0x807ee30, x=346.49509999999992, y=717.5299, dx=0, dy=0, c=0, nBytes=1,
    u=0x80d8d48, uLen=2) at TextOutputDev.cc:2074
#2  0xb7e77d24 in TextOutputDev::endMarkedContent (this=0x805e310, state=0x807ee30) at TextOutputDev.cc:4663
#3  0xb7e00165 in Gfx::opEndMarkedContent (this=0x805ed78, args=0xbff34aa0, numArgs=0) at Gfx.cc:4200
#4  0xb7e027a1 in Gfx::execOp (this=0x805ed78, cmd=0xbff34c40, args=0xbff34aa0, numArgs=<value optimized out>) at Gfx.cc:766
#5  0xb7e0296d in Gfx::go (this=0x805ed78, topLevel=1) at Gfx.cc:637
#6  0xb7e085b7 in Gfx::display (this=0x805ed78, obj=0xbff34d1c, topLevel=1) at Gfx.cc:606
#7  0xb7e47206 in Page::displaySlice (this=0x8060990, out=0x805e310, hDPI=72, vDPI=72, rotate=0, useMediaBox=1, crop=0, sliceX=-1,
    sliceY=-1, sliceW=-1, sliceH=-1, printing=0, catalog=0x805ded8, abortCheckCbk=0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0,
    annotDisplayDecideCbkData=0x0) at Page.cc:438
#8  0xb7e472d5 in Page::display (this=0x8060990, out=0x805e310, hDPI=72, vDPI=72, rotate=0, useMediaBox=1, crop=0, printing=0,
    catalog=0x805ded8, abortCheckCbk=0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0, annotDisplayDecideCbkData=0x0) at Page.cc:367
#9  0xb7e49e9e in PDFDoc::displayPage (this=0x805db88, out=0x805e310, page=5, hDPI=72, vDPI=72, rotate=0, useMediaBox=1, crop=0,
    printing=0, abortCheckCbk=0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0, annotDisplayDecideCbkData=0x0) at PDFDoc.cc:391
#10 0xb7e49f3a in PDFDoc::displayPages (this=0x805db88, out=0x805e310, firstPage=1, lastPage=10, hDPI=72, vDPI=72, rotate=0,
    useMediaBox=1, crop=0, printing=0, abortCheckCbk=0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0, annotDisplayDecideCbkData=0x0)
    at PDFDoc.cc:406
#11 0x08049b48 in main (argc=Cannot access memory at address 0x0
) at pdftotext.cc:276


state->getFont() returns zero, which is not checked.

5. grep shows that there is a number of such places in the poppler code:

poppler/ABWOutputDev.cc:  height = state->getFont()->getAscent() * state->getTransformedFontSize();
poppler/CairoOutputDev.cc:  LOG(printf ("updateFont() font=%s\n", state->getFont()->getName()->getCString()));
poppler/CairoOutputDev.cc:  if (state->getFont()->getType() == fontType3)
poppler/Gfx.cc:  wMode = state->getFont()->getWMode();
poppler/PSOutputDev.cc:        state->getFont()->getID()->num, state->getFont()->getID()->gen,
poppler/PSOutputDev.cc:  if (state->getFont()->getWMode()) {
poppler/TextOutputDev.cc:  if (state->getFont()->getType() == fontType3) {

6. Proposed patch solves the crash problem, while I'm not sure that functionally it does a right thing - it tries to substitude absent font with other one...

 diff -Naur  poppler/TextOutputDev.orig.cc poppler/TextOutputDev.cc
--- poppler/TextOutputDev.orig.cc       2008-10-12 09:56:41.000000000 +0000
+++ poppler/TextOutputDev.cc    2008-10-12 09:58:19.000000000 +0000
@@ -1953,9 +1953,23 @@
     return;
   }

-  // compute the rotation
-  state->getFontTransMat(&m[0], &m[1], &m[2], &m[3]);
-  if (state->getFont()->getType() == fontType3) {
+  GfxFont * gfxFont = state->getFont();
+  if ( !gfxFont )
+  {
+    if ( !curFont )
+    {
+        updateFont(state);
+    }
+    gfxFont = curFont->gfxFont;
+  }
+  if ( !gfxFont )
+  {
+    //What else can I do???
+    return ;
+  }
+   // compute the rotation
+   state->getFontTransMat(&m[0], &m[1], &m[2], &m[3]);
+   if (gfxFont->getType() == fontType3) {
     fontm = state->getFont()->getFontMatrix();
     m2[0] = fontm[0] * m[0] + fontm[1] * m[2];
     m2[1] = fontm[0] * m[1] + fontm[1] * m[3];
Comment 1 Albert Astals Cid 2008-10-12 05:25:02 UTC
Thanks for the report, i've commited a different and more simple fix [1], it's what is done in all poppler code.

For the other places, it seems it's ok to do the call, or at least it does not crash, and as it's not easy to think a way to "fix" the code when the font is not there i'll leave it that way.

The abiword converter crashes but it's not because of this problem, will open a separate bug

[1] http://cgit.freedesktop.org/poppler/poppler/commit/?h=poppler-0.10&id=d313c3029b09e043a5f68f7b0e7286aee2efbe13


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.