Bug 16885 - pdftotext crashed with SIGSEGV in TextPage::beginWord()
Summary: pdftotext crashed with SIGSEGV in TextPage::beginWord()
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-29 02:58 UTC by Sebastien Bacher
Modified: 2008-07-31 11:24 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Fix bug in TextOutputDev.cc (707 bytes, patch)
2008-07-29 04:00 UTC, Adrian Johnson
Details | Splinter Review

Description Sebastien Bacher 2008-07-29 02:58:42 UTC
the bug has been opened on https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/252854

"Crashed in the background while downloading some pdf's

0.8.4-1.1

#0  0x00007f54c1699ab1 in TextPage::beginWord (this=0x120e660, state=0x12bc0f0, x0=<value optimized out>, y0=<value optimized out>) at TextOutputDev.cc:1940
	m = {0, 0, 0, -0}
	rot = <value optimized out>
#1  0x00007f54c169b209 in TextPage::addChar (this=0x120e660, state=0x12bc0f0, x=<value optimized out>, y=<value optimized out>, dx=<value optimized out>, dy=<value optimized out>, 
    c=0, nBytes=1, u=<value optimized out>, uLen=<value optimized out>) at TextOutputDev.cc:2056
	x1 = <value optimized out>
	y1 = <value optimized out>
	w1 = <value optimized out>
	h1 = <value optimized out>
	base = 783
	sp = -495.42185000000006
	overlap = 0
	i = <value optimized out>
#2  0x00007f54c169b38f in TextOutputDev::endMarkedContent (this=0x1215640, state=0x12bc0f0) at TextOutputDev.cc:4645
	uniString = 0x11fefc0 "þÿ"
	length = 1
	i = <value optimized out>
#3  0x00007f54c1621e4c in Gfx::go (this=0x125f110, topLevel=1) at Gfx.cc:611
	timer = {start_time = {tv_sec = 1217320457, tv_usec = 382068}, end_time = {tv_sec = 4562254508917369341, tv_usec = 18266736}, active = 1}
	obj = {type = objCmd, {booln = 19012944, intg = 19012944, real = 9.3936424567034534e-317, string = 0x1221d50, name = 0x1221d50 "EMC", array = 0x1221d50, dict = 0x1221d50, 
    stream = 0x1221d50, ref = {num = 19012944, gen = 0}, cmd = 0x1221d50 "EMC"}}
	args = {{type = objNone, {booln = 19382976, intg = 19382976, real = 9.5764625557653816e-317, string = 0x127c2c0, name = 0x127c2c0 "P", array = 0x127c2c0, dict = 0x127c2c0, 
      stream = 0x127c2c0, ref = {num = 19382976, gen = 0}, cmd = 0x127c2c0 "P"}}, {type = objNone, {booln = 19604656, intg = 19604656, real = 9.6859870281354691e-317, 
      string = 0x12b24b0, name = 0x12b24b0 "0ó%\001", array = 0x12b24b0, dict = 0x12b24b0, stream = 0x12b24b0, ref = {num = 19604656, gen = 0}, cmd = 0x12b24b0 "0ó%\001"}}, {
    type = objNone, {booln = 594, intg = 594, real = 2.9347499362970045e-321, string = 0x252, name = 0x252 <Address 0x252 out of bounds>, array = 0x252, dict = 0x252, 
      stream = 0x252, ref = {num = 594, gen = 0}, cmd = 0x252 <Address 0x252 out of bounds>}}, {type = objNone, {booln = 783, intg = 783, real = 3.8685340069369604e-321, 
      string = 0x30f, name = 0x30f <Address 0x30f out of bounds>, array = 0x30f, dict = 0x30f, stream = 0x30f, ref = {num = 783, gen = 0}, 
      cmd = 0x30f <Address 0x30f out of bounds>}}, {type = objNone, {booln = 1030792151, intg = 1030792151, real = 2.605, string = 0x4004d70a3d70a3d7, 
      name = 0x4004d70a3d70a3d7 <Address 0x4004d70a3d70a3d7 out of bounds>, array = 0x4004d70a3d70a3d7, dict = 0x4004d70a3d70a3d7, stream = 0x4004d70a3d70a3d7, ref = {
        num = 1030792151, gen = 1074059018}, cmd = 0x4004d70a3d70a3d7 <Address 0x4004d70a3d70a3d7 out of bounds>}}, {type = objNone, {booln = -755914244, intg = -755914244, 
      real = -1.024, string = 0xbff0624dd2f1a9fc, name = 0xbff0624dd2f1a9fc <Address 0xbff0624dd2f1a9fc out of bounds>, array = 0xbff0624dd2f1a9fc, dict = 0xbff0624dd2f1a9fc, 
      stream = 0xbff0624dd2f1a9fc, ref = {num = -755914244, gen = -1074765235}, cmd = 0xbff0624dd2f1a9fc <Address 0xbff0624dd2f1a9fc out of bounds>}}, {type = objNone, {booln = 0, 
      intg = 0, real = 0, string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, real = 0, 
      string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, 
      name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, name = 0x0, 
      array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, name = 0x0, array = 0x0, 
      dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, name = 0x0, array = 0x0, dict = 0x0, 
      stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {
        num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, 
      cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {
    type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {
      booln = 0, intg = 0, real = 0, string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, 
      real = 0, string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, real = 0, 
      string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}, {type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, 
      name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}} <repeats 14 times>}
	numArgs = 0
	i = 2
	lastAbortCheck = 0
#4  0x00007f54c1628536 in Gfx::display (this=0x125f110, obj=0x7fffc9b75300, topLevel=1) at Gfx.cc:580
	obj2 = {type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}
	i = <value optimized out>
#5  0x00007f54c166de40 in Page::displaySlice (this=0x11c4140, out=0x1215640, hDPI=72, vDPI=72, rotate=<value optimized out>, useMediaBox=<value optimized out>, crop=0, sliceX=-1, 
    sliceY=-1, sliceW=-1, sliceH=-1, printing=0, catalog=0x1169320, abortCheckCbk=0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0, annotDisplayDecideCbkData=0x0) at Page.cc:415
	gfx = (Gfx *) 0x125f110
	obj = {type = objStream, {booln = 19401584, intg = 19401584, real = 9.5856561293031955e-317, string = 0x1280b70, name = 0x1280b70 "p®\222ÁT\177", array = 0x1280b70, 
    dict = 0x1280b70, stream = 0x1280b70, ref = {num = 19401584, gen = 0}, cmd = 0x1280b70 "p®\222ÁT\177"}}
	annotList = <value optimized out>
	i = <value optimized out>
#6  0x00007f54c166dedd in Page::display (this=0x12bc0f0, out=0x7fffc9b74df0, hDPI=-0, vDPI=-1, rotate=-910733832, useMediaBox=-910733824, crop=-910733816, 
    printing=<value optimized out>, catalog=0x1169320, abortCheckCbk=0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0, annotDisplayDecideCbkData=0x0) at Page.cc:344
No locals.
#7  0x00007f54c1671632 in PDFDoc::displayPages (this=0x1168e80, out=0x1215640, firstPage=<value optimized out>, lastPage=221, hDPI=72, vDPI=72, rotate=0, useMediaBox=1, crop=0, 
    printing=0, abortCheckCbk=0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0, annotDisplayDecideCbkData=0x0) at PDFDoc.cc:388
	page = 82
#8  0x0000000000401ff2 in main (argc=3, argv=<value optimized out>) at pdftotext.cc:248
	doc = (PDFDoc *) 0x1168e80
	fileName = <value optimized out>
	textFileName = <value optimized out>
	ownerPW = <value optimized out>
	userPW = <value optimized out>
	textOut = (class TextOutputDev *) 0x1215640
	f = <value optimized out>
	uMap = (UnicodeMap *) 0x1168ca0
	info = {type = objNone, {booln = 0, intg = 0, real = 0, string = 0x0, name = 0x0, array = 0x0, dict = 0x0, stream = 0x0, ref = {num = 0, gen = 0}, cmd = 0x0}}
	ok = <value optimized out>
	p = <value optimized out>
	exitCode = <value optimized out>"
Comment 1 Adrian Johnson 2008-07-29 04:00:48 UTC
Created attachment 17958 [details] [review]
Fix bug in TextOutputDev.cc

Attaching the PDF file would assist with debugging this problem.

From the stack trace I could see one problem. At #2 the string contains only the unicode byte order marker FEFF. At this point in the code the length should be 0, not 1. Patch to fix this is attached.

Without the original PDF file I do not know if this patch will fix this bug.
Comment 2 Sebastien Bacher 2008-07-30 01:14:30 UTC
I've asked the example to the submitter but the description suggests that the pdf was being downloaded and the issue was due to the partial copy rather
Comment 3 Albert Astals Cid 2008-07-31 11:24:35 UTC
I've commited Adrian's patch, not that we can do much more without more info, so i'm closing the bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.