Created attachment 59961 [details] [review]
Patch to suppress wrong "Error: Invalid XRef entry" messages
I have a valid PDF, where I get several "Error: Invalid XRef entry" messages when I call i.e. pdftoppm. The PDF is too huge to attach it to this bug report, so I try to describe the problem:
The PDF has an object with a big string:
56 0 obj <</CharSet (þÿ^@\(^@/^@S^@/^@t^@/^@r^@/^@a^@/egmrnabdsl^@/^@e^@/^@n^@/^@b^@/^@u^@/psca^@e^@/yhhpne^@/^@T^@/^@i^@/^@f^@/^@c^@/^@h^@/^@o^@/^@w^@/^@k^@/^@N^@/^@s^@/^@C^@/^@l^@/^@D^@/inen^@/^@A^@/no^@e^@/hter^@e^@/ifev^@/^@x^@/wt^@o^@/^@m^@/ezor^@/ofru^@/^@E^@/epirdo^@/^@d^@/^@G^@/^@H^@/maepsrna^@d^@/^@K^@/dueieris^@s^@/^@g^@\)) /CapHeight 500 /Ascent 728 /Flags 32 /FontFile 58 0 R /ItalicAngle 0 /Descent -210 /XHeight 250 /FontName /ZJIGIZ+ArialMT,Bold /Leading 150 /FontBBox [-628 -376 2000 1010 ] /MaxWidth 2628 /AvgWidth 479 /Type /FontDescriptor /StemV 0 >> endobj
Parsing this string in Lexer::getObj it exceeds the token buffer size (128 bytes), therefore xref->getNumEntry(curStr.streamGetPos()) is called to check if the document is not malformed and we are growing too much.
XRef::getNumEntry walks over every xref entry to get the obj num for the actual stream position, so it calls also XRef::getEntry with parameter 2. But in this PDF the obj num's from 2 to 4 are not used:
0000000002 65535 f
0000000015 00000 n
0000000236 00000 n
: : :
But therefore XRef::getEntry i.e. with parameter 2 rescans the xref section, encounters that obj num is not used and give the error message.
Because getNumEntry is called only to check for malformed documents and returns the obj num for a given stream position, it shouldn't check for xrefEntryNone entries. The attached patch solves this.
Wouldn't it be better just wrapping the
error(errSyntaxError, -1, "Invalid XRef entry");
with that new if? And call the if "complainAboutMissingEntry" ?
(In reply to comment #1)
> Wouldn't it be better just wrapping the
> error(errSyntaxError, -1, "Invalid XRef entry");
> with that new if? And call the if "complainAboutMissingEntry" ?
Possible, but I don't think it's really better: why is it necessary to rescan the complete xref section when just looking if the current stream position still belongs to that object? I think, that this scanning code comes due to the fact that a "real" fetch to that object is done, and then try to localize that "missing" object, which is not becessary in this case inmho.
I know what you mean, but there's a reconstructXRef that i find kind of scary so i've went the "really secure way". Sorry if i sound like a coward sometimes :D