Summary: | PDF hint table fails to load if PDFDoc is opened with stream start offset != 0 | ||
---|---|---|---|
Product: | poppler | Reporter: | Ole Liabø <seksfemfire> |
Component: | general | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Respect parent stream start offset when creating substream
attachment-27609-0.html poppler-0.42.0_hints.patch popplertest.cpp |
Do you have a document that shows the need for this patch? This PDF can be used: https://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf I'm a little bit confused about it: 1. If I parse the PDF of comment 2 i.e. with /utils/pdftoppm -png -f 1 -l 1 -cropbox I don't get any error messages. 2. The main stream of a PDF always starts with offset 0 3. The PDF spec says about the H entry in the "Linearization Parameter Dictionary": offset1 shall be the offset of the primary hint stream from the beginning(!!!) of the file. 4. All offsets in poppler, i.e. also the xref offets, are offsets from the beginning of a file and never add some hypothetical offet of substreams Created attachment 123292 [details] attachment-27609-0.html 1. I would like to withdraw my original patch - it does not solve all the issues. Sorry for committing it prematurely. The original bug still remains though. 2. "The main stream of a PDF always starts with offset 0": This is true if the filename PDFDoc constructor is used. But my bug refer to when the PDFDoc is created with a BaseStream as input. This stream could have a offset != 0, fex if you have concatenated many PDFS into one file. Today the code does not handle this case and the use of this PDFDoc constructor should be avoided until it's fixed. I'm working on a more comprehensive patch. On Tue, Apr 26, 2016 at 12:21 PM, <bugzilla-daemon@freedesktop.org> wrote: > *Comment # 3 <https://bugs.freedesktop.org/show_bug.cgi?id=95061#c3> on > bug 95061 <https://bugs.freedesktop.org/show_bug.cgi?id=95061> from Thomas > Freitag <Thomas.Freitag@alfa.de> * > > I'm a little bit confused about it: > 1. If I parse the PDF of comment 2 <https://bugs.freedesktop.org/show_bug.cgi?id=95061#c2> i.e. with /utils/pdftoppm -png -f 1 -l 1 > -cropbox I don't get any error messages. > 2. The main stream of a PDF always starts with offset 0 > 3. The PDF spec says about the H entry in the "Linearization Parameter > Dictionary": offset1 shall be the offset of the primary hint stream from the > beginning(!!!) of the file. > 4. All offsets in poppler, i.e. also the xref offets, are offsets from the > beginning of a file and never add some hypothetical offet of substreams > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > Created attachment 123296 [details] poppler-0.42.0_hints.patch Here is a more comprehensive patch and a sample application. It fixes the issues I have seen loading PDFs from a FileStream with startA != 0. Similar problems could exists in the other stream types. On Wed, Apr 27, 2016 at 9:44 AM, Ole Liabø <seksfemfire@gmail.com> wrote: > 1. I would like to withdraw my original patch - it does not solve all the > issues. Sorry for committing it prematurely. The original bug still remains > though. > 2. "The main stream of a PDF always starts with offset 0": This is true if > the filename PDFDoc constructor is used. But my bug refer to when the > PDFDoc is created with a BaseStream as input. This stream could have a > offset != 0, fex if you have concatenated many PDFS into one file. Today > the code does not handle this case and the use of this PDFDoc constructor > should be avoided until it's fixed. I'm working on a more comprehensive > patch. > > On Tue, Apr 26, 2016 at 12:21 PM, <bugzilla-daemon@freedesktop.org> wrote: > >> *Comment # 3 <https://bugs.freedesktop.org/show_bug.cgi?id=95061#c3> on >> bug 95061 <https://bugs.freedesktop.org/show_bug.cgi?id=95061> from Thomas >> Freitag <Thomas.Freitag@alfa.de> * >> >> I'm a little bit confused about it: >> 1. If I parse the PDF of comment 2 <https://bugs.freedesktop.org/show_bug.cgi?id=95061#c2> i.e. with /utils/pdftoppm -png -f 1 -l 1 >> -cropbox I don't get any error messages. >> 2. The main stream of a PDF always starts with offset 0 >> 3. The PDF spec says about the H entry in the "Linearization Parameter >> Dictionary": offset1 shall be the offset of the primary hint stream from the >> beginning(!!!) of the file. >> 4. All offsets in poppler, i.e. also the xref offets, are offsets from the >> beginning of a file and never add some hypothetical offet of substreams >> >> ------------------------------ >> You are receiving this mail because: >> >> - You reported the bug. >> >> > Created attachment 123297 [details]
popplertest.cpp
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/372. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 123139 [details] Respect parent stream start offset when creating substream The problem is that hints.cc create a substream from the original stream given to the PDFDoc class. The substream does not include the stream start offset in the original stream. To reproduce: 1. Create stream with startA != 0 2. Create PDFDoc with stream as output. 3. If PDF has hints tables it will output warnings: Syntax Warning: Failed parsing hints table object Syntax Warning: Failed to get object num from hint tables for page 1 Syntax Warning: Failed parsing page 1 using hint tables Syntax Warning: Failed to get object num from hint tables for page 1 Syntax Warning: Failed parsing page 1 using hint tables Syntax Warning: Failed to get object num from hint tables for page 1 The attached patch fixes the issue.