Bug 95061 - PDF hint table fails to load if PDFDoc is opened with stream start offset != 0
Summary: PDF hint table fails to load if PDFDoc is opened with stream start offset != 0
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-22 08:16 UTC by Ole Liabø
Modified: 2018-08-21 10:47 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Respect parent stream start offset when creating substream (813 bytes, text/plain)
2016-04-22 08:16 UTC, Ole Liabø
Details
attachment-27609-0.html (2.48 KB, text/html)
2016-04-27 07:44 UTC, Ole Liabø
Details
poppler-0.42.0_hints.patch (2.15 KB, text/x-patch)
2016-04-27 10:06 UTC, Ole Liabø
Details
popplertest.cpp (2.15 KB, text/x-c++src)
2016-04-27 10:06 UTC, Ole Liabø
Details

Description Ole Liabø 2016-04-22 08:16:11 UTC
Created attachment 123139 [details]
Respect parent stream start offset when creating substream

The problem is that hints.cc create a substream from the original stream given to the PDFDoc class. The substream does not include the stream start offset in the original stream.

To reproduce:
1. Create stream with startA != 0
2. Create PDFDoc with stream as output.
3. If PDF has hints tables it will output warnings:

Syntax Warning: Failed parsing hints table object
Syntax Warning: Failed to get object num from hint tables for page 1
Syntax Warning: Failed parsing page 1 using hint tables
Syntax Warning: Failed to get object num from hint tables for page 1
Syntax Warning: Failed parsing page 1 using hint tables
Syntax Warning: Failed to get object num from hint tables for page 1

The attached patch fixes the issue.
Comment 1 Albert Astals Cid 2016-04-25 22:56:13 UTC
Do you have a document that shows the need for this patch?
Comment 2 Ole Liabø 2016-04-26 09:10:40 UTC
This PDF can be used:

https://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf
Comment 3 Thomas Freitag 2016-04-26 10:21:30 UTC
I'm a little bit confused about it: 
1. If I parse the PDF of comment 2 i.e. with /utils/pdftoppm -png -f 1 -l 1 -cropbox I don't get any error messages.
2. The main stream of a PDF always starts with offset 0
3. The PDF spec says about the H entry in the "Linearization Parameter Dictionary": offset1 shall be the offset of the primary hint stream from the beginning(!!!) of the file.
4. All offsets in poppler, i.e. also the xref offets, are offsets from the beginning of a file and never add some hypothetical offet of substreams
Comment 4 Ole Liabø 2016-04-27 07:44:11 UTC
Created attachment 123292 [details]
attachment-27609-0.html

1. I would like to withdraw my original patch - it does not solve all the
issues. Sorry for committing it prematurely. The original bug still remains
though.
2. "The main stream of a PDF always starts with offset 0": This is true if
the filename PDFDoc constructor is used. But my bug refer to when the
PDFDoc is created with a BaseStream as input. This stream could have a
offset != 0, fex if you have concatenated many PDFS into one file. Today
the code does not handle this case and the use of this PDFDoc constructor
should be avoided until it's fixed. I'm working on a more comprehensive
patch.

On Tue, Apr 26, 2016 at 12:21 PM, <bugzilla-daemon@freedesktop.org> wrote:

> *Comment # 3 <https://bugs.freedesktop.org/show_bug.cgi?id=95061#c3> on
> bug 95061 <https://bugs.freedesktop.org/show_bug.cgi?id=95061> from Thomas
> Freitag <Thomas.Freitag@alfa.de> *
>
> I'm a little bit confused about it:
> 1. If I parse the PDF of comment 2 <https://bugs.freedesktop.org/show_bug.cgi?id=95061#c2> i.e. with /utils/pdftoppm -png -f 1 -l 1
> -cropbox I don't get any error messages.
> 2. The main stream of a PDF always starts with offset 0
> 3. The PDF spec says about the H entry in the "Linearization Parameter
> Dictionary": offset1 shall be the offset of the primary hint stream from the
> beginning(!!!) of the file.
> 4. All offsets in poppler, i.e. also the xref offets, are offsets from the
> beginning of a file and never add some hypothetical offet of substreams
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 5 Ole Liabø 2016-04-27 10:06:19 UTC
Created attachment 123296 [details]
poppler-0.42.0_hints.patch

Here is a more comprehensive patch and a sample application. It fixes the
issues I have seen loading PDFs from a FileStream with startA != 0. Similar
problems could exists in the other stream types.


On Wed, Apr 27, 2016 at 9:44 AM, Ole Liabø <seksfemfire@gmail.com> wrote:

> 1. I would like to withdraw my original patch - it does not solve all the
> issues. Sorry for committing it prematurely. The original bug still remains
> though.
> 2. "The main stream of a PDF always starts with offset 0": This is true if
> the filename PDFDoc constructor is used. But my bug refer to when the
> PDFDoc is created with a BaseStream as input. This stream could have a
> offset != 0, fex if you have concatenated many PDFS into one file. Today
> the code does not handle this case and the use of this PDFDoc constructor
> should be avoided until it's fixed. I'm working on a more comprehensive
> patch.
>
> On Tue, Apr 26, 2016 at 12:21 PM, <bugzilla-daemon@freedesktop.org> wrote:
>
>> *Comment # 3 <https://bugs.freedesktop.org/show_bug.cgi?id=95061#c3> on
>> bug 95061 <https://bugs.freedesktop.org/show_bug.cgi?id=95061> from Thomas
>> Freitag <Thomas.Freitag@alfa.de> *
>>
>> I'm a little bit confused about it:
>> 1. If I parse the PDF of comment 2 <https://bugs.freedesktop.org/show_bug.cgi?id=95061#c2> i.e. with /utils/pdftoppm -png -f 1 -l 1
>> -cropbox I don't get any error messages.
>> 2. The main stream of a PDF always starts with offset 0
>> 3. The PDF spec says about the H entry in the "Linearization Parameter
>> Dictionary": offset1 shall be the offset of the primary hint stream from the
>> beginning(!!!) of the file.
>> 4. All offsets in poppler, i.e. also the xref offets, are offsets from the
>> beginning of a file and never add some hypothetical offet of substreams
>>
>> ------------------------------
>> You are receiving this mail because:
>>
>>    - You reported the bug.
>>
>>
>
Comment 6 Ole Liabø 2016-04-27 10:06:19 UTC
Created attachment 123297 [details]
popplertest.cpp
Comment 7 GitLab Migration User 2018-08-21 10:47:08 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/372.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.