Bug 97068

Summary: document IDs in trailer dict should be printed as hex strings
Product: poppler Reporter: Jakub Alba <jakubalba>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: jakubalba, Thomas.Freitag
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: example PDF
setprops.c
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier
[PATCH 2/2] init new file identifier object as a hex string
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier
[PATCH 2/2] treat file identifier as a hex string, not a basic string
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier
[PATCH 2/2] treat file identifier as a hex string, not a basic string

Description Jakub Alba 2016-07-24 21:14:32 UTC
Created attachment 125297 [details]
example PDF

Example 3.7 in the PDF reference v7 (page 98) shows an example of a trailer dictionary. Two file identifiers are printed as hex strings and this is what non-poppler PDF software usually does. Poppler now reads hex strings correctly, but then treats like ordinary strings and prints them as ordinary strings. Because of that e.g. even though after an incremental update the first file identifier should stay the same, what actually gets printed is junk. And it is printed between round brackets instead of angle brackets (as it is supposed to be printed). An example:

1. Compile the attached "setprops.c" with poppler-glib from the git tree.
I do it with: gcc -o setprops setprops.c -I/poppler/install/path/include/poppler/glib/ -L/poppler/install/path/lib/ -lpoppler -lpoppler-glib $(pkg-config --cflags --libs gio-2.0 poppler-cairo)
2. Run "LD_PRELOAD=/poppler/install/path/lib/libpoppler.so:/poppler/install/path/lib/libpoppler-glib.so ./setprops path_to_pdf", where path_to_pdf is a path to the attached PDF file.
3. Run "tail path_to_pdf"
4. Run "tail path_to_pdf.new"

Output of the 3rd step:

<<
/Info 718 0 R
/ID [<4e7fff73ae86ee9a299d94b54d10c453> <8b805dbbee6d2d4b94948ad9307e4328>]
/Root 643 0 R
/Size 719
>>
startxref
921857
%%EOF

Output of the 4th step:
xref
0 1
0000000001 65535 f
718 1
0000936400 00000 n
trailer
<</Size 719 /ID [(Ns)M) ( y[NS|@7) ] /Root 643 0 R /Prev 921857 /Info 718 0 R >> 
startxref
936697
%%EOF

Do the same with my patches (remember to recompile setprops.c) and the output of the 4th step is:
xref
0 1
0000000001 65535 f
718 1
0000936400 00000 n
trailer
<</Size 719 /ID [<4e7fff73ae86ee9a299d94b54d10c453><42d9ded05c067ac3a02ac0807ee11b6d>] /Root 643 0 R /Prev 921857 /Info 718 0 R >> 
startxref
936697
%%EOF

Looks fine and the first file identifier remained untouched (as it is supposed to).
Comment 1 Jakub Alba 2016-07-24 21:15:13 UTC
Created attachment 125298 [details]
setprops.c
Comment 2 Jakub Alba 2016-07-24 21:16:07 UTC
Created attachment 125299 [details] [review]
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier
Comment 3 Jakub Alba 2016-07-24 21:16:37 UTC
Created attachment 125300 [details] [review]
[PATCH 2/2] init new file identifier object as a hex string
Comment 4 Jakub Alba 2016-07-24 21:31:14 UTC
Created attachment 125301 [details] [review]
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier

Oops, forgot about the copyright info in the previous 1/2 patch.
Comment 5 Jakub Alba 2016-07-26 22:35:18 UTC
Created attachment 125343 [details] [review]
[PATCH 2/2] treat file identifier as a hex string, not a basic string

Fixed other parts of poppler to use hex strings when dealing with file identifiers.
Comment 6 Albert Astals Cid 2016-07-30 15:53:49 UTC
Please add the enum to the end not in the middle
Comment 7 Albert Astals Cid 2016-07-30 15:54:53 UTC
Thomas, this looks reasonable to me but since you wrote part of the "pdf writing" code i'd appreciate a quick review from your side if you have time.
Comment 8 Jakub Alba 2016-07-30 16:08:00 UTC
Created attachment 125439 [details] [review]
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier

(In reply to Albert Astals Cid from comment #6)
> Please add the enum to the end not in the middle

Done.
Comment 9 Albert Astals Cid 2016-07-30 16:13:36 UTC
Comment on attachment 125439 [details] [review]
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier

objTypeNames diff is wrong now i guess?
Comment 10 Jakub Alba 2016-07-30 16:22:22 UTC
Created attachment 125440 [details] [review]
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier

Yes. Sorry for that. Fixed.
Comment 11 Thomas Freitag 2016-08-02 11:37:46 UTC
(In reply to Albert Astals Cid from comment #7)
> Thomas, this looks reasonable to me but since you wrote part of the "pdf
> writing" code i'd appreciate a quick review from your side if you have time.

The patches are reasonable and look good for me
Comment 12 Albert Astals Cid 2016-08-02 21:30:08 UTC
After having a look at the first patch, i was wondering if we really need the "hex" variable in the union, can't we just use string?
Comment 13 Jakub Alba 2016-08-02 21:52:42 UTC
Created attachment 125498 [details] [review]
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier

(In reply to Albert Astals Cid from comment #12)
> After having a look at the first patch, i was wondering if we really need
> the "hex" variable in the union, can't we just use string?

I added it, because I'd had some problems when writing this code, but now, when I checked once again, it seems that using string is perfectly OK.
Comment 14 Jakub Alba 2016-10-22 18:57:59 UTC
Created attachment 127476 [details] [review]
[PATCH 1/2] introduced hex string as a new Object type and used it for file identifier
Comment 15 Jakub Alba 2016-10-22 18:58:35 UTC
Created attachment 127477 [details] [review]
[PATCH 2/2] treat file identifier as a hex string, not a basic string
Comment 16 Jakub Alba 2016-10-22 18:59:20 UTC
I've updated my name and e-mail in the copyright info and commit author info.
Comment 17 Albert Astals Cid 2016-11-25 10:23:44 UTC
(In reply to Jakub Alba from comment #13)
> Created attachment 125498 [details] [review] [review]
> [PATCH 1/2] introduced hex string as a new Object type and used it for file
> identifier
> 
> (In reply to Albert Astals Cid from comment #12)
> > After having a look at the first patch, i was wondering if we really need
> > the "hex" variable in the union, can't we just use string?
> 
> I added it, because I'd had some problems when writing this code, but now,
> when I checked once again, it seems that using string is perfectly OK.

So can you please update the patches to use string?
Comment 18 Jakub Alba 2016-11-26 11:14:01 UTC
(In reply to Albert Astals Cid from comment #17)
> (In reply to Jakub Alba from comment #13)
> > Created attachment 125498 [details] [review] [review] [review]
> > [PATCH 1/2] introduced hex string as a new Object type and used it for file
> > identifier
> > 
> > (In reply to Albert Astals Cid from comment #12)
> > > After having a look at the first patch, i was wondering if we really need
> > > the "hex" variable in the union, can't we just use string?
> > 
> > I added it, because I'd had some problems when writing this code, but now,
> > when I checked once again, it seems that using string is perfectly OK.
> 
> So can you please update the patches to use string?

They do use string (the variable in the union). They just still introduce a new enum element just to carry the information if the string is a hex string. Or am I missing something?
Comment 19 Albert Astals Cid 2016-11-26 16:34:02 UTC
You're right sorry, i forgot what this was about :D

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.