Bug 81724 - pdfseparate + pdfunite produce different pdftoppm renderings
Summary: pdfseparate + pdfunite produce different pdftoppm renderings
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: utils (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-24 23:35 UTC by Albert Astals Cid
Modified: 2018-08-21 11:20 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
said file (2.45 MB, text/plain)
2014-07-24 23:35 UTC, Albert Astals Cid
Details
support output intents in pdfunite (4.06 KB, patch)
2014-08-19 11:17 UTC, Thomas Freitag
Details | Splinter Review
support output intents in pdfunite (4.08 KB, patch)
2014-08-19 11:23 UTC, Thomas Freitag
Details | Splinter Review
The second file (81.52 KB, application/x-download)
2014-10-09 09:24 UTC, Albert Astals Cid
Details
support output intents, optional content and acroform in pdfunite (4.83 KB, patch)
2014-10-29 13:20 UTC, Thomas Freitag
Details | Splinter Review
support output intents, optional content and acroform in pdfunite (4.93 KB, patch)
2014-11-03 12:22 UTC, Thomas Freitag
Details | Splinter Review
The third file (176.12 KB, application/pdf)
2014-12-23 15:04 UTC, Albert Astals Cid
Details
Don't change OC-references in image and xobject dictionaries with pdfunite (773 bytes, patch)
2015-01-06 16:03 UTC, Thomas Freitag
Details | Splinter Review
Don't change OC references in image an xobject dictionaries with pdfunite (7.33 KB, patch)
2015-09-14 14:05 UTC, Thomas Freitag
Details | Splinter Review

Description Albert Astals Cid 2014-07-24 23:35:31 UTC
Created attachment 103410 [details]
said file

Get the attached file

pdfseparate file.pdf hola
pdfunite hola hola2.pdf
pdftoppm -png file.pdf orig
pdftoppm -png hola2.pdf united
diff orig-1.png united-1.png

They are different :(

For most of the files i have around this works, but in this one (and some others, but lets' go one by one) it doesn't
Comment 1 Albert Astals Cid 2014-07-24 23:37:07 UTC
Thomas you're the specialist in pdfseparate+pdfunite, can you have a look?
Comment 2 Thomas Freitag 2014-07-25 07:20:08 UTC
(In reply to comment #1)
> Thomas you're the specialist in pdfseparate+pdfunite, can you have a look?

I put it on my TODO list.
Comment 3 Thomas Freitag 2014-08-18 11:38:41 UTC
I have a first quick look into it, and I think, that I know, why it happens: the original file has a DestOutputProfile, so an Output intent. Since committing the patch for bug 34053 on 2th of october 2013 we respect that in pdftoppm during rendering. I would guess, that with a version before that commit the output would NOT be different.

pdfseparate respects output intents, I remember I created a patch to do that. That is okay, because an output intent (defined in the catalog dict) is used for every page in a multi page pdf, so the pdftoppm output between hola and orig is NOT different.

But what should pdfunite do if it merges pages with output intents? Examine if the output intents are always the same? Use the first or the last output intent for every page in the merged pdf?

Okay, in your test You use pdfunite with a single page, so in this very special case we could insert in the output intent in the result pdf. But I would say this is not a bug but a feature of pdfunite.

To clearify what I mean, this is the interisting part of the original pdf:

164 0 obj
<< /Type /Catalog /Pages 165 0 R/ViewerPreferences <</Direction /L2R >>  /Metadata 162 0 R  /AcroForm 132 0 R  /OutputIntents [<</Info (ISO Coated) /S /GTS_PDFX /OutputConditionIdentifier (FOGRA27) /OutputCondition (Offsetdruck entsprechend ISO/DIS 12647-2:2003, OFCOM,  Positivplatte, Papiertyp 1 oder 2 \(gestrichenes Kunstdruckpapier, 115 g/m2\), Rasterweite 60/cm.) /DestOutputProfile 161 0 R /Type /OutputIntent /RegistryName (http://www.color.org) >> ] >>
endobj
Comment 4 Albert Astals Cid 2014-08-18 20:36:18 UTC
> Examine if the output intents are always the same?

I think this would be useful, this way i can pdfseparte all the pages of a pdf and then pdfunite some of them and still keep the same rendering.

I think this is a valid use case, no?
Comment 5 Thomas Freitag 2014-08-19 11:17:51 UTC
Created attachment 104884 [details] [review]
support output intents in pdfunite

I agree, it is a valid use case and here is the support of this use case.
Comment 6 Thomas Freitag 2014-08-19 11:23:12 UTC
Created attachment 104885 [details] [review]
support output intents in pdfunite

Sorry, reviewing it I figured out a missing free(), here a correction.
Comment 7 Albert Astals Cid 2014-10-08 20:40:30 UTC
This indeed fixes the attached file, i have another fail that still fails, how do you want to proceed? Commit this, open a new bug and continue there? Attach the file here and only comit at the end?
Comment 8 Thomas Freitag 2014-10-09 07:18:42 UTC
(In reply to Albert Astals Cid from comment #7)
> This indeed fixes the attached file, i have another fail that still fails,
> how do you want to proceed? Commit this, open a new bug and continue there?
> Attach the file here and only comit at the end?

Attach the said PDF here, then I first have a look at it. If it has a different reason as that what I tried to fix we can decide if we want to commit it in 2 steps, otherwise I would repair my patch first.
Comment 9 Albert Astals Cid 2014-10-09 09:24:34 UTC
Created attachment 107603 [details]
The second file

Here it comes, the buttons at the bottom get different renderings before and after separate+unite
Comment 10 Thomas Freitag 2014-10-09 09:55:53 UTC
I had a quick look at it. It has no OutputIntents, therefore it is definitely another problem. I would guess, it is the AcroForm, but I need more time to investigate.
So if You don't care You can commit the patch with the OutputIntents and I'll attach an additional patch here when I'm able to fix the new problem.
Comment 11 Albert Astals Cid 2014-10-09 20:28:52 UTC
ok, commited!
Comment 12 Thomas Freitag 2014-10-29 11:14:04 UTC
(In reply to Albert Astals Cid from comment #11)
> ok, commited!

Are You sure? Master doesn't contain my changes! I also don't find any commit message!
Comment 13 Thomas Freitag 2014-10-29 13:20:12 UTC
Created attachment 108625 [details] [review]
support output intents, optional content and acroform in pdfunite

This handles not only output intents but alsooptional content and acroform in pdfunite
Comment 14 Albert Astals Cid 2014-10-29 20:41:09 UTC
Obviously did not commit it, don't remember what happened :S

Well, i'll run the thing with your new patch and see how it goes :)
Comment 15 Albert Astals Cid 2014-11-03 10:39:49 UTC
This patch regresses https://bugsfiles.kde.org/attachment.cgi?id=40010

Whithout it, the separated+united file renders the same as the original one, with it the two buttons on the top right are missing on the render of the united file
Comment 16 Thomas Freitag 2014-11-03 12:22:47 UTC
Created attachment 108833 [details] [review]
support output intents, optional content and acroform in pdfunite

Sorry, my first patch removes unintendly all(!) page annotations in pdfunite.

This patch repairs it again!
Comment 17 Albert Astals Cid 2014-11-03 18:33:50 UTC
Ok, i'll delay testing this for the next release and get 0.28 out *now*

It's late late late already anyway :/
Comment 18 Albert Astals Cid 2014-12-23 14:56:37 UTC
Pushed the patch from comment 16. I'm trying to see if i can find another file that has the issue or we should just close the bug.
Comment 19 Albert Astals Cid 2014-12-23 15:03:53 UTC
Ok, found another one, attaching
Comment 20 Albert Astals Cid 2014-12-23 15:04:58 UTC
Created attachment 111221 [details]
The third file
Comment 21 Thomas Freitag 2015-01-06 16:03:49 UTC
Created attachment 111857 [details] [review]
Don't change OC-references in image and xobject dictionaries with pdfunite

Optional content properties are defined globally in the catalog. Therefore references to it in any image or xobject dictionary shouldn't be changed.

This patch fixes it.
Comment 22 Albert Astals Cid 2015-09-11 20:04:16 UTC
I get a regression on the rendering of page 134 of http://launchpadlibrarian.net/26748206/2001_these_mezouar.pdf when using "Don't change OC-references in image and xobject dictionaries with pdfunite" when doing pdfseparate + pdfunite

Can you confirm?
Comment 23 Thomas Freitag 2015-09-14 14:02:36 UTC
(In reply to Albert Astals Cid from comment #22)
> I get a regression on the rendering of page 134 of
> http://launchpadlibrarian.net/26748206/2001_these_mezouar.pdf when using
> "Don't change OC-references in image and xobject dictionaries with pdfunite"
> when doing pdfseparate + pdfunite
> 
> Can you confirm?

Yes, I can confirm :-D
The patch of comment 21 just looks for dict keys "OC" and treats it as optional content. But the PDF of comment 22 uses on page 134 a font named "OC" ;-)
So in case of a key "OC" we have to look if it is really an optional content, which means that it points to an optional content group or an optional content membership dictionary.
Comment 24 Thomas Freitag 2015-09-14 14:05:42 UTC
Created attachment 118266 [details] [review]
Don't change OC references in image an xobject dictionaries with pdfunite

This patch doesn't look only if the OC really references optional content but it also assures that the PDF uses optional content at all.
Comment 25 GitLab Migration User 2018-08-21 11:20:24 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/628.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.