Summary: | pdfseparate + pdfunite produce different pdftoppm renderings | ||
---|---|---|---|
Product: | poppler | Reporter: | Albert Astals Cid <aacid> |
Component: | utils | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | Thomas.Freitag |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
said file
support output intents in pdfunite support output intents in pdfunite The second file support output intents, optional content and acroform in pdfunite support output intents, optional content and acroform in pdfunite The third file Don't change OC-references in image and xobject dictionaries with pdfunite Don't change OC references in image an xobject dictionaries with pdfunite |
Thomas you're the specialist in pdfseparate+pdfunite, can you have a look? (In reply to comment #1) > Thomas you're the specialist in pdfseparate+pdfunite, can you have a look? I put it on my TODO list. I have a first quick look into it, and I think, that I know, why it happens: the original file has a DestOutputProfile, so an Output intent. Since committing the patch for bug 34053 on 2th of october 2013 we respect that in pdftoppm during rendering. I would guess, that with a version before that commit the output would NOT be different. pdfseparate respects output intents, I remember I created a patch to do that. That is okay, because an output intent (defined in the catalog dict) is used for every page in a multi page pdf, so the pdftoppm output between hola and orig is NOT different. But what should pdfunite do if it merges pages with output intents? Examine if the output intents are always the same? Use the first or the last output intent for every page in the merged pdf? Okay, in your test You use pdfunite with a single page, so in this very special case we could insert in the output intent in the result pdf. But I would say this is not a bug but a feature of pdfunite. To clearify what I mean, this is the interisting part of the original pdf: 164 0 obj << /Type /Catalog /Pages 165 0 R/ViewerPreferences <</Direction /L2R >> /Metadata 162 0 R /AcroForm 132 0 R /OutputIntents [<</Info (ISO Coated) /S /GTS_PDFX /OutputConditionIdentifier (FOGRA27) /OutputCondition (Offsetdruck entsprechend ISO/DIS 12647-2:2003, OFCOM, Positivplatte, Papiertyp 1 oder 2 \(gestrichenes Kunstdruckpapier, 115 g/m2\), Rasterweite 60/cm.) /DestOutputProfile 161 0 R /Type /OutputIntent /RegistryName (http://www.color.org) >> ] >> endobj > Examine if the output intents are always the same?
I think this would be useful, this way i can pdfseparte all the pages of a pdf and then pdfunite some of them and still keep the same rendering.
I think this is a valid use case, no?
Created attachment 104884 [details] [review] support output intents in pdfunite I agree, it is a valid use case and here is the support of this use case. Created attachment 104885 [details] [review] support output intents in pdfunite Sorry, reviewing it I figured out a missing free(), here a correction. This indeed fixes the attached file, i have another fail that still fails, how do you want to proceed? Commit this, open a new bug and continue there? Attach the file here and only comit at the end? (In reply to Albert Astals Cid from comment #7) > This indeed fixes the attached file, i have another fail that still fails, > how do you want to proceed? Commit this, open a new bug and continue there? > Attach the file here and only comit at the end? Attach the said PDF here, then I first have a look at it. If it has a different reason as that what I tried to fix we can decide if we want to commit it in 2 steps, otherwise I would repair my patch first. Created attachment 107603 [details]
The second file
Here it comes, the buttons at the bottom get different renderings before and after separate+unite
I had a quick look at it. It has no OutputIntents, therefore it is definitely another problem. I would guess, it is the AcroForm, but I need more time to investigate. So if You don't care You can commit the patch with the OutputIntents and I'll attach an additional patch here when I'm able to fix the new problem. ok, commited! (In reply to Albert Astals Cid from comment #11) > ok, commited! Are You sure? Master doesn't contain my changes! I also don't find any commit message! Created attachment 108625 [details] [review] support output intents, optional content and acroform in pdfunite This handles not only output intents but alsooptional content and acroform in pdfunite Obviously did not commit it, don't remember what happened :S Well, i'll run the thing with your new patch and see how it goes :) This patch regresses https://bugsfiles.kde.org/attachment.cgi?id=40010 Whithout it, the separated+united file renders the same as the original one, with it the two buttons on the top right are missing on the render of the united file Created attachment 108833 [details] [review] support output intents, optional content and acroform in pdfunite Sorry, my first patch removes unintendly all(!) page annotations in pdfunite. This patch repairs it again! Ok, i'll delay testing this for the next release and get 0.28 out *now* It's late late late already anyway :/ Pushed the patch from comment 16. I'm trying to see if i can find another file that has the issue or we should just close the bug. Ok, found another one, attaching Created attachment 111221 [details]
The third file
Created attachment 111857 [details] [review] Don't change OC-references in image and xobject dictionaries with pdfunite Optional content properties are defined globally in the catalog. Therefore references to it in any image or xobject dictionary shouldn't be changed. This patch fixes it. I get a regression on the rendering of page 134 of http://launchpadlibrarian.net/26748206/2001_these_mezouar.pdf when using "Don't change OC-references in image and xobject dictionaries with pdfunite" when doing pdfseparate + pdfunite Can you confirm? (In reply to Albert Astals Cid from comment #22) > I get a regression on the rendering of page 134 of > http://launchpadlibrarian.net/26748206/2001_these_mezouar.pdf when using > "Don't change OC-references in image and xobject dictionaries with pdfunite" > when doing pdfseparate + pdfunite > > Can you confirm? Yes, I can confirm :-D The patch of comment 21 just looks for dict keys "OC" and treats it as optional content. But the PDF of comment 22 uses on page 134 a font named "OC" ;-) So in case of a key "OC" we have to look if it is really an optional content, which means that it points to an optional content group or an optional content membership dictionary. Created attachment 118266 [details] [review] Don't change OC references in image an xobject dictionaries with pdfunite This patch doesn't look only if the OC really references optional content but it also assures that the PDF uses optional content at all. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/628. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 103410 [details] said file Get the attached file pdfseparate file.pdf hola pdfunite hola hola2.pdf pdftoppm -png file.pdf orig pdftoppm -png hola2.pdf united diff orig-1.png united-1.png They are different :( For most of the files i have around this works, but in this one (and some others, but lets' go one by one) it doesn't