Hello, I am working with Poppler version 0.24.5 installed via Homebrew on Mavericks. I have a PDF that is 203 MB, and each page separated (there are hundreds) via pdfseparate comes out at 197MB. The data in the PDF is sensitive, so if someone could e-mail me at cory at plangrid dot com, I could send them a Dropbox link. Thanks! Cory Lorenz
Created attachment 97897 [details] [review] remove annotations from separated pages The problem is the usage of page annotations: Because page annotations have references to the page which have a reference to the parent and so on and so on all objects become marked and are written to the output file, and it takes a lot of time because we get endless loops and it takes a while after the endless loop detection fires. Pdfseparate is just a small tool to separate single pages, and I think we can live with the fact that annoations are removed. Since acroforms can reference in their fields also annotations, I removed the acroform entry, too. @Albert: if You want to test the patch yourself I think I can forward the mail from Cory.
I'm not so sure it is a good idea dropping annotations tbh, i mean annotations are part of the page, as a user i'd be surprised if i asked the tool to extract page 3 and then the annotations weren't there.
Hi Thomas, are you talking about pdfmarks style annotations (e.g. lines, curves, etc)? In that case, we would need to keep them.
Created attachment 97953 [details] [review] just remove page annotations which reference a page not separated I'd already feared that You give such comments. But I couldn't resist the temptation to make it easy for me :-) So here a solution which removes just those page annotations which reference not separated pages. It was much harder to implement, and even not easy to test it. But I can open the separated PDF with acrobat reader as well as with okular. But I can't see any differences when open them to those I created with my former patch, beside that acrobat reader gives me the form message if I opened the first separated page (but I can't see a form, but that's the same as if I open the original page). @ALbert: Can You also have a look at the results? I'll forward Cory's mail.
Looks good, and there's interesting size changes. Well done as always :) Now the interesting question, do we want this is stable as a bugfix or is it a feature and should land to master? I can see it both ways and i'm undecided.
(In reply to comment #5) > Looks good, and there's interesting size changes. > > Well done as always :) > > Now the interesting question, do we want this is stable as a bugfix or is it > a feature and should land to master? > > I can see it both ways and i'm undecided. How would You classify a solution for "100% CPU usage"? I think, this is comparable. But I would appreciate if You classify it as a bugfix. Since the changes only affects pdfseparate/pdfunite the risk is negligible and we get a earlier feedback if we need additional adjustments.
(In reply to comment #6) > (In reply to comment #5) > > Looks good, and there's interesting size changes. > > > > Well done as always :) > > > > Now the interesting question, do we want this is stable as a bugfix or is it > > a feature and should land to master? > > > > I can see it both ways and i'm undecided. > > How would You classify a solution for "100% CPU usage"? I think, this is > comparable. > > But I would appreciate if You classify it as a bugfix. Since the changes > only affects pdfseparate/pdfunite the risk is negligible and we get a > earlier feedback if we need additional adjustments. I honestly think it's not exactly the same, and even for a 100% cpu bug it may be more of a feature than a bugfix, but let's commit it to stable anyway :)
Thanks guys, that patch looks like it does the trick!
Hi, Sorry to reopen the issue, but a bunch of files came up displaying similar behavior. I e-mailed them to Thomas, and can send them to anyone else who is interested. Thanks!
Created attachment 101431 [details] [review] additonal handling for annotationas The problem are still page annotations, but in the new PDF these annotaions were not referenced in the page dictionnary itself but in one of the stream dictionnaries of content, so I now changed the markDictionnary to handle them everywhere correctly.
Created attachment 102369 [details] [review] additonal handling for annotationas Reviewed this patch once again and didn't like the usage of public class members anymore, Use them as parameter of different methods, even if this is some more work.
Thomas your patch only applies to master and not to poppler-0.26. Do you want me to apply it only to master or can you provide also a patch for poppler-0.26 ?
(In reply to comment #12) > Thomas your patch only applies to master and not to poppler-0.26. Do you > want me to apply it only to master or can you provide also a patch for > poppler-0.26 ? I don't know when I get time to have a look on this again, so for me it would be sufficient if You just commit it to master.
master it is then :) Pushed!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.