Summary: | Add new PDFWriter class for writing PDF with printing options | ||
---|---|---|---|
Product: | poppler | Reporter: | Adrian Johnson <ajohnson> |
Component: | general | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | bugzilla, felix+freedesktop, mkasik, rishi.is, till.kamppeter |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Add PDFWriter
pdftopdf test program Add DeflateStream glib API for PDFWriter encrypted document fix compress decrypted streams fix typo Use consistent line ending in PDFDoc Add PDFWriter glib API for PDFWriter PDFWriter changes to address review comments Font embedding Use of multiple PDFs by single PDFWriter 1 - Add PDFWriter 2 - pdftopdf test program 3 - Add DeflateStream 4 - glib API for PDFWriter 5 - encrypted document fix 6 - compress decrypted streams 7 - fix typo 8 - Use consistent line ending in PDFDoc 9 - gitignore 10 multiple documents |
Created attachment 88592 [details] [review] pdftopdf test program This patch is a test program for the PDFWriter class. Created attachment 88593 [details] [review] Add DeflateStream This patch adds a DeflateStream output stream filter for compressing the XObject streams in PDFWriter. Created attachment 88594 [details] [review] glib API for PDFWriter glib api for writing PDFs with printing options Created attachment 88595 [details] [review] encrypted document fix I'm not sure if this patch is correct. It seems to fix the problem with PDFWriter and encrypted documents. Created attachment 88596 [details] [review] compress decrypted streams This uses DeflateStream to recompress streams that have been decrypted. Created attachment 88597 [details] [review] fix typo Fix the incorrect spelling of 'Dictionary' in PDFDoc. Created attachment 88598 [details] [review] Use consistent line ending in PDFDoc Use only '\n' in PDFDoc. Created attachment 88957 [details] [review] Add PDFWriter Fix a bug. Created attachment 88958 [details] [review] glib API for PDFWriter Fix some bugs in glib api. I've tested this api with evince. https://bugzilla.gnome.org/show_bug.cgi?id=711761 contains the patches. Comment on attachment 88957 [details] [review] Add PDFWriter Review of attachment 88957 [details] [review]: ----------------------------------------------------------------- This looks great to me. There are several memory leaks and some other things that could be improved a bit. ::: poppler/PDFWriter.cc @@ +62,5 @@ > + pages.push_back(page); > +} > + > +struct PageScale { > + int paper; This can be unsigned @@ +104,5 @@ > + std::swap(width, height); > + > + // find smallest paper size that will fit this page > + std::priority_queue<PageScale> queue; > + for (int i = 0; i < (int)paperSizes.size(); i++) { Why not using unsigned for i instead of casting? @@ +186,5 @@ > + mediaSize->y2 = paperSizes[0].width; > + margins->x1 = paperSizes[0].bottom; > + margins->x2 = mediaSize->y2 - paperSizes[0].top; > + margins->y1 = paperSizes[0].left; > + margins->y2 = mediaSize->x2 - paperSizes[0].right; This block is repeated a lot, I wonder if we could add a helper function, something like setSizeAndMargins(PDFRectangle *mediaSize, PDFRectangle *margins, double with, double height, double top, double bottom, double left, double right); I could be static inline @@ +376,5 @@ > +void PDFWriter::writePageObject(int pageNum, int copy, PDFRectangle *mediaSize) > +{ > + PDFRectangle margins; > + Matrix ctm; > + Object ctmObj; This looks unused. @@ +432,5 @@ > + obj2.free(); > + } > + } else { > + writeObject(&contentsObj); > + } contentsObj.free(); @@ +445,5 @@ > + outputKey = gFalse; > + break; > + } > + excludeKey++; > + } I think this could be easier to read if you use a helper function to check whether the entry should be excluded: if (excludePageEntry(keyName)) continue; @@ +453,5 @@ > + writeObject(pageDict->getValNF(i, &obj1)); > + outputStr->printf("\n"); > + obj1.free(); > + } > + } pageObj.free(); @@ +490,5 @@ > + outputStr->printf("/Subtype /Form\n"); > + > + pageDict->lookup("MediaBox", &obj); > + outputStr->printf("/BBox "); > + writeObject(&obj); obj.free(); @@ +494,5 @@ > + writeObject(&obj); > + outputStr->printf("\n"); > + > + pageDict->lookup("Resources", &obj); > + if (!obj.isNull()) { This could be a single line: if (!pageDict->lookup("Resources", &obj)->isNull()) ... @@ +498,5 @@ > + if (!obj.isNull()) { > + outputStr->printf("/Resources "); > + writeObject(&obj); > + outputStr->printf("\n"); > + } obj.free(); @@ +501,5 @@ > + outputStr->printf("\n"); > + } > + > + pageDict->lookup("Group", &obj); > + if (!obj.isNull()) { This could be a single line too. @@ +505,5 @@ > + if (!obj.isNull()) { > + outputStr->printf("/Group "); > + writeObject(&obj); > + outputStr->printf("\n"); > + } obj.free(); @@ +552,5 @@ > + Stream *str = obj.getStream(); > + writeStream(str); > + } else { > + error(errSyntaxError, -1, "Weird page contents"); > + } obj.free(); @@ +586,5 @@ > + outputStr->printf("endobj\n"); > + > + beginIndirectObject(&resourcesRef); > + outputStr->printf("<< /XObject <<\n"); > + for (int i = 0; i < (int)xobjectRefs.size(); i++) You could use unsigned for i here too. @@ +592,5 @@ > + outputStr->printf(">> >>\n"); > + outputStr->printf("endobj\n"); > + > + GooString content; > + for (int i = 0; i < (int)xobjectRefs.size(); i++) { Ditto. @@ +632,5 @@ > + outputStr->printf("/Kids ["); > + if (collate) { > + for (int cp = 0; cp < copies; cp++) { > + if (reverse) { > + for (int pg = (int)pageRefs[0].size() - 1; pg >= 0; pg--) Ditto. @@ +635,5 @@ > + if (reverse) { > + for (int pg = (int)pageRefs[0].size() - 1; pg >= 0; pg--) > + outputStr->printf(" %d %d R", pageRefs[cp][pg].num, pageRefs[cp][pg].gen); > + } else { > + for (int pg = 0; pg < (int)pageRefs[0].size(); pg++) Ditto. @@ +641,5 @@ > + } > + } > + } else { > + if (reverse) { > + for (int pg = (int)pageRefs[0].size() - 1; pg >= 0; pg--) { Ditto. @@ +646,5 @@ > + for (int cp = 0; cp < copies; cp++) > + outputStr->printf(" %d %d R", pageRefs[cp][pg].num, pageRefs[cp][pg].gen); > + } > + } else { > + for (int pg = 0; pg < (int)pageRefs[0].size(); pg++) Ditto. @@ +662,5 @@ > + PDFRectangle mediaSize; > + int outputPageNum; > + int nupPageNum; // 0..numberUp-1 > + int pageNum; > + Object pageObj; This is unused. @@ +677,5 @@ > + // are not marked as they will be included in the XObjects created for > + // each page. > + outputPageNum = 1; > + nupPageNum = 0; > + for (int i = 0; i < (int)pages.size(); i++) { Ditto. @@ +686,5 @@ > + Object pageObj; > + Ref *refPage = doc->getCatalog()->getPageRef(pageNum); > + doc->getXRef()->fetch(refPage->num, refPage->gen, &pageObj); > + Dict *pageDict = pageObj.getDict(); > + doc->markPageObjects(pageDict, yRef, countRef, 0, numberUp > 1 ? gFalse : gTrue); You don't need a variable for this, you can just pass the dict to the function: doc->markPageObjects(pageObj.getDict(), ....); and then pageObj.free(); @@ +711,5 @@ > + pageRefs.push_back(refs); > + } > + outputPageNum = 1; > + nupPageNum = 0; > + for (int i = 0; i < (int)pages.size(); i++) { unsigned @@ +756,5 @@ > + delete trailerDict; > + delete yRef; > + delete countRef; > + > + outputStr->close(); delete outputStr; @@ +757,5 @@ > + delete yRef; > + delete countRef; > + > + outputStr->close(); > +} I think we could make this function more generic by writing a stream, and then add writeFile, that creates a FileOutpoutStream and calls the generic one passing the stream. That way frontends can add a method to write to a stream instead of creating a file in disk. ::: poppler/PDFWriter.h @@ +27,5 @@ > + > + // order to layout multiple pages on a sheet. > + // LR_TB = Left to Right then Top to Bottom > + // LR_BT = Left to Right then Bottom to Top etc > + enum NumberUpOrder { LR_TB, LR_BT, RL_TB, RL_BT, TB_LR, TB_RL, BT_LR, BT_RL }; Even if it's a bit longer I think it's easier to read if you use full words instead of abbreviations, and you don't need the comment explaining what the abbreviations stand for. ::: poppler/Stream.cc @@ +371,5 @@ > +void OutStream::format(const char *format, ...) > +{ > + va_list argptr; > + va_start (argptr, format); > + GooString *s = GooString::formatv(format, argptr); Can we do something like this? GooString s; s.appendfv(format, argptr); This way we avoid the alloc/dealloc of the GooString. Created attachment 89020 [details] [review] PDFWriter changes to address review comments I've addressed most of the review comments and have attached a patch with the changes. I'll wait for the other patches to be reviewed before rolling this patch back into the original PDFWriter patch so I only have to rebase all the patches once. The comments that I didn't fix are: 1) >@@ +186,5 @@ >> + mediaSize->y2 = paperSizes[0].width; >> + margins->x1 = paperSizes[0].bottom; >> + margins->x2 = mediaSize->y2 - paperSizes[0].top; >> + margins->y1 = paperSizes[0].left; >> + margins->y2 = mediaSize->x2 - paperSizes[0].right; > >This block is repeated a lot, I wonder if we could add a helper function, > something like setSizeAndMargins(PDFRectangle *mediaSize, PDFRectangle > *margins, double with, double height, double top, double bottom, double left, > double right); I could be static inline The problem I had with this is we will have a function with six parameters. When looking at a call to this function it will be hard to keep track which parameter does what. In the example quoted above the parameter list will end up being quite long so I don't see the value in turning it into a function. 2) signed vs unsigned > > + for (int i = 0; i < (int)paperSizes.size(); i++) { > Why not using unsigned for i instead of casting? I prefer to always use signed integers unless I am doing modular arithmetic or bit manipulation. Mixing signed and unsigned can produce unexpected results. So it is better to just use signed all the time. The cast adds an extra 5 characters but then so does replacing 'int' with 'unsigned'. >> + for (int pg = (int)pageRefs[0].size() - 1; pg >= 0; pg--) { > Ditto. If this is changed to unsigned the loop never exits. (In reply to comment #7) > Created attachment 88598 [details] [review] [review] > Use consistent line ending in PDFDoc > > Use only '\n' in PDFDoc. I need an english native speaker here :D ************ The carriage return (CR) and line feed (LF) characters, also called newline characters, are treated as end-of-line (EOL) markers. The combination of a carriage return followed immediately by a line feed is treated as one EOL marker. For the most part, EOL markers are treated the same as any other white-space characters. However, sometimes an EOL marker is required or recommended— that is, the following token must appear at the beginning of a line. ************ Is it saying that only a CRLF is a EOL marker or that all CR, LF or CRLF are 1 EOL marker? (In reply to comment #2) > Created attachment 88593 [details] [review] [review] > Add DeflateStream > > This patch adds a DeflateStream output stream filter for compressing the > XObject streams in PDFWriter. You are missing the CMake part and also you need to fail safely if zlib is not avaliable, no? Does this allow to write a pdf totally uncompressed? It's really helpful when you want to debug some files and at the moment i have a nasty patch around that does it, but it'd be cool if pdftopdf does it :=) >> The carriage return (CR) and line feed (LF) characters, also called newline
>> characters, are treated as end-of-line (EOL) markers. The combination of a
>> carriage return followed immediately by a line feed is treated as one EOL marker.
b> Is it saying that only a CRLF is a EOL marker or that all CR, LF or
b> CRLF are 1 EOL marker?
The latter.
(In reply to comment #14) > Does this allow to write a pdf totally uncompressed? It's really helpful > when you want to debug some files and at the moment i have a nasty patch > around that does it, but it'd be cool if pdftopdf does it :=) It should be easy to add this option. I'll look at it after I get the embedding of non-embedded fonts working as this is required for printing. Created attachment 89404 [details] [review] Font embedding This patch adds support for embedding non-embedded fonts. Currently it only supports Type1 fonts. The remaining issues I'm still working on are: - embedding of other font types - disable DeflateStream when zlib not available - cmake support for DeflateStream - pdftopdf option to uncompress streams Created attachment 89762 [details] [review] Use of multiple PDFs by single PDFWriter Hi, regarding https://bugs.freedesktop.org/show_bug.cgi?id=69965#c5, I've added ability to combine multiple PDFs in one PDFWriter. It is done mainly by change of addition of pages. They are added as "Page *page" in "addPage(Page *page)" now instead of "int page". It is also possible to add whole documents by "addDoc(PDFDoc *doc)". I've added corresponding glib API. Regards Marek (In reply to comment #18) > for (int i = 0; i < (int)pages.size(); i++) { >- pageNum = pages[i]; >+ pageNum = pages[i]->getNum(); >+ doc = pages[i]->getDoc(); > if (pageSet == ALL || > (pageSet == ODD && outputPageNum % 2 == 1) || > (pageSet == EVEN && outputPageNum % 2 == 0)) { > Object pageObj; > Ref *refPage = doc->getCatalog()->getPageRef(pageNum); > doc->getXRef()->fetch(refPage->num, refPage->gen, &pageObj); >- markPageObjects(pageObj.getDict(), numberUp > 1 ? gFalse : gTrue); >+ markPageObjects(doc, pageObj.getDict(), numberUp > 1 ? gFalse : gTrue); > if (fontEmbedRequired) >- replaceNonEmbeddedFonts(pageObj.getDict(), &pageObj, *refPage); >+ replaceNonEmbeddedFonts(doc, pageObj.getDict(), &pageObj, *refPage); > pageObj.free(); > } I don't see how this works. The objects from each document will be mixed together and overwrite each other. All pages from each document need to be marked and written separately with an offset applied to the object numbering of each document to prevent the object numbers from clashing. I have already updated my patches to include multiple documents but am still reworking the font embedding patch. I wanted to get it all working before reposting but I'll go ahead and post what I already have working. Created attachment 89774 [details] [review] 1 - Add PDFWriter Created attachment 89775 [details] [review] 2 - pdftopdf test program Created attachment 89776 [details] [review] 3 - Add DeflateStream Created attachment 89777 [details] [review] 4 - glib API for PDFWriter Created attachment 89778 [details] [review] 5 - encrypted document fix Created attachment 89779 [details] [review] 6 - compress decrypted streams Created attachment 89780 [details] [review] 7 - fix typo Created attachment 89781 [details] [review] 8 - Use consistent line ending in PDFDoc Created attachment 89782 [details] [review] 9 - gitignore Created attachment 89784 [details] [review] 10 multiple documents The glib API in this patch needs more work. It looks like the glib API from Marek's patch could be used. I'm not sure whether the addPage should take PopplerPage or a page number. I avoided using PopplerPage as I thought this would be less efficient. I'm still reworking the font embedding patch so the current patch will not apply. I'm also thinking of changing the embed fonts boolean to a choice of none/all/only embed non standard14 fonts. (In reply to comment #19) > (In reply to comment #18) > > for (int i = 0; i < (int)pages.size(); i++) { > >- pageNum = pages[i]; > >+ pageNum = pages[i]->getNum(); > >+ doc = pages[i]->getDoc(); > > if (pageSet == ALL || > > (pageSet == ODD && outputPageNum % 2 == 1) || > > (pageSet == EVEN && outputPageNum % 2 == 0)) { > > Object pageObj; > > Ref *refPage = doc->getCatalog()->getPageRef(pageNum); > > doc->getXRef()->fetch(refPage->num, refPage->gen, &pageObj); > >- markPageObjects(pageObj.getDict(), numberUp > 1 ? gFalse : gTrue); > >+ markPageObjects(doc, pageObj.getDict(), numberUp > 1 ? gFalse : gTrue); > > if (fontEmbedRequired) > >- replaceNonEmbeddedFonts(pageObj.getDict(), &pageObj, *refPage); > >+ replaceNonEmbeddedFonts(doc, pageObj.getDict(), &pageObj, *refPage); > > pageObj.free(); > > } > > I don't see how this works. The objects from each document will be mixed > together and overwrite each other. All pages from each document need to be > marked and written separately with an offset applied to the object numbering > of each document to prevent the object numbers from clashing. I checked just order of pages, I forgot to look at individual streams. > I have already updated my patches to include multiple documents but am still > reworking the font embedding patch. I wanted to get it all working before > reposting but I'll go ahead and post what I already have working. Thank you. Patches still not finished, right? (In reply to comment #32) > Patches still not finished, right? Yes, still working on it. *** Bug 69965 has been marked as a duplicate of this bug. *** (In reply to Adrian Johnson from comment #33) > (In reply to comment #32) > > Patches still not finished, right? > > Yes, still working on it. Hey Adrian, are you still working on it? This could would be very useful to write a PDF modification app (similar to PDFMod or PDF-shuffler). (In reply to Bastien Nocera from comment #35) > Hey Adrian, are you still working on it? This could would be very useful to > write a PDF modification app (similar to PDFMod or PDF-shuffler). Yes I am intending to finish this. Just need to find the time to work on it. Adrian, any chance that you finish this? It would be great for using in cups-filters to flatten out filled PDF forms. See https://bugs.linuxfoundation.org/show_bug.cgi?id=1315 Yes, I still intend to finish this. Really still any plan to finish this, Adrian? There was no new comment for more than a year. I'll revisit these patches. Maybe it would be easier to incrementally merge this functionality instead of trying to get everything working all at once. Last I recall I was trying to get the embedding of non-embedded fonts working. But this feature could be added later. +1 merging in small steps is always a good idea. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/174. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 88591 [details] Add PDFWriter In order to get evince to print in PDF format without restricting the output to only what cairo supports we need a way to rewrite the PDF with the print settings from the print dialog applied. This will avoid stripping color management from the PDF, avoid cairo bugs, and result in much faster printing. The first patch adds a new PDFWriter class for writing the PDF file with the selected printing options. These options include, page selection, number of copies, collation, number-up, scaling, shrink/fit page, paper size selection, orientation etc.