Summary: | pack as many glyphs as possible in each cairo_show_glyphs() call | ||
---|---|---|---|
Product: | poppler | Reporter: | Pablo Rodríguez <freedesktop> |
Component: | cairo backend | Assignee: | Kristian Høgsberg <krh> |
Status: | RESOLVED FIXED | QA Contact: | cairo-bugs mailing list <cairo-bugs> |
Severity: | major | ||
Priority: | medium | CC: | carlosgc, freedesktop |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
URL: | http://ousia.iespana.es/pdf/tesis.utf8.pdf | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Pablo Rodríguez
2007-07-15 10:09:49 UTC
We don't actually use cairo for printing in poppler, but it would be a good idea. Not sure how the sizes are going to compare in that case, but it's worth a try. I'm moving this bug to poppler, but it's probably as much an evince bug/feature. Thanks for the answer, Kristian. Are you sure that evince-0.9.2 (with poppler-0.5.9 + cairo-backend) doesn't use cairo for printing to PDF files? I printed a dissertation using evince and the output is 4.5 times bigger as the original file (this is the worst case I've seen): $ ls -lh output.pdf TesisRosa-B5-Uni.pdf -rw-r--r-- 1 ousia ousia 6,3M Jul 16 21:43 output.pdf -rw-r--r-- 1 ousia ousia 1,4M Dec 1 2006 TesisRosa-B5-Uni.pdf The PDF document shows that the creator and generator is cairo-1.4.10: $ pdfinfo output.pdf Creator: cairo 1.4.10 (http://cairographics.org) Producer: cairo 1.4.10 (http://cairographics.org) Tagged: no Pages: 366 Encrypted: no Page size: 595.276 x 841.89 pts (A4) File size: 6581079 bytes Optimized: no PDF version: 1.4 And those CairoFonts should have been renamed by cairo itself. $ pdffonts output.pdf name type emb sub uni object ID ------------------------------------ ----------------- --- --- --- --------- CairoFont-8-0 CID TrueType yes no yes 1105 0 NimbusRomanNo9L Type 1 yes no yes 1110 0 NimbusSansL Type 1 yes no yes 1115 0 NimbusRomanNo9L Type 1 yes no yes 1120 0 CairoFont-0-0 CID TrueType yes no yes 1126 0 CairoFont-1-0 CID TrueType yes no yes 1132 0 NimbusSansL Type 1 yes no yes 1137 0 CairoFont-4-0 CID TrueType yes no yes 1143 0 NimbusSansL Type 1 yes no yes 1148 0 Doesn't cairo generate the PDF file when printing from evince to a PDF file? Thanks for your help, Pablo Oh, I guess I'm not up to date on things here. It's good that evince/poppler uses the cairo backend for printing but the output is pretty big. Cairo used to output a lot of code for regular text, but that's been optimized since. I'm not sure what's wrong here, and I haven't worked with the cairo code in a while... Carl, maybe someone else should be the owner of PDF backend bugs? > I printed a dissertation using evince and the output is 4.5 times bigger as the > original file (this is the worst case I've seen): > > $ ls -lh output.pdf TesisRosa-B5-Uni.pdf > -rw-r--r-- 1 ousia ousia 6,3M Jul 16 21:43 output.pdf > -rw-r--r-- 1 ousia ousia 1,4M Dec 1 2006 TesisRosa-B5-Uni.pdf > > The PDF document shows that the creator and generator is cairo-1.4.10: Could you provide a link to the original PDF and the output from cairo. > And those CairoFonts should have been renamed by cairo itself. > > $ pdffonts output.pdf > name type emb sub uni object ID > ------------------------------------ ----------------- --- --- --- --------- > CairoFont-8-0 CID TrueType yes no yes 1105 0 > NimbusRomanNo9L Type 1 yes no yes 1110 0 > NimbusSansL Type 1 yes no yes 1115 0 > NimbusRomanNo9L Type 1 yes no yes 1120 0 > CairoFont-0-0 CID TrueType yes no yes 1126 0 > CairoFont-1-0 CID TrueType yes no yes 1132 0 > NimbusSansL Type 1 yes no yes 1137 0 > CairoFont-4-0 CID TrueType yes no yes 1143 0 > NimbusSansL Type 1 yes no yes 1148 0 > Looks like what is happening here is that the embedded TrueType fonts in the original PDF have had the "name" tables stripped out during subsetting. The font names exist only in the PDF font dictionaries. When cairo embeds the font in the new PDF there is no fontname available in the font so the CairoFont-x-y name is used instead. (In reply to comment #4) > > I printed a dissertation using evince and the output is 4.5 times bigger as the > > original file (this is the worst case I've seen): > > > > $ ls -lh output.pdf TesisRosa-B5-Uni.pdf > > -rw-r--r-- 1 ousia ousia 6,3M Jul 16 21:43 output.pdf > > -rw-r--r-- 1 ousia ousia 1,4M Dec 1 2006 TesisRosa-B5-Uni.pdf > > > > The PDF document shows that the creator and generator is cairo-1.4.10: > > Could you provide a link to the original PDF and the output from cairo. You can find the original PDF at http://ousia.en.eresmas.com/TesisRosa-B5-Uni.pdf and the output from cairo at http://ousia.en.eresmas.com/output.pdf. Please, those files are released for testing purposes only. As soon as the files have been checked I would appreciate a note to erase the files from the website. Thanks for your work and your help, Pablo (In reply to comment #5) > and the output from cairo at > http://ousia.en.eresmas.com/output.pdf. I get a page not found on this file. (In reply to comment #6) > (In reply to comment #5) > > and the output from cairo at > > http://ousia.en.eresmas.com/output.pdf. > > I get a page not found on this file. Sorry, you can find it at http://ousia.iespana.es/pdf/output.pdf. At least it works for me now. Pablo What's happening here is that evince is calling cairo_show_glyphs() with one glyph at a time. Each time cairo_show_glyphs() is called the PDF ouput selects the pattern, selects the font, and initializes the text matrix. The result is about 110 bytes of overhead per glyph before compression. Sorry, but I'm not a developer and I don't understand the issue. Is this a bug in cairo or in the way evince invokes cairo? Thanks for your help, Pablo Adrian, reassign to Poppler and let Jeff deal with it then? (In reply to comment #10) > Adrian, reassign to Poppler and let Jeff deal with it then? > Reassigning to poppler. Poppler should pack as many glyphs as possible into each show_glyphs() call to get efficient PS/PDF output from cairo. Just in case it helps. Using poppler-0.6.3, cairo-1.5.4 and evince 2.21.1, the resulting file is even bigger: ls -lh otput.pdf TesisRosa-B5-Uni.pdf -rw-r--r-- 1 ousia guest 6,6M 2007-12-21 17:11 otput.pdf -rw-r--r-- 1 ousia guest 1,4M 2007-12-21 16:54 TesisRosa-B5-Uni.pdf Pablo Updating summary for a more accurate description. This seems to be required for a more efficient PS/PDF generation from cairo. (In reply to comment #13) > Updating summary for a more accurate description. > > This seems to be required for a more efficient PS/PDF generation from cairo. Both that, and it increases viewing performance too. One of the problems with calling cairo_show_glyphs() with one glyph at a time is that when text knockout is true (the default) overlapping transparent glyphs in each text object will composite with each other. Poppler needs to call cairo_show_glyphs() with all glyphs in the text object to ensure that the glyphs do not composite with each other. If TK is false poppler needs to call cairo_show_glyphs() with one glyph at a time. (In reply to comment #15) > One of the problems with calling cairo_show_glyphs() with one glyph at a time > is that when text knockout is true (the default) overlapping transparent glyphs > in each text object will composite with each other. Poppler needs to call > cairo_show_glyphs() with all glyphs in the text object to ensure that the > glyphs do not composite with each other. If TK is false poppler needs to call > cairo_show_glyphs() with one glyph at a time. > I should also add that PDF can change the font, font scale, and maybe other graphics state (the PDF reference is not clear on this) inside a a text object. As cairo only supports cairo_show_glyphs() with the same font, font scale, and pattern, poppler should, when TK=true, draw all text in a group with CAIRO_OPERATOR_SOURCE then paint the group onto the page. This would be slower and result in image fallbacks when printing so poppler should only do this if it is not possible to draw all the text in the text object with a single cairo_show_glyphs() call. I've committed some changes to cairo that packs glyphs from multiple calls to show_glyphs into the one string. Using your test case the changes to the PDF output size are as follows: Before: 1.4M TesisRosa-B5-Uni.pdf 6.3M output.pdf After: 1.4M TesisRosa-B5-Uni.pdf 1.8M output.pdf Many thanks for the improvement, Adrian. Just out of curiosity (and I'm no PDF expert, so I don't know whether the following question is nonsense), wouldn't it be possible that poppler/cairo generates a smaller PDF document than the original one? Thanks again for your excellent work, Pablo (In reply to comment #18) > Many thanks for the improvement, Adrian. > > Just out of curiosity (and I'm no PDF expert, so I don't know whether the > following question is nonsense), wouldn't it be possible that poppler/cairo > generates a smaller PDF document than the original one? Only if the original PDF creator was very inefficient in the way it generated the PDF and the particular inefficiencies are things that cairo can optimize - such as the string merging that I recently committed. Generally you are going to see an increase in size when doing PDF->Poppler->Cairo->PDF. Of course we would like to keep the increase as small as possible and there a few more optimizations that can be done to further reduce the size with out losing any information. For example keeping JPEG images in JPEG format in one such optimization planned for cairo. But converting a PDF to PDF is generally not an interesting operation. You already have the file in PDF format. If you are looking to further reduce the size of your PDF there are specialist tools for processing PDF files that can do this. Is this still valid? Poppler uses a glyph array for every string, and cairo produces smaller PDF output files now. I think we can just close this (In reply to comment #5) > You can find the original PDF at > http://ousia.en.eresmas.com/TesisRosa-B5-Uni.pdf and the output from cairo at > http://ousia.en.eresmas.com/output.pdf. > > Please, those files are released for testing purposes only. As soon as the > files have been checked I would appreciate a note to erase the files from the > website. Thanks very much for the bug report, Pablo. You can certainly remove those files from the website now. (In reply to comment #20) > Is this still valid? Poppler uses a glyph array for every string, and cairo > produces smaller PDF output files now. I think we can just close this It sure looks ready to close to me, so I'll go ahead and do that. -Carl I know at some point I noticed this bug and at the time the poppler cairo code indeed seemed to do the right thing, but its higher level called it with one glyph at a time. The shrinkage in the PDF as of recent may be just caused by cairo merging multiple show_glyphs() calls now. So, unless someone can actually point to a commit that has fixed this, I think the issue should be investigated further. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.