Bug 56858

Summary: Huge simple PDF displayed blank in poppler-glib-demo
Product: poppler Reporter: Maxim Iorsh <iorsh>
Component: cairo backendAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: pdftocairo: limit image size
Downscale source image before creating a cairo image
Downscale source image before creating a cairo image
Downscale source image before creating a cairo image
Downscale source image before creating a cairo image
Downscale source image before creating a cairo image
Downscale source image before creating a cairo image

Description Maxim Iorsh 2012-11-08 07:38:34 UTC
A very large PDF produced by a Xerox wide scanner is displayed blank.
File location: https://docs.google.com/file/d/14hFFrjSSbiEcfML1sgOttcT865GC7tHqv27wiuEfQ-KlvuRfU67Dkj9E9JaM/edit

Basically it contains nothing but a 21590 x 161385 b/w bitmap

Note that Acrobat Reader fails to display it too. Okular displays properly, but slowly. PDF-XChange Viewer for Windows displays properly and very fast.
Comment 1 Adrian Johnson 2012-11-11 08:50:47 UTC
Cairo bug. Works fine with pdftoppm.
Comment 2 Adrian Johnson 2012-11-11 08:55:11 UTC
Created attachment 69892 [details] [review]
pdftocairo: limit image size

Cairo images have a maximum width and height of 32767 pixels. When using pdftocairo to convert Shimshon40.pdf to png the default output size is 5398x40346 pixels which causes cairo to fail.

This patch checks if the limit is exceeded and prints a more informative message than the default cairo error message.
Comment 3 Adrian Johnson 2012-11-11 09:00:10 UTC
Created attachment 69893 [details] [review]
Downscale source image before creating a cairo image

Running pdftocairo with a reduced resolution to ensure the output image is not too large still fails to produce the correct output because the souce image in the PDF has a height of 161385 which is too large for cairo.

This patch fixes this problem by scaling down source images before creating a cairo image surface.

Running pdftocairo -png -r 72 Shimshon40.pdf now produces the correct output.
Comment 4 Maxim Iorsh 2012-11-11 21:00:38 UTC
Works now, thank you!

Performance note: rendering takes 69 seconds, after which the entire file is scrolled in no time. On the other hand, PDF-XChange Viewer for Windows displays much faster (3-4 seconds), but has some delays on scrolling, probably due to some render-by-demand mechanism.

Is this performance issue inside the scope of poppler, or the API users (such as Okular) should implement render-by-demand by themselves using the available API?
Comment 5 Adrian Johnson 2012-11-12 10:37:15 UTC
(In reply to comment #4)
> Is this performance issue inside the scope of poppler, or the API users
> (such as Okular) should implement render-by-demand by themselves using the
> available API?

It is possible for users of the glib API such as Evince to render smaller areas. It is just a case of setting the clip area before rendering. But this won't automatically make it render faster.

I did some testing with pdftocairo:

$ pdftocairo -png -r 72 Shimshon40.pdf out
rendering the full page (2591x19366 pixels) took 215 seconds

pdftocairo -png -r 72 -x 0 -y 0 -sz 2591 Shimshon40.pdf crop
rendering a 2591x2591 region (13% of the full page) took 144 seconds

Without having done any profiling I'm guessing the main reason the cropped region took so long is the entire source image is scaled down instead finding the region of the source image that corresponds to the cropped destination and scaling that down.

There is no doubt plenty of opportunities for optimization. But given that PDFs this size are uncommon it is not a high priority for me. But if you have more huge PDFs I'm happy to include them in any testing and profiling I would do to improve the performance of the cairo backend.
Comment 6 Carlos Garcia Campos 2012-11-12 18:52:00 UTC
Most of the time to render a page is spent parsing the PDF streams. Tile rendering is currently possible from the API point of view, but in practice, rendering one tile takes almost the same time than rendering the whole page, so dividing the page in N tiles and render them separately is in the end much slower than rendering the whole page. This is due to the poppler rendering model, based on parse + render. We would need a model based on parsing the page to intermediate objects once and render multiple times using the intermediate objects. Similar to the render tree used by web engines.
Comment 7 Carlos Garcia Campos 2012-11-12 18:53:01 UTC
Btw, I haven't looked at the patch in detail yet, but I guess the performance issues you mention are not introduced by the patch, right?
Comment 8 Maxim Iorsh 2012-11-12 19:18:04 UTC
Before the patch, the failure was immediate indeed, so it's hard to compare the two states by means of time.

Another issue: rendering at 2x scale still fails with error

 Internal Error: cairo context error: invalid value (typically too big) for the size of the input (surface, pattern, etc.)

I presume the latter issue would be solved by tiling too (at least for screen, not for png cairo-based output). But of course, changing the entire engine is not a bug-fixing activity.

Nevertheless, this patch already makes huge improvement for me as a user.
Comment 9 Adrian Johnson 2012-11-15 11:19:43 UTC
Created attachment 70107 [details] [review]
Downscale source image before creating a cairo image

Updated patch to remove commented out code.
Comment 10 Adrian Johnson 2012-11-15 12:11:55 UTC
Created attachment 70109 [details] [review]
Downscale source image before creating a cairo image

Updated to move the printing test into getSourceImage and eliminate the downscale boolean.
Comment 11 Adrian Johnson 2012-11-17 10:58:52 UTC
Created attachment 70185 [details] [review]
Downscale source image before creating a cairo image

Updated to fix failure in regression tests.
Comment 12 Adrian Johnson 2012-11-17 12:04:47 UTC
Created attachment 70186 [details] [review]
Downscale source image before creating a cairo image

Fix filtering.
Comment 13 Adrian Johnson 2012-11-17 12:08:25 UTC
Created attachment 70187 [details] [review]
Downscale source image before creating a cairo image

Oops. Correct fix for filtering.
Comment 14 Carlos Garcia Campos 2012-11-17 14:07:27 UTC
(In reply to comment #13)
> Created attachment 70187 [details] [review] [review]
> Downscale source image before creating a cairo image
> 
> Oops. Correct fix for filtering.

Total 292 tests
292 tests passed (100.00%)

Feel free to push it. Thanks!
Comment 15 Adrian Johnson 2012-11-18 12:15:52 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > Created attachment 70187 [details] [review] [review] [review]
> > Downscale source image before creating a cairo image
> > 
> > Oops. Correct fix for filtering.
> 
> Total 292 tests
> 292 tests passed (100.00%)
> 
> Feel free to push it. Thanks!

Pushed. I'll leave the bug open while I work on tiling for pdftocairo.
Comment 16 Munish Kumar 2018-04-16 05:11:39 UTC
(In reply to Adrian Johnson from comment #15)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > Created attachment 70187 [details] [review] [review] [review] [review]
> > > Downscale source image before creating a cairo image
> > > 
> > > Oops. Correct fix for filtering.
> > 
> > Total 292 tests
> > 292 tests passed (100.00%)
> > 
> > Feel free to push it. Thanks!
> 
> Pushed. I'll leave the bug open while I work on tiling for pdftocairo.

I am still facing the same problem. My pdftocairo version is 0.56.0. I am using poppler that uses cairo internally.
Comment 17 GitLab Migration User 2018-08-21 11:05:27 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/511.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.