Created attachment 77390 [details] Sample document containing Image Mask causing poppler to get stuck in an infinite loop We are working on an internal tool that uses poppler for PDF processing and have encountered a handful of documents that cause the poppler core to enter an infinite loop. I've looked at a couple of them and it looks to be something related to the parsing of image masks. This is happening both under linux and OS X, linked against poppler 0.22.2. I've confirmed the bug is in poppler and not our application as it is also seen with pdftohtml. Enabling PrintCommands produces output that doesn't take long for it to show the problem: … re 661.08 456.362 609.48 -104.88 f cs /Cs6 scn 1 1 1 gs /GS1 gfx state dict: << /SA false /SM 0.02 /Type /ExtGState >> re 0 1 1 -1 f scn 0.8 0.8 0.8 q cm 1 0 0 -1 0 1 Do /Im1 Q cs /Cs6 scn 1 1 1 gs /GS1 gfx state dict: << /SA false /SM 0.02 /Type /ExtGState >> re 0 1 1 -1 f scn 0.8 0.8 0.8 q cm 1 0 0 -1 0 1 Do /Im1 Q cs /Cs6 scn 1 1 1 gs /GS1 gfx state dict: << /SA false /SM 0.02 /Type /ExtGState >> … If I had to guess, an offset is not getting applied resulting in the same object getting returned. I realize there is a repeated graphic on the page but by the time I killed pdftohtml (< 30s from starting it), there were around 140k instances of the PNG written to disk and I'm pretty sure that can't be right :) I've extracted a single page of one that shows the issue and have attached it. Please note that running it on the file will quickly create thousands of small 8x8 PNGs about 100 bytes in size. There are two similar issues reported but they date back to 2010 and are marked resolved so I'm not confident it is the same problem: https://bugs.freedesktop.org/show_bug.cgi?id=28784 https://bugs.freedesktop.org/show_bug.cgi?id=28172 In the meantime, I'm trying to trace through the code to try and get an understanding but I'm very unfamiliar with the Parser/Lexer portion of the poppler core. Hope you can help and let me know if there's any other way I can assist. Thank you.
Are you sure it's a core problem? I can see pdftohtml looping but all the other tools pdftops, pdftoppm, etc. work fine
(In reply to comment #1) > Are you sure it's a core problem? I can see pdftohtml looping but all the > other tools pdftops, pdftoppm, etc. work fine You're true: all the other tools use OutputDevice's which implements tilingPatternFill, but HTMLOutputDev.cc / *.h doesn't implement it. And because the image mask is part of a pattern colorspace, it is outputted 2541 x 438 = 1112958, and that twice. I'm not familiar enough neither with HTML nor with CSS nor with pdftohtml to decide wether it is possible to render it only once and than run a javascript loop or define a CSS element around it like we do it in PSOutputDev or to render it as one image like we do it in SplashOutputDev, but the only solution I see is that someone with more knowhow implement the tilingPatternFill (and set useTilingPatternFill() to true) in HTMLOutputDev!
(In reply to comment #1) > Are you sure it's a core problem? I can see pdftohtml looping but all the > other tools pdftops, pdftoppm, etc. work fine Ugh. Figures I would only try it with a device that would also happen to not work. Just my luck. Apologies about filing this bug. Next time I'll be sure to try it out with more than one output device. At least now you know there's a problem with the HtmlOutputDev. :/ Thank you.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/283.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.