Bug 16999 - pdfimages extracts bitmaps with color inversed, reloaded!
Summary: pdfimages extracts bitmaps with color inversed, reloaded!
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL: http://www.cs.umd.edu/~gaburici/a8bug...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-05 12:17 UTC by Vasile Gaburici
Modified: 2008-08-30 03:48 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Correct handling of bitmap images (953 bytes, patch)
2008-08-05 14:56 UTC, Vasile Gaburici
Details | Splinter Review

Description Vasile Gaburici 2008-08-05 12:17:05 UTC
I see this has been "fixed" in bug 12121, but the fix doesn't work for me for the pdf linked the URL. More precisely, pdfimages from xpdf 3.02 works correctly but the one from poppler 0.8.5 does not. The troublesome spot is in ImageOutputDev, the 300 version is from poppler:

diff -up poppler-0.8.5/poppler-0.8.5/utils/ImageOutputDev.cc.300 poppler-0.8.5/poppler-0.8.5/utils/ImageOutputDev.cc.302
--- poppler-0.8.5/poppler-0.8.5/utils/ImageOutputDev.cc.300     2008-03-26 21:38:52.000000000 +0200
+++ poppler-0.8.5/poppler-0.8.5/utils/ImageOutputDev.cc.302     2007-02-28 00:05:52.000000000 +0200
@@ -152,5 +150,5 @@ void ImageOutputDev::drawImage(GfxState
     // copy the stream
     size = height * ((width + 7) / 8);
     for (i = 0; i < size; ++i) {
-      fputc(str->getChar(), f);
+      fputc(str->getChar() ^ 0xff, f);
     }

BTW, there's also a difference in how jpegs are handled in ImageOutputDev between xpdf 3.02 and poppler 0.8.5, but I don't know whether it can cause any problems.
Comment 1 Vasile Gaburici 2008-08-05 12:47:12 UTC
BTW, the pdf from bug 12121 has a "/Decode [1 0]" for its image! So, that pdf stores an inverted image that the pdf viewer is instructed to invert again. My pdf does not. So the fix for 12121 was not done properly, because pdfimages should drop "^ 0xff" only if /Decode [1 0] is present for the image in the pdf.

Comment 2 Vasile Gaburici 2008-08-05 13:53:03 UTC
(In reply to comment #1)
> BTW, the pdf from bug 12121 has a "/Decode [1 0]" for its image! So, that pdf
> stores an inverted image that the pdf viewer is instructed to invert again. My
> pdf does not. So the fix for 12121 was not done properly, because pdfimages
> should drop "^ 0xff" only if /Decode [1 0] is present for the image in the pdf.
> 

Both the xpdf 3.02 and poppler 0.8.5 solutions are incorrect in general. Basically pdfimages cannot just pass through the monochrome stream, instead
GfxImageColorMap::getGray (or some optimized variant thereof) needs to be called.
Comment 3 Vasile Gaburici 2008-08-05 14:56:25 UTC
Created attachment 18144 [details] [review]
Correct handling of bitmap images

I've attached a patch that works correctly for me whether the pdf image has or has not a /Decode array. For the sake of efficiency on large scans my fix only peeks at the colormap to decide what xor mask to use. The bit stream is then processed without involving the normal gary/color handling functions.
Comment 4 Albert Astals Cid 2008-08-29 14:21:06 UTC
Patch seems good.
I need you to allow to license your patch under GPLv2 or later, do you agree?
Comment 5 Vasile Gaburici 2008-08-29 15:34:08 UTC
(In reply to comment #4)
> Patch seems good.
> I need you to allow to license your patch under GPLv2 or later, do you agree?
> 

Of course.
Comment 6 Albert Astals Cid 2008-08-30 03:48:11 UTC
the patch will be part of poppler 0.9.0, thanks for sending it and sorry for the late reply, keep patches coming!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.