I see this has been "fixed" in bug 12121, but the fix doesn't work for me for the pdf linked the URL. More precisely, pdfimages from xpdf 3.02 works correctly but the one from poppler 0.8.5 does not. The troublesome spot is in ImageOutputDev, the 300 version is from poppler: diff -up poppler-0.8.5/poppler-0.8.5/utils/ImageOutputDev.cc.300 poppler-0.8.5/poppler-0.8.5/utils/ImageOutputDev.cc.302 --- poppler-0.8.5/poppler-0.8.5/utils/ImageOutputDev.cc.300 2008-03-26 21:38:52.000000000 +0200 +++ poppler-0.8.5/poppler-0.8.5/utils/ImageOutputDev.cc.302 2007-02-28 00:05:52.000000000 +0200 @@ -152,5 +150,5 @@ void ImageOutputDev::drawImage(GfxState // copy the stream size = height * ((width + 7) / 8); for (i = 0; i < size; ++i) { - fputc(str->getChar(), f); + fputc(str->getChar() ^ 0xff, f); } BTW, there's also a difference in how jpegs are handled in ImageOutputDev between xpdf 3.02 and poppler 0.8.5, but I don't know whether it can cause any problems.
BTW, the pdf from bug 12121 has a "/Decode [1 0]" for its image! So, that pdf stores an inverted image that the pdf viewer is instructed to invert again. My pdf does not. So the fix for 12121 was not done properly, because pdfimages should drop "^ 0xff" only if /Decode [1 0] is present for the image in the pdf.
(In reply to comment #1) > BTW, the pdf from bug 12121 has a "/Decode [1 0]" for its image! So, that pdf > stores an inverted image that the pdf viewer is instructed to invert again. My > pdf does not. So the fix for 12121 was not done properly, because pdfimages > should drop "^ 0xff" only if /Decode [1 0] is present for the image in the pdf. > Both the xpdf 3.02 and poppler 0.8.5 solutions are incorrect in general. Basically pdfimages cannot just pass through the monochrome stream, instead GfxImageColorMap::getGray (or some optimized variant thereof) needs to be called.
Created attachment 18144 [details] [review] Correct handling of bitmap images I've attached a patch that works correctly for me whether the pdf image has or has not a /Decode array. For the sake of efficiency on large scans my fix only peeks at the colormap to decide what xor mask to use. The bit stream is then processed without involving the normal gary/color handling functions.
Patch seems good. I need you to allow to license your patch under GPLv2 or later, do you agree?
(In reply to comment #4) > Patch seems good. > I need you to allow to license your patch under GPLv2 or later, do you agree? > Of course.
the patch will be part of poppler 0.9.0, thanks for sending it and sorry for the late reply, keep patches coming!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.