Created attachment 59456 [details] garbled-color image produced by pdftohtml pdftohtml version 0.18.4 from Kubuntu 12.04 beta amd64. I downloaded the 14MB PDF at http://www.swanyretail.com/SwanySkiCatalog_final-LO.pdf and ran it through pdftohtml with no options. All the resulting jpegs are the right size but have garbled colors. They look like "solarized" negatives: mostly black, little color. To reproduce the problem, mkdir bugtemp cd bugtemp wget http://www.swanyretail.com/SwanySkiCatalog_final-LO.pdf pdftohtml -f 1 -l 3 SwanySkiCatalog_final-LO.pdf swany.html' to convert just the first three pages. Then look at the resulting swanys.html and/or the individual jpegs. Here's the first bad image in the original PDF: 4618 0 obj <</Intent/RelativeColorimetric/Subtype/Image/Length 88781/Filter/DCTDecode/Name/X/BitsPerComponent 8/ColorSpace/DeviceCMYK/Width 629/Height 814/Type/XObject>>stream ÿØÿî^@^NAdobe^@d<80>^@^@^@^BÿÛ^@<84>^@^L^H^H^H^H^H^L^H^H^L^P^K^K^K^P^T^N^M^M^N^T^X^R^S^S^S^R^X^T^R^T^T^T^T^R^T^T^[^^^^^^^[^T$''''$25552;;;;;;;;;;^A^M^ I'm guessing, perhaps pdftohtml doesn't handle ColorSpace DeviceCMYK ? pdfimages extracts these as .ppm files that preview fine in Gwenview. pdfimages' -j option does nothing. I'll attach the bad jpeg from pdftohtml and the good ppm from pdfimages, and pages 1-3 extracted with pdfseparate/pdfunite
Created attachment 59457 [details] correct image produced by pdfimages
Created attachment 59458 [details] first three pages of problem PDF pages 1-3 of http://www.swanyretail.com/SwanySkiCatalog_final-LO.pdf in case it goes away.
*** Bug 35026 has been marked as a duplicate of this bug. ***
Created attachment 83039 [details] [review] Proposed simple patch I tried to fix the problem and this patch resolves the issue for all my tested PDFs. This is my first patch I provide ever, so I hope it is okay to upload it here.
Attaching the patch here is fine. Can you explain why the extra code? I mean, why we need to test that?
Thank you for your response Albert. I can't really answer your question, only tell you where I got the code from. I took the extra code from pdfimages as it was said that utility works. You can see the same code in ImageOutputDev.cc on line 269 to 272. I don't really know what is behind GfxImageColorMap::getNumPixelComps(). I just found out that you provided the original fix for ImageOutputDev.cc (commit: 2df6d530) in January 2009.
sigh copied code everywhere :-/ Ok, i've commited the change and will be in 0.24.1 FWIW I did not do the code, as the commit log says, i just brought it from xpdf Thanks for the investigation!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.