Bug 47186

Summary: pdftohtml: mask images are not extracted, unless they are JPEG
Product: poppler Reporter: Ihar Filipau <thephilips>
Component: pdftohtmlAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: the patch
the patch, v2
the patch, v3
extract monochrome images as PNGs, the patch, v4

Description Ihar Filipau 2012-03-10 07:59:56 UTC
Created attachment 58272 [details] [review]
the patch

Mask images are not extracted, unless they are JPEG.

I have made a patch to extract the mask images as PNGs.

Patch is attached. Was generated by git-diff'ing on freshly cloned poppler repository.

I have tested it on Debian Sid/AMD64 with several PDFs where I have observed the problem.

Please review as I'm total n00b in the things PDF. If possible - commit.
Comment 1 Albert Astals Cid 2012-03-10 08:15:53 UTC
Bad bad, don't copy code.

Create a function with the code you copied.

Also you changed 
OutputDev::drawImageMask(state, ref, str, width, height, invert, interpolate, inlineImg);
to 
OutputDev::drawImage(state, ref, str, width, height, colorMap, interpolate,
			 maskColors, inlineImg);
in the #else branch
Comment 2 Ihar Filipau 2012-03-10 08:26:39 UTC
(In reply to comment #1)
> Bad bad, don't copy code.
>
> Create a function with the code you copied.

The core of the PNG code here - the internal loop used to convert the bitmap into row for PNGWriter with comment "convert bits into a bytes for PNG" - is unfortunately unique. If you have any ideas how to make the two PNG writers into single function, I will do it.

At the moment I see that the image file name generation code and probably the file opening (to move away the error handling) can be separated into a new method.

Suggestions are welcome.

> Also you changed 
> OutputDev::drawImageMask(state, ref, str, width, height, invert, interpolate,
> inlineImg);
> to 
> OutputDev::drawImage(state, ref, str, width, height, colorMap, interpolate,
>              maskColors, inlineImg);
> in the #else branch

My bad. Will fix.
Comment 3 Ihar Filipau 2012-03-10 09:27:25 UTC
Created attachment 58273 [details] [review]
the patch, v2

Updated patch. Changes:

1. Fix the stupid copy-paste error OutputDev::drawImageMask()
   vs. OutputDev::drawImage()

2. Introduce a new PNG writing method, similar to drawJpegImage(), prototype:

void HtmlOutputDev::drawPngImage(GfxState *state, Stream *str, int width, int height,
                                 GfxImageColorMap *colorMap, GBool isMask)

which can write both RGB and MONOCHROME images. More parameters, since unlike
the jpeg counterpart, it has to do actual image creation.

3. Replace the PNG code in methods HtmlOutputDev::drawImage() and
   HtmlOutputDev::drawImageMask() with a call to HtmlOutputDev::drawPngImage() method.
Comment 4 Ihar Filipau 2012-03-10 09:34:26 UTC
Created attachment 58274 [details] [review]
the patch, v3

Same as the v2, but add the forgotten `#ifdef ENABLE_LIBPNG` around the body of the new function.
Comment 5 Albert Astals Cid 2012-03-11 15:48:52 UTC
Please use ImageStream, there's no point in you doing the bit to byte conversion when ImageStream does it for you, and since the other part of the if also uses a ImageStream will surely allow you to share more code
Comment 6 Ihar Filipau 2012-03-12 18:30:15 UTC
Created attachment 58348 [details] [review]
extract monochrome images as PNGs, the patch, v4

Applies to the master.

Changes, compared to the previous version of the patch:
- use ImageStream to also read the monochrome rows
- add HtmlOutputDev::createImageFileName(const char *) method to create image file name.
- use the createImageFileName() method in both drawJpeg/drawPng methods to remove the old copy-paste.
Comment 7 Albert Astals Cid 2012-03-15 14:57:03 UTC
Pushed

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.