Bug 29551

Summary: pdftohtml utility in complex mode creates background PNGs with insufficient resolution
Product: poppler Reporter: Chun <fuzzybr80>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: mpsuzuki
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Sample PDF
output (pdftohtml high resolution)
output (poppler-utils low res)
Patch to add "-r" option to pdftohtml.

Description Chun 2010-08-12 23:50:24 UTC
Created attachment 37832 [details]
Sample PDF

We invoke the pdftohtml utility in complex mode.

pdftohtml -c -noframes [input pdf] [output html]

which creates a backgroung PNG for each PDF page that includes the graphics in that page. When using pdftohtml (from pdftohtml.sourceforge.net), the resolution of the PNG is 1785x2526 pixels.

When usng poppler-utils, each background image (PNG) is 594x843 resolution.

This makes the resultant HTML's images look really pixellated.

Can the resolution be fixed or set as an option somewhere.
Comment 1 Chun 2010-08-12 23:51:42 UTC
Created attachment 37833 [details]
output (pdftohtml high resolution)
Comment 2 Chun 2010-08-12 23:52:34 UTC
Created attachment 37834 [details]
output (poppler-utils low res)
Comment 3 Chun 2010-08-13 00:24:20 UTC
I've tried poppler versions 0.14.1, 0.11.3, 0.10.7, 0.5.91 and they all exhibit this issue.
Comment 4 Albert Astals Cid 2010-08-22 14:06:25 UTC
major? come on, how this is major in any way?
Comment 5 Chun 2010-08-22 21:37:00 UTC
(In reply to comment #4)
> major? come on, how this is major in any way?

Only so far as the fact that forcing 72 dpi quality on the resultant HTML output's graphics makes this tool pretty much unusable by anyone wanting to pdf-to-html a PDF with graphics embedded.

I appreciate that not many people uses pdftohtml enough for poppler to care, even if you are pretty much the only still maintained open source package that provides pdf-to-html functionality. Still that would probably go under Importance rather than Severity.


Anyway I have tried fixing it myself with some tests, but the magical constant 72 is distributed throughout the code (core poppler stuff), not just pdftohtml, so I can see that this will be a non-trivial fix.

We've gone with pdftohtml.sourceforge.net for our deployment (and its not so complete unicode rendering), it was quite a pity not to be able to use poppler.
Comment 6 suzuki toshiya 2010-08-25 08:44:35 UTC
Hi, I'm sorry for lated involvement to this discussion (again).
As a patch for bug 19404 using SplashOutputDev to make
background image is committed to pdftohtml, I will rework
my previous patch posted to poppler mailing list.

I cannot comment about the evaluation of "major", but I think
it's reasonable to have new option "-r" to specify the resolution
of background image, aslike pdftoppm & pdftotext have.
Comment 7 suzuki toshiya 2010-08-25 10:16:15 UTC
Created attachment 38143 [details] [review]
Patch to add "-r" option to pdftohtml.

Here it is. The patch adds new option "-r" to specify resolution.
"-r 300" will generate background image at 300 dpi.
Chun, please check if it fits your request.
Comment 8 Chun 2010-08-26 01:20:19 UTC
Tested with several documents so far, and it works! Thanks!

It would be really great to see this committed into the next release.
Comment 9 Albert Astals Cid 2010-08-26 11:41:23 UTC
Will be part of poppler >= 0.15.0
Comment 10 suzuki toshiya 2010-08-27 22:40:10 UTC
I found that the patch is just committed to GIT head.
Thank you very much!
Comment 11 Albert Astals Cid 2010-12-16 16:36:11 UTC
I'm reverting this patch since actually i realized it does not make sense, i'll be using the scale variable that is already there to let you define the zoom you want to use. Can you please test poppler git master and report any problem you might find using the zoom argument?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.