Bug 19404 - pdftohtml - Images don't have correct page orientation.
Summary: pdftohtml - Images don't have correct page orientation.
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-05 08:39 UTC by Derek
Modified: 2010-08-22 14:12 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Background Image of Converted page (14.79 KB, image/png)
2009-01-05 08:55 UTC, Derek
Details
HTML file for converted page (1.69 KB, text/html)
2009-01-05 08:55 UTC, Derek
Details
Removing a line of code for landscape orientation (414 bytes, patch)
2010-06-03 13:59 UTC, Mike Slegeir
Details | Splinter Review
Rewriting the image generation for pdftohtml to use splash (7.07 KB, patch)
2010-06-07 11:08 UTC, Mike Slegeir
Details | Splinter Review
Modifying image generation for pdftohtml to use Splash unless -dev is specified (9.31 KB, patch)
2010-06-14 16:27 UTC, Mike Slegeir
Details | Splinter Review
Updates to the man page (596 bytes, patch)
2010-07-28 07:01 UTC, Mike Slegeir
Details | Splinter Review

Description Derek 2009-01-05 08:39:07 UTC
When using poppler 0.10.2 and running pdftohtml with -c and -nodrm on a pdf that has a page orientation of landscape, the html files are generated with the proper orientation, but the background image png files are generated with portrait orientation.  See attached PDF for an example and an attached html and image file.
Comment 1 Derek 2009-01-05 08:55:37 UTC
Created attachment 21690 [details]
Background Image of Converted page
Comment 2 Derek 2009-01-05 08:55:54 UTC
Created attachment 21691 [details]
HTML file for converted page
Comment 3 Derek 2009-01-05 08:58:00 UTC
The pdf is too big to attach (~ 9MB), but it can be downloaded from here:  http://www.softwaresummit.com/2006/speakers/RaibleMigratingStruts.pdf
Comment 4 Mike Slegeir 2010-06-03 13:59:11 UTC
Created attachment 36041 [details] [review]
Removing a line of code for landscape orientation

I know this is an old issue, but it hasn't yet been addressed as far as I can tell.

I removed a line of code from PSOutputDev.cc; it's used in pdftohtml to generate postscript which is used to generate images for each page.  I don't really understand the point of this line of code, but it causes rotate to be set to 180 when the initial state->rotate == 90, rotate0 != 0, and height > width.

By removing this line, the PDF provided by the bug opener looks correct when ran through pdftohtml.  If anyone can explain to me the purpose of setting rotate to 270 - rotate, I think we could probably come up with a solution for this bug.  If no one else knows, maybe we'd be better off without that line of code.

Any input on my change (or any other fixes for this bug), would be greatly appreciated!
Comment 5 Mike Slegeir 2010-06-03 14:59:13 UTC
After further inspection (actually looking at the generated postscript, rather than just the final output of pdftohtml), it is clear that the modification for rotation (which I removed) causes the postscript file to be displayed with the correct orientation.

It seems that the PNGs generated are just always rotated when the PS is in landscape mode.  Maybe something is not being passed to ghostscript which would force it to interpret the PS such that the PNGs generated aren't rotated.

Because my patch would break something like pdftops, I'd recommend against its inclusion, but hopefully someone has some insight in how to do this correctly.
Comment 6 Mike Slegeir 2010-06-07 11:08:57 UTC
Created attachment 36113 [details] [review]
Rewriting the image generation for pdftohtml to use splash

In my opinion, the correct approach to this issue is to, instead of using PS and GS, use Poppler itself to generate the images.  In my patch, I do just that.

I subclass SplashOutputDev to overload its text-related methods so that no text is present in the image.  I believe that the fact that Splash did not have such an option was the reason PostScript was originally used, but it was simple enough to modify it to make behave that way.

The performance of this method is much better, from what I've seen, and it resolves the issue with orientation.  The only downside I'm aware of is that our choice of image formats is down to JPEG and PNG due to the limited formats that Splash supports.  Currently, all images generated are RGB8, but an option could be added to allow for other color modes if desirable.

I'd love to see this included in Poppler, and I will work with whoever can help to get this upstream as I believe it is a substantial improvement over the previous method.
Comment 7 Albert Astals Cid 2010-06-10 13:07:13 UTC
Thanks for the patch, i also think it makes more sense to use splash for that, but in aim to keep compatibility with people that already uses scripts and might be using the "-dev" command i would suggest to leave the old code and just adding the new one and making it the default unless the "-dev" command is there

Also i think you should overload interpretType3Chars to return false
Comment 8 Mike Slegeir 2010-06-14 16:27:57 UTC
Created attachment 36271 [details] [review]
Modifying image generation for pdftohtml to use Splash unless -dev is specified

I've retained the Ghostscript method, and added the method returning false.  It seems to work well both ways for me.  I've also corrected an issue I discovered in Windows when compiling my code.  Let me know if there are any other issues.
Comment 9 Albert Astals Cid 2010-07-25 08:38:17 UTC
Sorry for the delay.

Patch looks good, one last request before merging it in, could you update the pdftohtml.1 file (the manpage) with the new options?
Comment 10 Albert Astals Cid 2010-07-25 08:39:50 UTC
Mike, i've added you to the CC list, can you please read comment #9 ?
Comment 11 Mike Slegeir 2010-07-28 07:01:40 UTC
Created attachment 37418 [details] [review]
Updates to the man page

I've updated the man page here.  I was a little more verbose than the minimal comments in the --help screen, but I can change either to be more consistent with the other if you'd prefer it that way.  Also, I haven't worked with man pages before, but I think I figured out the basics; however, if I did something wrong or could have done it better, let me know.
Comment 12 Albert Astals Cid 2010-08-22 14:12:38 UTC
I've commited your patch. Thanks. Will appear in poppler >= 0.15.0


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.