Summary: | pdftohtml - Images don't have correct page orientation. | ||
---|---|---|---|
Product: | poppler | Reporter: | Derek <djtyner1981> |
Component: | general | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | tehpola |
Version: | unspecified | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Background Image of Converted page
HTML file for converted page Removing a line of code for landscape orientation Rewriting the image generation for pdftohtml to use splash Modifying image generation for pdftohtml to use Splash unless -dev is specified Updates to the man page |
Description
Derek
2009-01-05 08:39:07 UTC
Created attachment 21690 [details]
Background Image of Converted page
Created attachment 21691 [details]
HTML file for converted page
The pdf is too big to attach (~ 9MB), but it can be downloaded from here: http://www.softwaresummit.com/2006/speakers/RaibleMigratingStruts.pdf Created attachment 36041 [details] [review] Removing a line of code for landscape orientation I know this is an old issue, but it hasn't yet been addressed as far as I can tell. I removed a line of code from PSOutputDev.cc; it's used in pdftohtml to generate postscript which is used to generate images for each page. I don't really understand the point of this line of code, but it causes rotate to be set to 180 when the initial state->rotate == 90, rotate0 != 0, and height > width. By removing this line, the PDF provided by the bug opener looks correct when ran through pdftohtml. If anyone can explain to me the purpose of setting rotate to 270 - rotate, I think we could probably come up with a solution for this bug. If no one else knows, maybe we'd be better off without that line of code. Any input on my change (or any other fixes for this bug), would be greatly appreciated! After further inspection (actually looking at the generated postscript, rather than just the final output of pdftohtml), it is clear that the modification for rotation (which I removed) causes the postscript file to be displayed with the correct orientation. It seems that the PNGs generated are just always rotated when the PS is in landscape mode. Maybe something is not being passed to ghostscript which would force it to interpret the PS such that the PNGs generated aren't rotated. Because my patch would break something like pdftops, I'd recommend against its inclusion, but hopefully someone has some insight in how to do this correctly. Created attachment 36113 [details] [review] Rewriting the image generation for pdftohtml to use splash In my opinion, the correct approach to this issue is to, instead of using PS and GS, use Poppler itself to generate the images. In my patch, I do just that. I subclass SplashOutputDev to overload its text-related methods so that no text is present in the image. I believe that the fact that Splash did not have such an option was the reason PostScript was originally used, but it was simple enough to modify it to make behave that way. The performance of this method is much better, from what I've seen, and it resolves the issue with orientation. The only downside I'm aware of is that our choice of image formats is down to JPEG and PNG due to the limited formats that Splash supports. Currently, all images generated are RGB8, but an option could be added to allow for other color modes if desirable. I'd love to see this included in Poppler, and I will work with whoever can help to get this upstream as I believe it is a substantial improvement over the previous method. Thanks for the patch, i also think it makes more sense to use splash for that, but in aim to keep compatibility with people that already uses scripts and might be using the "-dev" command i would suggest to leave the old code and just adding the new one and making it the default unless the "-dev" command is there Also i think you should overload interpretType3Chars to return false Created attachment 36271 [details] [review] Modifying image generation for pdftohtml to use Splash unless -dev is specified I've retained the Ghostscript method, and added the method returning false. It seems to work well both ways for me. I've also corrected an issue I discovered in Windows when compiling my code. Let me know if there are any other issues. Sorry for the delay. Patch looks good, one last request before merging it in, could you update the pdftohtml.1 file (the manpage) with the new options? Mike, i've added you to the CC list, can you please read comment #9 ? Created attachment 37418 [details] [review] Updates to the man page I've updated the man page here. I was a little more verbose than the minimal comments in the --help screen, but I can change either to be more consistent with the other if you'd prefer it that way. Also, I haven't worked with man pages before, but I think I figured out the basics; however, if I did something wrong or could have done it better, let me know. I've commited your patch. Thanks. Will appear in poppler >= 0.15.0 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.