Bug 55037

Summary: -xml does not render all images despite -c rendering correctly
Product: poppler Reporter: Jamie Carl <jazz>
Component: pdftohtmlAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Jamie Carl 2012-09-17 23:51:08 UTC
I've been trying to incorporate pdftohtml into my frontend renderer and have had some success with some documents.  Other more complex documents though are having problems.

My test document is the Nikon D3s brochure:

wget http://imaging.nikon.com/products/imaging/lineup/digitalcamera/slr/d3s/pdf/d3s_16p.pdf

Rendering with the following produces a pretty accurate representation of the document:

pdftohtml -c d3s_16p.pdf

However, when I output to XML using -xml some of the images that worked previously are not output.  They are not extracted or even included in the XML output.

Also, the images that are extracted are included with the wrong dimensions so the resulting page looks very out of whack.  

All of the text is rendered correctly though.

Tried latest version from git with same results.
Comment 1 Albert Astals Cid 2012-09-18 07:04:51 UTC
Not critical
Comment 2 GitLab Migration User 2018-08-20 21:55:27 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/127.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.