Bug 42864 - pdftohtml background images also contain text
Summary: pdftohtml background images also contain text
Status: RESOLVED NOTABUG
Alias: None
Product: poppler
Classification: Unclassified
Component: pdftohtml (show other bugs)
Version: unspecified
Hardware: All Windows (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-12 16:29 UTC by craig
Modified: 2011-11-14 08:58 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
2 directories containing pdfs with their output (880.82 KB, application/x-zip-compressed)
2011-11-12 16:29 UTC, craig
Details

Description craig 2011-11-12 16:29:26 UTC
Created attachment 53470 [details]
2 directories containing pdfs with their output

With some documents pdftohtml -c is generating pages that have a background image containing text.
The raw html itself also contains the same text, in nearly exactly the same place.
This causes the preview to be illegible.

I have attached 2 pdfs that cause this behaviour and the output after running them through pdftohtml.
Comment 1 Albert Astals Cid 2011-11-13 07:05:09 UTC
Not major.
Comment 2 Albert Astals Cid 2011-11-13 07:14:14 UTC
Which pdftohtml version? Which exact commandline?
Comment 3 craig 2011-11-13 08:43:17 UTC
pdftohtml.exe -dev jpeg -c inputfile


C:\Previewer\pdftohtml>pdftohtml -v
pdftohtml version 0.16.6
Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2004 Glyph & Cog, LLC

If I remember correctly this is the version I got from the kde4win project
Comment 4 Albert Astals Cid 2011-11-14 04:05:50 UTC
Don't use -dev jpeg, by doing that you are using a "I know what i am doing". And it shows you are not :D So simply remove that and it will work. And if it does not update to 0.18.1 and it will work.
Comment 5 craig 2011-11-14 08:51:52 UTC
Without -dev jpeg, my previews result in images with 0bytes.

I guess I will have to invest the time in getting 0.18.1 to compile in windows.

Unfortunately your project is not very windows friendly - there are no recent guides or builds in windows.

Cheers,
Craig
Comment 6 craig 2011-11-14 08:53:55 UTC
forgot to add the reason why they are 0bytes:

Error: Support for this image type not compiled in
Error: Support for this image type not compiled in

(In reply to comment #5)
> Without -dev jpeg, my previews result in images with 0bytes.
> 
> I guess I will have to invest the time in getting 0.18.1 to compile in windows.
> 
> Unfortunately your project is not very windows friendly - there are no recent
> guides or builds in windows.
> 
> Cheers,
> Craig
Comment 7 Albert Astals Cid 2011-11-14 08:58:22 UTC
By your metric we are not Linux friendly either, since we do not have recent guides nor builds for it either.

Seems the build you are using was not compiled against libpng or libjpeg and thus you have problems because of that. By using -dev jpeg what you are effectively doing is transform the pdf to a ps and then asking gs to render to a jpeg. This is kind of black magic and as you can see, sometimes it does not work properly.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.