Bug 65889

Summary: Bullets are converted to other characters when converting PDF to HTML
Product: poppler Reporter: Nitesh G. <nitesh.golchha>
Component: pdftohtmlAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: major    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: Windows (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Input PDF

Description Nitesh G. 2013-06-18 10:30:49 UTC
Created attachment 80993 [details]
Input PDF

Hi,

I have tried to convert the attached PDF to HTML(using pdftohtml.exe) and found that all bullets are converted into some other character represented by
alphabetic character 'n'.
I am attaching the reference PDF.

Thanks,
Nitesh
Comment 1 Dan Small 2013-08-07 17:31:39 UTC
I'm seeing something similar on Ubuntu where bullets are converted to ï·&#160;&#160; (ï(0082)&#160;&#160;) using the file http://www.cityplym.ac.uk/sites/default/files/docs/jobs/Safeguarding_changes_to_DBS.pdf
Comment 2 Dan Small 2013-08-07 18:24:09 UTC
Forgot the version poppler-0.24.0
Comment 3 Dan Small 2013-08-07 19:25:32 UTC
The problem goes away if I add the output encoding as -enc Windows-1255
Comment 4 GitLab Migration User 2018-08-20 21:43:29 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/45.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.