Bug 102911

Summary: Newer versions of pdftotext don't extract bold & underlined text
Product: poppler Reporter: Osman <oamasood>
Component: utilsAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED INVALID QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Example PDF

Description Osman 2017-09-20 23:36:35 UTC
Created attachment 134391 [details]
Example PDF

The current version of pdftotext (0.59.0) doesn't extract the bolded & underlined text out of attached pdf when -raw is used. For example, notice that 'Equipment Group 202A' is missing from the pdftotext -raw output. Confirmed behavior on Mac, Ubuntu 14, Ubuntu 16, and Alpine Linux.

On the other hand, we tried with version 0.24 (or version 3.03, which doesn't show The Popper Developers in the -v output, it only shows "Copyright 1996-2011 Glyph & Cog, LLC"), and those versions do have 'Equipment Group 202A' and generally produce better output.
Comment 1 Albert Astals Cid 2017-09-21 10:08:53 UTC
I went back to poppler 0.24 

pdftotext version 0.24.0
Copyright 2005-2013 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC

and the output of pdftotext -raw over that file is exactly the same we get with version 0.59

09/14/2017
Fuel Economy and Environment
Smartphone
QR
Code
™
fueleconomy.gov
Calculate personalized estimates and compare vehicles
Annual fuel cost Fuel Economy & Greenhouse Gas Rating (tailpipe only) Smog Rating (tailpipe only)
You
over 5 years
compared to the
average new vehicle.

There's no such thing as version 3.03, are you sure you tried version 0.24? Did you compile it yourself?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.