Bug 102911 - Newer versions of pdftotext don't extract bold & underlined text
Summary: Newer versions of pdftotext don't extract bold & underlined text
Status: RESOLVED INVALID
Alias: None
Product: poppler
Classification: Unclassified
Component: utils (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-20 23:36 UTC by Osman
Modified: 2018-04-17 09:24 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Example PDF (1.81 MB, application/pdf)
2017-09-20 23:36 UTC, Osman
Details

Description Osman 2017-09-20 23:36:35 UTC
Created attachment 134391 [details]
Example PDF

The current version of pdftotext (0.59.0) doesn't extract the bolded & underlined text out of attached pdf when -raw is used. For example, notice that 'Equipment Group 202A' is missing from the pdftotext -raw output. Confirmed behavior on Mac, Ubuntu 14, Ubuntu 16, and Alpine Linux.

On the other hand, we tried with version 0.24 (or version 3.03, which doesn't show The Popper Developers in the -v output, it only shows "Copyright 1996-2011 Glyph & Cog, LLC"), and those versions do have 'Equipment Group 202A' and generally produce better output.
Comment 1 Albert Astals Cid 2017-09-21 10:08:53 UTC
I went back to poppler 0.24 

pdftotext version 0.24.0
Copyright 2005-2013 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC

and the output of pdftotext -raw over that file is exactly the same we get with version 0.59

09/14/2017
Fuel Economy and Environment
Smartphone
QR
Code
™
fueleconomy.gov
Calculate personalized estimates and compare vehicles
Annual fuel cost Fuel Economy & Greenhouse Gas Rating (tailpipe only) Smog Rating (tailpipe only)
You
over 5 years
compared to the
average new vehicle.

There's no such thing as version 3.03, are you sure you tried version 0.24? Did you compile it yourself?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.