Summary: | Newer versions of pdftotext don't extract bold & underlined text | ||
---|---|---|---|
Product: | poppler | Reporter: | Osman <oamasood> |
Component: | utils | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED INVALID | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | Example PDF |
I went back to poppler 0.24 pdftotext version 0.24.0 Copyright 2005-2013 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC and the output of pdftotext -raw over that file is exactly the same we get with version 0.59 09/14/2017 Fuel Economy and Environment Smartphone QR Code ™ fueleconomy.gov Calculate personalized estimates and compare vehicles Annual fuel cost Fuel Economy & Greenhouse Gas Rating (tailpipe only) Smog Rating (tailpipe only) You over 5 years compared to the average new vehicle. There's no such thing as version 3.03, are you sure you tried version 0.24? Did you compile it yourself? |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 134391 [details] Example PDF The current version of pdftotext (0.59.0) doesn't extract the bolded & underlined text out of attached pdf when -raw is used. For example, notice that 'Equipment Group 202A' is missing from the pdftotext -raw output. Confirmed behavior on Mac, Ubuntu 14, Ubuntu 16, and Alpine Linux. On the other hand, we tried with version 0.24 (or version 3.03, which doesn't show The Popper Developers in the -v output, it only shows "Copyright 1996-2011 Glyph & Cog, LLC"), and those versions do have 'Equipment Group 202A' and generally produce better output.