Created attachment 117703 [details] PDF containing only the string "foobar" The text extraction in poppler-cpp silently drops characters from the PDF. To reproduce, compile and link the attached testcase and run it on the attached PDF: ./txtextr foobar.pdf The output should be foobar as that is the only text in the PDF, but it is fooba
Created attachment 117704 [details] Minimal test case
Created attachment 117705 [details] [review] Proposed patch Here is a patch. Unfortunately the conversion code is so hairy that I'm not sure it is correct. It passes all of pdfgrep's tests, though. Anyway, please review it.
Created attachment 117706 [details] More correct minimal test case The minimal test case now handles the trailing null byte correctly, but the bug still persists.
Created attachment 117707 [details] [review] Proposed patch Remove debug output from patch and fix indentation
Pushed!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.