Using pdftotext on the attached file results in Turkish characters (ı,ş,ğ and such) becoming garbled. Using splash API is resulting in the same problem so I guess its an internal Poppler issue.
Created attachment 27949 [details] Sample pdf file extracted from a longer file
Adobe can't extract the text correctly either so i'm leaning to the file being faulty
How do you extract with Adobe btw? The file for sure might be faulty, is there any way to debug what might be wrong with the file? Thanks!
File -> Save as Text ;-) That was easy Probably the font mapping/encoding is not correctly set
Yeah looks like they didn't use CP1254 but some other latin variant. Interesting bug (on the pdf creator side) :-)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.