Summary: | pdftotext text extraction failure | ||
---|---|---|---|
Product: | poppler | Reporter: | Rico <risanecek> |
Component: | general | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED NOTABUG | QA Contact: | |
Severity: | major | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Rico
2009-05-17 03:24:38 UTC
PDF examples are under: http://92.43.104.34/pdf/ Not a bug, the pdf doesn't contain the correct font -> text mappings, see that Adobe Reader can't extract the text either. I see. My (because of my ignorance of PDF interna perhaps naive) belief was, that if acroread/xpdf/... can render it (deterministic every time ;-)) and select text, the information must be in that document. So no chance to reconstruct that broken doc? Would be nice if pdftotext had some --tryrealhard switch. Anyway thanks for your consideration, keep up the good work. no chance, unless you go the OCR route |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.