Summary: | pdftotext breaks sentence in middle of sentence when text overflow the box, whereas pdftohtml captures the full sentence. | ||
---|---|---|---|
Product: | poppler | Reporter: | Gaurav Arora <gauravarora.daiict> |
Component: | utils | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
sample pdf which is facing this issue
Image showing how the text looks like in pdf |
Description
Gaurav Arora
2017-02-15 12:19:08 UTC
Created attachment 129624 [details]
Image showing how the text looks like in pdf
I think this is the correct behavior for pdftotext. You have a document which has weird formatting and pdftotext tries to respect that formatting. The fact that pdftohtml doesn't do this is more likely a bug in pdftohtml. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/252. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.