Bug 28076 - pdftohtml: RTL text generated backwards
Summary: pdftohtml: RTL text generated backwards
Status: NEW
Alias: None
Product: poppler
Classification: Unclassified
Component: cairo backend (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: poppler-bugs
QA Contact:
Depends on:
Reported: 2010-05-12 05:04 UTC by Nezmer
Modified: 2015-11-19 05:18 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Note You need to log in before you can comment on or make changes to this bug.
Description Nezmer 2010-05-12 05:04:58 UTC
"pdftohtml" seems to generate RTL text backwards. It's like (abc) is generated (cba). You can read the generated text from LTR but that's not convenient ;)

"pdftotext" is behaving correctly.
Comment 1 kirillkh 2010-12-17 09:09:11 UTC
I'm seeing the same issue with poppler-utils 0.12.4 (Ubuntu 10.04.1).

Workaround for Hebrew: convert with "-enc ISO-8859-8". However, that discards all non-Hebrew Unicode characters (such as those used in math).

Simply reversing the Hebrew words in the output doesn't help, since the order of words in a sentence is also backwards.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.