Created attachment 54089 [details] The pdf file with Arabic content text. The attached file is an example of a pdf file with Arabic text content that pdftotext and pdftohtml does not able to transform them into text at all. It is only parenthesis and some integers. The following is a partial copy of produced text file: (" " ) : / : : ( / ) ( ) 1: (" " ) : / : : ( / ) ( ) 1: (" "
Not critical
Not a pdftohtml only bug.
Are you sure the file is not sumply broken? Does this file open correctly in any pdf viewer? Adobe Reader 9.4.6 in Linux is not able to render it correctly either.
I'm sure that both Adobe Reader and document viewer of Ubuntu 11.10 are able to open and read this file correctly, If you able to download the attached file you will notice this.
I am using evince (with i guess is what you mean with "Document Viewer") in Ubuntu 11.10 and it does not work.
(In reply to comment #5) > I am using evince (with i guess is what you mean with "Document Viewer") in > Ubuntu 11.10 and it does not work. I installed some MS Fonts, including, since some time ago. I think it was ttf-mscorefonts-installer. So evince could able to open it. By the way, my system able to write Arabic. i.e. I have Arabic keyboard layout.
Created attachment 54200 [details] Screenshot for Acrobat reader plugin render of the file This png file is a screenshot for the render of Acrobat Reader plugin in Google Chrome browser that open the attached pdf file.
I would not consider this a bug in poppler, pdftohtml, or pdftotext. The document uses glyph IDs instead of a real character encoding and does not embed fonts. Since glyph IDs are only meaningful for one particular font, this means that this document can only be viewed correctly if you have the correct font installed (Microsoft's Arial font, in this case). And since it doesn't use a real character encoding, poppler can't get the text out of the document and pdftohtml and pdftotext will not work. Note that Adobe Reader and other PDF viewers can't get the text either.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.