Bug 2981 - RTL select, copy/paste and search support for Arabic and Hebrew scripts are missing
Summary: RTL select, copy/paste and search support for Arabic and Hebrew scripts are m...
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords: i18n
Depends on: 55977
Blocks: Persian
  Show dependency treegraph
 
Reported: 2005-04-11 12:43 UTC by Bryan Clark
Modified: 2018-08-21 10:35 UTC (History)
15 users (show)

See Also:
i915 platform:
i915 features:


Attachments
find bidirectional text (3.98 KB, patch)
2012-10-21 07:03 UTC, alex
Details | Splinter Review

Description Bryan Clark 2005-04-11 12:43:07 UTC
I know it's very complicated, but many times we need to search for just a word
in a pdf/ps and we cann't.

As I know, no viewer support RtL scripts yet.


------- From Behdad Esfahbod 2005-03-15 10:31 -------

Maybe a first step is to simply support searching/copy/paste Unicode strings. 
That should quite possible give the encoding vector of PS (and PDF) fonts.  SVG
should have no problem I guess.
Comment 1 Bryan Clark 2005-04-26 20:52:59 UTC
= Additional Comments from http://bugzilla.gnome.org/show_bug.cgi?id=300536 =

Reporter: johnny5@i12.com (Roee)

Please describe the problem:
When using the search function in an hebrew document the typing of the searched
text should be entered in backwards in order to make a search in the documnet.

Steps to reproduce:
1. Open a document containing hebrew text.
2. Try to search for a word

Actual results:
The only posible way to search is when the typing is done backwards

Expected results:
The search should find words in their correct typing

Does this happen every time?
yes

attachment: http://bugzilla.gnome.org/attachment.cgi?id=45225&action=view
Comment 2 Yaron Tausky 2005-08-16 06:28:47 UTC
GNOME bug http://bugzilla.gnome.org/show_bug.cgi?id=313230 is a duplicate of
this one.

I just wish to add that reversing the word when a RTL script is entered is not
enough! Poppler should implement the Unicode BiDi algorithm to support search
strings which contain both LTR and RTL scripts. There is an implementation named
fribidi you can use.
Comment 3 Benjamin Geer 2008-05-06 05:17:48 UTC
> As I know, no viewer support RtL scripts yet.

Adobe Acrobat Reader supports searching and copying Arabic text perfectly.

> There is an implementation named fribidi you can use.

Freebidi works very well, and and it's a Freedesktop project:

http://fribidi.freedesktop.org/wiki/

Why not use it in Poppler?
Comment 4 Behdad Esfahbod 2008-05-08 08:25:11 UTC
Note that FriBidi converts from logical (input text) order to visual (glyph) order.  The problem in poppler is reverse-bidi.  That is, going back from the visual order as found in the PDF to the logical text order.  Poppler does an ok job at that.  It sure can be improved, but fribidi is no magic bullet here.  I wrote about this a bit here:

  http://lists.cairographics.org/archives/cairo/2007-September/011427.html

search for reverse-bidi.

An alternative would be to make poppler use fribidi to find the visual order for the search text, then match that against the visual order of extracted text.  But that's against the current code and does not yield much immediate benefits.

he only thing that I can think of that can improve poppler's behavior by using fribidi is mirroring characters like brackets when found around RTL text.  That's all for now.
Comment 5 Benjamin Geer 2008-05-08 14:37:30 UTC
> The problem in poppler is reverse-bidi.  That is, going back from
> the visual order as found in the PDF to the logical text order.
> Poppler does an ok job at that.

If I want to search for the string
لقد
in a PDF using evince, I have to type it backwards in the search field:
دقل

This is completely broken behaviour.  How can it be considered doing "an OK job"?
Comment 6 Behdad Esfahbod 2008-05-09 01:04:17 UTC
Ah I thought it's changed in the mean time.

Poopler hackers, how does the search work?  I thought the search word is matched against the text extracted using the text device?  If it's done that way it should work fairly ok.
Comment 7 Usama Akkad 2010-05-19 03:28:43 UTC
is their any plaining for fix? it's really a big problem for RTL language user with evince

please help with that, some one is offering a solution here I think:
http://www.mail-archive.com/evince-list@gnome.org/msg01819.html
Comment 8 alex 2012-10-15 07:21:25 UTC
hello friends,

i've just posted a patch to implement visual to logical text conversion, that migh become a step towards this problem solution.

see https://bugs.freedesktop.org/show_bug.cgi?id=55977

best regards,
alex
Comment 9 alex 2012-10-21 07:03:51 UTC
Created attachment 68861 [details] [review]
find bidirectional text


a small workaround for searching rtl text. limited for mixed directional text.

say ABC 123 will render by fribidi as 123 ABC. 
to search for this text in poppler, you'd need to search literally 123 CBA before this patch.
with this patch, search for ABC 123 as entered. nice.
but if you only search for ABC 12, nothing would be found. that's because this patch transforms the searched text from logical to visual before the actual search in the visual text inside poppler, so ABC 12 would render to 12 CBA, that's not there.

there's a better way to go, which i'll implement later. this would also help with bidi text select and copy.

this patch will only work if you first apply my last patch to bug 55977. you also need fribidi or preferably icu. 

please enjoy.
alex
Comment 10 Albert Astals Cid 2012-11-19 21:30:16 UTC
Adding the depends for the current patch dependency, the bug itself is not dependent but current solution by alex is.
Comment 11 alex 2012-11-28 08:32:40 UTC
the last fix for #55977 will be enough to fix this bug too.
again, it's a partial solution for mixed direction text.
Comment 12 Uri Shabtay 2014-11-30 11:05:47 UTC
seriously, people. we're in 2014. 

open this .pdf file in your browser - such as Firefox or Chrome - 

https://launchpadlibrarian.net/15351076/PDF%20for%20Bug%20240398.pdf

and behold - search functions works flawlessly. 

why can't these be patched to Evince/Poppler?
Comment 13 Albert Astals Cid 2014-11-30 20:39:10 UTC
Because there's no patch for it. alex has proven he can't provide a valid patch.
Comment 14 Khaled Hosny 2015-11-23 10:06:24 UTC
I attached two patches to bug 55977 that should handle searching RTL text to a reasonable level, I wonder if this bug is more appropriate bug for those two patches?
Comment 15 GitLab Migration User 2018-08-21 10:35:30 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/274.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.