Bug 55977 - handling of rtl text inversion is too naive
Summary: handling of rtl text inversion is too naive
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 2981
  Show dependency treegraph
 
Reported: 2012-10-14 18:18 UTC by alex
Modified: 2016-01-11 20:57 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
the patch itself, made on poppler 0.20.3. please give it a try. (11.72 KB, text/plain)
2012-10-14 18:18 UTC, alex
Details
a small test file to illustrate the bug. solved in the patch (10.46 KB, application/pdf)
2012-10-14 18:22 UTC, alex
Details
the patch, slightly upgraded (18.41 KB, text/plain)
2012-10-16 22:47 UTC, alex
Details
a small upgrade, again (19.18 KB, patch)
2012-10-21 06:42 UTC, alex
Details | Splinter Review
PDF test case for RTL wrong handling (69.51 KB, application/octet-stream)
2012-11-04 22:54 UTC, Germán Poo-Caamaño
Details
reorder pdf text for rtl support. works with getText and partially with findText (17.55 KB, patch)
2012-11-28 08:27 UTC, alex
Details | Splinter Review
allow dumping and finding of rtl text in logical order, using icu only (12.50 KB, patch)
2013-01-28 11:29 UTC, alex
Details | Splinter Review
logical rtl text support through icu (50.44 KB, patch)
2013-07-15 08:13 UTC, alex
Details | Splinter Review
logical rtl text support through icu. Makefile.in's removed (37.82 KB, patch)
2013-11-06 18:01 UTC, alex
Details | Splinter Review
Alex patch rebased (31.90 KB, patch)
2013-11-08 04:11 UTC, Jose Aliste
Details | Splinter Review
test for rtl with numbers (10.46 KB, application/pdf)
2014-01-03 22:59 UTC, alex
Details
Patch to use ICU for RTL text reordering (36.43 KB, patch)
2015-07-14 15:06 UTC, Adam Reichold
Details | Splinter Review
Patch to use ICU for RTL text reordering (38.33 KB, patch)
2015-07-14 20:25 UTC, Adam Reichold
Details | Splinter Review
Patch to use ICU for RTL text reordering (38.25 KB, patch)
2015-07-15 17:30 UTC, Adam Reichold
Details | Splinter Review
Patch to use ICU for RTL text reordering (38.30 KB, patch)
2015-07-15 19:58 UTC, Adam Reichold
Details | Splinter Review
Handle right-to-left text in search (8.20 KB, patch)
2015-11-18 14:27 UTC, Khaled Hosny
Details | Splinter Review
Handle right-to-left text in search (7.18 KB, patch)
2015-11-18 15:01 UTC, Khaled Hosny
Details | Splinter Review
Fix finding Arabic Presentation Forms ligatures (3.90 KB, patch)
2015-11-23 10:01 UTC, Khaled Hosny
Details | Splinter Review

Description alex 2012-10-14 18:18:51 UTC
Created attachment 68555 [details]
the patch itself, made on poppler 0.20.3. please give it a try.

textpage::dumpfragment makes an attempt to reorder text in it's original directionality. i mean, reorder visual to logical order.

however, this is generally a very heavy issue to implement, handled (to my knowledge of foss) only in icu. 
fribidi does only reorder logical text to visual, which may also help here, but it's not the real thing.

here is a patch to implement the icu algorithm, with a fribidi fallback, in poppler.

i did it since the current attempt is not quite correct. a test file to test with pdftotext will be attached too.

please feel free to comment an enjoy.

best regards,
alex
Comment 1 alex 2012-10-14 18:22:00 UTC
Created attachment 68556 [details]
a small test file to illustrate the bug. solved in the patch
Comment 2 Albert Astals Cid 2012-10-14 20:46:47 UTC
Hi, thanks for the contribution!

Some weak points of the patch (have not really looked at the contents yet, just at the form):
 * You are missing support for the cmake build system
 * You are exposing the #if HAVE_ICU in TextOutputDev.h (for a variable i never see you assign or change)
 * If i have icu and fribidi it will say that both are used, when that's not true (please switch it to something like the cms check that says "with lcms" or "with lcms2" depending on the library used)
Comment 3 alex 2012-10-15 07:17:48 UTC
(In reply to comment #2)
> Hi, thanks for the contribution!

hello, thanks a lot for considering my patch. and more for your work on this great project.
> 
> Some weak points of the patch (have not really looked at the contents yet,
> just at the form):
>  * You are missing support for the cmake build system
i don't know cmake, but will happily add support for it. how and where?
>  * You are exposing the #if HAVE_ICU in TextOutputDev.h (for a variable i
> never see you assign or change)
the reorderingMode enum has values taken from an icu header. indeed the enum declaration itself should depend on have_icu. will fix.
>  * If i have icu and fribidi it will say that both are used, when that's not
> true (please switch it to something like the cms check that says "with lcms"
> or "with lcms2" depending on the library used)
indeed, this might need some thought in configure.ac. like trying fribidi only if icu is missing. this is the behaviour in the TextOutputDev anyway. what would you think?

best regards,
alex
Comment 4 Albert Astals Cid 2012-10-16 21:45:10 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Hi, thanks for the contribution!
> 
> hello, thanks a lot for considering my patch. and more for your work on this
> great project.
> > 
> > Some weak points of the patch (have not really looked at the contents yet,
> > just at the form):
> >  * You are missing support for the cmake build system
> i don't know cmake, but will happily add support for it. how and where?

Have a look at CMakeLists.txt and the various find modules at cmake/

> >  * You are exposing the #if HAVE_ICU in TextOutputDev.h (for a variable i
> > never see you assign or change)
> the reorderingMode enum has values taken from an icu header. indeed the enum
> declaration itself should depend on have_icu. will fix.

But you don't use the enum anywhere, do you? I mean i only see the same value being passed everywhere

> >  * If i have icu and fribidi it will say that both are used, when that's not
> > true (please switch it to something like the cms check that says "with lcms"
> > or "with lcms2" depending on the library used)
> indeed, this might need some thought in configure.ac. like trying fribidi
> only if icu is missing. this is the behaviour in the TextOutputDev anyway.
> what would you think?

Yes, makes sense
Comment 5 alex 2012-10-16 22:41:38 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > Hi, thanks for the contribution!
> > 
> > hello, thanks a lot for considering my patch. and more for your work on this
> > great project.
> > > 
> > > Some weak points of the patch (have not really looked at the contents yet,
> > > just at the form):
> > >  * You are missing support for the cmake build system
> > i don't know cmake, but will happily add support for it. how and where?
> 
> Have a look at CMakeLists.txt and the various find modules at cmake/
> 
> > >  * You are exposing the #if HAVE_ICU in TextOutputDev.h (for a variable i
> > > never see you assign or change)
> > the reorderingMode enum has values taken from an icu header. indeed the enum
> > declaration itself should depend on have_icu. will fix.
> 
> But you don't use the enum anywhere, do you? I mean i only see the same
> value being passed everywhere
> 
> > >  * If i have icu and fribidi it will say that both are used, when that's not
> > > true (please switch it to something like the cms check that says "with lcms"
> > > or "with lcms2" depending on the library used)
> > indeed, this might need some thought in configure.ac. like trying fribidi
> > only if icu is missing. this is the behaviour in the TextOutputDev anyway.
> > what would you think?
> 
> Yes, makes sense

(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > Hi, thanks for the contribution!
> > 
> > hello, thanks a lot for considering my patch. and more for your work on this
> > great project.
> > > 
> > > Some weak points of the patch (have not really looked at the contents yet,
> > > just at the form):
> > >  * You are missing support for the cmake build system
> > i don't know cmake, but will happily add support for it. how and where?
> 
> Have a look at CMakeLists.txt and the various find modules at cmake/
i've looked there and managed to find modules for icu and fribidi.
but cmakelists.txt needs more expertise, so i've only gave it a dry try. please help here.
> 
> > >  * You are exposing the #if HAVE_ICU in TextOutputDev.h (for a variable i
> > > never see you assign or change)
> > the reorderingMode enum has values taken from an icu header. indeed the enum
> > declaration itself should depend on have_icu. will fix.
> 
> But you don't use the enum anywhere, do you? I mean i only see the same
> value being passed everywhere
oh, it's being used as a new optional arg for some methods. my new patch is better in this respect too.
> 
> > >  * If i have icu and fribidi it will say that both are used, when that's not
> > > true (please switch it to something like the cms check that says "with lcms"
> > > or "with lcms2" depending on the library used)
> > indeed, this might need some thought in configure.ac. like trying fribidi
> > only if icu is missing. this is the behaviour in the TextOutputDev anyway.
> > what would you think?
> 
> Yes, makes sense
did it too in this new patch.
Comment 6 alex 2012-10-16 22:47:40 UTC
Created attachment 68652 [details]
the patch, slightly upgraded


added an option to suppress bidirectional text processing.
added cmake modules for icu and fribidi, but didn't know how to update cmakelist.txt
please hear your opinion.
Comment 7 alex 2012-10-21 06:39:50 UTC
again, a small upgrade to the patch
Comment 8 alex 2012-10-21 06:42:52 UTC
Created attachment 68860 [details] [review]
a small upgrade, again

a small upgrade, again
Comment 9 Germán Poo-Caamaño 2012-11-04 22:54:27 UTC
Created attachment 69530 [details]
PDF test case for RTL wrong handling

This PDF test case was reported originally in launchpad
https://bugs.launchpad.net/ubuntu/+source/evince/+bug/240398

It is not possible to search text in RTL.  The only way to do it,
it is by writing in reverse order.  For instance, I want to search
for "word" I have to write "drow", still the highlighted text is 
wrong.
Comment 10 Albert Astals Cid 2012-11-14 13:02:06 UTC
I still don't understand why you need the ReorderingMode enum.

Can you explain what is its use case?
Comment 11 Albert Astals Cid 2012-11-19 21:09:36 UTC
alex, the feature freeze for poppler 0.22 is in 3 days (Nov 22) so it'd be cool if you could answer what we need ReorderingMode for so we can try to squeeze it in. Otherwise it'll have to wait for poppler 0.24
Comment 12 alex 2012-11-27 11:20:00 UTC
(In reply to comment #10)
> I still don't understand why you need the ReorderingMode enum.
> 
> Can you explain what is its use case?

sorry, i wasn't cc and have more than 80 open firefox tabs.
the reordering mode is an arg to icu as to nuances of the visual to logical conversion.
better stisfy your curiosity here ;). http://icu-project.org/apiref/icu4j/com/ibm/icu/text/Bidi.html#setReorderingMode%28int%29

ReorderingNotNeeded will skip this functionality alltogether.
Comment 13 alex 2012-11-27 11:26:14 UTC
(In reply to comment #9)
> Created attachment 69530 [details]
> PDF test case for RTL wrong handling
> 
> This PDF test case was reported originally in launchpad
> https://bugs.launchpad.net/ubuntu/+source/evince/+bug/240398
> 
> It is not possible to search text in RTL.  The only way to do it,
> it is by writing in reverse order.  For instance, I want to search
> for "word" I have to write "drow", still the highlighted text is 
> wrong.

i could search in both english and hebrew with evince, while libpoppler with #2981 was installed.

best regards,
alex
Comment 14 Albert Astals Cid 2012-11-27 11:33:58 UTC
You're still not convincing me as why we *need* an enum there.

Why would a user of this functions pass different values of the enum? In which situations? Are you expecting the code that uses the function to call it with different values of the enum? Why? When?
Comment 15 alex 2012-11-27 14:28:49 UTC
(In reply to comment #14)
> You're still not convincing me as why we *need* an enum there.
> 
> Why would a user of this functions pass different values of the enum? In
> which situations? Are you expecting the code that uses the function to call
> it with different values of the enum? Why? When?

i'll sincerely tell you that this might not be the best place to put this option.

maybe the reordering method should better be an attribute of the document, to be given or autodetected at document load, by analyzing info on the program that created it. this should need a table with various known programs etc.

however this pair of patches fix (almostly) 2 main aspects of rtl support shortcomings, it'll whet my taste to later inquire poppler more deeply in order to give a more complete solution, that would include rtl text selection too. 
that would work higher in the document load process and (probably) move the reordering method to the poppler doc creation, as an optional parameter, as i said.

best regards,
alex
Comment 16 Albert Astals Cid 2012-11-27 14:31:24 UTC
Sorry but no, our API is crap enough already to add stuff that you know you will remove later. If you don't need this enum, don't add it.

And if you need it tell me why, i still don't see this enum used anywhere.
Comment 17 alex 2012-11-27 16:40:13 UTC
(In reply to comment #16)
> Sorry but no, our API is crap enough already to add stuff that you know you
> will remove later. If you don't need this enum, don't add it.
> 
> And if you need it tell me why, i still don't see this enum used anywhere.

the value of the choice is being used in the icu invocation.
np to remove it at the level of findtext and dumpfragment. 
but i'll only ask for some guidance on how could i access the pdf doc in dumpfragment and findtext, so that i could put the method choice at the level of the doc loading.
again, this is a matter of nuance as to the flavour of rtl visualization methods.
later, a default choice could be intelligently guessed based on various doc meta data.
would this be ok?
Comment 18 alex 2012-11-28 08:22:03 UTC
(In reply to comment #16)
> Sorry but no, our API is crap enough already to add stuff that you know you
> will remove later. If you don't need this enum, don't add it.
> 
> And if you need it tell me why, i still don't see this enum used anywhere.

i'm moving the enum into the TextPage object, to avoid polution and lots of extra parameters.
the reordering mode parameter has a default value.
the patch will shrink greatly, and includes my fix for #2981 too.
it adds to the api nothing i'd remove in the future. only things i'd further build upon: the reordering_mode option to the TextPage constructor.
yet a complete solution would require moving the reordering process to the core of text loading.
here's the new patch.
Comment 19 alex 2012-11-28 08:27:48 UTC
Created attachment 70723 [details] [review]
reorder pdf text for rtl support. works with getText and partially with findText

albert you were right about too much extra api pollution.
hope this will be small enough.
Comment 20 Albert Astals Cid 2012-12-01 00:41:23 UTC
Do we really need to support both icu and freebidi? Looks like a maintaince nightmare.

What's the license of FindFriBiDi.cmake? Can you clear it up from wesnoth stuff?

Did you write FindICU.cmake yourself?
Comment 21 alex 2012-12-03 11:09:09 UTC
(In reply to comment #20)
> Do we really need to support both icu and freebidi? Looks like a maintaince
> nightmare.
> 
> What's the license of FindFriBiDi.cmake? Can you clear it up from wesnoth
> stuff?
> 
> Did you write FindICU.cmake yourself?

i have to apologize here. i have no idea of cmake stuff.

just searched the net to provide the files, but don't know how to use them.
here is the license for wesnoth(GPL):
http://wiki.wesnoth.org/Wesnoth:Copyrights

for visual to logical approximation, icu is far more suitable than fribidi, but fribidi is smaller and did good work the far i could test.
however, fribidi is meant for logical to visual reordering, while we mainly need the opposite here.
if icu would lack on some build system, i'd definitely settle for fribidi, even for visual to logical approximation.

alex
Comment 22 alex 2012-12-09 14:56:38 UTC
well, i see not many of us are interested in this issue.

it shouldn't hurt your great project.

i'll take the burden of patching my packages` copies for myself.

thanks anyway,
alex
Comment 23 Albert Astals Cid 2012-12-09 23:19:31 UTC
Why do you say so? I've spent hours trying to understand your patches, saying that we are not interested in getting this upstream is simply a plain lie.

We just need to get a proper understanding of what you are doing, a simple code drop is not enough, code needs to be good enough so that in 5 years when not you nor me are around someone can understand why this was done when looking at the git log.

So now i am asking, are you interested in getting this in poppler? Because we are.
Comment 24 alex 2013-01-10 16:14:46 UTC
hello albert,

i see i was wrong. shame on me.

> Why do you say so? I've spent hours trying to understand your patches,
> saying that we are not interested in getting this upstream is simply a plain
> lie.
i did'n blame you for failing to cooperate. on the contrary, it was my failure with the cmake and answering on the wrong place.
just my frustration of the failure to insert something i found useful.
> 
> We just need to get a proper understanding of what you are doing, a simple
> code drop is not enough, code needs to be good enough so that in 5 years
> when not you nor me are around someone can understand why this was done when
> looking at the git log.
of course it's important to control the source. no problem.
> 
> So now i am asking, are you interested in getting this in poppler? Because
> we are.
of course i'm interested. that would extend the user base of poppler, which is a great thing already.

feel free to ask me anything you wish. this page will remain open for me to poll your questions.

best regards and apologies,
alex
Comment 25 Albert Astals Cid 2013-01-22 19:39:20 UTC
Ok, let's go back here. I think we (that is you) should decide if ICU or Fribidi. There is no need to support two libraries that do the same, just decide on "the best" and let's go from there.

What do you think is the best of those two?
Comment 26 alex 2013-01-24 08:11:00 UTC
hey albert,

sorry to have been pushy. it just turned out to be harder to lobby my patches than to even make them.
but i promise to be more patient :).

(In reply to comment #25)
> Ok, let's go back here. I think we (that is you) should decide if ICU or
> Fribidi. There is no need to support two libraries that do the same, just
> decide on "the best" and let's go from there.
> 
> What do you think is the best of those two?

categorically, icu does provide the needed functionality. 
fribidi is just smaller, but not really meant for visual to logical text reordering.
so should i remove the fribidi part of my patches?
Comment 27 Albert Astals Cid 2013-01-24 18:02:43 UTC
If icu is better, let's go with icu. Yes please kill the fribidi parts.
Comment 28 alex 2013-01-25 10:20:18 UTC
(In reply to comment #27)
> If icu is better, let's go with icu. Yes please kill the fribidi parts.

on my way, np :)
Comment 29 alex 2013-01-28 11:29:36 UTC
Created attachment 73767 [details] [review]
allow dumping and finding of rtl text in logical order, using icu only


hey albert,

this patch does the same as my previous, but with icu only. fribidi has been removed.

cmake related stuff is there, but i have no idea on how to use it. my env is debian linux with gnome.

as for finding (a) better place(s) to store reordering_mode, i'd like to consult you.

thanks for considering my patch,
alex
Comment 30 Albert Astals Cid 2013-02-10 23:03:59 UTC
(In reply to comment #29)
> Created attachment 73767 [details] [review] [review]
> allow dumping and finding of rtl text in logical order, using icu only
> 
> 
> hey albert,
> 
> this patch does the same as my previous, but with icu only. fribidi has been
> removed.

You are removing lots of code, e.g. the "// Note: This code treats numeric characters (European and", why? Can't we use that when ICU is not there?

> cmake related stuff is there, but i have no idea on how to use it. my env is
> debian linux with gnome.

And what's the problem? cmake works perfectly in debian linux with gnome :-)

> as for finding (a) better place(s) to store reordering_mode, i'd like to
> consult you.

Sure, what's the question?

> 
> thanks for considering my patch,
> alex
Comment 31 alex 2013-02-14 12:01:35 UTC
(In reply to comment #30)
> (In reply to comment #29)
> You are removing lots of code, e.g. the "// Note: This code treats numeric
> characters (European and", why? Can't we use that when ICU is not there?

it's not working properly, at all. rather try fribidi.
> 
> > cmake related stuff is there, but i have no idea on how to use it. my env is
> > debian linux with gnome.
> 
> And what's the problem? cmake works perfectly in debian linux with gnome :-)
debian is building using gnu make and friends, hence i've never encountered cmake before. no other problem on my part. :-)
> 
> > as for finding (a) better place(s) to store reordering_mode, i'd like to
> > consult you.
> 
> Sure, what's the question?
> 
the reordering mode helps taking us back to the logical (typing order) text that would be visualized in the order shown by the doc. 
on windows systems, ReorderingNumbersSpecial would normally prevail, while on pure unicode systems: ReorderingLikeDirect.
this assumption should guide the text conversion, according to the client that is requesting it: either by the very system, in case of a stand alone or local process application, or by the remote client's mode in case of a server performing the conversion.

in our case, i'd define a reordering_mode at the TextOutputDev object level, with the local system as default, that would be overrideable in the getText and findText methods calls, especially when called from different systems.
should the reordering mode be selectable in pdftotext too?

should i then move the reordering mode enum to the TextOutputDev object?

what would you think?
alex
Comment 32 Albert Astals Cid 2013-02-18 23:43:42 UTC
(In reply to comment #31)
> (In reply to comment #30)
> > (In reply to comment #29)
> > You are removing lots of code, e.g. the "// Note: This code treats numeric
> > characters (European and", why? Can't we use that when ICU is not there?
> 
> it's not working properly, at all. rather try fribidi.

Trying fribidi is not the correct answer, we have already settled that ICU is better. What I am asking if we should have that code as better than nothing when the dependencies are not there. So in a world where i can't have ICU, does having that code you removed gives better results than no code at all?

> > 
> > > cmake related stuff is there, but i have no idea on how to use it. my env is
> > > debian linux with gnome.
> > 
> > And what's the problem? cmake works perfectly in debian linux with gnome :-)
> debian is building using gnu make and friends, hence i've never encountered
> cmake before. no other problem on my part. :-)

Ok, then i don't understand your complain/comment

> > 
> > > as for finding (a) better place(s) to store reordering_mode, i'd like to
> > > consult you.
> > 
> > Sure, what's the question?
> > 
> the reordering mode helps taking us back to the logical (typing order) text
> that would be visualized in the order shown by the doc. 
> on windows systems, ReorderingNumbersSpecial would normally prevail, while
> on pure unicode systems: ReorderingLikeDirect.
> this assumption should guide the text conversion, according to the client
> that is requesting it: either by the very system, in case of a stand alone
> or local process application, or by the remote client's mode in case of a
> server performing the conversion.
> 
> in our case, i'd define a reordering_mode at the TextOutputDev object level,
> with the local system as default, that would be overrideable in the getText
> and findText methods calls, especially when called from different systems.
> should the reordering mode be selectable in pdftotext too?
> 
> should i then move the reordering mode enum to the TextOutputDev object?

I still don't see why we really need this enum. If you create an enum, it means you'll probably end up with someone exposing that as an option in the UI that lets users choose this, and i don't see any of the programs i use asking me which RTL handling method I want to use. Can't we just do *the right thing*?

Cheers,
  Albert

> 
> what would you think?
> alex
Comment 33 alex 2013-02-24 08:36:38 UTC
(In reply to comment #32)
> (In reply to comment #31)
> > (In reply to comment #30)
> > > (In reply to comment #29)
> > > You are removing lots of code, e.g. the "// Note: This code treats numeric
> > > characters (European and", why? Can't we use that when ICU is not there?
> > 
> > it's not working properly, at all. rather try fribidi.
> 
> Trying fribidi is not the correct answer, we have already settled that ICU
> is better. What I am asking if we should have that code as better than
> nothing when the dependencies are not there. So in a world where i can't
> have ICU, does having that code you removed gives better results than no
> code at all?
sorry, it's not better than nothing. btw, the code before lastly patched could be treated as "better than nothing", but not this one.
> 
> > > 
> > > > cmake related stuff is there, but i have no idea on how to use it. my env is
> > > > debian linux with gnome.
> > > 
> > > And what's the problem? cmake works perfectly in debian linux with gnome :-)
> > debian is building using gnu make and friends, hence i've never encountered
> > cmake before. no other problem on my part. :-)
> 
> Ok, then i don't understand your complain/comment
no complaint on my part, just an excuse for not adding icu in the cmake build system on account of my ignorance.
> 
> > > 
> > > > as for finding (a) better place(s) to store reordering_mode, i'd like to
> > > > consult you.
> > > 
> > > Sure, what's the question?
> > > 
> > the reordering mode helps taking us back to the logical (typing order) text
> > that would be visualized in the order shown by the doc. 
> > on windows systems, ReorderingNumbersSpecial would normally prevail, while
> > on pure unicode systems: ReorderingLikeDirect.
> > this assumption should guide the text conversion, according to the client
> > that is requesting it: either by the very system, in case of a stand alone
> > or local process application, or by the remote client's mode in case of a
> > server performing the conversion.
> > 
> > in our case, i'd define a reordering_mode at the TextOutputDev object level,
> > with the local system as default, that would be overrideable in the getText
> > and findText methods calls, especially when called from different systems.
> > should the reordering mode be selectable in pdftotext too?
> > 
> > should i then move the reordering mode enum to the TextOutputDev object?
> 
> I still don't see why we really need this enum. If you create an enum, it
> means you'll probably end up with someone exposing that as an option in the
> UI that lets users choose this, and i don't see any of the programs i use
> asking me which RTL handling method I want to use. Can't we just do *the
> right thing*?
the right thing should be default, based on the os of the system the dumped text is requested on.
as i said, this will be determined at compilation time for stand alone, but determined at run time on systems serving clients on various platforms.
yet, maybe the os checking could be not enough, and the document could contain additional hints for choosing this default.

what i was asking you is your opinion on moving this enum higher in the objects hierarchy, in order to further use it as an optional parameter when creating the TextOutputDev object.
> 
> Cheers,
>   Albert
> 
thanks,
alex
Comment 34 Albert Astals Cid 2013-03-03 18:02:01 UTC
(In reply to comment #33)
> (In reply to comment #32)
> > (In reply to comment #31)
> > > (In reply to comment #30)
> > > > (In reply to comment #29)
> > > > You are removing lots of code, e.g. the "// Note: This code treats numeric
> > > > characters (European and", why? Can't we use that when ICU is not there?
> > > 
> > > it's not working properly, at all. rather try fribidi.
> > 
> > Trying fribidi is not the correct answer, we have already settled that ICU
> > is better. What I am asking if we should have that code as better than
> > nothing when the dependencies are not there. So in a world where i can't
> > have ICU, does having that code you removed gives better results than no
> > code at all?
> sorry, it's not better than nothing. btw, the code before lastly patched
> could be treated as "better than nothing", but not this one.

Hmmm, not sure i understand what you mean here, what is "the code before lastly patched"?

> > > > 
> > > > > as for finding (a) better place(s) to store reordering_mode, i'd like to
> > > > > consult you.
> > > > 
> > > > Sure, what's the question?
> > > > 
> > > the reordering mode helps taking us back to the logical (typing order) text
> > > that would be visualized in the order shown by the doc. 
> > > on windows systems, ReorderingNumbersSpecial would normally prevail, while
> > > on pure unicode systems: ReorderingLikeDirect.
> > > this assumption should guide the text conversion, according to the client
> > > that is requesting it: either by the very system, in case of a stand alone
> > > or local process application, or by the remote client's mode in case of a
> > > server performing the conversion.
> > > 
> > > in our case, i'd define a reordering_mode at the TextOutputDev object level,
> > > with the local system as default, that would be overrideable in the getText
> > > and findText methods calls, especially when called from different systems.
> > > should the reordering mode be selectable in pdftotext too?
> > > 
> > > should i then move the reordering mode enum to the TextOutputDev object?
> > 
> > I still don't see why we really need this enum. If you create an enum, it
> > means you'll probably end up with someone exposing that as an option in the
> > UI that lets users choose this, and i don't see any of the programs i use
> > asking me which RTL handling method I want to use. Can't we just do *the
> > right thing*?
> the right thing should be default, based on the os of the system the dumped
> text is requested on.
> as i said, this will be determined at compilation time for stand alone, but
> determined at run time on systems serving clients on various platforms.

What do you mean by "systems serving clients on various platforms"?

> yet, maybe the os checking could be not enough, and the document could
> contain additional hints for choosing this default.

With could do you mean "I know there is a PDF hint for that" or you mean "maybe there is a PDF hint for that"?

> what i was asking you is your opinion on moving this enum higher in the
> objects hierarchy, in order to further use it as an optional parameter when
> creating the TextOutputDev object.

Probably TextOutputDev makes more sense, I have not much experience in RTL (as it shows ;-)) but i don't think it would make sense to have a TextPage of a document using one method and the next TextPage of the same document using a different method, no?

> > 
> > Cheers,
> >   Albert
> > 
> thanks,
> alex
Comment 35 alex 2013-03-24 20:35:08 UTC
(In reply to comment #34)

hello albert,

> (In reply to comment #33)
> > (In reply to comment #32)
> > > (In reply to comment #31)
> > > > (In reply to comment #30)
> > > > > (In reply to comment #29)
> > > > > You are removing lots of code, e.g. the "// Note: This code treats numeric
> > > > > characters (European and", why? Can't we use that when ICU is not there?
> > > > 
> > > > it's not working properly, at all. rather try fribidi.
> > > 
> > > Trying fribidi is not the correct answer, we have already settled that ICU
> > > is better. What I am asking if we should have that code as better than
> > > nothing when the dependencies are not there. So in a world where i can't
> > > have ICU, does having that code you removed gives better results than no
> > > code at all?
> > sorry, it's not better than nothing. btw, the code before lastly patched
> > could be treated as "better than nothing", but not this one.
> 
> Hmmm, not sure i understand what you mean here, what is "the code before
> lastly patched"?
i've used the text output before, and it had a very rudimentary, but fairly decent 
rtl approximation. but this got broken by a patch more recently applied, that comes 
from xpdf. this situation made me work for a best of breed solution.
> 
> > > > > 
> > > > > > as for finding (a) better place(s) to store reordering_mode, i'd like to
> > > > > > consult you.
> > > > > 
> > > > > Sure, what's the question?
> > > > > 
> > > > the reordering mode helps taking us back to the logical (typing order) text
> > > > that would be visualized in the order shown by the doc. 
> > > > on windows systems, ReorderingNumbersSpecial would normally prevail, while
> > > > on pure unicode systems: ReorderingLikeDirect.
> > > > this assumption should guide the text conversion, according to the client
> > > > that is requesting it: either by the very system, in case of a stand alone
> > > > or local process application, or by the remote client's mode in case of a
> > > > server performing the conversion.
> > > > 
> > > > in our case, i'd define a reordering_mode at the TextOutputDev object level,
> > > > with the local system as default, that would be overrideable in the getText
> > > > and findText methods calls, especially when called from different systems.
> > > > should the reordering mode be selectable in pdftotext too?
> > > > 
> > > > should i then move the reordering mode enum to the TextOutputDev object?
> > > 
> > > I still don't see why we really need this enum. If you create an enum, it
> > > means you'll probably end up with someone exposing that as an option in the
> > > UI that lets users choose this, and i don't see any of the programs i use
> > > asking me which RTL handling method I want to use. Can't we just do *the
> > > right thing*?
> > the right thing should be default, based on the os of the system the dumped
> > text is requested on.
> > as i said, this will be determined at compilation time for stand alone, but
> > determined at run time on systems serving clients on various platforms.
> 
> What do you mean by "systems serving clients on various platforms"?
a server application may have clients that run on different os, with different 
rtl reordering algorithms.
> 
> > yet, maybe the os checking could be not enough, and the document could
> > contain additional hints for choosing this default.
> 
> With could do you mean "I know there is a PDF hint for that" or you mean
> "maybe there is a PDF hint for that"?
well, it's just maybe. the hints i'm talking about may come from some headers 
mentioning the software used to create the pdf etc...
> 
> > what i was asking you is your opinion on moving this enum higher in the
> > objects hierarchy, in order to further use it as an optional parameter when
> > creating the TextOutputDev object.
> 
> Probably TextOutputDev makes more sense, I have not much experience in RTL
> (as it shows ;-)) but i don't think it would make sense to have a TextPage
> of a document using one method and the next TextPage of the same document
> using a different method, no?
just right, and i'll perform the change :).

cheers,
alex
> 
> > > 
> > > Cheers,
> > >   Albert
> > > 
> > thanks,
> > alex
Comment 36 Albert Astals Cid 2013-03-25 21:11:51 UTC
(In reply to comment #35)
> (In reply to comment #34)
> 
> hello albert,
> 
> > (In reply to comment #33)
> > > (In reply to comment #32)
> > > > (In reply to comment #31)
> > > > > (In reply to comment #30)
> > > > > > (In reply to comment #29)
> > > > > > You are removing lots of code, e.g. the "// Note: This code treats numeric
> > > > > > characters (European and", why? Can't we use that when ICU is not there?
> > > > > 
> > > > > it's not working properly, at all. rather try fribidi.
> > > > 
> > > > Trying fribidi is not the correct answer, we have already settled that ICU
> > > > is better. What I am asking if we should have that code as better than
> > > > nothing when the dependencies are not there. So in a world where i can't
> > > > have ICU, does having that code you removed gives better results than no
> > > > code at all?
> > > sorry, it's not better than nothing. btw, the code before lastly patched
> > > could be treated as "better than nothing", but not this one.
> > 
> > Hmmm, not sure i understand what you mean here, what is "the code before
> > lastly patched"?
> i've used the text output before, and it had a very rudimentary, but fairly
> decent 
> rtl approximation. but this got broken by a patch more recently applied,
> that comes 
> from xpdf. this situation made me work for a best of breed solution.
> > 
> > > > > > 
> > > > > > > as for finding (a) better place(s) to store reordering_mode, i'd like to
> > > > > > > consult you.
> > > > > > 
> > > > > > Sure, what's the question?
> > > > > > 
> > > > > the reordering mode helps taking us back to the logical (typing order) text
> > > > > that would be visualized in the order shown by the doc. 
> > > > > on windows systems, ReorderingNumbersSpecial would normally prevail, while
> > > > > on pure unicode systems: ReorderingLikeDirect.

I never asked, but does this make sense? Does it mean that while reading a given text you get different RTL text on Windows than on Linux? Why would anyone want that? 


> > What do you mean by "systems serving clients on various platforms"?
> a server application may have clients that run on different os, with
> different 
> rtl reordering algorithms.

Sure, this is a nice theoretical scenario, do you really see poppler used in that scenario? I guess it could, I think my biggest brain problem at the moment is the one i made in the previous question, why would i want to see different text depending if i'm a windows user or not


> > > yet, maybe the os checking could be not enough, and the document could
> > > contain additional hints for choosing this default.
> > 
> > With could do you mean "I know there is a PDF hint for that" or you mean
> > "maybe there is a PDF hint for that"?
> well, it's just maybe. the hints i'm talking about may come from some
> headers 
> mentioning the software used to create the pdf etc...

I don't think this is an option, it means guessing and playing like the browser and the "web developers" do, it's prone to break.
Comment 37 alex 2013-07-15 08:09:27 UTC
hello albert,

(In reply to comment #36)
i didn't want to talk so much and give nothing.

here is a patch that also covers choosing the reordering algorithm in pdftotext.

> (In reply to comment #35)
> > (In reply to comment #34)
> > 
> > hello albert,
> > 
> > > (In reply to comment #33)
> > > > (In reply to comment #32)
> > > > > (In reply to comment #31)
> > > > > > (In reply to comment #30)
> > > > > > > (In reply to comment #29)
> > > > > > > You are removing lots of code, e.g. the "// Note: This code treats numeric
> > > > > > > characters (European and", why? Can't we use that when ICU is not there?
> > > > > > 
> > > > > > it's not working properly, at all. rather try fribidi.
> > > > > 
> > > > > Trying fribidi is not the correct answer, we have already settled that ICU
> > > > > is better. What I am asking if we should have that code as better than
> > > > > nothing when the dependencies are not there. So in a world where i can't
> > > > > have ICU, does having that code you removed gives better results than no
> > > > > code at all?
> > > > sorry, it's not better than nothing. btw, the code before lastly patched
> > > > could be treated as "better than nothing", but not this one.
> > > 
> > > Hmmm, not sure i understand what you mean here, what is "the code before
> > > lastly patched"?
> > i've used the text output before, and it had a very rudimentary, but fairly
> > decent 
> > rtl approximation. but this got broken by a patch more recently applied,
> > that comes 
> > from xpdf. this situation made me work for a best of breed solution.
> > > 
> > > > > > > 
> > > > > > > > as for finding (a) better place(s) to store reordering_mode, i'd like to
> > > > > > > > consult you.
> > > > > > > 
> > > > > > > Sure, what's the question?
> > > > > > > 
> > > > > > the reordering mode helps taking us back to the logical (typing order) text
> > > > > > that would be visualized in the order shown by the doc. 
> > > > > > on windows systems, ReorderingNumbersSpecial would normally prevail, while
> > > > > > on pure unicode systems: ReorderingLikeDirect.
> 
> I never asked, but does this make sense? Does it mean that while reading a
> given text you get different RTL text on Windows than on Linux? Why would
> anyone want that? 
> 
though i can't say the reason, there is a fact that windows rtl reordering 
algorithm is different from the unicode one. that means, that in corner cases, 
the same text would show differently on these platforms.
the user is usually not aware of this difference, since (s)he would usually 
enter the text by wysiwyg tools that handle saving the text and rendering it.

so, if i've created some text with ms-word, i'd like to render it to pdf as 
word would show it. then, to further edit the same text on a linux machine, 
to see exactly the same i'd need to have a slightly different text.

indeed, it seems too much but it's free, and fully supported in icu.
> 
> > > What do you mean by "systems serving clients on various platforms"?
> > a server application may have clients that run on different os, with
> > different 
> > rtl reordering algorithms.
> 
> Sure, this is a nice theoretical scenario, do you really see poppler used in
> that scenario? I guess it could, I think my biggest brain problem at the
> moment is the one i made in the previous question, why would i want to see
> different text depending if i'm a windows user or not
> 
the reason to choose the algorithm to use is the same reason to choose to use 
no rtl reordering, and show the text in a dumb terminal window.

my own case is a web server application, with clients based on various platforms.
> 
> > > > yet, maybe the os checking could be not enough, and the document could
> > > > contain additional hints for choosing this default.
> > > 
> > > With could do you mean "I know there is a PDF hint for that" or you mean
> > > "maybe there is a PDF hint for that"?
> > well, it's just maybe. the hints i'm talking about may come from some
> > headers 
> > mentioning the software used to create the pdf etc...
> 
> I don't think this is an option, it means guessing and playing like the
> browser and the "web developers" do, it's prone to break.

after rethinking, there's no obvious reason to check where the pdf was created, 
so i'd leave this to the explicit decision of the user/client.
Comment 38 alex 2013-07-15 08:13:00 UTC
Created attachment 82433 [details] [review]
logical rtl text support through icu
Comment 39 Albert Astals Cid 2013-08-25 19:43:22 UTC
Alex please accept my apologies for taking again so long into looking at this patch (note this may happen again since i'll be on holiday for the most of september).

Two things:
 * Your patch is patching stuff like Makefile.in which tells me you're not using git diff but developing about a packaged version. That makes no sense for a new feature like this you should develop against git master
 * As said, new features like this can only enter git master for the next version. If i try to apply this patch to current git master, the patch fails to apply. Can you please update the patch so that it does apply against git master?
Comment 40 alex 2013-09-04 13:34:05 UTC
(In reply to comment #39)

hello albert,

> Alex please accept my apologies for taking again so long into looking at
> this patch (note this may happen again since i'll be on holiday for the most
> of september).
no problem. enjoy your hollyday. you're already doing much for this project, while my contribution is quite a niche one.
> 
> Two things:
>  * Your patch is patching stuff like Makefile.in which tells me you're not
> using git diff but developing about a packaged version. That makes no sense
> for a new feature like this you should develop against git master
indeed i'm developing against old poppler 0.20.5, as it's installed on my debian system.
that's because my fear the compiled libraries will not be integrated in my running system otherwise.
makefile.in files are indeed in the patch, as generated by the debian package building utility. but i have naturally done no change to these files, and they may be safely removed from the patch. i'll do it if you wish.
>  * As said, new features like this can only enter git master for the next
> version. If i try to apply this patch to current git master, the patch fails
> to apply. Can you please update the patch so that it does apply against git
> master?
as said.

thanks and enjoy your hollyday,
alex
Comment 41 Albert Astals Cid 2013-09-25 19:30:32 UTC
What we need is a patch that we can apply against the current code in git master. So please develop against git master, otherwise it doesn't make any sense.
Comment 42 alex 2013-11-06 18:01:16 UTC
Created attachment 88773 [details] [review]
logical rtl text support through icu. Makefile.in's removed

Makefile.in's removed
Comment 43 alex 2013-11-06 18:13:52 UTC
(In reply to comment #41)
hello albert,

sorry for the long time it took me to come to this.
> What we need is a patch that we can apply against the current code in git
> master. So please develop against git master, otherwise it doesn't make any
> sense.
indeed you're right, but my environment, debian, having all the programs that are using libpoppler, usually uses released versions of all libraries.
so, to be able to test my patches, i needed to play by the system's rules.

please be understanding to this issue. in the case the patch is far from compatibility to master, leave the fixing issue to me.

best regards,
alex
Comment 44 Albert Astals Cid 2013-11-07 23:08:35 UTC
Your patch is not appliable to master, if it can not be applied it can't be compiled, so it can't be tested.

Please give us a patch that we can apply to master.
Comment 45 Jose Aliste 2013-11-08 04:11:55 UTC
Created attachment 88871 [details] [review]
Alex patch rebased

So This is just alex's patch rebased against git master. There were two or three conflicts that seemed "trivial" to me, but I haven't have time to test it throughly. Actually in a quick test, copying the text and pasting it to a text editor, the text in hebrew is inverted, which I don't think is the intended here? I made this so it could probably help alex work. 


@Alex, if you need a proper development for poppler, maybe you can try something like jhbuild that allows you to compile cairo and other poppler deps in a controlled way.
Comment 46 alex 2014-01-03 22:56:13 UTC
(In reply to comment #45)
> Created attachment 88871 [details] [review] [review]
> Alex patch rebased
> 
> So This is just alex's patch rebased against git master. There were two or
> three conflicts that seemed "trivial" to me, but I haven't have time to test
> it throughly. Actually in a quick test, copying the text and pasting it to a
> text editor, the text in hebrew is inverted, which I don't think is the
> intended here? I made this so it could probably help alex work. 
> 
> 
> @Alex, if you need a proper development for poppler, maybe you can try
> something like jhbuild that allows you to compile cairo and other poppler
> deps in a controlled way.

hello jose, 

a big thank for your trying to help.
i'm building poppler in the debian fashion, using the patches that come with debian poppler and my own ones, in order to improve poppler without breaking my debian/gnome environment.

i'm not sure how did you test the library. i'd rather use pdftotext like that:
pdftotext -fixed 9 -reorder default test.pdf output.txt
i'll upload test.pdf

as my environment is hebrew enabled, would you please send me the output file?

thanks in advance,
alex
Comment 47 alex 2014-01-03 22:59:56 UTC
Created attachment 91475 [details]
test for rtl with numbers

please try with pdftotext -fixed 9 -reorder default test.pdf output.txt
Comment 48 Albert Astals Cid 2014-02-18 23:07:18 UTC
The patch is not applying again. And the cmake build is broken. Also why is printReorderings in a separate file and has my name as copyright?
Comment 49 Adam Reichold 2015-07-14 15:06:52 UTC
Created attachment 117108 [details] [review]
Patch to use ICU for RTL text reordering

Hello,

I was made aware of this bug/patch by an RTL-using acquaintance and would like to try to help to get this into shape for inclusion into master. I took attachment 88871 [details] [review] and tried to clean it up manually, making sure that CMake and autotools build systems work properly, that the current code paths stay intact and that the new conditional compilation is easy to follow.

I did not yet do any functional testing beyond running the usual unit tests. I think extending those with a dedicated RTL test case seems in order? I will try to get help some from my acquaintance on that.

I did also remove the extensions to pdftotext and the reordering name parsing in GlobalParams to keep the patch as short as possible for now.

Best regards, Adam.
Comment 50 Albert Astals Cid 2015-07-14 19:49:53 UTC
Adding things to globalparams is bad, we should be removing stuff, not adding new stuff.
Comment 51 Adam Reichold 2015-07-14 20:25:47 UTC
Created attachment 117118 [details] [review]
Patch to use ICU for RTL text reordering

Hello again,

no disagreement here, just tried to stay close to the original. Attached is an updated patch that uses the TextOutputDev constructor to pass the parameter only which should yield the same results and should also be usable for tools like pdftotext. (It also removes the redundant storage of the rawOrder flag from the TextOutputDev.)

Best regards, Adam.
Comment 52 Albert Astals Cid 2015-07-15 08:07:01 UTC
This regresses

-A new variational method for the 𝑝(𝑥)-Laplacian equation
+A new variational method for the <U+D835><U+DC5D>(<U+D835><U+DC65>)-Laplacian equation

in http://www.maths.mq.edu.au/~ross/5019-e-cmap.pdf
Comment 53 Adam Reichold 2015-07-15 17:30:20 UTC
Created attachment 117145 [details] [review]
Patch to use ICU for RTL text reordering

Hello again,

I think this is due to an error in the original patch, i.e. feeding the UnicodeMap with the UChar code points instead of the (already reconverted) Unicode data which this patch version fixes.

Best regards, Adam.
Comment 54 Adam Reichold 2015-07-15 19:58:43 UTC
Created attachment 117148 [details] [review]
Patch to use ICU for RTL text reordering

Rebased patch on current master to resolve merge conflict in CMakeLists.txt...
Comment 55 Albert Astals Cid 2015-08-31 23:08:29 UTC
What is this patch supposed to improve?

By running pdftotext i see diffs that are

-<U+202B>ﺳﻠﻄﻨﺔ ﻋﻤﺎﻥ<U+202C>
-<U+202B>ﻭﺯﺍﺭﺓ ﺍﻻﻋــﻼﻡ<U+202C>
+ﺳﻠﻄﻨﺔ ﻋﻤﺎﻥ
+ﻭﺯﺍﺭﺓ ﺍﻻﻋــﻼﻡ


which don't seem a big/important change to be honest.
Comment 56 Khaled Hosny 2015-11-18 14:27:56 UTC
Created attachment 119906 [details] [review]
Handle right-to-left text in search

I think there are two different issues here; 1) Searching RTL text does not work 2) the “algorithm” used to fix RTL text in text dumping might be too simplistic.

I think 1) is the more pressing issue and it can be reasonably fixed without introducing any new dependencies at all, as we can simply reuse the existing text dumping code.

2) might benefit from using ICU, and I just noticed today mirrored characters (like brackets) are not reversed back. But that a far minor issue that no search working at all.

I’m attaching a patch that fixes just 1), based slightly on the latest attached patch, but without all the ICU stuff and with providing a reorderText() function similar to the non-ICU dumpReorderedText(). There is code duplication here that I don’t feel easy about, but trying to use one function for both cases results on lots of if conditions that makes the resulting code very unreadable, so I’m not sure which is better.

I think this patch can go on, while the merits of introducing a huge dependency like ICU for such a minor gain is assessed.
Comment 57 Adam Reichold 2015-11-18 14:46:19 UTC
(In reply to Khaled Hosny from comment #56)
> I’m attaching a patch that fixes just 1), based slightly on the latest
> attached patch, but without all the ICU stuff and with providing a
> reorderText() function similar to the non-ICU dumpReorderedText(). There is
> code duplication here that I don’t feel easy about, but trying to use one
> function for both cases results on lots of if conditions that makes the
> resulting code very unreadable, so I’m not sure which is better.
> 
> I think this patch can go on, while the merits of introducing a huge
> dependency like ICU for such a minor gain is assessed.

I do agree with Khaled's assessment of main issue being search not reordering at all. I also like the patch as it will also allow us to easily introduce other reordering algorithms in the style of the original patch by splitting it out into helper functions.

From a purely technical point of view, I would like to completely remove any hand-written reordering code as it only duplicates what libraries like ICU provide and incompletely at that. But then shoehorning this in by converting back and forth between Poppler's and ICU's representations in those helper functions also seems like a stopgap measure and a proper solution would probably involve aligning or even replacing Poppler's internal text representation with that of ICU (or whatever other library one chooses to use). Given the inertia of a code base like Poppler's, this seems rather improbable to happen any time so.

So my guess is that we should use Khaled's patch and leave ICU out for now so that the user's immediate issue is resolved.
Comment 58 Khaled Hosny 2015-11-18 15:01:55 UTC
Created attachment 119908 [details] [review]
Handle right-to-left text in search

Here is an updated patch that shares the reordering code, a bit ugly but I think better than the duplicated code.
Comment 59 Khaled Hosny 2015-11-23 10:01:58 UTC
Created attachment 120046 [details] [review]
Fix finding Arabic Presentation Forms ligatures

PDF text containing Arabic Presentation forms ligatures is still not
found after the previous commit.

This because the ligatures are decomposed in logical order after
normalisation, while the whole string is in visual order.  For example
the RTL text ABCD in visual order will be DCBA, and assuming B is a
ligature, it will be decomposed to B1B2 so the string after
normalization will be DCB1B2A while we are expecting it to be DCB2B1A.

This patch reverses the order of the decomposition of RTL characters to
work around this issue.

This is to be applied on top of the previous patch.
Comment 60 Khaled Hosny 2015-12-23 22:00:04 UTC
Any comments on the patches I posted, any chance to get them in the next release?
Comment 61 Albert Astals Cid 2016-01-03 12:00:39 UTC
I've pushed the two patches by Khaled, I'd sincerely appreciate if people agreed to close this bug and a new one with clearer scope was created, since this bug has always been a bit of a dump of ideas in my mind.
Comment 62 Adam Reichold 2016-01-03 12:12:52 UTC
(In reply to Albert Astals Cid from comment #61)
> I've pushed the two patches by Khaled, I'd sincerely appreciate if people
> agreed to close this bug and a new one with clearer scope was created, since
> this bug has always been a bit of a dump of ideas in my mind.

I agree that the scope is rather undefined and that closing this after committing Khaled's patch seem reasonable. I still think we should try to remove the internal text representation and reordering code using ICU at some point in the future, but that should definitely not keep this bug open.
Comment 63 Khaled Hosny 2016-01-03 20:56:11 UTC
I agree that this issue should be closed and a new issue for improving the reverse bidi algorithm can be opened. So far the current algorithm seems to be good enough for my limited tests.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.