Created attachment 75098 [details] Screenshot of different line break handling in acroread and evince As reported in https://bugzilla.gnome.org/show_bug.cgi?id=622160 and https://bugs.kde.org/show_bug.cgi?id=300992: "[...] as shown in the attached screenshot, seems to be twofold: 1.) sentences spanning across line breaks are not recognized as continuous and aren't taken up by the inbuilt search (lower part of screenshot) 2.) single phrases spanning across line breaks aren't recognized as being continuous, either. There does not seem to be any difference between hyphenated and regular phrases in this. Searching for "main-tenance" in the example above doesn't return any results, either. Neither of these problems exist in proprietary solutions such as Adobe Reader or Foxit. I think it can be argued that fixing this issue is quite important as it greatly diminishes the inbuilt search capabilities of evince." When I can reproduce the same behavior with in poppler-glib-demo (poppler 0.22.1) with any document with hyphenated word and regular phrases as explained in the bug report.
FWIW Okular has its own text searching routines (so the same routines are used for pdf, dvi, etc) so the Okular bug is "unrelated" to this one.
*** This bug has been marked as a duplicate of bug 11381 ***
@Germán: I understand that this may seem very similar, but the current algorithm in poppler is (page and) line based. We will have to come up with two algorithmic changes in order to solve this bug as well as bug 11381, as the model of pages and searching in poppler doesn't really support returning matches that span more than one page. I intend to spend another hour trying to fix this bug and provide a patch. If I succeed, I assume it's OK to reopen this bug?
Gah, struggling with tools and environment here, and there weren't really any unit tests to continue to work off of. Perhaps we could team up at FOSDEM?! :-)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.