Bug 61104

Summary: Search across newlines
Product: poppler Reporter: Germán Poo-Caamaño <gpoo+bfdo>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED DUPLICATE QA Contact:
Severity: normal    
Priority: medium CC: glutanimate
Version: unspecified   
Hardware: Other   
OS: All   
URL: https://bugzilla.gnome.org/show_bug.cgi?id=622160
Whiteboard:
i915 platform: i915 features:
Attachments: Screenshot of different line break handling in acroread and evince

Description Germán Poo-Caamaño 2013-02-19 08:24:02 UTC
Created attachment 75098 [details]
Screenshot of different line break handling in acroread and evince

As reported in https://bugzilla.gnome.org/show_bug.cgi?id=622160 and https://bugs.kde.org/show_bug.cgi?id=300992:

"[...] as shown in the attached screenshot, seems to be twofold:

1.) sentences spanning across line breaks are not recognized as continuous and
aren't taken up by the inbuilt search (lower part of screenshot)

2.) single phrases spanning across line breaks aren't recognized as being
continuous, either. There does not seem to be any difference between hyphenated
and regular phrases in this. Searching for "main-tenance" in the example above
doesn't return any results, either.

Neither of these problems exist in proprietary solutions such as Adobe Reader
or Foxit. I think it can be argued that fixing this issue is quite important as
it greatly diminishes the inbuilt search capabilities of evince."

When I can reproduce the same behavior with in poppler-glib-demo (poppler 0.22.1) with any document with hyphenated word and regular phrases as explained in the bug report.
Comment 1 Albert Astals Cid 2013-02-19 08:43:03 UTC
FWIW Okular has its own text searching routines (so the same routines are used for pdf, dvi, etc) so the Okular bug is "unrelated" to this one.
Comment 2 Germán Poo-Caamaño 2013-11-18 19:16:14 UTC

*** This bug has been marked as a duplicate of bug 11381 ***
Comment 3 Fredrik Wendt 2013-11-18 19:44:18 UTC
@Germán: I understand that this may seem very similar, but the current algorithm in poppler is (page and) line based.
We will have to come up with two algorithmic changes in order to solve this bug as well as bug 11381, as the model of pages and searching in poppler doesn't really support returning matches that span more than one page.

I intend to spend another hour trying to fix this bug and provide a patch. If I succeed, I assume it's OK to reopen this bug?
Comment 4 Fredrik Wendt 2013-11-18 21:56:09 UTC
Gah, struggling with tools and environment here, and there weren't really any unit tests to continue to work off of. Perhaps we could team up at FOSDEM?! :-)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.