Bug 61104 - Search across newlines
Summary: Search across newlines
Status: RESOLVED DUPLICATE of bug 11381
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL: https://bugzilla.gnome.org/show_bug.c...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-19 08:24 UTC by Germán Poo-Caamaño
Modified: 2013-11-18 21:56 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Screenshot of different line break handling in acroread and evince (429.92 KB, image/jpeg)
2013-02-19 08:24 UTC, Germán Poo-Caamaño
Details

Description Germán Poo-Caamaño 2013-02-19 08:24:02 UTC
Created attachment 75098 [details]
Screenshot of different line break handling in acroread and evince

As reported in https://bugzilla.gnome.org/show_bug.cgi?id=622160 and https://bugs.kde.org/show_bug.cgi?id=300992:

"[...] as shown in the attached screenshot, seems to be twofold:

1.) sentences spanning across line breaks are not recognized as continuous and
aren't taken up by the inbuilt search (lower part of screenshot)

2.) single phrases spanning across line breaks aren't recognized as being
continuous, either. There does not seem to be any difference between hyphenated
and regular phrases in this. Searching for "main-tenance" in the example above
doesn't return any results, either.

Neither of these problems exist in proprietary solutions such as Adobe Reader
or Foxit. I think it can be argued that fixing this issue is quite important as
it greatly diminishes the inbuilt search capabilities of evince."

When I can reproduce the same behavior with in poppler-glib-demo (poppler 0.22.1) with any document with hyphenated word and regular phrases as explained in the bug report.
Comment 1 Albert Astals Cid 2013-02-19 08:43:03 UTC
FWIW Okular has its own text searching routines (so the same routines are used for pdf, dvi, etc) so the Okular bug is "unrelated" to this one.
Comment 2 Germán Poo-Caamaño 2013-11-18 19:16:14 UTC

*** This bug has been marked as a duplicate of bug 11381 ***
Comment 3 Fredrik Wendt 2013-11-18 19:44:18 UTC
@Germán: I understand that this may seem very similar, but the current algorithm in poppler is (page and) line based.
We will have to come up with two algorithmic changes in order to solve this bug as well as bug 11381, as the model of pages and searching in poppler doesn't really support returning matches that span more than one page.

I intend to spend another hour trying to fix this bug and provide a patch. If I succeed, I assume it's OK to reopen this bug?
Comment 4 Fredrik Wendt 2013-11-18 21:56:09 UTC
Gah, struggling with tools and environment here, and there weren't really any unit tests to continue to work off of. Perhaps we could team up at FOSDEM?! :-)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.