Bug 11381

Summary: Search should optionally be whitespace insensitive
Product: poppler Reporter: Peter Lyons <pete>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: enhancement    
Priority: medium CC: chpe, freedesktop, gpoo+bfdo, nshmyrev, phil.ganchev
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Peter Lyons 2007-06-26 11:28:44 UTC
= Transfering this bug from GNOME Bugzilla: http://bugzilla.gnome.org/show_bug.cgi?id=408299 =


When searching a PDF document for a particular phrase containing spaces (as
opposed to a single keyword), search will not match occurances of that phrase
if they occur in a paragraph and have been laid out with a line break in the
middle of the phrase.  This behavior is sometimes desired. However, I would
like an option to search where all whitespace is considered equivalent.  This
would be useful for normal paragraphs-of-text type traditional documents. 
Perhaps this mode should be default, even.

For example, if my document contains: "around the\nbend" where \n is a line
break, and my search text is "around the bend", I would like search to match in
this case.  Basically, if spaces in the search string would be treated as a
multiline regex \s character class, that should do it.


------------------------

 Comment #3 from Pablo Rodríguez  (points: 10)
2007-04-19 17:51 UTC [reply]

I guess that whitespaces should be treated the same way than in regexp
searches, because otherwise it might be confusing and potentially misleading
for users. Regexp search should be an alternative to the standard search.

But this is my personal opinion.


Comment #4 from Philip Ganchev (points: 6)
2007-04-19 20:18 UTC [reply]

You mean standard search should ignore line breaks, but regexp search should
not.  This is what I think too.


Comment #5 from Peter Lyons (reporter, points: 2)
2007-04-19 20:32 UTC [reply]

That seems good to me.  Regular non-tech searchers get all whitespace treated
as equal, but if you want to be advanced and do a regex search, you know about
whitespace and want your regex to work just as it would in a programming
language.
Comment 1 Philip Ganchev 2007-12-23 03:25:34 UTC
Hyphens should also be treated as whitespace.

It should also be optionally diacritic insensitive: http://bugzilla.gnome.org/show_bug.cgi?id=418189

Perhaps this should be in the same option as whitespace insensitive?
Comment 2 Falk Kühnel 2008-01-19 00:27:12 UTC
Actually hyphens should be ignored when searching text. and whitespaces after hyphens aswell. Otherwise you will not find a match on hyphenated words.
Comment 3 Germán Poo-Caamaño 2013-11-18 19:16:14 UTC
*** Bug 61104 has been marked as a duplicate of this bug. ***
Comment 4 Fredrik Wendt 2013-11-18 20:10:01 UTC
I assume that https://bugzilla.gnome.org/show_bug.cgi?id=652909 is related too - "find" doesn't match when the text is "small caps".
Comment 5 Germán Poo-Caamaño 2018-06-25 17:18:32 UTC
*** Bug 9648 has been marked as a duplicate of this bug. ***
Comment 6 GitLab Migration User 2018-08-20 21:44:46 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/56.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.