Bug 56524

Summary: PIM Manager: accent-insensitive and transliterated search
Product: SyncEvolution Reporter: Patrick Ohly <patrick.ohly>
Component: SyncEvolutionAssignee: Patrick Ohly <patrick.ohly>
Status: RESOLVED FIXED QA Contact:
Severity: enhancement    
Priority: high CC: murrayc, syncevolution-issues
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 56141    

Description Patrick Ohly 2012-10-29 14:17:58 UTC
Currently, searching can be case-insensitive and case-sensitive. The difference between characters with and without accent is still relevant, so "Muller" does not find "Müller".

It could be useful to also ignore accents. One way to implement this might be:
- decompose as NFKD
- throw away characters which are modifiers or accents (exact filter to be decided)

See:
http://unicode.org/reports/tr15/#Norm_Forms
Comment 1 Patrick Ohly 2013-10-10 09:50:31 UTC
The short-term goal is to do the same as in EDS, which has an utility function for stripping accents. For the sake of consistency with case-sensitivity, accent-sensitivity should be "off" by default.

Long term it may be worthwhile to investigate better ways of searching, for example a method which also matches "ü" against "ue". See http://userguide.icu-project.org/collation/icu-string-search-service
Comment 2 Patrick Ohly 2013-10-27 19:18:06 UTC
Another, related request was to match foreign and Latin ways of writing the same text, for example 江 = jiāng (= Jiang when ignoring case and accents).

I've decided to treat this the same way as accent-insensitive search, meaning that the default will be as relaxed as possible: case-insensitive, accent-insensitive, and transliterated. I've finished the implementation (including updated tests) and now only need to check for regressions, then merge into master.

Here's the updated README:

    For text values, the default search without explicit flags is
    very tolerant, meaning that it ignores quite a few differences
    between search term and value. The default search:
    - transliterates any foreign script in search term and values
      to Latin before comparison, thus finding 江 when searching
      for Jiang and vice-versa
    - is case-insensitive
    - is accent-insensitive

    Case and accent differences get removed after the optional
    transliteration. Spaces between words always matter.

    This behavior can be modified by giving additional,
    optional flags after the search value:
    'case-insensitive' - force case-insensitive search (available for the sake
    of consistency and just in case, should the default ever change)
    'case-sensitive' - force case-sensitive search
    'accent-insensitive', 'accent-sensitive' - same for accents
    'transliteration' - force transliteration, i.e. explicitly choose the
    current default
    'no-transliteration' - disable transliteration
Comment 3 Patrick Ohly 2013-11-19 16:07:23 UTC
Implemented in "master" branch, included in 1.3.99.6.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.