Currently, searching can be case-insensitive and case-sensitive. The difference between characters with and without accent is still relevant, so "Muller" does not find "Müller".
It could be useful to also ignore accents. One way to implement this might be:
- decompose as NFKD
- throw away characters which are modifiers or accents (exact filter to be decided)
The short-term goal is to do the same as in EDS, which has an utility function for stripping accents. For the sake of consistency with case-sensitivity, accent-sensitivity should be "off" by default.
Long term it may be worthwhile to investigate better ways of searching, for example a method which also matches "ü" against "ue". See http://userguide.icu-project.org/collation/icu-string-search-service
Another, related request was to match foreign and Latin ways of writing the same text, for example 江 = jiāng (= Jiang when ignoring case and accents).
I've decided to treat this the same way as accent-insensitive search, meaning that the default will be as relaxed as possible: case-insensitive, accent-insensitive, and transliterated. I've finished the implementation (including updated tests) and now only need to check for regressions, then merge into master.
Here's the updated README:
For text values, the default search without explicit flags is
very tolerant, meaning that it ignores quite a few differences
between search term and value. The default search:
- transliterates any foreign script in search term and values
to Latin before comparison, thus finding 江 when searching
for Jiang and vice-versa
- is case-insensitive
- is accent-insensitive
Case and accent differences get removed after the optional
transliteration. Spaces between words always matter.
This behavior can be modified by giving additional,
optional flags after the search value:
'case-insensitive' - force case-insensitive search (available for the sake
of consistency and just in case, should the default ever change)
'case-sensitive' - force case-sensitive search
'accent-insensitive', 'accent-sensitive' - same for accents
'transliteration' - force transliteration, i.e. explicitly choose the
'no-transliteration' - disable transliteration
Implemented in "master" branch, included in 184.108.40.206.