From a bug report received via email:
I made the following test (in synceval, I'm planning to make the same with EDS)
using the vCards Attached (jp.vCards.tgz);
Basically these are the 3 names used in the vCard:
I used as reference this:
In the txt files there is the dbus-monitor dump of the 3 search tests I made:
Here I used "" as filter and I got correctly all the 3 vCards
Here I used "[ 'any-contains' , '1' ]" as filter and I retrieved correctly the first two vCards ( "1月" and "111" )
Here I used "[ 'any-contains' , '1月' ]" as filter and I retrieved the first two vCards ( "1月" and "111" )
I expected to see only the first one.
Here I used "[ 'any-contains' , '月' ]" as filter and I retrieved all the vCards
It seems that the non pure ASCII characters are not evaluated in the search options.
Created attachment 78766 [details]
Created attachment 78767 [details]
search pattern  = match all
Created attachment 78768 [details]
search for "1"
Created attachment 78769 [details]
search for "月"
Created attachment 78770 [details]
search for "1月"
What is the locale that is set when doing this search?
There could be an interaction with boost::locale::fold_case() involved here, because the default mode for "any-contains" is case-insentive.
The locale was:
It seems solved the problem with:
(In reply to comment #7)
> The locale was:
> It seems solved the problem with:
POSIX probably didn't enable UTF-8 support, thus breaking the 1月 when making each byte lower-case according to an ASCII character mapping. Any *.utf8 locale should be fine for searching. Picking the right one for the local country becomes more important for sorting.
In your Search.All.txt there was another bug: 111 must come before 1月 when using the Japanese collation. SyncEvolution was returning 1月 before 111 when using POSIX. Choosing ja_JP.UTF-8 as locale also fixes the sorting.
I've added a testpim.py tests for Japanese. Adding more test cases for other regions will be simple, someone just needs to define test cases, searches and expected results:
@property("ENV", "LC_TYPE=ja_JP.UTF-8 LC_ALL=ja_JP.UTF-8 LANG=ja_JP.UTF-8")
# All contacts.
('111', u'1月', 'Bad'),
# Query + expected results.
((, ('111', u'1月', 'Bad')),
([['any-contains', '1']], ('111', u'1月')),
([['any-contains', u'1月']], (u'1月',)))