|Summary:||PIM: unexpected search result|
|Product:||SyncEvolution||Reporter:||Patrick Ohly <patrick.ohly>|
|Component:||SyncEvolution||Assignee:||Patrick Ohly <patrick.ohly>|
|Status:||RESOLVED NOTABUG||QA Contact:|
|i915 platform:||i915 features:|
|Bug Depends on:|
search pattern  = match all
search for "1"
search for "月"
search for "1月"
Description Patrick Ohly 2013-05-02 06:52:29 UTC
From a bug report received via email: I made the following test (in synceval, I'm planning to make the same with EDS) using the vCards Attached (jp.vCards.tgz); Basically these are the 3 names used in the vCard: - 1月 - 111 - Bad I used as reference this: http://demo.icu-project.org/icu-bin/locexp?d_=en&x=col&_=ja In the txt files there is the dbus-monitor dump of the 3 search tests I made: - Search.All.txt Here I used "" as filter and I got correctly all the 3 vCards - Search.1.txt Here I used "[ 'any-contains' , '1' ]" as filter and I retrieved correctly the first two vCards ( "1月" and "111" ) - Search.1月.txt Here I used "[ 'any-contains' , '1月' ]" as filter and I retrieved the first two vCards ( "1月" and "111" ) I expected to see only the first one. - Search.月.txt Here I used "[ 'any-contains' , '月' ]" as filter and I retrieved all the vCards It seems that the non pure ASCII characters are not evaluated in the search options.
Comment 2 Patrick Ohly 2013-05-02 06:54:02 UTC
Created attachment 78767 [details] search pattern  = match all
Comment 6 Patrick Ohly 2013-05-02 07:43:34 UTC
What is the locale that is set when doing this search? There could be an interaction with boost::locale::fold_case() involved here, because the default mode for "any-contains" is case-insentive.
Comment 7 Eugenio Parodi 2013-05-02 08:39:40 UTC
The locale was: LANG= LANGUAGE= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= It seems solved the problem with: LANG=ja_JP.utf8 LANGUAGE= LC_CTYPE="ja_JP.utf8" LC_NUMERIC="ja_JP.utf8" LC_TIME="ja_JP.utf8" LC_COLLATE="ja_JP.utf8" LC_MONETARY="ja_JP.utf8" LC_MESSAGES="ja_JP.utf8" LC_PAPER="ja_JP.utf8" LC_NAME="ja_JP.utf8" LC_ADDRESS="ja_JP.utf8" LC_TELEPHONE="ja_JP.utf8" LC_MEASUREMENT="ja_JP.utf8" LC_IDENTIFICATION="ja_JP.utf8" LC_ALL=ja_JP.utf8 Thanks.
Comment 8 Patrick Ohly 2013-05-03 06:34:48 UTC
(In reply to comment #7) > The locale was: ... > LC_COLLATE="POSIX" ... > It seems solved the problem with: ... > LC_COLLATE="ja_JP.utf8" POSIX probably didn't enable UTF-8 support, thus breaking the 1月 when making each byte lower-case according to an ASCII character mapping. Any *.utf8 locale should be fine for searching. Picking the right one for the local country becomes more important for sorting.
Comment 9 Patrick Ohly 2013-05-06 10:16:21 UTC
Eugenio, FYI... In your Search.All.txt there was another bug: 111 must come before 1月 when using the Japanese collation. SyncEvolution was returning 1月 before 111 when using POSIX. Choosing ja_JP.UTF-8 as locale also fixes the sorting. I've added a testpim.py tests for Japanese. Adding more test cases for other regions will be simple, someone just needs to define test cases, searches and expected results: @timeout(60) @property("ENV", "LC_TYPE=ja_JP.UTF-8 LC_ALL=ja_JP.UTF-8 LANG=ja_JP.UTF-8") def testFilterJapanese(self): self.doFilter([u'''BEGIN:VCARD VERSION:3.0 FN:1月 N:1月;;;04; END:VCARD ''', u'''BEGIN:VCARD VERSION:3.0 FN:111 N:111;;;54; END:VCARD ''', u'''BEGIN:VCARD VERSION:3.0 FN:Bad N:Bad;;;08; END:VCARD ''' ], # All contacts. ('111', u'1月', 'Bad'), # Query + expected results. ((, ('111', u'1月', 'Bad')), ([['any-contains', '1']], ('111', u'1月')), ([['any-contains', u'1月']], (u'1月',))) )