64140 – PIM: unexpected search result

Bug 64140 - PIM: unexpected search result

Summary: PIM: unexpected search result

Status:	RESOLVED NOTABUG

Alias:	None

Product:	SyncEvolution
Classification:	Unclassified
Component:	SyncEvolution (show other bugs)
Version:	1.3.99.3
Hardware:	Other All

Importance:	highest blocker
Assignee:	Patrick Ohly
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	55916
	Show dependency tree / graph

Reported:	2013-05-02 06:52 UTC by Patrick Ohly
Modified:	2013-05-06 10:16 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
test vcards (290 bytes, text/plain) 2013-05-02 06:53 UTC, Patrick Ohly	Details
search pattern [] = match all (5.08 KB, text/plain) 2013-05-02 06:54 UTC, Patrick Ohly	Details
search for "1" (3.70 KB, text/plain) 2013-05-02 06:54 UTC, Patrick Ohly	Details
search for "月" (4.95 KB, text/plain) 2013-05-02 06:55 UTC, Patrick Ohly	Details
search for "1月" (3.91 KB, text/plain) 2013-05-02 06:55 UTC, Patrick Ohly	Details
View All

Description Patrick Ohly 2013-05-02 06:52:29 UTC

From a bug report received via email:

I made the following test (in synceval, I'm planning to make the same with EDS)
using the vCards Attached (jp.vCards.tgz);
Basically these are the 3 names used  in the vCard:
 -   1月
 -   111
 -   Bad
I used as reference this:
http://demo.icu-project.org/icu-bin/locexp?d_=en&x=col&_=ja

In the txt files there is the dbus-monitor dump of the 3 search tests I made:

 - Search.All.txt
 Here I used "[]" as filter and I got correctly all the 3 vCards

 - Search.1.txt
 Here I used "[ 'any-contains' , '1' ]" as filter and I retrieved correctly the first two vCards ( "1月" and "111" ) 

 - Search.1月.txt
 Here I used "[ 'any-contains' , '1月' ]" as filter and I retrieved the first two vCards ( "1月" and "111" )
 I expected to see only the first one.

- Search.月.txt
 Here I used "[ 'any-contains' , '月' ]" as filter and I retrieved all the vCards

It seems that the non pure ASCII  characters are not evaluated in the search options.

Comment 1 Patrick Ohly 2013-05-02 06:53:33 UTC

Created attachment 78766 [details]
test vcards

Comment 2 Patrick Ohly 2013-05-02 06:54:02 UTC

Created attachment 78767 [details]
search pattern [] = match all

Comment 3 Patrick Ohly 2013-05-02 06:54:21 UTC

Created attachment 78768 [details]
search for "1"

Comment 4 Patrick Ohly 2013-05-02 06:55:08 UTC

Created attachment 78769 [details]
search for "月"

Comment 5 Patrick Ohly 2013-05-02 06:55:27 UTC

Created attachment 78770 [details]
search for "1月"

Comment 6 Patrick Ohly 2013-05-02 07:43:34 UTC

What is the locale that is set when doing this search?

There could be an interaction with boost::locale::fold_case() involved here, because the default mode for "any-contains" is case-insentive.

Comment 7 Eugenio Parodi 2013-05-02 08:39:40 UTC

The locale was:
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

It seems solved the problem with:
LANG=ja_JP.utf8
LANGUAGE=
LC_CTYPE="ja_JP.utf8"
LC_NUMERIC="ja_JP.utf8"
LC_TIME="ja_JP.utf8"
LC_COLLATE="ja_JP.utf8"
LC_MONETARY="ja_JP.utf8"
LC_MESSAGES="ja_JP.utf8"
LC_PAPER="ja_JP.utf8"
LC_NAME="ja_JP.utf8"
LC_ADDRESS="ja_JP.utf8"
LC_TELEPHONE="ja_JP.utf8"
LC_MEASUREMENT="ja_JP.utf8"
LC_IDENTIFICATION="ja_JP.utf8"
LC_ALL=ja_JP.utf8

Thanks.

Comment 8 Patrick Ohly 2013-05-03 06:34:48 UTC

(In reply to comment #7)
> The locale was:
...
> LC_COLLATE="POSIX"
...
> It seems solved the problem with:
...
> LC_COLLATE="ja_JP.utf8"

POSIX probably didn't enable UTF-8 support, thus breaking the 1月 when making each byte lower-case according to an ASCII character mapping. Any *.utf8 locale should be fine for searching. Picking the right one for the local country becomes more important for sorting.

Comment 9 Patrick Ohly 2013-05-06 10:16:21 UTC

Eugenio, FYI...

In your Search.All.txt there was another bug: 111 must come before 1月 when using the Japanese collation. SyncEvolution was returning 1月 before 111 when using POSIX. Choosing ja_JP.UTF-8 as locale also fixes the sorting.

I've added a testpim.py tests for Japanese. Adding more test cases for other regions will be simple, someone just needs to define test cases, searches and expected results:

    @timeout(60)
    @property("ENV", "LC_TYPE=ja_JP.UTF-8 LC_ALL=ja_JP.UTF-8 LANG=ja_JP.UTF-8")
    def testFilterJapanese(self):
         self.doFilter([u'''BEGIN:VCARD
VERSION:3.0
FN:1月
N:1月;;;04;
END:VCARD
''',

u'''BEGIN:VCARD
VERSION:3.0
FN:111
N:111;;;54;
END:VCARD
''',

u'''BEGIN:VCARD
VERSION:3.0
FN:Bad
N:Bad;;;08;
END:VCARD
'''
],
                       # All contacts.
                       ('111', u'1月', 'Bad'),
                       # Query + expected results.
                       (([], ('111', u'1月', 'Bad')),
                        ([['any-contains', '1']], ('111', u'1月')),
                        ([['any-contains', u'1月']], (u'1月',)))
                       )

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.