Bug 64177

Summary:	PIM: more complex searching (done) and sorting (open)
Product:	SyncEvolution	Reporter:	Patrick Ohly <patrick.ohly>
Component:	SyncEvolution	Assignee:	SyncEvolution Community <syncevolution-issues>
Status:	RESOLVED MOVED	QA Contact:
Severity:	enhancement
Priority:	medium	CC:	syncevolution-issues
Version:	1.3.99.3
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	56141

Description Patrick Ohly 2013-05-03 07:23:46 UTC

Searching could be extended with and/or operations and more complex match terms, like "field X contains/starts-with/ends-with Y".

Sorting could be extended to sort ascending and descending and with a list of fields to use for comparison.

Need full specification for the desired feature.

Comment 1 Patrick Ohly 2013-05-27 17:02:26 UTC

(In reply to comment #0)
> Searching could be extended with and/or operations and more complex match
> terms, like "field X contains/starts-with/ends-with Y".
[...]
> Need full specification for the desired feature.

Here's a proposal for extended searching. This supersedes the current specification in a way which keeps all old searches working:

--------------------------------------------------
Searching
=========

Supported searches:

    [ ] - An empty list matches all contacts.

    [ [ <search> ], [ 'limit', <number> ] ] - a 'limit' search
    term with a number as parameter (formatted as string) can be added
    at the top level term to truncate the search results
    after a certain number of contacts. Example: Search([['any-contains',
    'Joe'], ['limit', '10']]) => return the first 10 Joes.

    As with any other search, the resulting view will be updated if
    contact data changes.

    The limit must not be changed in a RefineSearch(). A 'limit' term
    may (but doesn't have to) be given. If it is given, its value must
    match the value set when creating the search. This limitation
    simplifies the implementation and its testing. The limitation
    could be removed if there is sufficient demand.

    [ [ <search> ] ] - the same as [ <search> ]

    [ 'or', <search1>, <search2>, ... ] - combines 0 to n other
    searches and results in a match if any of the sub-searches
    matches. ['or'] without any sub-search does not match.

    [ 'and', <search1>, <search2>, ... ] - like 'or', but matches if
    and only if all of the sub-searches match.

    [ 'phone', '<number>' ] - Look up a valid phone number (= "caller ID").
    The country code for the current locale is added if no country
    code was given in the number. Phone numbers in the unified address
    book must start with the resulting full number, after being normalized
    the same way.

    In other words:
    - Formatting does not matter.
    - Alpha characters are aliases for numbers on the keypad and match
      their corresponding number.
    - Additional digits in the address book are ignored, only
      the prefix must match (extensions may or may not be included in
      <number>).
    - Phone numbers in the address book which cannot be normalized
      cannot be matched.

    [ 'is|contains|begins_with|ends_with', '<field>', '<text>',
    '<flags>' ] - compares a specified field against the search
    text. For the 'is' operation, the entire field must match, for
    'contains' anywhere inside the value, for 'begins_with' at the
    beginning and for 'ends_with' at the end.

    Fields are referenced as in the contact dictionary (see below), using
    multiple path components if necessary. Supported for matching are:
    'full-name' - string
    'nickname' - string
    'structured-name/family' - string
    'structured-name/given' - string
    'structured-name/additional' - string
    'phones/value' - telephone number
    'emails/value' - string
    'addresses/po-box' - string
    'addresses/extension' - string
    'addresses/street' - string
    'addresses/locality' - string
    'addresses/region' - string
    'addresses/postal-code' - string
    'addresses/country' - string

    The fields referencing value lists ('phones', 'email', 'address')
    check against any of the entries in these lists.

    Except for 'phones/value', all values are treated as text values.
    For text values, the default search without explicit flags is
    case-insensitive and accent-sensitive. Spaces between words
    matter. This behavior can be modified by giving additional,
    optional flags after the search value:
    'case-insensitive' - force case-insensitive search (available for the sake
    of consistency and just in case, should the default ever change)
    'case-sensitive' - force case-sensitive search

    For telephone numbers, only digits are compared. Latin alphabetic
    characters are treated as aliases for digits as they typically
    occur on a keypad or old rotary dial phones ('A', 'b', 'c' map to
    '1', etc.).

    If the full name was not set explicitly for a contact, the
    concatenation of the given, middle and family with a space as
    separator is used instead when matching against the 'full-name'
    field.

    Using the current syntax it is not possible to define searches
    where the *same* value must meet different criteria ("cell phone
    number containing the digits 1234"). Something like that could be
    added as a future extensions, for example by allowing search
    values to have more complex types than the simple '<text>'.  term
    with a more complex type.

    [ 'any-contains', '<text>', <flags> ] - Sub-string search for
    <text> in the following contact values: first, middle or last
    name, formatted name, nick name, phone number, or email
    address. Optional flags include: 'case-insensitive' (the default),
    'case-sensitive'.

    This search is equivalent to:
    [ 'or',
      [ 'contains', 'structured-name/given', '<text>', <flags> ],
      [ 'contains', 'structured-name/additional', '<text>', <flags> ],
      [ 'contains', 'structured-name/family', '<text>', <flags> ],
      [ 'contains', 'full-name', '<text>', <flags> ],
      [ 'contains', 'emails/value', '<text>', <flags> ],
      [ 'contains', 'phones/value', '<text>']
    ]

Note that lookup and search are different: the former is based on a
valid number, the later on user input.

A 'phone' lookup can compare normalized numbers including the country
code, to ensure that the lookup is exact and does not mismatch numbers
from different countries. Heuristics like suffix matching do not do
this correctly in all cases.

An 'any-contains' search is based on user input, which might contain
just some digits in the middle of the phone number. The search ignores
formatting in both input and address book.

Compound searches with 'and' and 'or' are evaluated lazily, from the
first to the last sub-search. Therefore it makes sense to list
sub-searches that are more likely to match first.
--------------------------------------------

Comment 2 Eugenio Parodi 2013-05-28 13:09:39 UTC

This new extended searching criteria are ok.
Just one note regarded:
      -   "is not possible to define searches where
           the *same* value must meet different criteria"
This meas that a query like this is not allowed?

  [ 'or',
    [ 'and',
      [ 'begins_with', 'structured-name/given' , 'a' ],
      [ 'begins_with', 'structured-name/family', 'b' ]
    ],
    [ 'and',
      [ 'begins_with', 'structured-name/given' , 'b' ],
      [ 'begins_with', 'structured-name/family', 'a' ]
    ]
  ]

If not, can it be accepted in the future?
Thanks.

Comment 3 Patrick Ohly 2013-05-28 14:17:34 UTC

(In reply to comment #2)
> This new extended searching criteria are ok.
> Just one note regarded:
>       -   "is not possible to define searches where
>            the *same* value must meet different criteria"
> This meas that a query like this is not allowed?
> 
>   [ 'or',
>     [ 'and',
>       [ 'begins_with', 'structured-name/given' , 'a' ],
>       [ 'begins_with', 'structured-name/family', 'b' ]
>     ],
>     [ 'and',
>       [ 'begins_with', 'structured-name/given' , 'b' ],
>       [ 'begins_with', 'structured-name/family', 'a' ]
>     ]
>   ]
>

That query itself is fine and will be supported. It works as intended because there is only one structured-name.

The comment is about value lists, like telephone numbers. For example, suppose you have two telephone numbers:
TEL:1234
TEL:5678

Now you search
['and',
  ['contains', 'phones/value', '1'],








> If not, can it be accepted in the future?
> Thanks.

Comment 4 Patrick Ohly 2013-05-28 14:25:43 UTC

[Please ignore the previous comment.]

(In reply to comment #2)
> This new extended searching criteria are ok.
> Just one note regarded:
>       -   "is not possible to define searches where
>            the *same* value must meet different criteria"
> This meas that a query like this is not allowed?
> 
>   [ 'or',
>     [ 'and',
>       [ 'begins_with', 'structured-name/given' , 'a' ],
>       [ 'begins_with', 'structured-name/family', 'b' ]
>     ],
>     [ 'and',
>       [ 'begins_with', 'structured-name/given' , 'b' ],
>       [ 'begins_with', 'structured-name/family', 'a' ]
>     ]
>   ]
>

That query itself is fine and will be supported. It works as intended because there is only one structured-name.

The comment was about value lists, like telephone numbers. For example, suppose you have two telephone numbers in the same contact:
TEL:1234
TEL:5678

Now you search
['and',
  ['contains', 'phones/value', '1'],
  ['contains', 'phones/value', '5']
]

This will match the contact, because each of the terms combined with 'and' has a match in the contact: the first term will match '1234' and the second '5678'. This follows from the strict mathematic definition of the operations, but is admittedly not immediately obvious.

I think it is a corner case. It occurred to me when documenting the semantic and I wanted to write it down. Obviously, better documentation would not have triggered a question ;-} Please suggest a better wording.

Sort status update - recursive queries and "or"/"and" are working.

Comment 5 Patrick Ohly 2013-05-28 19:34:07 UTC

This paragraph clearly needs more work:

    Using the current syntax it is not possible to define searches
    where the *same* value must meet different criteria ("cell phone
    number containing the digits 1234"). Something like that could be
    added as a future extensions, for example by allowing search
    values to have more complex types than the simple '<text>'.  term
    with a more complex type.

What I thought I had typed is:
    Using the current syntax it is not possible to define searches
    where the *same* value in a *value list* ...
                           ^^^^^^^^^^^^^^^^^

"term with a more complex type." at the end needs to be removed.

Comment 6 Patrick Ohly 2013-05-28 20:49:06 UTC

I noticed an inconsistency: the operations should use hyphen instead of underscore, like the other string constants. I noticed because I mistyped them in the tests at first.

I'll make it consistent, so now the operations are:

-    [ 'is|contains|begins_with|ends_with', '<field>', '<text>',
+    [ 'is|contains|begins-with|ends-with', '<field>', '<text>',

Comment 7 Patrick Ohly 2013-05-28 21:13:34 UTC

Field filters are also implemented and pass my tests, so I'll kick of a more complete test run and start packaging a snapshot tomorrow.

Comment 8 Patrick Ohly 2013-05-29 07:21:48 UTC

Enhanced searching is in master, see:

commit 236a89fd86dec0bf77f1cd00ca767695f31b8889
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Tue May 28 16:25:19 2013 +0200

    PIM: document enhanced searching (search part of FDO #64177)
    
    Documents 'or', 'and' and new per-field
    'is|contains|begins_with|ends_with' operations.

commit 34241881ae10b0bc24038ddbb7b53979e8825b28
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Tue May 28 22:55:26 2013 +0200

    PIM testing: test field tests
    
    doFilter() gets extended to take (<full name>, <vcard>) tuples in
    addition to the full name alone. Then this is used to create one large
    vCard that is suitable for testing (field content unique, all fields
    set, etc.) with various filters. All field tests are covered with at
    least one positive and one negative case.

commit c922aed0f2ccb0ddcec969f4384e878ca3859385
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Tue May 28 22:51:48 2013 +0200

    PIM: implement 'is/contains/begins-with/ends-with'
    
    The operation is a runtime parameter of different classes, whereas
    extracting the right values to compare via the operation is hard-coded
    at compile time. This is a rather arbitrary compromise between code
    duplication, simplicity and performance (which, in fact, was not
    measured at all).
    
    The code for selecting case-sensitivity and the normalization before
    the string operations is shared with the older 'any-contains'
    operation.


commit 3796f3cdde544e54fa2190e8029889c3b9c4ffc4
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Tue May 28 16:27:37 2013 +0200

    PIM testing: test case for 'and' and 'or'
    
    The new TestContacts.testFilterLogic uses the same infrastructure as
    the language tests and covers some combinations of 'and' and 'or'.
    
    In some cases, Python lists had to be avoided in favor of tuples,
    because Python cannot automatically map the lists to a 'av' array of
    variants, while sending a tuple enumerating its types can be sent and
    gets accepted by PIM Manager.

commit b42e093a608c367b118ecc5844a0018a978516fb
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Tue May 28 16:26:45 2013 +0200

    PIM: implement 'and' and 'or'
    
    Implementation falls naturally into the new framework, with
    special logic filters combining the results of sub-filters.

commit 9942cecd8435a2b1e9602aa1b4d21be19240eef9
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Tue May 28 15:22:58 2013 +0200

    PIM: support recursive search filter
    
    This changes the signature of the filter parameter in Search(),
    RefineSearch() and ReplaceSearch() from 'as' (array of strings) to
    'av' (array of variants). Allowed entries in the variant are arrays
    containing strings and/or other such arrays (recursive!).
    
    A single string as value is not supported, which is the reason why
    'av' instead of just plain 'v' makes sense (mismatches can already be
    found in the sender, potentially at compile time). It also helps
    Python choose the right type when asked to send an empty list. When
    just using 'v', Python cannot decide automatically.
    
    Error messages include a backtrace of all terms that the current,
    faulty term was included in. That helps to locate the error in a
    potentially larger filter.
    
    The scope of supported searches has not changed (yet).

commit 4a617c1cc98e7d85f020d35860f09647443e147c
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Tue May 28 15:13:00 2013 +0200

    GDBus GIO: support recursive variant with one type
    
    The signature of the new type is just a 'v' for variant. Recursion is
    only supported in combination with std::vector. This is done to keep
    implementation simpler. std::vector was chosen over other collections
    because it is easier to work with in method calls, the main purpose
    of the new type.
    
    The conversion from D-Bus messages accepts anything which can be mapped
    into the variant: arrays, tuples, plain types, and of course variants
    containing those. This helps with backward compatibility (one can
    turn an interface which took a fixed type like std::vector<std::string>
    into something with a recursive variant) and with Python, because
    Python sends arrays and tuples without converting into variants
    when the app uses those simpler types.
    
    One caveat exists with sending an empty list in Python: when using 'v'
    as signature in the interface, Python doesn't know how to send the
    empty list without assistance by the programmer (as in dbus.Array([],
    signature='s'). If possible, avoid the plain 'v' in an interface in
    favor of something like 'av' (= std::vector<boost::variant>).


This should resolve enhanced searching. Enhanced sorting is still open, but less important and still undefined -> lowering priority and deferring it.

Comment 9 GitLab Migration User 2018-10-13 12:39:32 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/SyncEvolution/syncevolution/issues/26.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.