Summary: | PIM: more complex searching (done) and sorting (open) | ||
---|---|---|---|
Product: | SyncEvolution | Reporter: | Patrick Ohly <patrick.ohly> |
Component: | SyncEvolution | Assignee: | SyncEvolution Community <syncevolution-issues> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | enhancement | ||
Priority: | medium | CC: | syncevolution-issues |
Version: | 1.3.99.3 | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 56141 |
Description
Patrick Ohly
2013-05-03 07:23:46 UTC
(In reply to comment #0) > Searching could be extended with and/or operations and more complex match > terms, like "field X contains/starts-with/ends-with Y". [...] > Need full specification for the desired feature. Here's a proposal for extended searching. This supersedes the current specification in a way which keeps all old searches working: -------------------------------------------------- Searching ========= Supported searches: [ ] - An empty list matches all contacts. [ [ <search> ], [ 'limit', <number> ] ] - a 'limit' search term with a number as parameter (formatted as string) can be added at the top level term to truncate the search results after a certain number of contacts. Example: Search([['any-contains', 'Joe'], ['limit', '10']]) => return the first 10 Joes. As with any other search, the resulting view will be updated if contact data changes. The limit must not be changed in a RefineSearch(). A 'limit' term may (but doesn't have to) be given. If it is given, its value must match the value set when creating the search. This limitation simplifies the implementation and its testing. The limitation could be removed if there is sufficient demand. [ [ <search> ] ] - the same as [ <search> ] [ 'or', <search1>, <search2>, ... ] - combines 0 to n other searches and results in a match if any of the sub-searches matches. ['or'] without any sub-search does not match. [ 'and', <search1>, <search2>, ... ] - like 'or', but matches if and only if all of the sub-searches match. [ 'phone', '<number>' ] - Look up a valid phone number (= "caller ID"). The country code for the current locale is added if no country code was given in the number. Phone numbers in the unified address book must start with the resulting full number, after being normalized the same way. In other words: - Formatting does not matter. - Alpha characters are aliases for numbers on the keypad and match their corresponding number. - Additional digits in the address book are ignored, only the prefix must match (extensions may or may not be included in <number>). - Phone numbers in the address book which cannot be normalized cannot be matched. [ 'is|contains|begins_with|ends_with', '<field>', '<text>', '<flags>' ] - compares a specified field against the search text. For the 'is' operation, the entire field must match, for 'contains' anywhere inside the value, for 'begins_with' at the beginning and for 'ends_with' at the end. Fields are referenced as in the contact dictionary (see below), using multiple path components if necessary. Supported for matching are: 'full-name' - string 'nickname' - string 'structured-name/family' - string 'structured-name/given' - string 'structured-name/additional' - string 'phones/value' - telephone number 'emails/value' - string 'addresses/po-box' - string 'addresses/extension' - string 'addresses/street' - string 'addresses/locality' - string 'addresses/region' - string 'addresses/postal-code' - string 'addresses/country' - string The fields referencing value lists ('phones', 'email', 'address') check against any of the entries in these lists. Except for 'phones/value', all values are treated as text values. For text values, the default search without explicit flags is case-insensitive and accent-sensitive. Spaces between words matter. This behavior can be modified by giving additional, optional flags after the search value: 'case-insensitive' - force case-insensitive search (available for the sake of consistency and just in case, should the default ever change) 'case-sensitive' - force case-sensitive search For telephone numbers, only digits are compared. Latin alphabetic characters are treated as aliases for digits as they typically occur on a keypad or old rotary dial phones ('A', 'b', 'c' map to '1', etc.). If the full name was not set explicitly for a contact, the concatenation of the given, middle and family with a space as separator is used instead when matching against the 'full-name' field. Using the current syntax it is not possible to define searches where the *same* value must meet different criteria ("cell phone number containing the digits 1234"). Something like that could be added as a future extensions, for example by allowing search values to have more complex types than the simple '<text>'. term with a more complex type. [ 'any-contains', '<text>', <flags> ] - Sub-string search for <text> in the following contact values: first, middle or last name, formatted name, nick name, phone number, or email address. Optional flags include: 'case-insensitive' (the default), 'case-sensitive'. This search is equivalent to: [ 'or', [ 'contains', 'structured-name/given', '<text>', <flags> ], [ 'contains', 'structured-name/additional', '<text>', <flags> ], [ 'contains', 'structured-name/family', '<text>', <flags> ], [ 'contains', 'full-name', '<text>', <flags> ], [ 'contains', 'emails/value', '<text>', <flags> ], [ 'contains', 'phones/value', '<text>'] ] Note that lookup and search are different: the former is based on a valid number, the later on user input. A 'phone' lookup can compare normalized numbers including the country code, to ensure that the lookup is exact and does not mismatch numbers from different countries. Heuristics like suffix matching do not do this correctly in all cases. An 'any-contains' search is based on user input, which might contain just some digits in the middle of the phone number. The search ignores formatting in both input and address book. Compound searches with 'and' and 'or' are evaluated lazily, from the first to the last sub-search. Therefore it makes sense to list sub-searches that are more likely to match first. -------------------------------------------- This new extended searching criteria are ok. Just one note regarded: - "is not possible to define searches where the *same* value must meet different criteria" This meas that a query like this is not allowed? [ 'or', [ 'and', [ 'begins_with', 'structured-name/given' , 'a' ], [ 'begins_with', 'structured-name/family', 'b' ] ], [ 'and', [ 'begins_with', 'structured-name/given' , 'b' ], [ 'begins_with', 'structured-name/family', 'a' ] ] ] If not, can it be accepted in the future? Thanks. (In reply to comment #2) > This new extended searching criteria are ok. > Just one note regarded: > - "is not possible to define searches where > the *same* value must meet different criteria" > This meas that a query like this is not allowed? > > [ 'or', > [ 'and', > [ 'begins_with', 'structured-name/given' , 'a' ], > [ 'begins_with', 'structured-name/family', 'b' ] > ], > [ 'and', > [ 'begins_with', 'structured-name/given' , 'b' ], > [ 'begins_with', 'structured-name/family', 'a' ] > ] > ] > That query itself is fine and will be supported. It works as intended because there is only one structured-name. The comment is about value lists, like telephone numbers. For example, suppose you have two telephone numbers: TEL:1234 TEL:5678 Now you search ['and', ['contains', 'phones/value', '1'], > If not, can it be accepted in the future? > Thanks. [Please ignore the previous comment.] (In reply to comment #2) > This new extended searching criteria are ok. > Just one note regarded: > - "is not possible to define searches where > the *same* value must meet different criteria" > This meas that a query like this is not allowed? > > [ 'or', > [ 'and', > [ 'begins_with', 'structured-name/given' , 'a' ], > [ 'begins_with', 'structured-name/family', 'b' ] > ], > [ 'and', > [ 'begins_with', 'structured-name/given' , 'b' ], > [ 'begins_with', 'structured-name/family', 'a' ] > ] > ] > That query itself is fine and will be supported. It works as intended because there is only one structured-name. The comment was about value lists, like telephone numbers. For example, suppose you have two telephone numbers in the same contact: TEL:1234 TEL:5678 Now you search ['and', ['contains', 'phones/value', '1'], ['contains', 'phones/value', '5'] ] This will match the contact, because each of the terms combined with 'and' has a match in the contact: the first term will match '1234' and the second '5678'. This follows from the strict mathematic definition of the operations, but is admittedly not immediately obvious. I think it is a corner case. It occurred to me when documenting the semantic and I wanted to write it down. Obviously, better documentation would not have triggered a question ;-} Please suggest a better wording. Sort status update - recursive queries and "or"/"and" are working. This paragraph clearly needs more work: Using the current syntax it is not possible to define searches where the *same* value must meet different criteria ("cell phone number containing the digits 1234"). Something like that could be added as a future extensions, for example by allowing search values to have more complex types than the simple '<text>'. term with a more complex type. What I thought I had typed is: Using the current syntax it is not possible to define searches where the *same* value in a *value list* ... ^^^^^^^^^^^^^^^^^ "term with a more complex type." at the end needs to be removed. I noticed an inconsistency: the operations should use hyphen instead of underscore, like the other string constants. I noticed because I mistyped them in the tests at first. I'll make it consistent, so now the operations are: - [ 'is|contains|begins_with|ends_with', '<field>', '<text>', + [ 'is|contains|begins-with|ends-with', '<field>', '<text>', Field filters are also implemented and pass my tests, so I'll kick of a more complete test run and start packaging a snapshot tomorrow. Enhanced searching is in master, see: commit 236a89fd86dec0bf77f1cd00ca767695f31b8889 Author: Patrick Ohly <patrick.ohly@intel.com> Date: Tue May 28 16:25:19 2013 +0200 PIM: document enhanced searching (search part of FDO #64177) Documents 'or', 'and' and new per-field 'is|contains|begins_with|ends_with' operations. commit 34241881ae10b0bc24038ddbb7b53979e8825b28 Author: Patrick Ohly <patrick.ohly@intel.com> Date: Tue May 28 22:55:26 2013 +0200 PIM testing: test field tests doFilter() gets extended to take (<full name>, <vcard>) tuples in addition to the full name alone. Then this is used to create one large vCard that is suitable for testing (field content unique, all fields set, etc.) with various filters. All field tests are covered with at least one positive and one negative case. commit c922aed0f2ccb0ddcec969f4384e878ca3859385 Author: Patrick Ohly <patrick.ohly@intel.com> Date: Tue May 28 22:51:48 2013 +0200 PIM: implement 'is/contains/begins-with/ends-with' The operation is a runtime parameter of different classes, whereas extracting the right values to compare via the operation is hard-coded at compile time. This is a rather arbitrary compromise between code duplication, simplicity and performance (which, in fact, was not measured at all). The code for selecting case-sensitivity and the normalization before the string operations is shared with the older 'any-contains' operation. commit 3796f3cdde544e54fa2190e8029889c3b9c4ffc4 Author: Patrick Ohly <patrick.ohly@intel.com> Date: Tue May 28 16:27:37 2013 +0200 PIM testing: test case for 'and' and 'or' The new TestContacts.testFilterLogic uses the same infrastructure as the language tests and covers some combinations of 'and' and 'or'. In some cases, Python lists had to be avoided in favor of tuples, because Python cannot automatically map the lists to a 'av' array of variants, while sending a tuple enumerating its types can be sent and gets accepted by PIM Manager. commit b42e093a608c367b118ecc5844a0018a978516fb Author: Patrick Ohly <patrick.ohly@intel.com> Date: Tue May 28 16:26:45 2013 +0200 PIM: implement 'and' and 'or' Implementation falls naturally into the new framework, with special logic filters combining the results of sub-filters. commit 9942cecd8435a2b1e9602aa1b4d21be19240eef9 Author: Patrick Ohly <patrick.ohly@intel.com> Date: Tue May 28 15:22:58 2013 +0200 PIM: support recursive search filter This changes the signature of the filter parameter in Search(), RefineSearch() and ReplaceSearch() from 'as' (array of strings) to 'av' (array of variants). Allowed entries in the variant are arrays containing strings and/or other such arrays (recursive!). A single string as value is not supported, which is the reason why 'av' instead of just plain 'v' makes sense (mismatches can already be found in the sender, potentially at compile time). It also helps Python choose the right type when asked to send an empty list. When just using 'v', Python cannot decide automatically. Error messages include a backtrace of all terms that the current, faulty term was included in. That helps to locate the error in a potentially larger filter. The scope of supported searches has not changed (yet). commit 4a617c1cc98e7d85f020d35860f09647443e147c Author: Patrick Ohly <patrick.ohly@intel.com> Date: Tue May 28 15:13:00 2013 +0200 GDBus GIO: support recursive variant with one type The signature of the new type is just a 'v' for variant. Recursion is only supported in combination with std::vector. This is done to keep implementation simpler. std::vector was chosen over other collections because it is easier to work with in method calls, the main purpose of the new type. The conversion from D-Bus messages accepts anything which can be mapped into the variant: arrays, tuples, plain types, and of course variants containing those. This helps with backward compatibility (one can turn an interface which took a fixed type like std::vector<std::string> into something with a recursive variant) and with Python, because Python sends arrays and tuples without converting into variants when the app uses those simpler types. One caveat exists with sending an empty list in Python: when using 'v' as signature in the interface, Python doesn't know how to send the empty list without assistance by the programmer (as in dbus.Array([], signature='s'). If possible, avoid the plain 'v' in an interface in favor of something like 'av' (= std::vector<boost::variant>). This should resolve enhanced searching. Enhanced sorting is still open, but less important and still undefined -> lowering priority and deferring it. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/SyncEvolution/syncevolution/issues/26. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.