Bug 64838

Summary:	PBAP: enhance performance
Product:	SyncEvolution	Reporter:	Patrick Ohly <patrick.ohly>
Component:	SyncEvolution	Assignee:	Patrick Ohly <patrick.ohly>
Status:	RESOLVED FIXED	QA Contact:
Severity:	enhancement
Priority:	high	CC:	syncevolution-issues
Version:	1.3.99.3
Hardware:	Other
OS:	All
See Also:	https://bugzilla.gnome.org/show_bug.cgi?id=697278
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:	52669
Bug Blocks:	56141
Attachments:	5000 Contats log using synceval dbus APIs. 5000 Contats log using synceval dbus APIs. 5000 Contats testing vCards 1000.vcf with the same PHOTO for each contact

Description Patrick Ohly 2013-05-21 18:56:04 UTC

I did some benchmarking on the NDIS 166 Tizen IVI reference hardware, using btrfs as the file system storing the data. On that system, a simple "import contacts" takes:
time syncevolution --daemon=no --import test/testcases/10000.vcf database=pim-manager-test-dbus-foo backend=evolution-contacts
...
real	2m20.431s

top shows that the system is partly idle, with evolution-addressbook-factory take most of the time. The limiting factor is the latency for adding contacts one-by-one via D-Bus.

time syncevolution --daemon=no --import test/testcases/10000.vcf database=pim-manager-test-dbus-foo backend=evolution-contacts
real	2m20.431s

The test involved importing, matching and deleting contacts with the "peer" being a local directory. In other words, this focused on EDS and sync engine performance. To run the test, use the modified testpim.py like this:

TESTPIM_TEST_SYNC_TESTCASES=test/testcases/10000.vcf testpim.py -v TestContacts.testSync

Can be combined with TEST_DBUS_PBAP_PHONE to also test PBAP timing and make the test more realistic. However, the phone must support SyncML for automatic contact push/delete.

The test measures the total duration of the SyncPeer() D-Bus method call. Test data is attached. The data for 1000 and 10000 contacts is fairly realistic, while the data for 5000 just uses F0000 till F5000 as names. In theory, this could have an effect on hash data structures, but this did not seem to happen in practice.

1000:

import: 32.273505s
match: 15.500893s
remove: 23.965385s

5000:

import: 173.671093s
match: 135.620294s
remove: 111.923021s

10000:

import: 395.876979s
match: 563.556571s
remove: 227.145546s

Import and remove are roughly linear with number of items. Matching is a non-linear operation, essentially an n^2 algorithm (brute-force comparison of old against new data). So the numbers above are as expected.

Watching with top during the import phase:

%Cpu(s): 45.0 us, 16.1 sy,  0.0 ni, 30.8 id,  3.9 wa,  1.9 hi,  2.4 si,  0.0 st
KiB Mem:   3932184 total,  3276808 used,   655376 free,     9072 buffers
KiB Swap:        0 total,        0 used,        0 free,  2646056 cached

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                         
 3478 pohly     20   0  663248  46416  14404 S 48.54 1.180   0:52.47 syncevo-dbus-he                                 
 3448 pohly     20   0  955684  23728  10852 S 28.93 0.603   0:34.14 evolution-addre                                 
 3428 pohly     20   0  774276  24448  17464 S 12.30 0.622   0:17.75 syncevo-dbus-se        

syncevo-dbus-helper does most of the heavy lifting here, while parsing and preparing data for EDS. The ratio between helper and EDS gets worse (= more work in helper) at the end of the phase. There might be some data structure which do not scale well with number of items.

During delete:

%Cpu(s): 37.5 us, 24.3 sy,  0.0 ni, 27.8 id,  3.1 wa,  3.1 hi,  4.2 si,  0.0 st
KiB Mem:   3932184 total,  3668620 used,   263564 free,     9008 buffers
KiB Swap:        0 total,        0 used,        0 free,  2937936 cached

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                         
 3753 pohly     20   0  947488  27980  10856 S 58.71 0.712   0:41.08 evolution-addre                                 
 3785 pohly     20   0  875868 172940  14320 S 18.57 4.398   0:22.55 syncevo-dbus-he                                 
 2261 pohly     20   0   25768   3188   1080 S 12.27 0.081   1:53.90 dbus-daemon                                   

So here EDS does most of the work. Batched deletes should speed up the phase considerably (see bug #52669). However, deleting is rare. It would be more important to optimize importing and matching.

Importing is limited partly by write performance, partly by internal data structures. Batched writing should be implemented first.

Matching is a different beast. It remains to be seen where the bottlenecks are and how that can be addressed without breaking the sync engine's architecture.

Comment 1 Eugenio Parodi 2013-05-22 09:42:22 UTC

Created attachment 79641 [details]
5000 Contats log using synceval dbus APIs.

Comment 2 Eugenio Parodi 2013-05-22 09:42:50 UTC

Created attachment 79642 [details]
5000 Contats log using synceval dbus APIs.

Comment 3 Eugenio Parodi 2013-05-22 09:43:33 UTC

Created attachment 79643 [details]
5000 Contats testing vCards

Comment 4 Eugenio Parodi 2013-05-22 10:53:56 UTC

Hi,

I made the following test over an empty DB:

time syncevolution --daemon=no  database=pim-manager-test backend=evolution-contacts --import ./5000vCard.txt
...
real	7m6.102s
user	0m55.495s
sys	0m7.416s

I noticed that during the import at regular time slots the printout stuck.
Since our HW does not allow fast r/w operations I'm just wondering if during the import/sync a r/w operation is performed for any contact.

Comment 5 Patrick Ohly 2013-05-22 12:10:04 UTC

(In reply to comment #4)
> I made the following test over an empty DB:
> 
> time syncevolution --daemon=no  database=pim-manager-test
> backend=evolution-contacts --import ./5000vCard.txt
> ...
> real	7m6.102s
> user	0m55.495s
> sys	0m7.416s
> 
> I noticed that during the import at regular time slots the printout stuck.
> Since our HW does not allow fast r/w operations I'm just wondering if during
> the import/sync a r/w operation is performed for any contact.

For every contact, there are potentially multiple writes. I'm less sure about sync() calls, those might get batched together.

Benchmarks done by Openismus showed submitting contacts in batches is orders of magnitude faster than writing one at a time. So addressing bug #52669 should help a lot, as writing seems to be the primary bottleneck for initial imports.

Comment 6 Patrick Ohly 2013-05-24 07:48:18 UTC

I also have a bug open upstream about sync() handling in EDS which needs more work - see https://bugzilla.gnome.org/show_bug.cgi?id=697278

Comment 7 Patrick Ohly 2013-06-12 12:59:30 UTC

Created attachment 80727 [details]
1000.vcf with the same PHOTO for each contact

I've added batch inserts (bug #52669) and removed one redundant EDS operation per insert. Importing became 4 times faster.

I also noticed and fixed that matching involved unnecessary DB writes:

commit c88ebbb9164281c012078761ec577607e7e75f30
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Wed Jun 12 08:25:48 2013 +0200

    EDS contacts: avoid unnecessary DB writes during slow sync

    Traditionally, contacts were modified shortly before writing into EDS
    to match with Evolution expectations (must have N, only one CELL TEL,
    VOICE flag must be set). During a slow sync, the engine compare the
    modified contacts with the unmodified, incoming one. This led to
    mismatches and/or merge operations which end up not changing anything
    in the DB because the only difference would be removed again before
    writing.

    To avoid this, define datatypes which include the EDS modifications in
    its <incomingscript>. Then the engine works with an item as it would
    be written to EDS and correctly determines that the incoming and DB
    items are identical.

    Found in testpim.py's testSync when applied to the test cases
    generated by the Openismus test case generator.

Now performance is like this (still not involving PBAP!):

(In reply to comment #0)
> 1000:
> 
> import: 32.273505s
> match: 15.500893s
> remove: 23.965385s
> 
> 5000:
> 
> import: 173.671093s
> match: 135.620294s
> remove: 111.923021s
> 
> 10000:
> 
> import: 395.876979s
> match: 563.556571s
> remove: 227.145546s

1000:
import: 8.6s
match: 9.8s
remove: 19.0s

5000:
import: 56.5s
match: 96.7s
remove: 88.3s

10000:
import: 171s
match: 353s
remove: 179s

I also measured the PBAP transfer from a Samsung Galaxy SIII. obexd took 150s to download 9994 contacts. I had sent the 10000.vcf file via Bluetooth, but the phone dropped some contacts when importing.

Note that this is without photo data, because the Openismus test case generator has no option to generate photo data.

My personal address book has 845 contacts. 41 of those have a PHOTO, thanks to Google Contacts, which inserts photos of my contacts after getting them from somewhere. In terms of bytes, 35% of the vCard data is in PHOTO. On average, a photo is 4000 bytes large when uuencoded.

I ran tests earlier: excluding that photo data from the transfer had very little impact on PBAP transfer time. That's why bug #59551, the incremental PHOTO download, is not that important.

For the sake of worst case analysis, let's assume that every contact has a photo of roughly 4000 bytes. I've used my own photo, because that is what I can publish here, and inserted that photo into each of the 1000 contacts. See the attached 1000-with-photos.vcf.

Here some more PBAP transfer stats:
- 1000.vcf (240KB): 25s
- 1000-with-photos.vcf (3.65MB): 109s
- 1000-with-photos.vcf excluding the PHOTO data: 16s

It's unintuitive that the last duration is smaller than the first one. Perhaps there were some caching effects in the phone.

Comment 8 Patrick Ohly 2013-09-30 11:25:32 UTC

I consider this issue resolved and plan no further work.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.