Bug 59551 - incremental PHOTO download, avoid sync if no change
Summary: incremental PHOTO download, avoid sync if no change
Status: RESOLVED FIXED
Alias: None
Product: SyncEvolution
Classification: Unclassified
Component: SyncEvolution (show other bugs)
Version: unspecified
Hardware: Other All
: high normal
Assignee: Patrick Ohly
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 55918
  Show dependency treegraph
 
Reported: 2013-01-18 10:14 UTC by Patrick Ohly
Modified: 2013-09-03 12:34 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Patrick Ohly 2013-01-18 10:14:09 UTC
Tracked as one issue, because it affects the same code.

The first part, incremental PHOTO download, has the goal of making the text-only properties available sooner. This can be done by doing the initial sync with a filter that excludes PHOTO data, then do the next sync with PHOTO data.

The second part is meant to avoid expensive work (syncing) and thus save CPU cycles: by checksumming the downloaded data, it becomes possible to detect the case where nothing changed and avoid the syncing.

Some complications:
- When switching between downloads with and without PHOTO,
  then two checksums are needed.
- If the checksum without PHOTO
  data matches, then we still need to download with PHOTO data,
  because the PHOTOs might have changed.
- At the moment, doing a sync without PHOTO after a sync with PHOTO
  will remove the PHOTO data from the local cache. The engine
  would have to be configured so that it knows whether PHOTOs
  are part of the sync or not (relevant also for other issues,
  see feature request/bug #54998).

A simpler solution would be the following:
- If a phone has never been synced before, do a sync without PHOTO
  data. This will add the text-only properties quickly.
- Immediately do a sync with PHOTOs, to add that data.
  Keep a checksum of that sync.
- In all future syncs, download with PHOTO data and compare the
  checksum to avoid syncing when nothing changed.

This logic can be implemented entirely in the PBAP backend.
Comment 1 Patrick Ohly 2013-05-06 10:41:53 UTC
We need to implement the incremental *updating* of the phone book, too. Otherwise, if the complete download always takes longer than the car is in use, then future updates even to text-only contacts will never get mirrored in the car.
Comment 2 Patrick Ohly 2013-05-13 09:39:16 UTC
Tom Counihan raised another issue with the current PBAP syncing in private email: if the phonebook download always takes too long, we never process the data and thus the cache forever remains out of sync.

Tom proposed to download in smaller batches and import each batch separately. The problem with that approach is that we still need to get the entire address book from the phone before we can reliably determine deleted contacts.

Let me offer a refinement of the original idea: instead of only
processing a subset of the phone's contacts per sync, let's split the
sync itself into more phases:
      * Start at offset x into the set of contacts.
      * Download n contacts and update the local database with them.
      * Update x persistently.
      * Repeat and role over until we have dealt with all contacts.
      * Then delete all local contacts which had no match on the phone
        anymore.

In other words, we attempt to do a full sync each time we get triggered
to do so, but it won't matter as much anymore when we get interrupted in
the middle, because we will import new or updated contacts eventually.

Deleting contacts will still depend on finishing the full download and
processing.

This batched download can be combined with checksuming by calculating a checksum over each batch. Finding a checksum depends on a few lookup keys:
* start offset
* batch size
* with and without photo data
We need to remove obsolete checksums to avoid growing their number infinitely.

The batched download can be combined with incremental photo download by first downloading batches without photos, then with photos.
Comment 3 Patrick Ohly 2013-07-05 11:46:16 UTC
(In reply to comment #1)
> We need to implement the incremental *updating* of the phone book, too.
> Otherwise, if the complete download always takes longer than the car is in
> use, then future updates even to text-only contacts will never get mirrored
> in the car.

I implemented incremental photo download for *all* syncs (not just the first one, as originally proposed).

Details are in these commit messages (actual commits not in master yet):

commit 7f47fa336570467a4dd60ada5da41ad302335c36
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Fri Jul 5 10:39:21 2013 +0200

    PBAP: incremental sync (FDO #59551)
    
    Depending on the SYNCEVOLUTION_PBAP_SYNC env variable, syncing reads
    all properties as configured ("all"), excludes photos ("text") or
    first text, then all ("incremental").
    
    When excluding photos, only known properties get requested. This
    avoids issues with phones which reject the request when enabling
    properties via the bit flags. This also helps with
    "databaseFormat=^PHOTO".
    
    When excluding photos, the vcard merge script as used by EDS ensures
    that existing photo data is preserved. This only works during a slow
    sync (merge script not called otherwise, okay for PBAP because it
    always syncs in slow sync) and EDS (other backends do not use the
    merge script, okay at the moment because PIM Manager is hard-coded to
    use EDS).
    
    The PBAP backend must be aware of the PBAP sync mode and request a
    second cycle, which again must be a slow sync. This only works because
    the sync engine is aware of the special mode and sets a new session
    variable "keepPhotoData". It would be better to have the PBAP backend
    send CTCap with PHOTO marked as not supported for text-only syncs and
    enabled when sending PHOTO data, but that is considerably harder to
    implement (CTCap cannot be adjusted at runtime).
    
    The original goal of overlapping syncing with download has not been
    achieved yet. It turned out that all item IDs get requested before
    syncing starts, which thus depends on downloading all items in the current
    implementation. Can be fixed by making up IDs based on the number of
    existing items (see GetSize() in PBAP) and then downloading later when
    the data is needed.

commit e9cf7c22fc5c680ee5f1f1d9d0b970933c7098e3
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Fri Jul 5 13:42:12 2013 +0200

    PIM: use incremental sync for PBAP by default (FDO #59551)
    
    When doing a PBAP sync, PIM manager asks the D-Bus sync helper to set
    its SYNCEVOLUTION_PBAP_SYNC to "incremental". If the env variable
    is already set, it does not get overwritten, which allows overriding
    this default.
Comment 4 Patrick Ohly 2013-07-05 11:49:53 UTC
(In reply to comment #0)
> The second part is meant to avoid expensive work (syncing) and thus save CPU
> cycles: by checksumming the downloaded data, it becomes possible to detect
> the case where nothing changed and avoid the syncing.

This has one major downside: the check can only be done after having downloaded all data. In the cases where something changed, the total time then is "download time" + "sync time".

Therefore I intend to not implement checksuming. Instead syncing will run in parallel to downloading and total time will be roughly max("dowload time", "sync time"). If we can make syncing faster than downloading (which is the goal), then we will be done shortly after finishing the download.
Comment 5 Patrick Ohly 2013-07-05 11:55:25 UTC
(In reply to comment #2)
> Let me offer a refinement of the original idea: instead of only
> processing a subset of the phone's contacts per sync, let's split the
> sync itself into more phases:
>       * Start at offset x into the set of contacts.
>       * Download n contacts and update the local database with them.
>       * Update x persistently.
>       * Repeat and role over until we have dealt with all contacts.
>       * Then delete all local contacts which had no match on the phone
>         anymore.

It could be even simpler:
 * Start at offset x.
 * Download, process in parallel.
 * Download more chunks until done.

However, after discussing the "Offset" value with Luiz van Dentz, our conclusion was that the value is poorly defined: it is not clear whether it is based on the original numbering of contacts (in which case one could ask for contacts #10-#19 and not get any because all of those were deleted) or counts all current non-deleted contacts.

Depending on a feature that is poorly defined, probably not used much and has corner cases which are even rarer and hard to test (contacts deleted while syncing) is asking for trouble. Download times are small enough that the theoretic advantage of downloading in chunks may never matter in practice.

Taking all of that into account, I don't intend to implement downloading in chunks anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.