Summary: | PBAP: download in chunks to make progress after interrupts | ||
---|---|---|---|
Product: | SyncEvolution | Reporter: | Patrick Ohly <patrick.ohly> |
Component: | PBAP | Assignee: | Patrick Ohly <patrick.ohly> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | highest | CC: | nairb1958, syncevolution-issues |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Patrick Ohly
2014-04-10 16:01:48 UTC
*** Bug 78528 has been marked as a duplicate of this bug. *** Here's a potential (and not 100% correct!) algorithm for transferring a complete address book: uint16 used = GetSize() # not the same as maximum offset! uint16 start = choose_start() uint16 chunksize = choose_chunk_size() uint16 i for (i = start; i < used; i += chunksize) { PullAll( Offset = i, MaxCount = chunksize) } for (i = 0; i < start; i += chunksize) { PullAll( Offset = i, MaxCount = min(chunksize, start - 1) } Note that GetSize() is specified as returning the number of entries in the selected phonebook object that are actually used (i.e. indexes that correspond to non-NULL entries). This is relevant if contacts get deleted after starting the session. In that case, the algorithm above will not necessarily read all contacts. Here's an example: offsets #0 till #99, with contacts #10 till #19 deleted chunksize = 10 GetSize() = 90 => this will request offsets #0 till #89, missing contacts #90 till #99 I think this can be fixed with an additional PullAll, leading to: for (i = start; i < used; i += chunksize) { PullAll( Offset = i, MaxCount = chunksize) } PullAll(Offset = i) # not MaxCount! for (i = 0; i < start; i += chunksize) { PullAll( Offset = i, MaxCount = min(chunksize, start - 1) } The additional PullAll() is meant to read all contacts at the end which would not be covered otherwise. Now the other problem: MaxCount means "read chunksize contacts starting at #i". Therefore the algorithm above will end up reading contacts multiple times occasionally. Example: offsets #0 till #99, with contact #0 deleted chunksize = 10 GetSize() = 98 PullAll(Offset = 0, MaxCount = 10) => returns 10 contacts #1 till #10 (inclusive) PullAll(Offset = 10, MaxCount = 10) => returns 10 contacts #10 till #19 => contact #10 appears twice in the result The duplicate cannot be filtered out easily because the UID is not reliable. This could be addressed by keeping a hash of each contact and discarding those who are exact matches for already seen contacts. It's easier to accept the duplicate and remove it during the next sync. There are two more aspects that I chose to ignore above: how to implement the choice of start offset and chunk size. Start offset could be random (no persistent state needed) or could continue where the last sync left off. The latter will require a write after each PullAll() (in case of unexpected shutdowns), even if nothing ever changes. Is that acceptable? Probably not. I prefer choosing randomly. The chunk size depends on the size of the average contact. Make it too small, and we end up generating lots of individual transfers. Make it too large (say 1000), and we still have chunks that never transfer completely. We could tune the chunk size so that on average, each transfer has a certain size in bytes. TODO: how large? Once we have such a target size in bytes, perhaps we can let the algorithm adjust the chunk size dynamically: start small (100?), then increase or decrease depending on the observed size of the returned contacts. Implemented, included in master. It is turned off by default. commit 527b47c80ef105e7cbdfb170541615fd3e906906 Author: Patrick Ohly <patrick.ohly@intel.com> Date: Wed Jul 2 17:33:13 2014 +0200 PBAP: transfer in chunks (FDO #77272) If enabled via env variables, PullAll transfers will be limited to a certain numbers contacts at different offsets until all data got pulled. See README for details. When transfering in chunks, the enumeration of contacts for the engine no longer matches the PBAP enumeration. Debug output uses "offset #x" for PBAP and "ID y" for the engine. From the PBAP README: Transfering in chunks ===================== The default is to pull all contacts in one transfer. This can be changed to transfer in chunks. Optionally the size of the chunks can be adjusted dynamically at runtime to achieve a certain time per transfer. The purpose of transferring in chunks is twofold: 1. It avoids having to pull the entire address book into a file which then has to be kept around until syncing is complete. 2. By randomly starting at different offsets, eventually all data gets added to the local cache even if no sync ever completes. This gets configured with environment variables: SYNCEVOLUTION_PBAP_CHUNK_MAX_COUNT_PHOTO=<number of contacts> A value larger 0 enables chunking when transferring contacts with photo data. SYNCEVOLUTION_PBAP_CHUNK_MAX_COUNT_NO_PHOTO=<number of contacts> A value larger 0 enables chunking when transferring contacts without photo data. SYNCEVOLUTION_PBAP_CHUNK_TRANSFER_TIME=<seconds> The desired duration of each transfer. Indirectly also controls the amount of data which has to be buffered. Defaults to 30 seconds, turned off with any value <= 0 seconds. SYNCEVOLUTION_PBAP_CHUNK_TIME_LAMBDA=<0 to 1> Controls how quickly new measurements adapt the chunk size. 0 is fastest (= next transfer uses exactly the calculated number of contacts), 1 is not all all (= all transfers use the intitial number). Defaults to 0.1. SYNCEVOLUTION_PBAP_CHUNK_OFFSET=<0 to number of contacts in phone> Overrides the random selection of the start offset. Useful for debugging. Offsets which are out of range get mapped into a valid offset. For example, consider a Samsung Galaxy S3, Android 4.3, average contact size 6KB with photo data and 235B without. The transfer rate is 40KB/s with photo data, 17KB/s without. To achieve 30s per chunk, one needs to choose 243 contacts per chunk with photo data resp. 2500 without. A transfer of 1000 contacts without photos completes in under 17 seconds, with photos under 2:05 minutes. In this case, downloading in chunks was almost as fast as transferring all at once. To debug transferring in chunks, run SYNCEVOLUTION_DEBUG=1 syncevolution --daemon=no --export - \ backend=pbap loglevel=4 \ database=obex-bt://64:B3:10:C0:8C:2E 2>&1 | grep -e transferred -e "pullall" -e "max count" |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.