Bug 86335

Summary: abnormally high network traffic when synchronizing calendar in CalDAV
Product: SyncEvolution Reporter: Sébastien Villemot <sebastien>
Component: CalDAV/CardDAVAssignee: Patrick Ohly <patrick.ohly>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: high CC: syncevolution-issues
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Sébastien Villemot 2014-11-16 10:27:51 UTC
Since I upgraded to SyncEvolution 1.5, synchronizing my calendar (in two-way mode) with my CalDAV server (Radicale) generates an abnormally high amount of network traffic.

This is the corresponding entry in my apache log:

2001:7a8:3d80:0:a00:27ff:fe70:ffdd - - [16/Nov/2014:00:23:09 +0100] "REPORT /sebastien/calendar.ics/ HTTP/1.1" 207 26684196 "-" "SyncEvolution"

As you can see, this simple two-way synchronization consumes 26Mb, while the ics file of the calendar is 300kb.

This is of course highly inefficient, and also makes SyncEvolution almost unusable to me, because synchronization now takes 6 minutes (due to low upload bandwith of my server).

SyncEvolution 1.4 was much more efficient in this respect, and was doing the synchronization in a couple of seconds.

Any idea how to fix this? Is there a simple workaround that I could apply?

Thanks for creating SyncEvolution.
Comment 1 Patrick Ohly 2014-11-16 16:34:14 UTC
I've no idea why 1.5 should be worse than 1.4. Please help me investigate that further.

For that, increase the loglevel to 4 in the target-config of the CalDAV sync and run a sync.

Was it reported as a normal, two-way sync with no changes on either side?

Look at the syncevolution-log.html of the target-config side for that sync session. What is this REPORT which produces the excessive amount of data?

It would be useful if I could look at the log leading up to that REPORT. Perhaps you can send it to me at patrick.ohly at gmx.de? Feel free to strip out personal data, although I also of course promise to keep it confidential.

It may also be useful to downgrade to 1.4; the archives can be downloaded here: http://downloads.syncevolution.org/syncevolution/archive/

Don't bother if you don't have the time to try that. Perhaps the additional information about 1.5 already gives me enough clues.
Comment 2 Sébastien Villemot 2014-11-16 17:10:40 UTC
The sync was indeed reported as normal, two-way sync.

I sent you detailed log by private email.
Comment 3 Patrick Ohly 2014-11-17 11:48:42 UTC
The unexpectedly high traffic is caused by a "is the collection empty" check. This is relevant for slow sync prevention, because a slow sync is allowed to run when there is no data.

The 1.5 release does the check whereas 1.4 (unintentionally) skipped it. This was fixed for 1.4.99.4 in:

commit dec741ea501add6d9e173821b99af85ee0262034
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Wed Aug 20 16:23:15 2014 +0200

    VirtualSyncSource, MapSyncSource: implement m_isEmpty
    
    This is relevant for isUsable() and not so much (not at all?) for
    slow sync prevention.

The comment explain why the change was made. It *is* relevant for slow sync prevention, it just wasn't the primary reason for changing the code.

The underlying problem is that the "is empty" check is currently done using code which downloads the entire calendar. This is necessary to work around bugs in (ironically) servers like Radical 0.7 which did (do?) not implement a less costly operation reliably: when asked to report only about VEVENTs, VTODO or VJOURNAL are also included in the response and vice versa.

The underlying problem is that a collection may contain a mixture of those, but SyncEvolution only syncs one kind and thus needs to determine the content type of each item. There is no good support for that in CalDAV.

There are three possible solutions:
1. restore the 1.4 behavior of always assuming that there is data (basically disabling the more intelligent slow-sync prevention)
2. implement a simpler "is empty" check with potentially false "is not empty" results when there are some items of unknown content (for example, we sync VEVENTs and the collection contains only VJOURNAL, but we don't know that because we avoid downloading the data and thus incorrectly assume that there might be VEVENTs)
3. implement logic which determines whether a collection really can contain a mixture of items and only do the more costly check in that case

Option 3 can be combined with 2.

I think I'll start with 2.
Comment 4 Sébastien Villemot 2014-11-17 12:16:42 UTC
Thanks for making progress on the issue.

What I still don't get however is why a transfer of 26Mb is needed for a 300kb ICS file. It should theoretically be possible to keep your existing check (i.e. downloading the whole calendar) but using a more efficient download method. Is it a limitation of the CalDAV protocol? A bug in radicale? I don't know the underlying technical details, so I may miss something though…
Comment 5 Patrick Ohly 2014-11-17 12:53:45 UTC
(In reply to Sébastien Villemot from comment #4)
> What I still don't get however is why a transfer of 26Mb is needed for a
> 300kb ICS file.

I forgot to comment on that. That happens on the server side. There was a recent change in libical that causes it to include the entire history of a time zone in each item. For example, most data logged for the first item in your log file is just the VTIMEZONE:

<getetag>"c6dd224d03da4ef8a74750b9dc73d40c"</getetag>
<C:calendar-data>BEGIN:VCALENDAR
PRODID:-//Ximian//NONSGML Evolution Calendar//EN
VERSION:2.0
BEGIN:VTIMEZONE
TZID:/freeassociation.sourceforge.net/Tzfile/America/Montreal
X-LIC-LOCATION:America/Montreal
BEGIN:STANDARD
TZNAME:EST
DTSTART:19691026T020000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
X-RADICALE-NAME:/freeassociation.sourceforge.net/Tzfile/America/Montreal
END:STANDARD
BEGIN:DAYLIGHT
TZNAME:EDT
DTSTART:19700426T020000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
X-RADICALE-NAME:/freeassociation.sourceforge.net/Tzfile/America/Montreal
END:DAYLIGHT 
...

Worse, the *same* item also contains other VTIMEZONEs. I don't know why that happens. Radicale should only send the VTIMEZONE definitions actually referenced in an item. Check your .ics file - perhaps a client stored the item as it later gets send out by the server? But then the .ics file itself should already be much larger.

For libical, see 
http://sourceforge.net/p/freeassociation/bugs/95/
http://lists.infradead.org/pipermail/libical-devel/2014-October/000627.html
Comment 6 Sébastien Villemot 2014-11-18 20:53:33 UTC
Actually radicale does not use libical, so I'm not sure to understand the connection to the libical change that you mentioned.

However it is clear that the problem is linked to timezones. I cleant my calendar.ics file by removing all the custom timezones (those with "freeassociation.sourceforge.net"), and now the REPORT request transfers 1.1Mb (instead of 26Mb previously). This is still 3 times as big as the whole calendar.ics, but it is nevertheless much better.

Looking at the log, it still looks that the spurious traffic comes from timezone information that is repeated for every VEVENT entry.
Comment 7 Patrick Ohly 2014-11-19 09:26:18 UTC
(In reply to Sébastien Villemot from comment #6)
> Actually radicale does not use libical,

You are right. I forgot that radicale is just fudging proper CalDAV semantics by doing text manipulations (radicale/ical.py).

> so I'm not sure to understand the
> connection to the libical change that you mentioned.

The large VTIMEZONE definitions in the log look exactly like the ones generated by the modified libical. If Radicale does not generate them, then it must have received them from a client (perhaps even SyncEvolution).

Current SyncEvolution should send the smaller VTIMEZONEs with just one entry for summer time and one for winter time.

> However it is clear that the problem is linked to timezones. I cleant my
> calendar.ics file by removing all the custom timezones (those with
> "freeassociation.sourceforge.net"), and now the REPORT request transfers
> 1.1Mb (instead of 26Mb previously). This is still 3 times as big as the
> whole calendar.ics, but it is nevertheless much better.

Removing them entirely might lead to missing time zone information in the items sent to clients. It might not matter in practice because clients (including SyncEvolution) often look at the TZID first to match it with an existing local definition.

> Looking at the log, it still looks that the spurious traffic comes from
> timezone information that is repeated for every VEVENT entry.

That's worth discussing with Radicale devs. I don't know how they currently decide which VTIMEZONEs need to be inserted into the individual items. If they include *all* of them, then it's not surprising that there is significant overhead once a calendar contains several different time zone definitions.
Comment 8 Sébastien Villemot 2014-11-19 09:58:19 UTC
(In reply to Patrick Ohly from comment #7)
> (In reply to Sébastien Villemot from comment #6)
> > Looking at the log, it still looks that the spurious traffic comes from
> > timezone information that is repeated for every VEVENT entry.
> 
> That's worth discussing with Radicale devs. I don't know how they currently
> decide which VTIMEZONEs need to be inserted into the individual items. If
> they include *all* of them, then it's not surprising that there is
> significant overhead once a calendar contains several different time zone
> definitions.

My understanding is indeed that *all* VTIMEZONEs are inserted into all individual items in the REPORT response.

So in the end I am not sure that the issue under discussion is actually a problem in SyncEvolution. I leave it up to you to decide if you want to implement a workaround for what now looks to me as a bug in Radicale.

Thanks for your feedback on this issue, it is much appreciated.
Comment 9 Patrick Ohly 2014-11-19 10:21:27 UTC
Yes, I want to avoid the costly REPORT including the actual calendar data when possible. The network traffic might be particularly bad with Radicale, but it's still something that SyncEvolution shouldn't be doing. A normal sync with no changes should cause very little network traffic.
Comment 10 Sébastien Villemot 2014-12-17 20:26:33 UTC
Just for the record, I have reported the Radicale issue: https://github.com/Kozea/Radicale/issues/243
Comment 11 Patrick Ohly 2015-03-03 10:03:58 UTC
The current SyncEvolution master is using a more efficient approach for the "is empty" check (point 2 from comment 3).

I don't intend to implement point 3, because the current solution for CalDAV (a REPORT without item content) should already be comparable in performance to the simpler PROPFIND and adding a slower, more accurate approach is of little value (isEmpty() is allowed to err on the side of caution and report "not empty" for cases where the server has mixed content and only items which would get ignored during the sync).

commit 75115022c525d21341be4113d789eccd6e23cca8
Author: Patrick Ohly <patrick.ohly@intel.com>
Date:   Thu Feb 12 13:36:42 2015 +0100

    CalDAV: more efficient "is empty" check (FDO #86335)
    
    Since 1.4.99.4, syncing WebDAV collections always checks first
    whether there are items in the collections. This was partly done for
    slow sync prevention (which is not necessary for empty collections),
    partly for the "is the datastore usable" check.
    
    However, this did not take into account that for CalDAV collections,
    the entire content gets downloaded for this check. That is because
    filtering by item type (VEVENT vs. VJOURNAL) is not implemented
    correctly by all servers. So now all CalDAV syncs, whether incremental
    or slow, always transfered all items, which is not the
    intention (incremental syncs should be fast and efficient).
    
    This commit adds a more efficient isEmpty() check: for simple CardDAV
    collections, only luid and etag get transferred, as in
    listAllItems(). This is the behavior from 1.5.
    
    For CalDAV, a report with a filter for the content type is used and
    the transfer gets aborted after the first item, without actually
    double-checking the content of the item. This is different from
    listAllItems(), which really transfers the content. This extra content
    check would only be needed for some old servers (Radical 0.7) and is
    not essential, because reporting "not empty" even when empty is safe.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.