During the sync, syncevolution try to log some information about the event and crash because the function 'g_variant_new_string' return a null pointer.
The event title contains '"@default/calendar_uoa_5: updating \"Lunch Thursday - Steph & Joe\377Lunch Thursday w/Steph\""' which is not a valid utf-8 string. due the '\377'. I am not sure if the char '\377' is part of the event title or it was generated by a previous conversion.
This cause this function to crash: (/src/gdbusxx/gdbus-cxx-bridge.h:1510)
because the g_variant_new_string returns a null pointer.
Created attachment 115231 [details]
full crash backtrace
Created attachment 115232 [details] [review]
Avoid crash during the log
check for g_variant_new_string result before call g_variant_builder_add_value
Comment on attachment 115232 [details] [review]
Avoid crash during the log
Review of attachment 115232 [details] [review]:
The commit message needs to include information about the root cause for the failure. As mentioned in the original bug report, this occurs when the "char *" string isn't really UTF-8 encoded.
I am not convinced that silently ignoring the issue by replacing the actual string with an empty one is the right thing to do. It's okay for a hotfix, but not for inclusion upstream.
If it is a string created by SyncEvolution, then throwing an error is the right thing to do. If it is a string coming from an uncertain source (like the calendar event), then it would be better to sanitize the data before sending it via D-Bus. For example, g_utf8_validate() could be used to copy just the valid characters and replace the rest with a special character like ?
Unfortunately catching all places where invalid strings might be passed into the gdbus layer is a lot of work, so pampering over the problem inside the layer by sanitizing all "char *" strings might be the only viable approach. Perhaps gdbus should not accept the ambiguous "char *" at all and instead rely on the caller to convert to a "UTF8Char *" first, either by casting (string known to be valid) or converting (untrusted string).
I'll have a look at fixing this properly.
Created attachment 116196 [details] [review]
incomplete attempt to enforce sane UTF-8 string in the D-Bus bindings
(In reply to Patrick Ohly from comment #4)
> I'll have a look at fixing this properly.
My attempt to introduce an explicit "this is an UTF-8 string" type ended up causing much more changes than hoped for, and still isn't complete (attached). I stopped when appending a std::pair<std::string, ...> as return value would not compile because it is treated like appending a parameter - adding a special exception as for normal return values would break the key point of the exercise for D-Bus method calls (forcing the caller to think about string content).
So I'll focus instead on silently removing invalid bytes.
Included the following patch in the upcoming 1.5.1:
Author: Patrick Ohly <firstname.lastname@example.org>
Date: Mon Jun 1 15:05:16 2015 +0200
syncing: avoid segfault for invalid text inside items (FDO #90118)
As reported by Canonical, syncing fails if data items contain
text which is not correct UTF-8 in one of the fields that
SyncEvolution logs in the command line output (like SUMMARY of
a calendar event).
That is because the byte string coming from the item is passed
unchecked to the D-Bus implementation for transmission via D-Bus. But
D-Bus strings must be correct UTF-8, so depending on the D-Bus library
in use, one gets a segfault (GIO D-Bus, due to an unchecked NULL
pointer access) or an "out of memory" error (libdbus, which checks for
What the D-Bus bindings now do is checking the string in advance (to
avoid error messages inside the D-Bus implementations) and then
replacing all invalid bytes with question marks. The rest of the
string is preserved.
Handling this inside the D-Bus binding layer is not the "correct"
solution (which would be to check for UTF-8 in the higher layers were
such bad data might get into SyncEvolution), but it is the less
intrusive and more complete one. Changing the bindings such that all
strings must be declared explicitly as "UTF-8 string" would have been
a way to find all places where such checks are missing, but it turned
out to be too complex and requiring too many changes.