Bug 42462

Summary: Crash when getting a non-utf8 presence status
Product: Telepathy Reporter: Sjoerd Simons <sjoerd>
Component: gabbleAssignee: Telepathy bugs list <telepathy-bugs>
Status: RESOLVED FIXED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: blocker    
Priority: medium Keywords: patch
Version: git master   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: straw-men patch

Description Sjoerd Simons 2011-11-01 02:31:31 UTC
It seems some-one in the prosody muc has the following byte sequence as their presence status: {0xef, 0xb7, 0xaf}. Which apparently xml2 doesn't validate or chokes upon, causing a nice crash when emitting this status message over d-bus :/
Comment 1 Sjoerd Simons 2011-11-01 02:34:55 UTC
Created attachment 52982 [details] [review]
straw-men patch

Silly patch that works around the issue. Needs more checking for all the places where we could get non-utf8 out of libxml and make sure we verify them all + add tests to make sure things are happy..

Really like to have some fuzzing tests at some point :)
Comment 2 Sjoerd Simons 2011-11-01 13:54:17 UTC
So it seems the issue stems from the fact that prosody and probably other xmpp server pass through all valid unicode code-points. Even though some of those codepoint are specified as being Non Characters which should only be used for internal use.

D-Bus and Glib on the other hand only consider Unicode Characters to be *valid*, not all Unicode codepoints..

Great fun!
Comment 3 Sjoerd Simons 2011-11-28 02:19:05 UTC
Fixed in my branch:
  http://cgit.collabora.com/git/user/sjoerd/wocky.git/log/?h=invalid-character-test
Comment 4 Simon McVittie 2011-11-28 02:33:38 UTC
This would benefit from https://bugzilla.gnome.org/show_bug.cgi?id=610969 being fixed. Maybe a GLib reviewer will notice that bug one day, or maybe based on your experience of writing this patch you can give feedback on which of the proposed features on that bug you would/wouldn't find useful...

+ g_string_append (result, "�");

I'm not sure how portable it is to have UTF-8 in our string constants: the version in GLib is "\357\277\275" with a comment explaining that it's U+FFFD REPLACEMENT CHARACTER.

Otherwise this looks fine.
Comment 5 Sjoerd Simons 2011-11-29 04:00:53 UTC
Fixed in git

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.