Bug 23684

Summary: Gabble advertizes an avatar's sha1 in its presence stanza without following XEP-0153
Product: Telepathy Reporter: Alban Crequy <alban.crequy>
Component: gabbleAssignee: Alban Crequy <alban.crequy>
Status: RESOLVED FIXED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: normal    
Priority: medium Keywords: patch
Version: unspecified   
Hardware: Other   
OS: All   
URL: http://git.collabora.co.uk/?p=user/alban/telepathy-gabble.git;a=shortlog;h=refs/heads/avatar-bug23684-0.8
Whiteboard:
i915 platform: i915 features:

Description Alban Crequy 2009-09-03 04:56:08 UTC
telepathy-gabble 0.8.0

XEP-0153 4.3 2. 3rd bullet:
"If the hashes are different, then the client MUST NOT attempt to resolve the conflict by uploading its avatar image again. Instead, it MUST defer to the content of the retrieved vCard by resetting its image hash (see below) and providing that hash in future presence broadcasts."

Gabble does not implement the Resetting the Image Hash described in section 4.4.

When Gabble receives a presence stanza from another Jabber client on the same account (but different resource), connection_avatar_update_cb() is called and calls update_own_avatar_sha1(). It just replaces the avatar's hash and publish it immediately:

  conn->self_presence->avatar_sha1 = g_strdup (sha1);
  if (!_gabble_connection_signal_own_presence (conn, &error))

I fear there can be somehow infinite ping-pong with different Jabber client implementations which do not follow the XEP-0153...
Comment 1 Simon McVittie 2009-09-09 10:23:10 UTC
The actual code looks good to me for 0.8 and 0.9, with some stylistic complaints. The regression test, not so much.

> +        }
> +      else
> +        presence->avatar_sha1 = g_strdup (sha1);

Put the else clause in {} to be consistent with the if clause, please.

> priv->conn->parent.self_handle

I'd prefer to have a local, "TpBaseConnection *base_conn = (TpBaseConnection *) priv->conn;", and use base_conn->self_handle. I've been convinced that this is substantially better style.

> +    event = q.expect('stream-presence')
> +    event = q.expect('dbus-signal', signal='AvatarUpdated')
> +    assert event.args[0] == 1, event.args
> +    assert event.args[1] == "", event.args
> +    event = q.expect('stream-iq', to=None, query_ns='vcard-temp',
> +            query_name='vCard')

This is a race - the tests read from the D-Bus socket and the stream socket in a non-deterministic order. You'll have to use:

stream_presence, avatar_update, stream_iq = q.expect_many(EventPattern(...
Comment 2 Alban Crequy 2009-09-09 11:56:28 UTC
(In reply to comment #1)
> The actual code looks good to me for 0.8 and 0.9, with some stylistic
> complaints. The regression test, not so much.

Do you mean the regression test is not good for 0.8 and 0.9, or that your complaints on the regression test are not stylistic?

> > +        }
> > +      else
> > +        presence->avatar_sha1 = g_strdup (sha1);
> 
> Put the else clause in {} to be consistent with the if clause, please.

Fixed.

> > priv->conn->parent.self_handle
> 
> I'd prefer to have a local, "TpBaseConnection *base_conn = (TpBaseConnection *)
> priv->conn;", and use base_conn->self_handle. I've been convinced that this is
> substantially better style.

Fixed.

> > +    event = q.expect('stream-presence')
> > +    event = q.expect('dbus-signal', signal='AvatarUpdated')
> > +    assert event.args[0] == 1, event.args
> > +    assert event.args[1] == "", event.args
> > +    event = q.expect('stream-iq', to=None, query_ns='vcard-temp',
> > +            query_name='vCard')
> 
> This is a race - the tests read from the D-Bus socket and the stream socket in
> a non-deterministic order. You'll have to use:
> 
> stream_presence, avatar_update, stream_iq = q.expect_many(EventPattern(...

Indeed. However, I wanted to check that stream-presence arrives before stream-iq as it is required by the XEP-0153 section 4.4 bullet 1 and 2.

I don't know whether it is possible to do with the test framework, so I just fixed the race, and thus remove that check.

Branch pushed again.
Comment 3 Alban Crequy 2009-09-22 07:03:19 UTC
Xavier had this problem. In his case, when he downloads the vCard from the server, he got the reply:

<iq type="error" ...
<error code="500" type="wait"><resource-constraint xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/></error></iq>

Maybe the loop was caused by the quota limit on stanza in Google Jabber servers?
   o  <resource-constraint/> -- the server or recipient lacks the system
      resources necessary to service the request; the associated error
      type SHOULD be "wait".

Gabble should "wait" a timer and ask again latter. But do not ask the vCard again if the timeout is not reached. This is not implemented in Gabble. So I remove "[fixed in 0.8/master git]" from the subject of this bug.
Comment 4 Alban Crequy 2009-09-24 10:35:24 UTC
It should be fixed in my branch "avatar-bug23684-0.8". This branch is for telepathy-gabble-0.8.

Review welcome
Comment 5 Alban Crequy 2009-09-24 11:55:04 UTC
Add patch keyword
Comment 6 Will Thompson 2009-09-28 08:07:47 UTC
> It will fixed the timers override (problem found during Daf's review)

For future reference: I don't know what this means, since I wasn't reading IRC at the time of the review. It would be better to say what the actual problem is. Ditto a couple of other commit messages.

I made a couple of clarifications rather than writing review comments: <http://git.collabora.co.uk/?p=user/wjt/telepathy-gabble-wjt.git;a=shortlog;h=refs/heads/avatar-bug23684-0.8>. Given those, I think this is good to go.
Comment 7 Alban Crequy 2009-09-28 09:10:02 UTC
Your patches look fine.

I took them, rebased my branch on telepathy-gabble-0.8, and pushed it on the server (see URL), run "make check" and smoke-tested it.

Last commit is 09042cf7712cb0a4c0943078bf48912acb2cfc34.

Waiting for Xavier's test.
Comment 8 Will Thompson 2009-10-02 12:55:34 UTC
Current status: what we believed was a fix was released in 0.8.4, but it turned out to loop in certain situations, and to contain a crasher. A better fix will be in 0.8.5...
Comment 9 Will Thompson 2009-10-02 16:54:51 UTC
Should be fixed in 0.8.5, yay.
Comment 10 Alban Crequy 2009-10-08 03:15:32 UTC
0.8.5 is not correct. Reopen.

- In case there is no avatar at all, there is still a loop:
https://bugs.launchpad.net/ubuntu/+source/telepathy-gabble/+bug/445474

- an assertion happens, probably on disconnection while there is several pending vCard requests:
https://bugs.edge.launchpad.net/ubuntu/+source/telepathy-gabble/+bug/445847
Comment 11 Alban Crequy 2009-10-08 07:58:21 UTC
(In reply to comment #10)
> 0.8.5 is not correct. Reopen.
> 
> - In case there is no avatar at all, there is still a loop:
> https://bugs.launchpad.net/ubuntu/+source/telepathy-gabble/+bug/445474
> 
> - an assertion happens, probably on disconnection while there is several
> pending vCard requests:
> https://bugs.edge.launchpad.net/ubuntu/+source/telepathy-gabble/+bug/445847

Both are fixed in my branch and the unit tests are updated to avoid regressions.

Review welcome :-)

Comment 12 Will Thompson 2009-10-08 08:30:55 UTC
+    sync_stream(q, stream) # Twice because the vCard request is done in
+    sync_stream(q, stream) # g_idle_add

Does this actually reliably give the idle a chance to run?

6f705c81fc7cec means that we leak presence->avatar_sha1 when doing
conflict resolution: self_avatar_resolve_conflict() sets it to NULL
without freeing it. Either it should free it, or not set it to NULL.

+          DEBUG ("Avatar conflict! Received hash '%s' and our cache is '%s'",
+            sha1, presence->avatar_sha1 == NULL ? "<NULL>" : presence->avatar_sha1);

Style: The second line is too long

+    # if the server don't reply after the timeout and there is pending
+    # requests, Gabble must handle that correctly and not crash.
+    contacts = ['random_user_%s@bigserver.com' % i for i in range(1, 100) ]
+    handles = [conn.RequestHandles(1, [contact])[0] for contact in contacts]
+    conn.Avatars.RequestAvatars(handles)
+    try:
+        conn.Avatars.RequestAvatar(handles[-1])
+    except dbus.DBusException, e:
+        assertEquals(cs.NOT_AVAILABLE, e.get_dbus_name())
+    else:
+        assert False

I know how this works, but we should document what this is actually
doing. Something like:

  # First, ensure the pipeline is full...
  ...
  # Then, request yet another avatar. The request will time out before
  # the IQ is sent, which used to trigger a crash in Gabble
  # (LP#445847). So, we assert that the error is NotAvailable (rather
  # than the error returned when the service crashes).

Comment 13 Alban Crequy 2009-10-08 08:46:57 UTC
(In reply to comment #12)
> +    sync_stream(q, stream) # Twice because the vCard request is done in
> +    sync_stream(q, stream) # g_idle_add
> 
> Does this actually reliably give the idle a chance to run?

In practice, yes. However I am not sure about the theory.

> 6f705c81fc7cec means that we leak presence->avatar_sha1 when doing
> conflict resolution: self_avatar_resolve_conflict() sets it to NULL
> without freeing it. Either it should free it, or not set it to NULL.

Sorry, my mistake. I wanted to do the free() near the place where it is set to NULL. Fixed.

Other comments fixed too.
Comment 14 Will Thompson 2009-10-08 09:45:46 UTC
Okay, looks good to me!
Comment 15 Will Thompson 2009-10-09 03:02:50 UTC
Merged, will be fixed in 0.8.6 and 0.9.2.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.