Bug 37078

Summary: crashes if connecting fails just before it succeeds
Product: Telepathy Reporter: Simon McVittie <smcv>
Component: gabbleAssignee: Simon McVittie <smcv>
Status: RESOLVED FIXED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: normal    
Priority: medium CC: marco.barisione
Version: 0.12Keywords: patch
Hardware: Other   
OS: All   
URL: http://cgit.collabora.co.uk/git/user/smcv/telepathy-gabble-smcv.git/commit/?h=012-schrodinger-connected
Whiteboard: NB#254027, review+
i915 platform: i915 features:
Attachments: set_status_to_connected: do nothing if we already disconnected

Description Simon McVittie 2011-05-10 10:51:28 UTC
We've seen some Gabble crash-dumps with these criticals logged (in the configuration where they were reported, they're non-fatal):

May  5 13:04:07 (none) telepathy-gabble[23476]: GLIB CRITICAL ** tp-glib -
tp_base_connection_add_interfaces: assertion `self->status !=
TP_CONNECTION_STATUS_DISCONNECTED' failed
May  5 13:04:07 (none) telepathy-gabble[23476]: GLIB CRITICAL ** tp-glib -
tp_base_connection_change_status: assertion `prev_status ==
TP_CONNECTION_STATUS_CONNECTING' failed

and then an abort caused by this assertion failure:

"(base->status == TP_CONNECTION_STATUS_DISCONNECTED) ||
(base->status == TP_INTERNAL_CONNECTION_STATUS_NEW)"

The reason that assertion should be OK is that GabbleConnectionManager holds a
ref to every GabbleConnection, from before the time it moves to CONNECTED until
after the time it moves to DISCONNECTED. The life-cycle is NEW -> CONNECTING ->
CONNECTED -> DISCONNECTED, except that you can also become DISCONNECTED from
any earlier state.

It looks as though the two criticals I quoted can only happen if:

* set_status_to_connected is called when we already DISCONNECTED
* which requires decrement_waiting_connected when we already DISCONNECTED

s_s_t_c should bail out early if we already DISCONNECTED; that's a Gabble bug.

Meanwhile, telepathy-glib's tp_base_connection_change_status() can critical()
but already have had its side-effect of changing the status to CONNECTED:
that's a telepathy-glib bug.

The backtrace contains GSimpleAsyncResult machinery, so I think what's happened here is:

* bare_jid_disco_cb succeeds, we've disco'ed ourselves
* conn_presence_set_initial_presence_async succeeds, and schedules a call
  to its callback in an idle; we are now ready to be CONNECTED, but because
  of GAsyncResult calling conventions, we won't call the callback right now
* someone calls Disconnect(), and we do so, synchronously; we are now
  DISCONNECTED
* the idle goes off and connection_initial_presence_cb is called
Comment 1 Simon McVittie 2011-05-10 11:14:21 UTC
Created attachment 46559 [details] [review]
set_status_to_connected: do nothing if we already  disconnected

Now that we're making more use of GAsyncResult, we can easily get into
this situation:

* all but one of the preconditions for being CONNECTED have happened;
  the remaining one uses GAsyncResult (currently that can only be
  conn_presence_set_initial_presence_async)

* conn_presence_set_initial_presence succeeds, and schedules a call
  to its callback in an idle; we are now ready to be CONNECTED, but because
  of GAsyncResult calling conventions, we won't call the callback right now

* someone calls Disconnect(), and we do so, synchronously; we are now
  DISCONNECTED

* the idle goes off and connection_initial_presence_cb is called,
  with success, while DISCONNECTED!
Comment 2 Vivek Dasmohapatra 2011-05-13 07:51:48 UTC
ship it.
Comment 3 Simon McVittie 2011-05-16 02:27:13 UTC
Fixed in git for 0.12.1, 0.13.0

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.