Bug 24625

Summary: porter_force_close might never call its callback; porter might tell connection to die repeatedly
Product: Wocky Reporter: Will Thompson <will>
Component: GeneralAssignee: Vivek Dasmohapatra <vivek>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: vivek
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 24626    

Description Will Thompson 2009-10-19 10:57:12 UTC
A whack-a-mole session revealed a couple of bugs in Wocky:

• The porter's internal flag to remind it it's already closing up was not set until after stream errors were reported, which lead to inconsistencies if the signals' callbacks called back into the porter;
• The checks to avoid the porter repeatedly telling the connection to forcibly disconnect were inverted.

I have a WIP branch at <http://git.collabora.co.uk/?p=user/wjt/wocky.git;a=shortlog;h=refs/heads/disconnection-woes>.
Comment 1 Vivek Dasmohapatra 2009-10-20 03:41:05 UTC
Further invetigation indicates that the tests aren't quite as inverted as 
we thought they were originally. They're _sometimes_ inverted. So it looks 
like we need to track our state a little more carefully in wocky, and we 
may be trying to cram a tri-state's worth of information into two bi-states and getting it wrong.

The flag-being-set-after-signals-fired was definitely a bug though.

Comment 2 Vivek Dasmohapatra 2009-10-22 06:14:21 UTC
Ok, so, this is the root cause of the undead-dbus-names bug:

• The flag in the wocky porter saying it was in shutdown was set _after_ the remote-* signals were fired.
• Ths meant a callback inside wocky did not get called, so the name was never cleared from the bus (even though the object was correctly unref()d).
• The tests for the corce close result were not inverted, their purpose was not to prevent a second force close, but to prevent a callback being set up when there was no result to hold its response (ie to avoid an _unexpected_ force close (which shouldn't happen anyway)).

However:

• The porter can only support one forced close shutdown operation in flight at a time
• It did not check for already-in-shutdown before attempting a forced shutdown
• When the async stanza_received_cb was triggered and failed to receive a stanza it tried to start a second force close shutdown
• This caused the XMPP connection to report an error in idle with the porter as its user_data

This caused two problems:
• The porter had already been unref()d when the idle error was reported, instant death when we tried to cast the user data back to a porter
• The async result had already been used and cleared by the first force close shutdown op, so the callback to the second force close op had no result to use to report back to us.

So, the fixes are:

• flip the closing flag early (before the remote-* signals)
• add a flag to indicate when a forced close has been started, cope gracefully when a force-close op is already in flight and we request a second one
• ref the porter passed as user data to the force-close callback
• unref the porter in the force close callback
• report errors (in idle if appropriate) when a second force-close is attempted

http://git.collabora.co.uk/?p=user/vivek/wocky.git;a=shortlog;h=refs/heads/porter-disconnect-bugs

 
Comment 3 Vivek Dasmohapatra 2009-10-23 08:16:31 UTC
Merged to master, wocky submodule synced, all tests (including new one for this problem) pass.

git+ssh://git.collabora.co.uk/git/telepathy-gabble.git
commit fb462e4687c35e0aaa88da064ca67e081b9af006

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.