Bug 40283 - Crash when a client needs recovery
Summary: Crash when a client needs recovery
Status: RESOLVED FIXED
Alias: None
Product: Telepathy
Classification: Unclassified
Component: mission-control (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Xavier Claessens
QA Contact: Telepathy bugs list
URL: http://cgit.collabora.com/git/user/xc...
Whiteboard: r+
Keywords: patch
: 39766 40659 45683 (view as bug list)
Depends on:
Blocks: 30043
  Show dependency treegraph
 
Reported: 2011-08-22 03:20 UTC by Xavier Claessens
Modified: 2012-02-06 05:05 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Xavier Claessens 2011-08-22 03:20:49 UTC
backtrace:

(process:13287): mcd-DEBUG: mcd_dispatcher_client_needs_recovery_cb: called

Program received signal SIGSEGV, Segmentation fault.
g_str_hash (v=0x0) at /build/buildd/glib2.0-2.29.16/./glib/gstring.c:142
142	/build/buildd/glib2.0-2.29.16/./glib/gstring.c: Aucun fichier ou dossier de ce type.
	in /build/buildd/glib2.0-2.29.16/./glib/gstring.c
(gdb) bt
#0  g_str_hash (v=0x0) at /build/buildd/glib2.0-2.29.16/./glib/gstring.c:142
#1  0x00007fa9b5cf0403 in g_hash_table_lookup_node (hash_return=<synthetic pointer>, key=0x0, hash_table=0x1a21240) at /build/buildd/glib2.0-2.29.16/./glib/ghash.c:360
#2  g_hash_table_lookup (hash_table=0x1a21240, key=0x0) at /build/buildd/glib2.0-2.29.16/./glib/ghash.c:1022
#3  0x0000000000429392 in mcd_dispatcher_client_needs_recovery_cb (client=0x1abf530, self=0x1a2e810) at mcd-dispatcher.c:683
#4  0x00007fa9b61e3ed4 in g_closure_invoke (closure=0x1aed680, return_value=0x0, n_param_values=1, param_values=0x1a4dc60, invocation_hint=<optimized out>)
    at /build/buildd/glib2.0-2.29.16/./gobject/gclosure.c:773
#5  0x00007fa9b61f717b in signal_emit_unlocked_R (node=<optimized out>, detail=0, instance=0x1abf530, emission_return=0x0, instance_and_params=0x1a4dc60)
    at /build/buildd/glib2.0-2.29.16/./gobject/gsignal.c:3271
#6  0x00007fa9b6200797 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=0x7fffb1f02588) at /build/buildd/glib2.0-2.29.16/./gobject/gsignal.c:3002
#7  0x00007fa9b6200962 in g_signal_emit (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>) at /build/buildd/glib2.0-2.29.16/./gobject/gsignal.c:3059
#8  0x00007fa9b6bb1ba8 in _tp_cli_dbus_properties_invoke_callback_get_all (self=0x1abf530, error=0x0, args=0x1b05000, generic_callback=0x43f960 <_mcd_client_proxy_observer_get_all_cb>, 
    user_data=<optimized out>, weak_object=<optimized out>) at _gen/tp-cli-generic-body.h:1190
#9  0x00007fa9b6bb8a80 in tp_proxy_pending_call_idle_invoke (p=0x1a9f860) at proxy-methods.c:153
#10 0x00007fa9b5d01efd in g_main_dispatch (context=0x1a067d0) at /build/buildd/glib2.0-2.29.16/./glib/gmain.c:2439
#11 g_main_context_dispatch (context=0x1a067d0) at /build/buildd/glib2.0-2.29.16/./glib/gmain.c:3008
#12 0x00007fa9b5d026f8 in g_main_context_iterate (context=0x1a067d0, block=<optimized out>, dispatch=1, self=<optimized out>) at /build/buildd/glib2.0-2.29.16/./glib/gmain.c:3086
#13 0x00007fa9b5d02c32 in g_main_loop_run (loop=0x1a01b90) at /build/buildd/glib2.0-2.29.16/./glib/gmain.c:3294
#14 0x000000000040e9e5 in main (argc=<optimized out>, argv=<optimized out>) at mc-server.c:78



Steps leading to that crash:

1) a text channel arrives, MC creates a DO and give it to all approvers.
2) gnome-shell is an approver and also an handler, so it claims the DO.
3) MC creates an Appoval with type APPROVAL_TYPE_CLAIM and client_bus_name to NULL.It goes to _mcd_dispatch_operation_check_client_locks() where is calls mcd_dispatch_operation_set_channel_handled_by() with NULL well-known-name for the handler. which makes McdHandlerMap map the channel to NULL client name.
4) any observer with recovery=true restart (in my case it is empathy-chatroom something because of a crash probably). This makes MC go to mcd_dispatcher_client_needs_recovery_cb() which goes through all channels to get their handler and gets a NULL hname and then the call to _mcd_client_registry_lookup() will lead to crash.


There are 2 issues here:
1) The big one: when an approver claims a channel, MC should insert a valid well-known-name for it in the table.
2) in recovery, if a well-known-name of an handler is NULL, it should not crash, because that could happen anyway according to the comment in _mcd_handler_map_set_path_handled():
    /* In case we want to re-invoke the same client later, remember its
     * well-known name, if we know it. (In edge cases where we're recovering
     * from an MC crash, we can only guess, so we get NULL.) */
Comment 1 Xavier Claessens 2011-08-22 04:31:58 UTC
Here is the branch fixing the crash: In claim() method implementation we need to resolve the sender's unique-name to a well-known-name. I'm not familiar with MC so if you there is a nicer/easier way of doing this, please tell :)
Comment 2 Xavier Claessens 2011-08-22 04:41:26 UTC
*** Bug 39766 has been marked as a duplicate of this bug. ***
Comment 3 Simon McVittie 2011-08-23 02:39:03 UTC
(In reply to comment #0)
> 1) The big one: when an approver claims a channel, MC should insert a valid
> well-known-name for it in the table.

Why? What is this well-known name useful for?

If the channel is "brought to the foreground" (I think this is called "reinvoking" inside MC?) with EnsureChannels, we'll need some sort of well-known name for the approver/handler process, but if we're picking arbitrarily among its well-known names, we can do that equally well later, when the EnsureChannels call actually happens. I believe the EnsureChannels code already knows how to do this, in fact.

Also, we should pick a name that is a Handler - if a process owns o.fd.T.Client.GnomeShellHandler (a Handler) and o.fd.T.Client.GnomeShellApprover (an Approver), we want to choose the Handler, not the Approver (and if none of the names are Handlers, we can't call HandleChannels). I believe the EnsureChannels code already does this, too.

Note that processes with no well-known names at all can theoretically Claim() a channel - although this is only valid if they're about to Close() it, because without being a Handler, the client won't be able to list the channel in its HandledChannels.

(Also: the indentation of the first patch seems to have got lost or something.)

> 2) in recovery, if a well-known-name of an handler is NULL, it should not
> crash

This is definitely true, though.

Could you/someone make a regression test for this, please? I agree this needs fixing, but I don't think this branch is the right fix.

I think your second patch (on its own) might well be the right fix for this.
Comment 4 Xavier Claessens 2011-08-23 03:26:00 UTC
MC maps each channel to its handler using the well-known-name of the handler process. In this case, MC needs to know who's handling each channel to check if it bypass observers so it knows when an observer recovers for each channel if it should be given to it or if the handler wants to bypass them.

In the case the handler got the channel as an approver and claimed it, MC does not keep in its mapping the handler for that channel.

I don't know if there are other cases where that mapping is used. But I think either that mapping is useless and should be removed, or we should make our best to keep it complete and correct. We should not inserting NULL as the handler of a channel if we can know who's handling it.
Comment 5 Simon McVittie 2011-08-23 05:00:42 UTC
(In reply to comment #4)
> MC maps each channel to its handler using the well-known-name of the handler
> process. In this case, MC needs to know who's handling each channel to check if
> it bypass observers so it knows when an observer recovers for each channel if
> it should be given to it or if the handler wants to bypass them.

Fair enough... but in that case, it should only be looking for a Handler head on that process, and preferably one whose filter matches the channel. _mcd_dispatcher_reinvoke_handler has similar logic for EnsureChannels (although the behaviour in corner cases is different - if EnsureChannels discovers that there is no matching handler, it just gives up).

I'm not sure how much BypassObservers=True makes sense on a client that will Claim() channels: for new channels, the observer-bypassing bit is only relevant if the client has a filter that is known in advance.

Perhaps the check we actually want is something like this:

* if we know the Handler well-known-name &&
  the corresponding Handler still exists

  * return its BypassObservers property

* else

  * foreach Handler with the given well-known name

    * if it has BypassObservers = TRUE, return TRUE

  * else, return FALSE

Or even just:

* get the possible handlers for the channel, as if it was a new channel
  (_mcd_client_registry_list_possible_handlers) - perhaps matching the
  unique name of the Approver that claimed it, or perhaps just all the
  handlers

* if *any of them* has BypassObservers = TRUE, return TRUE

* else return FALSE

It seems BypassObservers is still a draft property (Handler.FUTURE), and the rationale given in telepathy-spec talks about "use-cases where the handler doesn't want anyone observing the channel - for example, because channels it handles shouldn't be logged", so we probably want to err on the side of "if *anyone* says the channel bypasses observers, then it bypasses observers".

I noticed in passing that _mcd_dispatch_operation_handlers_can_bypass_observers() is implemented wrong; I'll open a bug for that.

> I don't know if there are other cases where that mapping is used.

_mcd_dispatcher_reinvoke_handler was the motivation for it. git grep would tell you whether there are others.

> But I think
> either that mapping is useless and should be removed, or we should make our
> best to keep it complete and correct. We should not inserting NULL as the
> handler of a channel if we can know who's handling it.

At the same time, we perhaps shouldn't be inserting a non-NULL handler for a channel if it's just a guess anyway - we can guess at least as well, and maybe even better, at time-of-use (as _mcd_dispatcher_reinvoke_handler does).

(Analogously: git performs rename-detection when you do git log, not when you do git commit.)
Comment 6 Simon McVittie 2011-08-23 05:14:46 UTC
(In reply to comment #5)
> It seems BypassObservers is still a draft property (Handler.FUTURE), and the
> rationale given in telepathy-spec talks about "use-cases where the handler
> doesn't want anyone observing the channel - for example, because channels it
> handles shouldn't be logged", so we probably want to err on the side of "if
> *anyone* says the channel bypasses observers, then it bypasses observers".

See also Bug #30043...

> I noticed in passing that
> _mcd_dispatch_operation_handlers_can_bypass_observers() is implemented wrong;
> I'll open a bug for that.

... and Bug #40305, which is this one.
Comment 7 Xavier Claessens 2011-09-20 00:22:15 UTC
*** Bug 40659 has been marked as a duplicate of this bug. ***
Comment 8 Xavier Claessens 2011-09-20 02:07:50 UTC
Ok, changed my branch to reuse the code from _mcd_dispatcher_reinvoke_handler(). By reading the code, it could also fail at presenting the channel if the handler claimed it from observer/approver. That issue is fixed as well by the branch (I hope).

I'll try to bring this under unit tests, but they all fail miserably with master here...
Comment 9 Simon McVittie 2011-09-20 03:44:07 UTC
> By reading the code, it could also fail at
> presenting the channel if the handler claimed it from observer/approver

Regression test or it didn't happen :-)

> +static McdClientProxy *
> +_mcd_dispatcher_lookup_handler (McdDispatcher *self,
> +                                TpChannel *channel,
> +                                McdRequest *request)

What this function does is subtle enough that it deserves a doc-comment, something like:

    /*
     * @channel: a non-null Channel which has already been dispatched
     * @request: (allow-none): if not NULL, a request that resulted in
     *     @channel
     *
     * Return a Handler to which @channel could be re-dispatched,
     * for instance as a result of a re-request or a PresentChannel call.
     *
     * If @channel was dispatched to a Handler, return that Handler.
     * Otherwise, if it was Claimed by a process, and that process
     * has a Handler to which the channel could have been dispatched,
     * return that Handler. Otherwise return NULL.
     */

I think it'd be clearer what this function does if you inlined mcd_dispatcher_dup_possible_handlers into it, at which point you'd discover that going from a list of Handlers to a list of well-known names and back to the first Handler was pointless, and you could just use the first element of the list returned by _mcd_client_registry_list_possible_handlers.

I think the doc-comment I suggested makes it clear that this is the right function to call from dispatcher_present_channel(), and the right function to call from _mcd_dispatcher_reinvoke_handler(). So making it into its own function is a good bit of refactoring.

I'm still not at all sure that it's the right solution to mcd_dispatcher_client_needs_recovery_cb() - see Comment #5 and Bug #40305 - but that's not a regression. Please at least add a FIXME comment pointing to Bug #40305 (and perhaps also Bug #30043 to get the desired semantics pinned down).

> + /* Failing that, maybe the Handler it was dispatched to was temporary;

Failing what? For it to make sense, you need another comment for this to continue from, something like the one you deleted from reinvoke_handler:

> if (well_known_name != NULL)
>   {
>     /* We know which Handler well-known name was responsible: if it
>      * still exists, we want to call HandleChannels on it */
>     handler = _mcd_client_registry_lookup (self->priv->clients,
>                                            well_known_name);
>   }

I think it might deserve a DEBUG(), either here or on successful exit from the function - I think it'd be good if each route through the function was guaranteed to DEBUG() at least once. Be careful that it can't printf ("%s", NULL), which crashes on some platforms.

I don't like the "goto out" where out is within a block; perhaps you could refactor that so "out" is either unnecessary, or at the top level?

In _mcd_dispatcher_reinvoke_handler() you've deleted calls to mcd_dispatcher_finish_reinvocation(). Why? If I understand the code correctly, you should mcd_dispatcher_finish_reinvocation() before you goto finally. Is this case covered by the tests?

In dispatcher_present_channel, because you use _mcd_channel_get_request(), what you're doing is particularly subtle: you're basing the search for a suitable Handler on the handler that was preferred by the request that initially created the (Tp)Channel, if any[1]. I think this is right (or at least, better than not looking at the request at all), but deserves a comment!

[1] Actually you're not, but only because _mcd_client_registry_list_possible_handlers() is misleading, for which I'll clone a bug.
Comment 10 Simon McVittie 2011-09-20 03:57:05 UTC
(In reply to comment #9)
> [1] Actually you're not, but only because
> _mcd_client_registry_list_possible_handlers() is misleading, for which I'll
> clone a bug.

Bug #41031
Comment 11 Xavier Claessens 2011-09-20 05:01:56 UTC
Ok, branch updated with comments fixed, and unit tests :)
Comment 12 Simon McVittie 2011-09-20 05:41:27 UTC
(In reply to comment #11)
> Ok, branch updated with comments fixed, and unit tests :)

Looks good now. From the department of reducing our technical debt, any chance you could look at Bug #41031 and Bug #40305 too?
Comment 13 Xavier Claessens 2011-09-20 06:44:58 UTC
Branch merged
Comment 14 Simon McVittie 2012-02-06 05:05:53 UTC
*** Bug 45683 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.