Bug 37844

Summary: Don't try to reconnect an account over and over for certain errors
Product: Telepathy Reporter: Emilio Pozuelo Monfort <pochu27>
Component: mission-controlAssignee: Simon McVittie <smcv>
Status: RESOLVED MOVED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: normal    
Priority: medium CC: danielle, pochu27
Version: git master   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Check the specific error code when deciding whether to reconnect or not
Check for the specific error codes when reconnecting
Check for the specific error codes when reconnecting
Check for the specific error codes when reconnecting

Description Emilio Pozuelo Monfort 2011-06-02 02:17:26 UTC
Some errors are not temporary, so trying to reconnect over and over again when they happen doesn't make much sense. MC should have a list of those errors so it doesn't try to reconnect when an account fails to connect with one of them.

One of these errors is TP_ERROR_SOFTWARE_UPGRADE_REQUIRED
Comment 1 Simon McVittie 2011-06-02 02:38:50 UTC
Does MC actually reconnect on that error? Last time I looked, it had a whitelist of errors that might potentially be addressed by reconnecting, and would not reconnect on other errors.
Comment 2 Emilio Pozuelo Monfort 2011-06-15 00:13:36 UTC
(In reply to comment #1)
> Does MC actually reconnect on that error? Last time I looked, it had a
> whitelist of errors that might potentially be addressed by reconnecting, and
> would not reconnect on other errors.

You are right. That is in mcd-connection.c:mcd_connection_invalidated_cb()

    if ((priv->abort_reason == TP_CONNECTION_STATUS_REASON_NONE_SPECIFIED ||
         priv->abort_reason == TP_CONNECTION_STATUS_REASON_NETWORK_ERROR) &&
        priv->probation_drop_count <= PROBATION_MAX_DROPPED)
    {
        /* we were disconnected by a network error or by a connection manager
         * crash (in the latter case, we get NoneSpecified as a reason): don't
         * abort the connection but try to reconnect later */

The problem is that there doesn't seem to be a TP_CONNECTION_STATUS_REASON_UPGRADE_SOFTWARE_REQUIRED, and so CMs may use _NETWORK_ERROR (for the lack of a more appropriate/specific reason), and so MC tries to reconnect. So the question is, should there be more TP_CONNECTION_STATUS_REASON_*, should CMs use another one for this error, or anything else, like e.g. MC should check dbus_reason too?
Comment 3 Simon McVittie 2011-06-15 02:17:57 UTC
(In reply to comment #2)
> The problem is that there doesn't seem to be a
> TP_CONNECTION_STATUS_REASON_UPGRADE_SOFTWARE_REQUIRED, and so CMs may use
> _NETWORK_ERROR (for the lack of a more appropriate/specific reason)

Indeed, errors.xml says Network_Error is the generic form of SoftwareUpdateRequired. (I'm not convinced that's right - if there isn't a better enum member, it should be Not_Specified - but that wouldn't help us here because MC treats those two the same.)

> tries to reconnect. So the question is, should there be more
> TP_CONNECTION_STATUS_REASON_*, should CMs use another one for this error, or
> anything else, like e.g. MC should check dbus_reason too?

Ideally, MC should base its behaviour on the GError in TpConnection::invalidated (which is derived from the dbus_reason, or from the enum member if we don't understand the dbus_reason), falling back to tp_connection_get_status() if it doesn't understand the GError.

This would give it a reliable way to detect CM crashes (TP_DBUS_ERROR_NAME_OWNER_LOST, I believe?).
Comment 4 Emilio Pozuelo Monfort 2011-06-15 04:34:01 UTC
Created attachment 47990 [details] [review]
Check the specific error code when deciding whether to reconnect or not

How about something like this?
Comment 5 Simon McVittie 2011-06-15 05:07:38 UTC
Review of attachment 47990 [details] [review]:

::: src/mcd-connection.c
@@ +1222,2 @@
     if ((priv->abort_reason == TP_CONNECTION_STATUS_REASON_NONE_SPECIFIED ||
+         code == TP_ERROR_NETWORK_ERROR) &&

ConnectionLost (for instance) is also a NETWORK_ERROR, so this change would make MC not reconnect when a connection was lost (if CMs used ConnectionLost correctly, which in practice they probably don't).

If you're going to switch away from using abort_reason (which I would encourage!), I'd suggest not using abort_reason in this conditional at all.

Perhaps factor out a function:

gboolean
connection_should_reconnect (TpConnection *tp_conn,
    guint domain,
    gint code)
{
  TpConnectionStatusReason reason;

  if (domain == TP_ERROR)
    {
      switch (code)
        {
        case TP_ERROR_CONNECTION_FAILED:
        case TP_ERROR_CONNECTION_LOST:
        case TP_ERROR_DISCONNECTED:
        case TP_ERROR_NETWORK_ERROR:
          return TRUE;

        case TP_ERROR_SOFTWARE_UPGRADE_REQUIRED:
        case TP_ERROR_SERVICE_BUSY:
        case TP_ERROR_CONNECTION_REPLACED:
        case TP_ERROR_ALREADY_CONNECTED:
        case TP_ERROR_CONNECTION_REFUSED:
          /* possibly a few others (notably, all the encryption
           * things), but the others are hopefully not
           * generically mapped to NETWORK_ERROR anyway */
          return FALSE;
        }
    }
  else if (domain == TP_DBUS_ERROR)
    {
      switch (code)
        {
          TP_DBUS_ERROR_NAME_OWNER_LOST:
            /* CM crashed */
            return TRUE;
        }
    }

  /* not sure what the GError meant, so check the generic
   * status code */
  tp_connection_get_status (tp_conn, &reason);

  switch (reason)
    {
      case TP_CONNECTION_STATUS_REASON_NETWORK_ERROR:
        return TRUE;
    }

  return FALSE;
}

Bonus points if you add some vaguely useful DEBUG().
Comment 6 Emilio Pozuelo Monfort 2011-06-16 02:53:03 UTC
Created attachment 48028 [details] [review]
Check for the specific error codes when reconnecting

> If you're going to switch away from using abort_reason (which I would
> encourage!), I'd suggest not using abort_reason in this conditional at all.
> 
> Perhaps factor out a function:

Sounds good, I've done that, adding TP_CONNECTION_STATUS_REASON_NONE_SPECIFIED too as the old check had.

I've tested a couple of scenarios and it works fine:

(process:19287): mcd-DEBUG: connection_should_reconnect: error code org.freedesktop.Telepathy.Error.SoftwareUpgradeRequired, not reconnecting

(process:19287): mcd-DEBUG: connection_should_reconnect: dbus error code: OWNER_LOST, reconnecting
Comment 7 Simon McVittie 2011-06-16 03:11:14 UTC
Review of attachment 48028 [details] [review]:

::: src/mcd-connection.c
@@ +1217,3 @@
+            /* possibly a few others (notably, all the encryption
+             * things), but the others are hopefully not
+             * generically mapped to NETWORK_ERROR anyway */

Er, that comment was meant to be a note to you, not something you should commit :-)

Either replace it with more error codes (look at the list in devhelp or something and think about whether each one would be helped by reconnection or not), or check that for all our other error codes, believing the status reason would be OK.

@@ +1241,3 @@
+    {
+    case TP_CONNECTION_STATUS_REASON_NETWORK_ERROR:
+    case TP_CONNECTION_STATUS_REASON_NONE_SPECIFIED:

I'm not so sure that NONE_SPECIFIED should trigger reconnection: if the string error name was not understood and the status reason is NONE_SPECIFIED, that tells us nothing.

The comment suggests that NONE_SPECIFIED was only there because it's what a crashing connection manager would produce, but TP_DBUS_ERROR_NAME_OWNER_LOST covers that case now, so maybe we don't need it?

@@ +1245,3 @@
+        return TRUE;
+    default:
+        break;

Useless default case that does nothing, can hopefully be removed?
Comment 8 Emilio Pozuelo Monfort 2011-06-16 04:40:35 UTC
Created attachment 48037 [details] [review]
Check for the specific error codes when reconnecting

(In reply to comment #7)
> Review of attachment 48028 [details] [review]:
> 
> ::: src/mcd-connection.c
> @@ +1217,3 @@
> +            /* possibly a few others (notably, all the encryption
> +             * things), but the others are hopefully not
> +             * generically mapped to NETWORK_ERROR anyway */
> 
> Er, that comment was meant to be a note to you, not something you should commit
> :-)

Oh :-)

> Either replace it with more error codes (look at the list in devhelp or
> something and think about whether each one would be helped by reconnection or
> not), or check that for all our other error codes, believing the status reason
> would be OK.

I have added quite a few. There are a bunch that I don't think would ever happen when connecting (e.g. errors that are channel-specific), so I have left those out.

I have added another DEBUG in case we don't handle a TpError.

> @@ +1241,3 @@
> +    {
> +    case TP_CONNECTION_STATUS_REASON_NETWORK_ERROR:
> +    case TP_CONNECTION_STATUS_REASON_NONE_SPECIFIED:
> 
> I'm not so sure that NONE_SPECIFIED should trigger reconnection: if the string
> error name was not understood and the status reason is NONE_SPECIFIED, that
> tells us nothing.
> 
> The comment suggests that NONE_SPECIFIED was only there because it's what a
> crashing connection manager would produce, but TP_DBUS_ERROR_NAME_OWNER_LOST
> covers that case now, so maybe we don't need it?

Agreed, I've removed it. If we find out that there are other cases that it was covering, we should add the specific TpErrors if possible.

> @@ +1245,3 @@
> +        return TRUE;
> +    default:
> +        break;
> 
> Useless default case that does nothing, can hopefully be removed?

I added it because gcc was complaining.
Comment 9 Emilio Pozuelo Monfort 2011-06-16 04:42:05 UTC
Created attachment 48038 [details] [review]
Check for the specific error codes when reconnecting

Err, I forgot to amend the last change.
Comment 10 Danielle Madeley 2011-07-03 23:28:34 UTC
Orthogonal to this, we could make SoftwareUpdateRequired not be a NETWORK_ERROR.

We could specify in the spec that MC will attempt reconnect on certain error enums.
Comment 11 Simon McVittie 2012-10-09 14:44:06 UTC
(In reply to comment #9)
> Check for the specific error codes when reconnecting

Patch applied, will be in 5.15 (sorry for the delay).

I'm leaving this bug open while I think about whether all the cases are right, but it's clearly better than what we have right now...
Comment 12 Simon McVittie 2012-10-09 14:55:08 UTC
Comment on attachment 48038 [details] [review]
Check for the specific error codes when reconnecting

Review of attachment 48038 [details] [review]:
-----------------------------------------------------------------

::: src/mcd-connection.c
@@ +1236,5 @@
> +            return FALSE;
> +
> +        default:
> +            DEBUG ("TpError code %s not handled",
> +                tp_error_get_dbus_name (code));

REGISTRATION_EXISTS, WOULD_BREAK_ANONYMITY, CAPTCHA_NOT_SUPPORTED should also specifically not reconnect (although as implemented here, they probably won't anyway).

RESOURCE_UNAVAILABLE I'm not sure about. Any opinions?

The remaining unhandled cases are

NOT_IMPLEMENTED
NOT_AVAILABLE
CONFUSED
SERVICE_CONFUSED

which are too vague to make any particular decision, and

PERMISSION_DENIED
CHANNEL_*
NOT_YOURS
NOT_CAPABLE
OFFLINE
BUSY
NO_ANSWER
DOES_NOT_EXIST
TERMINATED
NOT_YET
REJECTED
PICKED_UP_ELSEWHERE
EMERGENCY_CALLS_NOT_SUPPORTED
INSUFFICIENT_BALANCE
MEDIA_*

which don't make sense in this context.
Comment 13 GitLab Migration User 2019-12-03 19:35:22 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/telepathy/telepathy-mission-control/issues/40.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.