Summary: | Don't try to reconnect an account over and over for certain errors | ||
---|---|---|---|
Product: | Telepathy | Reporter: | Emilio Pozuelo Monfort <pochu27> |
Component: | mission-control | Assignee: | Simon McVittie <smcv> |
Status: | RESOLVED MOVED | QA Contact: | Telepathy bugs list <telepathy-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | danielle, pochu27 |
Version: | git master | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Check the specific error code when deciding whether to reconnect or not
Check for the specific error codes when reconnecting Check for the specific error codes when reconnecting Check for the specific error codes when reconnecting |
Description
Emilio Pozuelo Monfort
2011-06-02 02:17:26 UTC
Does MC actually reconnect on that error? Last time I looked, it had a whitelist of errors that might potentially be addressed by reconnecting, and would not reconnect on other errors. (In reply to comment #1) > Does MC actually reconnect on that error? Last time I looked, it had a > whitelist of errors that might potentially be addressed by reconnecting, and > would not reconnect on other errors. You are right. That is in mcd-connection.c:mcd_connection_invalidated_cb() if ((priv->abort_reason == TP_CONNECTION_STATUS_REASON_NONE_SPECIFIED || priv->abort_reason == TP_CONNECTION_STATUS_REASON_NETWORK_ERROR) && priv->probation_drop_count <= PROBATION_MAX_DROPPED) { /* we were disconnected by a network error or by a connection manager * crash (in the latter case, we get NoneSpecified as a reason): don't * abort the connection but try to reconnect later */ The problem is that there doesn't seem to be a TP_CONNECTION_STATUS_REASON_UPGRADE_SOFTWARE_REQUIRED, and so CMs may use _NETWORK_ERROR (for the lack of a more appropriate/specific reason), and so MC tries to reconnect. So the question is, should there be more TP_CONNECTION_STATUS_REASON_*, should CMs use another one for this error, or anything else, like e.g. MC should check dbus_reason too? (In reply to comment #2) > The problem is that there doesn't seem to be a > TP_CONNECTION_STATUS_REASON_UPGRADE_SOFTWARE_REQUIRED, and so CMs may use > _NETWORK_ERROR (for the lack of a more appropriate/specific reason) Indeed, errors.xml says Network_Error is the generic form of SoftwareUpdateRequired. (I'm not convinced that's right - if there isn't a better enum member, it should be Not_Specified - but that wouldn't help us here because MC treats those two the same.) > tries to reconnect. So the question is, should there be more > TP_CONNECTION_STATUS_REASON_*, should CMs use another one for this error, or > anything else, like e.g. MC should check dbus_reason too? Ideally, MC should base its behaviour on the GError in TpConnection::invalidated (which is derived from the dbus_reason, or from the enum member if we don't understand the dbus_reason), falling back to tp_connection_get_status() if it doesn't understand the GError. This would give it a reliable way to detect CM crashes (TP_DBUS_ERROR_NAME_OWNER_LOST, I believe?). Created attachment 47990 [details] [review] Check the specific error code when deciding whether to reconnect or not How about something like this? Review of attachment 47990 [details] [review]: ::: src/mcd-connection.c @@ +1222,2 @@ if ((priv->abort_reason == TP_CONNECTION_STATUS_REASON_NONE_SPECIFIED || + code == TP_ERROR_NETWORK_ERROR) && ConnectionLost (for instance) is also a NETWORK_ERROR, so this change would make MC not reconnect when a connection was lost (if CMs used ConnectionLost correctly, which in practice they probably don't). If you're going to switch away from using abort_reason (which I would encourage!), I'd suggest not using abort_reason in this conditional at all. Perhaps factor out a function: gboolean connection_should_reconnect (TpConnection *tp_conn, guint domain, gint code) { TpConnectionStatusReason reason; if (domain == TP_ERROR) { switch (code) { case TP_ERROR_CONNECTION_FAILED: case TP_ERROR_CONNECTION_LOST: case TP_ERROR_DISCONNECTED: case TP_ERROR_NETWORK_ERROR: return TRUE; case TP_ERROR_SOFTWARE_UPGRADE_REQUIRED: case TP_ERROR_SERVICE_BUSY: case TP_ERROR_CONNECTION_REPLACED: case TP_ERROR_ALREADY_CONNECTED: case TP_ERROR_CONNECTION_REFUSED: /* possibly a few others (notably, all the encryption * things), but the others are hopefully not * generically mapped to NETWORK_ERROR anyway */ return FALSE; } } else if (domain == TP_DBUS_ERROR) { switch (code) { TP_DBUS_ERROR_NAME_OWNER_LOST: /* CM crashed */ return TRUE; } } /* not sure what the GError meant, so check the generic * status code */ tp_connection_get_status (tp_conn, &reason); switch (reason) { case TP_CONNECTION_STATUS_REASON_NETWORK_ERROR: return TRUE; } return FALSE; } Bonus points if you add some vaguely useful DEBUG(). Created attachment 48028 [details] [review] Check for the specific error codes when reconnecting > If you're going to switch away from using abort_reason (which I would > encourage!), I'd suggest not using abort_reason in this conditional at all. > > Perhaps factor out a function: Sounds good, I've done that, adding TP_CONNECTION_STATUS_REASON_NONE_SPECIFIED too as the old check had. I've tested a couple of scenarios and it works fine: (process:19287): mcd-DEBUG: connection_should_reconnect: error code org.freedesktop.Telepathy.Error.SoftwareUpgradeRequired, not reconnecting (process:19287): mcd-DEBUG: connection_should_reconnect: dbus error code: OWNER_LOST, reconnecting Review of attachment 48028 [details] [review]: ::: src/mcd-connection.c @@ +1217,3 @@ + /* possibly a few others (notably, all the encryption + * things), but the others are hopefully not + * generically mapped to NETWORK_ERROR anyway */ Er, that comment was meant to be a note to you, not something you should commit :-) Either replace it with more error codes (look at the list in devhelp or something and think about whether each one would be helped by reconnection or not), or check that for all our other error codes, believing the status reason would be OK. @@ +1241,3 @@ + { + case TP_CONNECTION_STATUS_REASON_NETWORK_ERROR: + case TP_CONNECTION_STATUS_REASON_NONE_SPECIFIED: I'm not so sure that NONE_SPECIFIED should trigger reconnection: if the string error name was not understood and the status reason is NONE_SPECIFIED, that tells us nothing. The comment suggests that NONE_SPECIFIED was only there because it's what a crashing connection manager would produce, but TP_DBUS_ERROR_NAME_OWNER_LOST covers that case now, so maybe we don't need it? @@ +1245,3 @@ + return TRUE; + default: + break; Useless default case that does nothing, can hopefully be removed? Created attachment 48037 [details] [review] Check for the specific error codes when reconnecting (In reply to comment #7) > Review of attachment 48028 [details] [review]: > > ::: src/mcd-connection.c > @@ +1217,3 @@ > + /* possibly a few others (notably, all the encryption > + * things), but the others are hopefully not > + * generically mapped to NETWORK_ERROR anyway */ > > Er, that comment was meant to be a note to you, not something you should commit > :-) Oh :-) > Either replace it with more error codes (look at the list in devhelp or > something and think about whether each one would be helped by reconnection or > not), or check that for all our other error codes, believing the status reason > would be OK. I have added quite a few. There are a bunch that I don't think would ever happen when connecting (e.g. errors that are channel-specific), so I have left those out. I have added another DEBUG in case we don't handle a TpError. > @@ +1241,3 @@ > + { > + case TP_CONNECTION_STATUS_REASON_NETWORK_ERROR: > + case TP_CONNECTION_STATUS_REASON_NONE_SPECIFIED: > > I'm not so sure that NONE_SPECIFIED should trigger reconnection: if the string > error name was not understood and the status reason is NONE_SPECIFIED, that > tells us nothing. > > The comment suggests that NONE_SPECIFIED was only there because it's what a > crashing connection manager would produce, but TP_DBUS_ERROR_NAME_OWNER_LOST > covers that case now, so maybe we don't need it? Agreed, I've removed it. If we find out that there are other cases that it was covering, we should add the specific TpErrors if possible. > @@ +1245,3 @@ > + return TRUE; > + default: > + break; > > Useless default case that does nothing, can hopefully be removed? I added it because gcc was complaining. Created attachment 48038 [details] [review] Check for the specific error codes when reconnecting Err, I forgot to amend the last change. Orthogonal to this, we could make SoftwareUpdateRequired not be a NETWORK_ERROR. We could specify in the spec that MC will attempt reconnect on certain error enums. (In reply to comment #9) > Check for the specific error codes when reconnecting Patch applied, will be in 5.15 (sorry for the delay). I'm leaving this bug open while I think about whether all the cases are right, but it's clearly better than what we have right now... Comment on attachment 48038 [details] [review] Check for the specific error codes when reconnecting Review of attachment 48038 [details] [review]: ----------------------------------------------------------------- ::: src/mcd-connection.c @@ +1236,5 @@ > + return FALSE; > + > + default: > + DEBUG ("TpError code %s not handled", > + tp_error_get_dbus_name (code)); REGISTRATION_EXISTS, WOULD_BREAK_ANONYMITY, CAPTCHA_NOT_SUPPORTED should also specifically not reconnect (although as implemented here, they probably won't anyway). RESOURCE_UNAVAILABLE I'm not sure about. Any opinions? The remaining unhandled cases are NOT_IMPLEMENTED NOT_AVAILABLE CONFUSED SERVICE_CONFUSED which are too vague to make any particular decision, and PERMISSION_DENIED CHANNEL_* NOT_YOURS NOT_CAPABLE OFFLINE BUSY NO_ANSWER DOES_NOT_EXIST TERMINATED NOT_YET REJECTED PICKED_UP_ELSEWHERE EMERGENCY_CALLS_NOT_SUPPORTED INSUFFICIENT_BALANCE MEDIA_* which don't make sense in this context. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/telepathy/telepathy-mission-control/issues/40. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.