Bug 23155

Summary: Connection: make handles survive until disconnection
Product: Telepathy Reporter: Simon McVittie <smcv>
Component: tp-specAssignee: Simon McVittie <smcv>
Status: RESOLVED FIXED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: normal    
Priority: medium CC: guillaume.desmottes, mikhail.zabaluev, sjoerd, will
Version: unspecifiedKeywords: patch
Hardware: Other   
OS: All   
URL: http://git.collabora.co.uk/?p=user/smcv/telepathy-spec-smcv.git;a=shortlog;h=refs/heads/immortality
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 16170    
Bug Blocks: 23148, 29459    

Description Simon McVittie 2009-08-05 08:07:36 UTC
Connection should perform distributed reference-counting for (at least Contact) handles.

Proposed migration path:

* add client_unref_handle() method to TpHandleRepoIface

* change the semantics of HoldHandles to "always add one reference", and ReleaseHandles to "remove all references owned by this client" (this is compatible)

* change the semantics of GetContactAttributes to ref the handles rather than holding them (this is compatible)

* add ReferenceHandles, UnreferenceHandles to Connection (only working on Contact handles, perhaps); ReferenceHandles is not strictly necessary but is more consistent

* augment TpBaseConnection and the Python equivalent to implement both sets of D-Bus API

* when those changes are old enough to rely on, change clients to ref handles instead of holding them

* in Telepathy 1.0 (Bug #23148), delete HoldHandles and ReleaseHandles

* when telepathy-glib API is broken, remove the supporting C API for ReleaseHandles, and rename client_hold_handle() to client_ref_handle() to reflect its new semantics
Comment 1 Simon McVittie 2010-04-05 11:29:36 UTC
The current handle model causes a number of unfortunate corner cases. Many
can be solved with something equivalent to MembersChangedDetailed's
contact-ids hint; MessageReceived needs to gain one of these (I'll clone
this bug).

However, we also have the problem that telepathy-glib's TpContact and
telepathy-qt4's Tp::Contact guarantee that they are holding a handle. This
can't be done without a D-Bus round trip (in at least some cases),
leading to problems like:

* Channel receives a message with sender #42, "sjoerd"
* Channel wants to emit a Message signal with a Contact object for sjoerd
* A Contact object for sjoerd is not in the cache
* HoldHandles(CONTACT, [42]) sent
* Channel receives a message with sender #23, "smcv"
* A Contact object for smcv *is* in the cache
* Channel emits message-received for the message from smcv
* Time passes
* CM responds to HoldHandles
* Channel emits message-received for the message from sjoerd
* The messages from sjoerd and smcv have been re-ordered!

telepathy-qt4 goes to heroic lengths to avoid this for Messages and for
Group changes (it has an internal queue, and delays the Qt signals until
it can form a Contact object), but Messages and Group events can still
get re-ordered.

My goal for solving this problem is that there's a (relatively) simple
set of extra things that CM authors (or, in practice, the libraries they
use) can do. If they do, client code interacting with their CM will use
a fast path that always creates contact objects synchronously and never
re-orders; if they fail to do the extra things, their CM is considered
to be deficient, and the client library recovers by using a slow-path
that may re-order messages.

Sjoerd proposes that we just stop ref-counting handles, and have them
all persist until the connection disconnects. Rob and I consider this
to be a good proposal.

Memory cost
===========

A back-of-an-envelope estimate: imagine that I'm in a chatroom as busy
as #ubuntu (about 1500 users), but the chatroom is on an XMPP server
with a long name. hypothetical-room@conference.pseudorandom.co.uk is
47 characters; suppose pessimistically that appending users' nicknames
leads to 100-character JIDs. This would lead to 150,000 bytes of string
data for the string pool, which is significant, but not ridiculous.

(Note that UI and CM will both need to store all of these strings for
the duration of the user's presence in the chatroom in any case - the
only change here is that when the user leaves the chatroom, the strings
aren't freed until the connection disconnects.)

A more problematic point is that TpIntSet is currently optimized for
relatively dense sets of small integers; it's an expanding bitfield with
one bit per handle, up to the largest handle it contains. The silliest
case for someone in #ubuntu is a TpIntSet with one member, handle #1500,
which would be 188 bytes of 0, with one 1 near the end.

Implementation in the spec
==========================

property Connection.HasImmortalHandles: b
    TRUE if handles last for the whole lifetime of the connection. This
    SHOULD be the case in all connection managers, but clients MUST
    interoperate with connection managers that reference-count handles.

method Connection.HoldHandles (u, au) -> nothing
    If Connection.HasImmortalHandles is true, this does nothing and returns
    successfully.

    Older connection managers give it the following semantics:
    ...

method Connection.ReleaseHandles (u, au) -> nothing
    If Connection.HasImmortalHandles is true, this does nothing and returns
    successfully.

    Older connection managers give it the following semantics:
    ...

method Connection.Interface.Contacts.GetContactAttributes(...) -> ...
    ...
    Hold: b
        If Connection.HasImmortalHandles is true, this parameter has no
        effect. On older connection managers, ...

Implementation in telepathy-glib (for CMs)
==========================================

TpIntSet should be reimplemented to deal with sparse sets better, ideally
O(number of members) rather than O(largest member).

TpDynamicHandleRepo could be preserved mostly as it is, with the handle
refcounting, handle-leak tracking, etc. stubbed out, but it would be more
efficient to reimplement it as a simple growing GPtrArray of strings, indexed
by the handle.

Implementation in client libraries
==================================

(I'm talking about telepathy-glib here, but add some :: and it's equally
true for telepathy-qt4...)

TpConnection should get HasImmortalHandles via its GetAll fast-path
(Bug #27459) before becoming prepared (Bug #21097).

TpChannel should wait for the TpConnection to be prepared (which does not
necessarily have to imply connected, any more) before it becomes prepared
itself. On receiving an event (group members change, message, ...),
if there is a TpContact already cached, it's used to signal the event.
Otherwise, if the connection has immortal handles, the TpChannel insta-creates
a TpContact via new internal API, and uses it to signal the event.

As a fallback, the TpChannel makes an async round-trip with the current
public API, and signals the event using the TpContact thus obtained - this
results in the event being delayed (re-ordered), but that's now considered to
be a CM bug (the CM wasn't helpful enough).

We could simplify the fallback code paths by having the client code just leak
all of its handle references, bringing old CMs closer to the new semantics :-)
Comment 2 Simon McVittie 2010-04-05 11:31:06 UTC
(In reply to comment #1)
> telepathy-qt4 goes to heroic lengths to avoid this for Messages and for
> Group changes (it has an internal queue, and delays the Qt signals until
> it can form a Contact object), but Messages and Group events can still
> get re-ordered.

That doesn't make much sense as written; it should say "re-ordered relative to each other" (the Messages events and the Group events are both queued, but on two independent queues).
Comment 3 Simon McVittie 2010-04-09 04:07:34 UTC
(In reply to comment #1)
> TpIntSet should be reimplemented to deal with sparse sets better, ideally
> O(number of members) rather than O(largest member).

This is Bug #16170. I used a hash table, with the following properties:

* O(n of members) size
* smaller size if handles are clustered somewhat (in practice they will be)
* O(largest member) in-order iteration using existing API
* O(n of members) out-of-order iteration using new API
* O(1) (? same as GHashTable, anyway) lookups, additions, removals

I also have a proof-of-concept branch that removes the handle refcounting:

http://git.collabora.co.uk/?p=user/smcv/telepathy-glib-smcv.git;a=shortlog;h=refs/heads/leak-handles
Comment 4 Mikhail Zabaluev 2010-04-09 06:30:46 UTC
I'm not comfortable with the idea of tying resource management functionality (handle lifetime management) with user-visible behavior (connection getting disconnected). True, it makes everything a lot simpler and in typical usage accumulation of unused handles is not expected to be a big problem. But still, on long-lived connections immortal handles basically constitute a specified memory leak, potentially bringing heap fragmentation and whatnot. In Maemo, we'll spend days warding off our endurance testers :)

If letting the clients control the lifetime of handles is too painful, maybe the right to garbage-collect handles, possibly after a suitably long time period, should be given to the connection managers?

Caution, brainstorm-in-progress follows: in a future backwards-incompatible Telepathy version, we could have a signal telling the clients that such and such handles were invalidated. Any client binding that wants to retain handles which it does not hold already should asynchronously reference or hold them, and handle or expose the case of concurrent invalidation, which should never happen in practice if CMs invalidate handles long time after last usage.
Comment 5 Simon McVittie 2010-11-09 09:59:10 UTC
(In reply to comment #1)
> TpIntSet should be reimplemented to deal with sparse sets better, ideally
> O(number of members) rather than O(largest member).

I did that.

(In reply to comment #4)
> I'm not comfortable with the idea of tying resource management functionality
> (handle lifetime management) with user-visible behavior (connection getting
> disconnected). True, it makes everything a lot simpler and in typical usage
> accumulation of unused handles is not expected to be a big problem. But still,
> on long-lived connections immortal handles basically constitute a specified
> memory leak, potentially bringing heap fragmentation and whatnot.

This is becoming a pressing concern, because Guillaume is working on Bug #29531, and I don't want to have the same madness in telepathy-glib as in the corresponding bit of telepathy-qt4.

I discussed this with Sjoerd this morning and we're still of the opinion that "leaking" handles is only a theoretical concern. In practice, you communicate with a finite number of contacts; even the pathological example of an XMPP version of #ubuntu is "only" 150K or so, and while you're in the room, the CM needs to hold all of those in memory *anyway* (most likely twice - once for itself, and once in its underlying library).

One implementation point we need to keep in mind is that APIs resulting in an unbounded number of handles should be avoided. We preemptively made ContactSearch operate in terms of identifiers for this reason, so that's OK.

> If letting the clients control the lifetime of handles is too painful, maybe
> the right to garbage-collect handles, possibly after a suitably long time
> period, should be given to the connection managers?

No, I think specifying a "leak" is less bad than specifying behaviour that, in practice, clients can never recover from.

> Caution, brainstorm-in-progress follows: in a future backwards-incompatible
> Telepathy version, we could have a signal telling the clients that such and
> such handles were invalidated. Any client binding that wants to retain handles
> which it does not hold already should asynchronously reference or hold them,
> and handle or expose the case of concurrent invalidation, which should never
> happen in practice if CMs invalidate handles long time after last usage.

I don't see how the binding or UI could deal with concurrent invalidation gracefully, and I suspect the machinery to track last-use is "heavier" than just keeping a grow-only string pool.

To get a sane client API without re-ordering, I think we need to be able to make a TpContact or Tp::Contact from a (handle, identifier) pair, synchronously (if it's not synchronous, we re-order events). Given that, if we allow deferred handle-referencing, we get this situation:

* for maximum badness, assume the client is an Observer, so you have no control over when messages are acknowledged
* receive a Messages message from Sjoerd, handle 42, identifier sjoerd@night
* client calls _tp_contact_new_sync (conn, 42, "sjoerd@night") -> 0xf00baa, attaches it to the TpSignalledMessage object and gets on with its life [A]
* client calls either HoldHandles([42]) or RequestHandles(["sjoerd@night"]) in the background [B]
* (time passes, your dbus-daemon is stalled for some reason; perhaps someone's using Tracker ;-)
* handle 42 is deallocated [C]
* handle 42 is reallocated to point to "smcv@reptile"
* the CM replies to B
* the client gets the reply [D]

With your proposal, the client gets a signal either at C or after D; regardless, the client gets a reply at D. In either case, it says that the state of the object 0xf00baa is inconsistent with reality. What's the client going to do about it? Between A and either C or D, it's been taking actions based on the information it thought it had; it can't necessarily undo those actions.

Having the CM do delayed-deallocation just increases the length of time for which you need to have been not scheduled to get this problem; it doesn't make it any more possible to recover.
Comment 6 Simon McVittie 2010-11-10 05:57:43 UTC
(In reply to comment #3)
> I also have a proof-of-concept branch that removes the handle refcounting:
> 
> http://git.collabora.co.uk/?p=user/smcv/telepathy-glib-smcv.git;a=shortlog;h=refs/heads/leak-handles

... which I've rebased onto a more current master.
Comment 8 Will Thompson 2010-11-21 06:52:26 UTC
Brainstorming this with Rob, Pekka and Aki at the MeeGo Conference led to the following rough proposal for 1.0:

  • In the D-Bus API, replace handles with (ss) tuples, where the first field is the vCard field, and the second field is the contact ID.
  • Cellular conference call-style meaningless channel-specific-handles could have a dummy namespace and look something like this: ("meaningless", "1234:really:tel:+441234567890").
  • General channel-specific handles could have some similar convention. Or maybe we could use (sss) throughout.

Using tuples rather than just strings makes Addressing more native than it will be in 0.x; it also makes it more obvious to applications that they should only store things that the CM has normalized. The argument that applications will confuse normalized and non-normalized strings is fair, but applications not using bindings mess up handles, too, and in applications using bindings we can use the host language's type system to distinguish the two.

Using strings throughout would make dbus-monitor traffic easier to read, would remove a whole section of Telepathy that currently needs explaining, and would also remove the memory-leak objection to immortal handles: CMs could forget contact ID tuples when they're done with them, safe in the knowledge that they can re-create them if a client uses one.

(Internally, telepathy-glib's API could replace TpHandle with GObjects that get passed around implementing a particular interface... TpSvcContact, say. This preserves fast comparisons and so on.
Comment 9 Olli Salli 2010-11-22 07:45:13 UTC
We currently tie the contact attribute lifetime to the contact's handle's lifetime. If the notion of a handle being freed goes away, how do we handle resource management for the contact attributes? They can't necessarily just be "recreated when needed" - that'd potentially need queries to the server.
Comment 10 Simon McVittie 2010-11-24 08:56:37 UTC
(In reply to comment #9)
> We currently tie the contact attribute lifetime to the contact's handle's
> lifetime.

Not in any CM I'm familiar with... TpDynamicHandleRepo has qdata on handles, but I don't think using it is necessarily wise.

In practice, I think CMs should only cache contact attributes for those contacts where we can expect to get change notification: that's some combination of the contacts on the contact list (if any) and the the contacts we're in a channel with (if any).

For any other contacts, Get()-style methods returning information would be pretty misleading anyway - we have no idea how stale it is. Request()-style methods are still useful, but it's OK for those to re-query the server.

I think Salut is on the right track here: it has SalutContact objects which encapsulate a contact and are refcounted. The telepathy-glib service-side could optionally grow a TpSvcContact or TpBaseContact like Will suggested, but I'm not sure whether that ought to be a base class or a GInterface - in CMs backed by a good enough library, we'd probably want the Telepathy contact object to be a subclass of WockyContact or whatever?
Comment 11 Sjoerd Simons 2010-11-29 07:36:18 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > We currently tie the contact attribute lifetime to the contact's handle's
> > lifetime.
>
> Not in any CM I'm familiar with... TpDynamicHandleRepo has qdata on handles,
> but I don't think using it is necessarily wise.

I agree with smcv here, more-over trusting clients to indicate a useful time to cache attributes in the CM seems bad. E.g. a different client might have requested several attributes and gone away. In that case the ``main'' client holding the handle doesn't indicate anything really..

A big part of a CM is smartly caching stuff indeed. Most meta-data about a contact is reasonably small, but for example avatar data is already not cached in the CM for a very long time, regardless about whether the handles are held or not. In the end holding a handle just guarantees stability of your string to handle normalisation, nothing else.
Comment 12 Mikhail Zabaluev 2010-11-29 08:27:12 UTC
Between speccing handles out of the final version of ContactSearch, and the new way to represent contacts in Telepathy 1.0 as outlined in comment #8, I have no objections left to letting handles live for the lifetime of the connection.
Comment 13 Simon McVittie 2010-12-01 05:25:23 UTC
OK, so conceptually everyone's fine with this. Could someone please review the specific changes in comment #7 so we can land this soon?
Comment 14 Will Thompson 2010-12-01 07:36:48 UTC
Spec:

+        <p>If Connection.HasImmortalHandles is true, which SHOULD always
+          be the case, this method does nothing and returns successfully.</p>

Linkify the property name, plz. Ditto other places. Also, please be consistent as to whether you call it "Connection.HII" or just "HII".

Otherwise fine.

tp-glib:

Functions like tp_handle_Ref() which are now documented thusly:

+ * Do nothing. This previously provided refcounting for handles, but handles
+ * now last as long as the connection.

should note which version this changed in.

Otherwise the tp-glib branch looks fine. I like branches that delete code.
Comment 15 Simon McVittie 2010-12-01 08:40:16 UTC
telepathy-glib branch smcv/leak-handles updated.

I've also added a branch at http://git.collabora.co.uk/?p=user/smcv/telepathy-glib-smcv.git;a=shortlog;h=refs/heads/immortality which discards most of the handle refcounting machinery on the client side (by the simple expedient of never releasing handles).
Comment 16 Simon McVittie 2010-12-01 08:54:06 UTC
(In reply to comment #14)
> Linkify the property name, plz. Ditto other places. Also, please be consistent
> as to whether you call it "Connection.HII" or just "HII".

Done (consistently using the bare name), + wording improvements.
Comment 17 Will Thompson 2010-12-01 09:25:02 UTC
(In reply to comment #15)
> telepathy-glib branch smcv/leak-handles updated.
> 
> I've also added a branch at
> http://git.collabora.co.uk/?p=user/smcv/telepathy-glib-smcv.git;a=shortlog;h=refs/heads/immortality
> which discards most of the handle refcounting machinery on the client side (by
> the simple expedient of never releasing handles).

I'd say ship it all. I have to say I haven't reviewed the deletion of the client-side particularly closely, but the key parts (calls to the CM) looked okay.

(In reply to comment #16)
> (In reply to comment #14)
> > Linkify the property name, plz. Ditto other places. Also, please be consistent
> > as to whether you call it "Connection.HII" or just "HII".
> 
> Done (consistently using the bare name), + wording improvements.

Also looks fine.
Comment 18 Simon McVittie 2010-12-01 10:22:33 UTC
Fixed in git for spec 0.21.6 and telepathy-glib 0.13.8.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.