Bug 62378

Summary: Offline contact caching
Product: Telepathy Reporter: Xavier Claessens <xclaesse>
Component: tp-glibAssignee: Telepathy bugs list <telepathy-bugs>
Status: RESOLVED MOVED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: normal    
Priority: medium CC: i.gnatenko.brain, nekohayo
Version: unspecifiedKeywords: patch
Hardware: Other   
OS: All   
URL: http://cgit.collabora.com/git/user/xclaesse/telepathy-glib.git/log/?h=offline-cache
Whiteboard:
i915 platform: i915 features:

Description Xavier Claessens 2013-03-15 16:20:25 UTC
Following my email[1], I think we should add helpers in telepathy-glib to handle the offline contact cache.

Goals:
1) Provide last known Nickname for TpNamesMixin (bug #14540) without requesting VCard on XMPP.
2) Provide last known ContactInfo without requesting VCard on XMPP.
3) Provide offline roster
4) Implement local roster for tp-sofiasip

Out of scope:
1) incremental roster download. This is probably too protocol-specific.

Proposed solution:

1) storing:

When CM is going offline, of course before freeing its data structures needed to reply to tp_contacts_mixin_get_contact_attributes(), it calls:

tp_base_contact_list_write_cache()
{
  contacts = tp_base_contact_list_dup_contacts (self);
  result = tp_contacts_mixin_get_contact_attributes (conn, all_interfaces_except_presence);
  for attributes in result:
    {
      id = tp_asv_get_string (attributes, TP_TOKEN_CONNECTION_CONTACT_ID);
      variant = asv_to_gvariant(attributes);
      filename = g_strdup_printf ("~/.cache/telepathy/rosters/%s/%s",
          something_relevant_to_identify_the_account(self),
          id);
      write_file (filename, g_variant_to_blob(variant));
    }
}

2) reading:

When CM receive the roster, or directly at startup for SIP:

GHashTable<contact-id, asv>
tp_base_contact_list_read_cache(Set<contact-id> roster_ids)
{
  return _read_roster_cache (something_relevant_to_identify_the_account(self), 
      roster_ids);
}

_read_roster_cache(something_relevant_to_identify_the_account, roster_ids)
{
  dir = g_strdup_printf ("~/.cache/telepathy/rosters/%s",
      something_relevant_to_identify_the_account);
  for file in dir:
    {
      id = g_strrstr(file, "/")+1;

      /* roster_ids is NULL when we just want to read them all */
      if (roster_ids != NULL && id not in roster_ids)
        {
          rmfile(file);
          continue;
        }

      reply[id] = g_variant_from_blob (file);
    }

  return reply;
}

The CM populates his internal data structures from that. Or maybe we want to avoid reading everything, and we should wait for a GetContactAttributes call, and read cache only if something is missing (like ContactInfo, or Nickname) and just for that contact. Probably even better like that.

3) Get an offline roster:

GPtrArray<TpOfflineContact>
tp_account_get_offline_roster()
{
  map = _read_roster_cache(something_relevant_to_identify_the_account(self),
      NULL);
  for attributes in map:
    {
      reply.add(tp_offline_contact_new(attributes));
    }
  return reply;
}

TpOfflineContact is a trivial object that keeps a ref on attributes asv, and return the asv's value for each getter function. API is similar to TpContact but immutable.


The only thing not really clear to me, is how we define something_relevant_to_identify_the_account(). I would use tp_account_get_path_suffix() but how can I construct that from the TpBaseConnection side?


Comments more than welcome!

[1] http://lists.freedesktop.org/archives/telepathy/2013-March/006384.html
Comment 1 Xavier Claessens 2013-03-16 15:44:13 UTC
Bah, actually implemented it:
http://cgit.collabora.com/git/user/xclaesse/telepathy-glib.git/log/?h=offline-cache

A few things that still need work:

1) cache filename is build with (cm_name, protocol_name, contact-id) tuple. So if I have 2 jabber accounts xclaesse1@example.com and xclaesse2@example.com and both have smcv@example.com, they will both write into the same cache file. And if Simon accepted only xclaesse1's subscription request, xclaesse2 could have less info. If it is xclaesse2's account would write into cache last then we have less info in the cache that we could. Suggestions to improve this?

2) file read/write is done syncronously with g_file_set/get_contents(). Maybe everything needs to be made async with GIO.

3) Data is serialized into the file using g_variant_get_data(). Is that format documented? Does Qt world have a parser for that? Would be preferable if tp-qt can use the same disk cache.
Comment 2 Xavier Claessens 2013-03-16 17:15:44 UTC
(In reply to comment #1)
> 1) cache filename is build with (cm_name, protocol_name, contact-id) tuple.
> So if I have 2 jabber accounts xclaesse1@example.com and
> xclaesse2@example.com and both have smcv@example.com, they will both write
> into the same cache file. And if Simon accepted only xclaesse1's
> subscription request, xclaesse2 could have less info. If it is xclaesse2's
> account would write into cache last then we have less info in the cache that
> we could. Suggestions to improve this?

I'll add the self contact's id into the path then. It can be retrieved using tp_account_get_normalized_name() from client-side and TpBaseConnection can of course easily inspect self handle. AFAIK that's enough to make it unambiguous, right?
Comment 3 Xavier Claessens 2013-03-16 22:01:53 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > 1) cache filename is build with (cm_name, protocol_name, contact-id) tuple.
> > So if I have 2 jabber accounts xclaesse1@example.com and
> > xclaesse2@example.com and both have smcv@example.com, they will both write
> > into the same cache file. And if Simon accepted only xclaesse1's
> > subscription request, xclaesse2 could have less info. If it is xclaesse2's
> > account would write into cache last then we have less info in the cache that
> > we could. Suggestions to improve this?
> 
> I'll add the self contact's id into the path then. It can be retrieved using
> tp_account_get_normalized_name() from client-side and TpBaseConnection can
> of course easily inspect self handle. AFAIK that's enough to make it
> unambiguous, right?

Did that in my branch now.
Comment 4 Simon McVittie 2013-03-18 11:20:36 UTC
(In reply to comment #2)
> I'll add the self contact's id into the path then. It can be retrieved using
> tp_account_get_normalized_name() from client-side and TpBaseConnection can
> of course easily inspect self handle. AFAIK that's enough to make it
> unambiguous, right?

It is for XMPP, unless people are doing particularly stupid things with non-federated XMPP servers.

It's ambiguous for IRC, though: idle/irc/alice/bob is not enough to know which chatnet Alice and Bob are on (and Bob on OFTC is not necessarily the same person as Bob on Freenode).

What is the intended scope of this cache? Protocols with persistent server-side contact lists?

One way to defeat this ambiguity would be to include the Account object path "tail" as a (special, reserved, built-in to MC) CM parameter, use that as part of the directory, and disable this interface if that information is not supplied.

Alternatively, a potential counter-proposal would be to say something like this:

Telepathy is an abstraction for IM servers/networks/protocols. Anything not present in the protocol, including faking a contact list for protocols that don't maintain one, should not be part of Telepathy, but part of something higher-level like Folks (which can also do contact linking and other useful stuff, so you probably want it anyway).

(In reply to comment #1)
> 3) Data is serialized into the file using g_variant_get_data(). Is that
> format documented? Does Qt world have a parser for that? Would be preferable
> if tp-qt can use the same disk cache.

GVariant is not a standardized or documented format. The documentation is the implementation. Also, a "bare" GVariant is native-endian for the architecture on which it was written, and will import differently on x86 vs. PowerPC: mis-importing is not detectable unless you deliberately include an endianness flag.

Using the D-Bus message encoding (GDBusMessage) would be one way to make it portable and endian-neutral, but to be honest I'm not particularly keen on mandating particular non-obvious on-disk encodings as part of the Telepathy spec, particularly if they're not just a cache/optimization.

(I consider the Avatars1 proposal - one file per avatar and one per MIME type - to be an "obvious" on-disk encoding, so it gets away with it.)
Comment 5 Xavier Claessens 2013-03-18 12:05:00 UTC
(In reply to comment #4)
> (In reply to comment #2)
> > I'll add the self contact's id into the path then. It can be retrieved using
> > tp_account_get_normalized_name() from client-side and TpBaseConnection can
> > of course easily inspect self handle. AFAIK that's enough to make it
> > unambiguous, right?
> 
> It is for XMPP, unless people are doing particularly stupid things with
> non-federated XMPP servers.

Ok.
 
> It's ambiguous for IRC, though: idle/irc/alice/bob is not enough to know
> which chatnet Alice and Bob are on (and Bob on OFTC is not necessarily the
> same person as Bob on Freenode).

IRC does not need offline caching, it makes no sense to have a roster there, AFAIK.

> What is the intended scope of this cache? Protocols with persistent
> server-side contact lists?

Goals are in the initial report.

> One way to defeat this ambiguity would be to include the Account object path
> "tail" as a (special, reserved, built-in to MC) CM parameter, use that as
> part of the directory, and disable this interface if that information is not
> supplied.

This could be a solution indeed.

> Alternatively, a potential counter-proposal would be to say something like
> this:
> 
> Telepathy is an abstraction for IM servers/networks/protocols. Anything not
> present in the protocol, including faking a contact list for protocols that
> don't maintain one, should not be part of Telepathy, but part of something
> higher-level like Folks (which can also do contact linking and other useful
> stuff, so you probably want it anyway).

Folks is a library, it is not a place to do such thing. Unless you want all processes to know everything about all contacts and write on disk at the same time. I believe folks need significant changes to be useful. Its performance make it simply not a solution to any problem.

I think the CM is the best place to do offline cache, because it's the only process that should know everything about all contacts.

> (In reply to comment #1)
> > 3) Data is serialized into the file using g_variant_get_data(). Is that
> > format documented? Does Qt world have a parser for that? Would be preferable
> > if tp-qt can use the same disk cache.
> 
> GVariant is not a standardized or documented format. The documentation is
> the implementation. Also, a "bare" GVariant is native-endian for the
> architecture on which it was written, and will import differently on x86 vs.
> PowerPC: mis-importing is not detectable unless you deliberately include an
> endianness flag.

Right, so it is good enough for tp-glib, and not if we want to interop with tp-qt.

otoh, one thing I like with using GVariant file format, is all processes needing to know about offline contacts can use a GMappedFile and GVariant won't do any copy of the data.
Comment 6 Xavier Claessens 2013-03-18 15:21:55 UTC
Callgrinded a modified version of the unit test that creates 500 contacts and store/load them from a single file.

1) most of the time is spent creating the 500 contacts, below I count only the part where it store and load back the cache, excluding the setup part.
2) _tp_asv_to_vardict() takes 60% of the time
3) tp_account_dup_offline_accounts() is only 10% the time, and 60% of it is GObject creation.
Comment 7 Xavier Claessens 2013-11-27 01:09:12 UTC
Worked on Gabble implementation, and I changed my mind on how to implement this. I create one file per contact now, and the CM is responsible to update the cache when information change (e.g. vcard fetched). This is because gabble does not keep vcard in memory forever so we cannot wait too long before writing it on disk, and we can't wait for TpBaseConnection to be disconnected to update the cache, because at that point it's already too late to request contact attributes.

I removed client-side TpOfflineContact API for now.

I'm still not entirely happy with the API, it really gives all the responsability to the CM and I hope to find a way to make it transparent. Also the code lives in TpBaseContactList but does not really use it... Better ideas are welcome.

It also doesn't remove contacts from the cache when they are not part of the roster anymore.

And finally, it does sync disk read/write which is badâ„¢. Writing the cache async is trivial change, but async read is more problematic because we have to fill contact attributes syncronously... not sure how to fix that.

Here is new API in tp-glib:
http://cgit.collabora.com/git/user/xclaesse/telepathy-glib.git/log/?h=offline-cache

And implementation in gabble:
http://cgit.collabora.com/git/user/xclaesse/telepathy-gabble.git/log/?h=offline-cache

With those 2 branch we fix an old bug that always annoyed me: offline xmpp contacts now have an avatar in empathy !
Comment 8 Xavier Claessens 2013-12-02 20:18:17 UTC
I changed my mind on the implementation (again).

CM-side cache is write-only. Added TpBaseContactListClass::cache_miss() virtual method called when a roster contact is missing from the offline cache, and the CM simply have to reply with tp_base_contact_list_cache_update() once the vcard has been fetched. TpBaseContactList also transparently takes care of removing cache of contacts not in the roster anymore.

Client-side cache is read-only. So I added an extra step after GetContact(list)Attributes to read the offline cache if CM omitted some info (avatar token and ContactInfo). That way TpContact will always have last-known value. This is completely transparent change to clients. Or maybe it could be enabled with a contact feature?

File operations are now async, so I think it's pretty much ready to get reviewed and merged.

The missing part is loading the cache when the TpAccount is offline. That will allow folks to have consistent contact list regardless of onliness.
Comment 9 GitLab Migration User 2019-12-03 20:41:23 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/telepathy/telepathy-glib/issues/110.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.