19605 – Gabble should cache client capabilities

Bug 19605 - Gabble should cache client capabilities

Summary: Gabble should cache client capabilities

Status:	RESOLVED FIXED

Alias:	None

Product:	Telepathy
Classification:	Unclassified
Component:	gabble (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Will Thompson
QA Contact:	Telepathy bugs list

URL:	http://git.collabora.co.uk/?p=user/wj...
Whiteboard:	review+
Keywords:	patch

Depends on:
Blocks:

Reported:	2009-01-16 08:12 UTC by Simon McVittie
Modified:	2010-05-21 04:31 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Simon McVittie 2009-01-16 08:12:29 UTC

At the moment signing in with Gabble generates unnecessary network traffic. Now that we have v1.5 hashed capabilities, we should be able to cache known disco responses in $XDG_CACHE_DIR.

Comment 1 Dafydd Harries 2010-01-07 18:24:09 UTC

An implementation for 0.8 is here:

http://git.collabora.co.uk/?p=user/daf/telepathy-gabble;a=shortlog;h=refs/heads/caps-cache-0.8

I'll port to 0.9 once the 0.8 code has some review.

Comment 2 Will Thompson 2010-01-11 09:38:47 UTC

91ced97: maybe use assertLength? Not a merge blocker.

f351897: does everything cope with singleton being NULL? (for debug messages after we leave the mainloop)

So... I'm a little uneasy that this adds a second capabilities cache on top of the per-connection one. I guess a certain amount of per-connection state is needed, and doing it this way (hooking in to the existing cache in two places, rather than tearing the existing cache apart) minimizes the changes on a stable branch, which is good.

+      /* XXX: Should we cache non-hashed caps? */

Yes. Google Talk and other popular clients still use non-hashed caps. AFAICT caching non-hashed caps is ~no more code?

+      /* XXX: issue warning here? */

Probably if they're different? But I guess we'd never know it was happening in practice.

If you passed a gchar ** to/from gabble_caps_cache_insert/_lookup you could use g_strjoinv ("\n", caps); and g_strsplit (value, "\n", 0); rather than writing your own split and join functions. It's easy enough to build gchar **s using GPtrArray, too.

+      GabblePresenceCapabilities caps =
+          caps_in_cache ? cached_caps : info->caps;
...
+              caps, info->per_channel_manager_caps, serial);

info->per_channel_manager_caps can't possibly contain a meaningful value here. I think this means that tube caps etc. won't be correctly restored from the cache. I think you need to add some code that pokes all the channel managers to this loop:

+      for (i = uris; i; i = i->next)
+        {
+          cached_caps |= capabilities_from_ns ((const gchar *) i->data);
+        }

So I think caps-persistent-cache.py should test some tubes caps.

Should we seed the per-connection cache when we get a hit from the disk cache? This would avoid repeatedly re-parsing the set of URLs.

+caps_cache_prepare (
...
+  gint ret = sqlite3_prepare_v2 (self->priv->db, sql, -1, stmt, NULL);
...
+  g_assert (stmt != NULL);

Seems a bit late to be asserting that. I think it should be g_assert (*stmt != NULL).

+  /* This emulates DELETE ... ORDER ... LIMIT because some Sqlites (e.g.
+   * Debian) ship without SQLITE_ENABLE_UPDATE_DELETE_LIMIT unabled.
+   */
+
+  if (!caps_cache_prepare (self,
+        "DELETE FROM capabilities WHERE oid IN ("
+        "  SELECT oid FROM capabilities"
+        "    ORDER BY timestamp ASC, oid ASC"
+        "    LIMIT ?)", &stmt))
+    return;

Looks a lot like you're using what you claim to be emulating.

Comment 3 Dafydd Harries 2010-01-12 17:33:48 UTC

(In reply to comment #2)
> 91ced97: maybe use assertLength? Not a merge blocker.

Patched it to throw a more meaningful exception.

> f351897: does everything cope with singleton being NULL? (for debug messages
> after we leave the mainloop)

Yeah, if the singleton is NULL, a new one will get created automatically.

> So... I'm a little uneasy that this adds a second capabilities cache on top of
> the per-connection one. I guess a certain amount of per-connection state is
> needed, and doing it this way (hooking in to the existing cache in two places,
> rather than tearing the existing cache apart) minimizes the changes on a stable
> branch, which is good.

Right. As we discussed on IRC, though, it's much more practical to rationalise this in 0.9 where the capabilities code is much simpler.

> +      /* XXX: Should we cache non-hashed caps? */
> 
> Yes. Google Talk and other popular clients still use non-hashed caps. AFAICT
> caching non-hashed caps is ~no more code?

Removed this comment.

> +      /* XXX: issue warning here? */
> 
> Probably if they're different? But I guess we'd never know it was happening in
> practice.

This comment was already removed in a later patch. I can add in a warning if you think it's worth it.

> If you passed a gchar ** to/from gabble_caps_cache_insert/_lookup you could use
> g_strjoinv ("\n", caps); and g_strsplit (value, "\n", 0); rather than writing
> your own split and join functions. It's easy enough to build gchar **s using
> GPtrArray, too.

This turned out to be a bit complicated, but I think the code is simpler for it.

> +      GabblePresenceCapabilities caps =
> +          caps_in_cache ? cached_caps : info->caps;
> ...
> +              caps, info->per_channel_manager_caps, serial);
> 
> info->per_channel_manager_caps can't possibly contain a meaningful value here.
> I think this means that tube caps etc. won't be correctly restored from the
> cache. I think you need to add some code that pokes all the channel managers to
> this loop:
> 
> +      for (i = uris; i; i = i->next)
> +        {
> +          cached_caps |= capabilities_from_ns ((const gchar *) i->data);
> +        }

Good catch. Fixing this presented opportunities for refactoring, meaning we scan the message once for capabilities instead of twice.

> So I think caps-persistent-cache.py should test some tubes caps.

Yeah, I added a tube cap to the test. However, I found some inconsistency about when ClientCapabilitiesChanged is emitted which I'd like to investigate, so I haven't pushed this patch yet.

> Should we seed the per-connection cache when we get a hit from the disk cache?
> This would avoid repeatedly re-parsing the set of URLs.

I don't think this is worth it. It might waste memory storing caps we'll never look at.

> +caps_cache_prepare (
> ...
> +  gint ret = sqlite3_prepare_v2 (self->priv->db, sql, -1, stmt, NULL);
> ...
> +  g_assert (stmt != NULL);
> 
> Seems a bit late to be asserting that. I think it should be g_assert (*stmt !=
> NULL).

Right.

> +  /* This emulates DELETE ... ORDER ... LIMIT because some Sqlites (e.g.
> +   * Debian) ship without SQLITE_ENABLE_UPDATE_DELETE_LIMIT unabled.
> +   */
> +
> +  if (!caps_cache_prepare (self,
> +        "DELETE FROM capabilities WHERE oid IN ("
> +        "  SELECT oid FROM capabilities"
> +        "    ORDER BY timestamp ASC, oid ASC"
> +        "    LIMIT ?)", &stmt))
> +    return;
> 
> Looks a lot like you're using what you claim to be emulating.

The ORDER and LIMIT clauses are on the SELECT subquery, not on the DELETE directly, which isn't allowed unless Sqlite is compiled with the aforementioned #define.

Comment 4 Dafydd Harries 2010-01-13 10:51:57 UTC

> Up to 53c6b3e:
>.
> len is unused in gabble_caps_cache_lookup

Ok.

> in caps_cache_insert: g_strjoinv?

Right.

> Comment about UPDATE_DELETE_LIMIT is still wrong.

How so?

> I think we should poke caps from the on-disk cache into the PresenceCache with capability_info_recvd() because lookups from the latter are much cheaper.

Maybe. I think the cache is going to grow to a few hundred k with the current
limit of 1000 entries. I don't think getting it from memory is that much 
faster than hitting the disk cache. Is the in-memory cache that much more 
likely to get swapped out than the disk cache is to get flushed?

At any rate, can we do this after merge?

> I also really think caching all caps uris is needed, because google. If we're really worried about them changing illegally, we could keep a cache hit count and expire them after n hits or something? But given that the xep has always said you should cache them (i believe) we can probably just cache them. Could ask jdev?

I agree, I just assumed the code did that. On closer inspection, we do cache
all URIs, it's just that we don't look up ones that don't have a hash. I'll 
fix this.

Comment 5 Will Thompson 2010-01-19 10:59:04 UTC

Looks good in its current state!

Comment 6 Will Thompson 2010-02-18 11:08:21 UTC

Review of the 0.9 port:

+  self->priv->cache = g_hash_table_new_full (
+      g_str_hash, g_str_equal, g_free, (GDestroyNotify) g_strfreev);

but

+  g_hash_table_insert (
+      self->priv->cache, g_strdup (node), gabble_capability_set_copy (caps));

So the free function for GabbleCapsCache::cache is wrong.

I think http://git.collabora.co.uk/?p=telepathy-gabble.git;a=commit;h=099bb4534 needs cherry-picking.

Comment 7 Simon McVittie 2010-02-24 08:20:45 UTC

(In reply to comment #6)
> So the free function for GabbleCapsCache::cache is wrong.

The offending code was removed by a subsequent patch, so it doesn't actually matter.

> I think http://git.collabora.co.uk/?p=telepathy-gabble.git;a=commit;h=099bb4534
> needs cherry-picking.

Done.

I also cherry-picked the regression test (but not the code changes) from 2ce3ea29b7 "restore per-channel-manager caps from the cache", to verify that those code changes are not needed in 0.9 (the test passes, so apparently they're not).

Comment 8 Senko Rasic 2010-03-09 06:26:30 UTC

Review up to 1c2bde:

In fd22d6:

* gabble_caps_cache_dup_share() dups the object if not NULL, but
  there's no weak pointer to set the singleton variable back to NULL
  after last unref.
  Elsewhere in the code you use "dup .. use .. unref" pattern, which means
  that unless ref was leaked before, you'll have singleton variable
  pointing to a destroyed object. That is, uless you explicitly
  take care of that somewhere in the initialisation, but I didn't
  see it anywhere.

A few SQLite related comments for 11713f:

* Since we don't care if the cache becomes corrupt due to power/system failure,
  we can safely turn off syncing and on-disk journal. This will lessen the disk
  usage while connecting (when presumably we have the highest number of writes
  to the database - since even a cache hit is a disk write for touching the
  entry). This can be done with PRAGMAs immediately after openning the db.
  
* In caps_cache_insert(), instead of ignoring the error of INSERT, we can
  use INSERT OR UPDATE ... which will, as a side-effect, save up-to-date
  caps for the node, and also touch the timestamp so it won't be reaped
  before its time.

Comment 9 Will Thompson 2010-03-26 09:16:46 UTC

Taking this bug to fix the review feedback.

Comment 10 Will Thompson 2010-03-26 11:16:00 UTC

(In reply to comment #8)
> Review up to 1c2bde:
> 
> In fd22d6:
> 
> * gabble_caps_cache_dup_share() dups the object if not NULL, but
>   there's no weak pointer to set the singleton variable back to NULL
>   after last unref.
>   Elsewhere in the code you use "dup .. use .. unref" pattern, which means
>   that unless ref was leaked before, you'll have singleton variable
>   pointing to a destroyed object. That is, uless you explicitly
>   take care of that somewhere in the initialisation, but I didn't
>   see it anywhere.

shared_cache owns a hard ref, not a weak ref. gabble_caps_cache_free_shared() drops that ref, and sets shared_cache to NULL; it's only called during CM finalization to make it possible to check that no other refs are leaked.

So, I don't believe this is a bug.

> A few SQLite related comments for 11713f:
> 
> * Since we don't care if the cache becomes corrupt due to power/system failure,
>   we can safely turn off syncing and on-disk journal. This will lessen the disk
>   usage while connecting (when presumably we have the highest number of writes
>   to the database - since even a cache hit is a disk write for touching the
>   entry). This can be done with PRAGMAs immediately after openning the db.

Okay, I've added a couple of pragmata that look right.

> * In caps_cache_insert(), instead of ignoring the error of INSERT, we can
>   use INSERT OR UPDATE ... which will, as a side-effect, save up-to-date
>   caps for the node, and also touch the timestamp so it won't be reaped
>   before its time.

We shouldn't get that far if the entry is already in the hash; that code path only occurs when we didn't get a hit from gabble_caps_cache_lookup(), and thus discoed the peer. gabble_caps_cache_lookup() does touch the row (UPDATE-ing it, setting timestamp = now) when it gets a hit; maybe there's a way to combine that with the SELECT?

I've also added a couple more patches: one fixes the part of the test which tries to get a particular entry GCed to actually guarantee that that happens (which did involve a whiteboard!), and another removes a useless call to sqlite3_column_bytes(). Re-review anyone?

Comment 11 Simon McVittie 2010-04-05 08:37:11 UTC

(In reply to comment #10)
> So, I don't believe this is a bug.

Me neither.

> > A few SQLite related comments for 11713f:
> 
> Okay, I've added a couple of pragmata that look right.

Looks good to me.

> > * In caps_cache_insert(), instead of ignoring the error of INSERT, we can
> >   use INSERT OR UPDATE ... which will, as a side-effect, save up-to-date
> >   caps for the node, and also touch the timestamp so it won't be reaped
> >   before its time.
> 
> We shouldn't get that far if the entry is already in the hash; that code path
> only occurs when we didn't get a hit from gabble_caps_cache_lookup(), and thus
> discoed the peer. gabble_caps_cache_lookup() does touch the row (UPDATE-ing it,
> setting timestamp = now) when it gets a hit; maybe there's a way to combine
> that with the SELECT?

I have no opinion on this, but it doesn't sound like something that's incorrect. Senko, do you want to veto this or is it OK to merge as-is?

> I've also added a couple more patches: one fixes the part of the test which
> tries to get a particular entry GCed to actually guarantee that that happens
> (which did involve a whiteboard!), and another removes a useless call to
> sqlite3_column_bytes(). Re-review anyone?

These patches look good to me.

Comment 12 Simon McVittie 2010-04-12 08:00:08 UTC

Since Senko hasn't responded for a week, I'll call this review+ from me.

Comment 13 Senko Rasic 2010-04-12 08:19:47 UTC

(In reply to comment #10)
> Okay, I've added a couple of pragmata that look right.

Looks fine to me.

> We shouldn't get that far if the entry is already in the hash; that code path
> only occurs when we didn't get a hit from gabble_caps_cache_lookup(), and thus
> discoed the peer. gabble_caps_cache_lookup() does touch the row (UPDATE-ing it,
> setting timestamp = now) when it gets a hit; maybe there's a way to combine
> that with the SELECT?

Okay, makes sense. AFAIK there's no way of doing select+touch. It'd be possible to combine the two in a transaction, but that'd duplicate the functionality of _touch(), and the speedup (if any) would be marginal, I don't think it's worth the effort.

> I've also added a couple more patches: one fixes the part of the test which
> tries to get a particular entry GCed to actually guarantee that that happens
> (which did involve a whiteboard!), and another removes a useless call to
> sqlite3_column_bytes(). Re-review anyone?

Looks good to me, merge away.

Comment 14 Will Thompson 2010-05-14 12:11:32 UTC

I have a branch merging this which is blocking on a review of http://git.collabora.co.uk/?p=user/wjt/telepathy-gabble-wjt.git;a=shortlog;h=refs/heads/presence-cache-unleak

Interestingly this branch conflicts with master, but not with presence-cache-unleak. :D

Comment 15 Simon McVittie 2010-05-17 07:57:19 UTC

presence-cache-unleak++ also. Please, finally merge this :-)

Comment 16 Will Thompson 2010-05-21 04:31:44 UTC

how capable!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.