Bug 20035 - Avatar cache reference implementation
Summary: Avatar cache reference implementation
Status: RESOLVED FIXED
Alias: None
Product: Telepathy
Classification: Unclassified
Component: tp-glib (show other bugs)
Version: unspecified
Hardware: Other All
: low enhancement
Assignee: Telepathy bugs list
QA Contact: Telepathy bugs list
URL: http://git.collabora.co.uk/?p=user/xc...
Whiteboard:
Keywords: patch
Depends on:
Blocks: 25512
  Show dependency treegraph
 
Reported: 2009-02-10 04:10 UTC by Simon McVittie
Modified: 2010-05-19 03:40 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Simon McVittie 2009-02-10 04:10:34 UTC
+++ This bug was initially created as a clone of Bug #20034 +++

telepathy-qt4 and telepathy-glib should have a reference implementation of an avatar cache. For maximum synergy, they should share an on-disk format.
Comment 1 Xavier Claessens 2009-02-16 04:37:47 UTC
When I first discussed that with smcv, he said the avatar cache should be done in a dedicated object so it is optional, because some clients could want his own implementation.

I suggest to add an AVATAR_CACHE feature on TpContact, so clients that want another implementation can still not request that feature on contact objects. I makes API more consistent IMO. So TpContact could have a "avatar-filename" property with the path to the on-disk cache.
Comment 2 Xavier Claessens 2009-10-08 07:36:48 UTC
Here is a first draft for a patch. I didn't test it yet. I'm interested in feedback for the proposed API.

http://git.collabora.co.uk/?p=user/xclaesse/telepathy-glib.git;a=shortlog;h=refs/heads/avatar-data
Comment 3 Xavier Claessens 2009-10-09 04:16:14 UTC
I started porting empathy to the new proposed API. Here is the branch:
http://git.collabora.co.uk/?p=user/xclaesse/empathy.git;a=shortlog;h=refs/heads/avatar

It is still not complete, but it's a proof that tp-glib code works :)
Comment 4 Danielle Madeley 2009-11-26 16:15:35 UTC
So, just to throw my thoughts in:

I think the API works, except for the TpContact::avatar-retrieved signal. This returns the avatar data in a highly unsuitable way, requiring the use of a GdkPixbufLoader and about 5 lines of code to load a GdkPixbuf from the contents.

The signal should instead pass a GFile for the file on disk, or a GInputStream so that gdk_pixbuf_new_from_file, or gdk_pixbuf_new_from_stream can be used. This will make the data a lot easier to consume.

Arguably, since things like Adium themes in Empathy require the avatar as a filename, we should expose it as a filename here.
Comment 5 Guillaume Desmottes 2009-12-08 04:00:53 UTC
This blocks this Empathy bug: https://bugzilla.gnome.org/show_bug.cgi?id=579812
Comment 6 Xavier Claessens 2010-04-19 09:23:28 UTC
I agree with Danielle, but here are my concern:

In the case we retrieved the avatar from CM, we have already the data in memory, so I don't want to return only a filename, otherwise it will read from disk again. I could then create a GMemoryInputStream but that means I have a dup the data in memory because the GArray is feed when returning from AvatarRetrieved.

So there are various solutions:
1) The solution I implemented, return a pointer to data given in AvatarRetrieved callback directly, so no need to dup memory or re-read from disk.

2) Return a GInputStream, that would be GFileInputStream if coming from cache, or GMemoryInputStream if coming from AvatarRetrieved but then the memory will be dupped, but won't be re-read from disk at least.

3) Return a GFile in all cases, so if coming from AvatarRetrieved the data is first written on disk, then a GFile to the cache is returned so it will be re-read from disk.

4) Combinaison from solutions above, like returning a data pointer and GFile/GInputMemory.

What are your opinions?

Note: I updated my path for tp-glib master: http://git.collabora.co.uk/?p=user/xclaesse/telepathy-glib.git;a=shortlog;h=refs/heads/avatar-data
Comment 7 Xavier Claessens 2010-04-20 02:13:16 UTC
Extra note for solution 3: If I return the avatar as a GFile in the "avatar-retrieved" signal, that means that if there are many listeners to that signal, they will all re-read the data from file...
Comment 8 Xavier Claessens 2010-04-20 04:29:15 UTC
Extra note for solution 2: I don't think we can call gdk_pixbuf_new_from_stream() many times on the same GInputStream. I guess the data in the input stream are consumed the first time it's read.

Extra note for solution 1: For adium theme all we care is the filename and webkit will load the image itself. So in that case, the read from cache would be useless, and it would be more efficient to only return filename without reading the data.

So maybe we could have both a signal "avatar-retrieved" and a property "avatar-filename". tp_connection_request_avatars() would do:
1) check if the current token is found in cache

1a) found in cache: "notify::avatar-filename" is emitted, if there are listeners for "avatar-retrieved" signal (using g_signal_has_handler_pending) then as extra step the file is read and "avatar-retrieved" signal is emitted once data is in memory.

1b) not found in cache: avatar is requested from CM. When CM emits "AvatarRetrieved" then TpContact::avatar-retrieved is emitted with the data in memory (no need of using g_signal_has_handler_pending since it is cheap to just pass the data pointer) and data is stored in cache. Once the write in cache is done, "notify::avatar-filename" is emitted.

2) The avatar filename could be kept inside TpContact, and could be returned any time using tp_contact_get_avatar_filename() but is reset to NULL each time the token is modified. If avatar-filename is NULL then the user knows he has to first call tp_connection_request_avatars(). If it is known to have no avatar then filename is "" (to be consistent with "avatar-token").

Does that looks good now?

Also I'm wondering if we should change tp_connection_request_avatars() to tp_contact_request_avatar() since requests are aggregated anyway internally. Or we could have both because the former is convenient together with tp_connection_get_contacts_by_id/handle() when fetching the set of contacts the first time, and the second is convenient when getting "notify::avatar-token" for a single contact.
Comment 9 Simon McVittie 2010-04-20 06:46:54 UTC
(In reply to comment #8)
> 1b) not found in cache: avatar is requested from CM. When CM emits
> "AvatarRetrieved" then TpContact::avatar-retrieved is emitted with the data in
> memory (no need of using g_signal_has_handler_pending since it is cheap to just
> pass the data pointer)

Don't signals copy the data anyway, if you're using boxed types correctly?

> 2) The avatar filename could be kept inside TpContact, and could be returned
> any time using tp_contact_get_avatar_filename() but is reset to NULL each time
> the token is modified. If avatar-filename is NULL then the user knows he has to
> first call tp_connection_request_avatars(). If it is known to have no avatar
> then filename is "" (to be consistent with "avatar-token").

The cache should also store the MIME type from the CM, and this accessor should return it. CMs give you the information, please don't just throw it away :-)

> In the case we retrieved the avatar from CM, we have already the data in
> memory, so I don't want to return only a filename, otherwise it will read from
> disk again.

On a competent OS (e.g. Linux), is reading from a file you just wrote actually much more expensive than memcpy?

> I could then create a GMemoryInputStream but that means I have a
> dup the data in memory because the GArray is feed when returning from
> AvatarRetrieved.

Don't you effectively have to duplicate it (at least once) regardless, either by copying the GArray or by saving/loading a file?
Comment 10 Xavier Claessens 2010-04-20 07:09:49 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > 1b) not found in cache: avatar is requested from CM. When CM emits
> > "AvatarRetrieved" then TpContact::avatar-retrieved is emitted with the data in
> > memory (no need of using g_signal_has_handler_pending since it is cheap to just
> > pass the data pointer)
> 
> Don't signals copy the data anyway, if you're using boxed types correctly?

Not if you give a G_TYPE_POINTER

> > 2) The avatar filename could be kept inside TpContact, and could be returned
> > any time using tp_contact_get_avatar_filename() but is reset to NULL each time
> > the token is modified. If avatar-filename is NULL then the user knows he has to
> > first call tp_connection_request_avatars(). If it is known to have no avatar
> > then filename is "" (to be consistent with "avatar-token").
> 
> The cache should also store the MIME type from the CM, and this accessor should
> return it. CMs give you the information, please don't just throw it away :-)

Yep, my implementation also cache mime and return it. That's already done ;)

> > In the case we retrieved the avatar from CM, we have already the data in
> > memory, so I don't want to return only a filename, otherwise it will read from
> > disk again.
> 
> On a competent OS (e.g. Linux), is reading from a file you just wrote actually
> much more expensive than memcpy?

Maybe I'm just trying to do exessive optimisations... As avatar data is something that could be big and we can have a lot of contacts, I'm trying as much as possible to just pass the data pointer instead of alloc new memory and copy. Feel free to object and tell it's fine to only give a filename and listeners will just read again from file.

> > I could then create a GMemoryInputStream but that means I have a
> > dup the data in memory because the GArray is feed when returning from
> > AvatarRetrieved.
> 
> Don't you effectively have to duplicate it (at least once) regardless, either
> by copying the GArray or by saving/loading a file?

g_memory_input_stream_new_from_data() takes ownership of the data, it does not dup it. So if we use it with sync calls inside AvatarRetrieved callback then we don't have to copy the data... but that's insane.
Comment 11 Xavier Claessens 2010-05-06 10:03:37 UTC
Ok, so here is another way for doing this:
http://git.collabora.co.uk/?p=user/xclaesse/telepathy-glib.git;a=shortlog;h=refs/heads/avatar-data2

it adds an AVATAR_DATA feature on TpContact. When that feature is set, "avatar-filename" and "avatar-mimetype" properties will be available. When the token change, either the avatar is already in cache and "notify::avatar-filename/mimetype" are directly emitted, or it's not yet in cache and then it is fetched from CM. Once CM gives the data, it's written in cache and properties are updated.

Like that a client can create contacts with the AVATAR_DATA feature, and just wait for "notify::avatar-filename" signal to be emitted to update its UI.

We could add more helper functions, to get the data as input stream, or GArray, or wathever, but at least this part should be fine, since it's purely the cache implementation with minimal API to access it.
Comment 12 Simon McVittie 2010-05-06 10:33:52 UTC
If it's just the one patch, this looks pretty good. I haven't done a thorough review of the refcounting and stuff.

Would a GFile be a better representation than a string? (I don't know... comments welcome.)

> +    /* AvatarRetrieved signal is refcounted because we want to keep it
> +     * connected only when there are contacts waiting for their avatar data */

This seems like overkill to me. The expensive thing is getting the avatar data from the server to the CM; once we've received it, we might as well do something with it.

Note that we can sometimes retrieve avatars "spontaneously" - whenever an XMPP contact's ContactInfo or nickname is downloaded, we get the avatar too - so perhaps it would be best to connect to AvatarRetrieved in any client that cares about avatar tokens?

I'm a little alarmed that there's no expiry on the avatar cache; avatars stay forever. However, sensible expiry would probably have to be based on keeping a record of last access times (or letting the filesystem do it for us, but that doesn't work with noatime).

> +      "Avatar MimeType",

"Avatar MIME type"
Comment 13 Xavier Claessens 2010-05-06 13:32:04 UTC
(In reply to comment #12)
> If it's just the one patch, this looks pretty good. I haven't done a thorough
> review of the refcounting and stuff.
> 
> Would a GFile be a better representation than a string? (I don't know...
> comments welcome.)

I think a GFile is better, changed my branch to replace the filename with a GFile.

> > +    /* AvatarRetrieved signal is refcounted because we want to keep it
> > +     * connected only when there are contacts waiting for their avatar data */
> 
> This seems like overkill to me. The expensive thing is getting the avatar data
> from the server to the CM; once we've received it, we might as well do
> something with it.

Right, I removed that, making the code simpler.

> Note that we can sometimes retrieve avatars "spontaneously" - whenever an XMPP
> contact's ContactInfo or nickname is downloaded, we get the avatar too - so
> perhaps it would be best to connect to AvatarRetrieved in any client that cares
> about avatar tokens?

The code now support opportunistic cache write, even if we don't have a TpContact for it. Just have to add a call to contacts_bind_to_avatar_retrieved() if wanted. I'm not sure we want this though, in real use cases, empathy will request avatar data anyway, and tp-logger wants avatar-token to write in logs, but doesn't need avatar data. So probably better not have too much processes writing the same avatar in cache at the same time (that's why we are using atomic g_file_set_contents).

> I'm a little alarmed that there's no expiry on the avatar cache; avatars stay
> forever. However, sensible expiry would probably have to be based on keeping a
> record of last access times (or letting the filesystem do it for us, but that
> doesn't work with noatime).

We are writing inside ~/.cache. I think any device with disk contraints should have a way to purge the whole ~/.cache, or an UI to ask used which cache to drop, etc... On PC I don't think it will ever take a significate % of you /home, and it's funny to be able to see very old avatar of some contacts, sometimes...

> > +      "Avatar MimeType",
> 
> "Avatar MIME type"

fixed.
Comment 14 Xavier Claessens 2010-05-07 01:17:08 UTC
I made an empathy branch using this new TpContact API, and updated the tp-glib branch as well, fixing bugs found when testing. All seems to work pretty well. Time for a serious review I think!


Empathy branch (last patch): http://git.collabora.co.uk/?p=user/xclaesse/empathy.git;a=shortlog;h=refs/heads/avatar


tp-glib branch (1 patch): http://git.collabora.co.uk/?p=user/xclaesse/telepathy-glib.git;a=shortlog;h=refs/heads/avatar-data2
Comment 15 Guillaume Desmottes 2010-05-07 06:13:01 UTC
Does your branch pass "make check" with --enable-gtk-doc ?

 * Returns: the same file as the #TpContact:avatar-file property
"the same #GFile"

You should use the same phrasing when documenting the lifetime of the returned file ("This remains valid until the main loop...").

  dir = g_build_filename (g_get_user_cache_dir (),
      "telepathy", "avatars", cm_name, protocol, NULL);
Wouldn't it be better to use the account rather than the connection to build this directory?

Log an error if mkdir failed?

_build_filename would be a better name IMHO as it's doesn't get an existing filename.

  DEBUG ("Contact#%u avatar stored in cache: %s, %s", handle, filename, mime_type);
This line is too long.

You should check the return value of g_file_set_contents and log an error if it fails.

contact_maybe_request_avatar_data
this check looks weird: self->priv->avatar_token[0] == '\0'
Why isn't NULL in that case?

+          g_warning ("Unknown feature %d", features[i]);
Use WARNING
Also, you changed the semantic as those function doesn't return any more in such case.
I think it would be better to add a function after the old code adding more features if needed.


No having tests makes me cry, especially as TpContact is pretty well tested so far. I let smcv decide if that's a merge blocker or not.
Comment 16 Xavier Claessens 2010-05-07 07:05:24 UTC
(In reply to comment #15)
> Does your branch pass "make check" with --enable-gtk-doc ?

Yes.

>  * Returns: the same file as the #TpContact:avatar-file property
> "the same #GFile"

Fixed.

> You should use the same phrasing when documenting the lifetime of the returned
> file ("This remains valid until the main loop...").

Fixed.

>   dir = g_build_filename (g_get_user_cache_dir (),
>       "telepathy", "avatars", cm_name, protocol, NULL);
> Wouldn't it be better to use the account rather than the connection to build
> this directory?

TpContact doesn't know about account, it only has a TpConnection.

> Log an error if mkdir failed?

Fixed

> _build_filename would be a better name IMHO as it's doesn't get an existing
> filename.

Fixed

>   DEBUG ("Contact#%u avatar stored in cache: %s, %s", handle, filename,
> mime_type);
> This line is too long.

Fixed

> You should check the return value of g_file_set_contents and log an error if it
> fails.

Fixed

> contact_maybe_request_avatar_data
> this check looks weird: self->priv->avatar_token[0] == '\0'
> Why isn't NULL in that case?

For the avatar token, NULL means the CM doesn't know the token yet, waiting for the server. "" means the CM knows that there is no avatar.

> +          g_warning ("Unknown feature %d", features[i]);
> Use WARNING

Fixed.

> Also, you changed the semantic as those function doesn't return any more in
> such case.

I don't think it's bad to continue with the valid features requested, should be fine.

> I think it would be better to add a function after the old code adding more
> features if needed.

What do you mean?

> No having tests makes me cry, especially as TpContact is pretty well tested so
> far. I let smcv decide if that's a merge blocker or not.

I'll see if it's easy to write some.
Comment 17 Simon McVittie 2010-05-10 02:48:31 UTC
(In reply to comment #16)
> >   dir = g_build_filename (g_get_user_cache_dir (),
> >       "telepathy", "avatars", cm_name, protocol, NULL);
> > Wouldn't it be better to use the account rather than the connection to build
> > this directory?
> 
> TpContact doesn't know about account, it only has a TpConnection.

Also, telepathy-spec guarantees that the tuple (CM name, protocol, token) is unique (this can actually be useful sometimes - I have, byte-for-byte, the same avatar on several XMPP accounts, and using this tuple means my friends only cache it once).
Comment 18 Guillaume Desmottes 2010-05-10 07:47:31 UTC
(In reply to comment #16)
> >   dir = g_build_filename (g_get_user_cache_dir (),
> >       "telepathy", "avatars", cm_name, protocol, NULL);
> > Wouldn't it be better to use the account rather than the connection to build
> > this directory?
> 
> TpContact doesn't know about account, it only has a TpConnection.

Right.
Smcv: what would you suggest here?

> > Also, you changed the semantic as those function doesn't return any more in
> > such case.
> 
> I don't think it's bad to continue with the valid features requested, should be
> fine.

Not sure what's our policy there.

> > I think it would be better to add a function after the old code adding more
> > features if needed.
> 
> What do you mean?

keeping the current code checking feature and return_if_fail if needed and add the extra features after.

> > No having tests makes me cry, especially as TpContact is pretty well tested so
> > far. I let smcv decide if that's a merge blocker or not.
> 
> I'll see if it's easy to write some.

Shouldn't be too hard we already have fake CM and stuffs. Just mimic what's done in the current contact tests.
Comment 19 Xavier Claessens 2010-05-11 12:05:00 UTC
I added unit tests to my branch.
Comment 20 Guillaume Desmottes 2010-05-17 06:20:04 UTC
prepare_avatar_requirements_cb

Shouldn't you use g_assert_no_error rather than ignoring errors?

We stopped using MYASSERT in new tests. Jut use g_assert, g_assert_cmpuint, etc.


create_contact_with_fake_avatar (ContactsConnection *service_conn,
                                 TpConnection *client_conn,
                                 const gchar *id)
args should be 4 spaces aligned


  MYASSERT_SAME_STRING (tp_contact_get_avatar_mimetype (contact), avatar_mimetype);
this line is too long.

  MYASSERT (avatar_retrieved_called == TRUE, "");
juse use "g_assert (avatar_retrieved_called)"

  g_setenv ("XDG_CACHE_HOME", "/tmp", TRUE);
use g_get_tmp_dir()
Shouldn't we create a subdirectory in /tmp to be sure to not have name clashes if files already exist?

+avatar_data_new (GArray *data, const gchar *mimetype, const gchar *token)

+void contacts_connection_change_avatar_data (ContactsConnection *self,

+my_request_avatars (TpSvcConnectionInterfaceAvatars *avatars,
+                    const GArray *contacts,
style is wrong

Did you valgrind the test?
Comment 21 Xavier Claessens 2010-05-17 09:20:38 UTC
(In reply to comment #20)
> prepare_avatar_requirements_cb
> 
> Shouldn't you use g_assert_no_error rather than ignoring errors?

It is done when the main_loop returns, that's how it's done in other tests too.

> We stopped using MYASSERT in new tests. Jut use g_assert, g_assert_cmpuint,
> etc.

Ok, fixed.

> create_contact_with_fake_avatar (ContactsConnection *service_conn,
>                                  TpConnection *client_conn,
>                                  const gchar *id)
> args should be 4 spaces aligned

fixed

>   MYASSERT_SAME_STRING (tp_contact_get_avatar_mimetype (contact),
> avatar_mimetype);
> this line is too long.

fixed

>   MYASSERT (avatar_retrieved_called == TRUE, "");
> juse use "g_assert (avatar_retrieved_called)"

fixed

>   g_setenv ("XDG_CACHE_HOME", "/tmp", TRUE);
> use g_get_tmp_dir()
> Shouldn't we create a subdirectory in /tmp to be sure to not have name clashes
> if files already exist?

fixed

> +avatar_data_new (GArray *data, const gchar *mimetype, const gchar *token)
> 
> +void contacts_connection_change_avatar_data (ContactsConnection *self,
> 
> +my_request_avatars (TpSvcConnectionInterfaceAvatars *avatars,
> +                    const GArray *contacts,
> style is wrong

fixed

> Did you valgrind the test?

I have no idea how you can make any sence out of the spammed logs... I don't see anything really bad, but we are missing so much suppressions...
Comment 22 Guillaume Desmottes 2010-05-18 03:26:09 UTC
+/* Stolen from telepathy-haze */
You should add copyright/licence info regarding this code.

You can use "make -C tests/dbus check-valgrind TESTS=test-contacts" to run a single test with valgrind. Some tests have all the needed false-positive suppressions; don't know if that's the case for this one.

You should check with smcv for his advice regarding controversial bits (Comment #18 basically).
Comment 23 Xavier Claessens 2010-05-18 03:40:01 UTC
(In reply to comment #22)
> +/* Stolen from telepathy-haze */
> You should add copyright/licence info regarding this code.

Amended the last commit to include that.

> You can use "make -C tests/dbus check-valgrind TESTS=test-contacts" to run a
> single test with valgrind. Some tests have all the needed false-positive
> suppressions; don't know if that's the case for this one.
> 
> You should check with smcv for his advice regarding controversial bits (Comment
> #18 basically).

That's what I did, it is missing tones of suppression rules... but didn't see anything leaking.
Comment 24 Simon McVittie 2010-05-18 06:25:46 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > >   dir = g_build_filename (g_get_user_cache_dir (),
> > >       "telepathy", "avatars", cm_name, protocol, NULL);
> > > Wouldn't it be better to use the account rather than the connection to build
> > > this directory?
> > 
> > TpContact doesn't know about account, it only has a TpConnection.
> 
> Right.
> Smcv: what would you suggest here?

I think Zdra is right - the CM name, protocol and token make a suitable tuple to key the cache with.
Comment 25 Simon McVittie 2010-05-18 06:52:46 UTC
I do think commit 9d44a19e51eae78b52117ee8538a91a23565b49c ("Use g_return_if_fail() in case a feature is unknown") should be included when this branch is merged.

I happened to notice that the regression test uses a directory in g_get_tmpdir() (i.e. in practice, /tmp) without checking whether it already exists. Please do handle this, even if the handling is as simple as trying g_mkdir (that_directory, 0700) and aborting if it fails (with EEXIST or otherwise).

I haven't reviewed the implementation in detail, just looked at the API and docs. I'd like one (cosmetic) API change and some documentation tweaks before it's merged, but there's nothing else I want to veto.

> +tp_contact_get_avatar_mimetype
> +    gchar *avatar_mimetype;
> +tp_contact_get_avatar_mimetype (TpContact *self)
> +    case PROP_AVATAR_MIMETYPE:
[etc.]

I'd prefer with a "space" (i.e. "_" or "-"), at least where it appears in API - avatar_mime_type, etc.

> + * Returns: the same #Gfile as the #TpContact:avatar-file property

...the same #GFile as... (and please grep for any other occurrences)

> + * Returns: the same token as the #TpContact:avatar-mimetype property

...the same MIME type as the...

> +   * When #TpContact:avatar-token change, this property is not updated directly,

From the not-en_BE department: s/change/changes/ and possibly s/directly/immediately/

> +   * but will be once the new avatar data is retrieved and stored in cache.

I'd phrase this as "but will be updated when the new...", which I think is a bit clearer.

> +   * implicitely set %TP_CONTACT_FEATURE_AVATAR_TOKEN.

implicitly (and please grep for any other occurrences)
Comment 26 Xavier Claessens 2010-05-18 07:42:12 UTC
All fixed in the branch.
Comment 27 Xavier Claessens 2010-05-19 03:40:11 UTC
merged in telepaht-glib master.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.