Summary: | Do on-disk avatar cache in CM | ||
---|---|---|---|
Product: | Telepathy | Reporter: | Xavier Claessens <xclaesse> |
Component: | tp-spec | Assignee: | Telepathy bugs list <telepathy-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Telepathy bugs list <telepathy-bugs> |
Severity: | normal | ||
Priority: | medium | Keywords: | patch |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
URL: | http://cgit.collabora.com/git/user/xclaesse/telepathy-spec.git/log/?h=next-avatar-uri | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 31668, 50082 |
Description
Xavier Claessens
2011-01-24 07:13:34 UTC
As motivation, it turns out tp-qt4's avatar cache implementation does not use atomic-write at all, and qt4 does not have any build-in method for this (like g_file_set_contents). So correct implementation in tpqt4 gets much harder, and probably doing this in CM is the way to go. (In reply to comment #1) > As motivation, it turns out tp-qt4's avatar cache implementation does not use > atomic-write at all, and qt4 does not have any build-in method for this (like > g_file_set_contents). > > So correct implementation in tpqt4 gets much harder, and probably doing this in > CM is the way to go. Would be nice to keep the ability to read avatars from cache without having to launch the CMs (from logger point of view). Have spoken about this on IRC, the plan is: 1) Redefine the Avatar iface in 'next' branch because keeping backward compat is difficult here. 2) Avatar token does not need to be exposed directly to clients, since it's only useful for caching. But CM should make sure string comparison of the URI is enough to know the avatar image has change. I suggest just reuse the existing cache from tp-glib format. 3) URI is preferred over filename, even if we are using only file:// because path are "ay" on dbus and encoding is plateform specific, etc... g_file_new_for_path() and g_file_get_uri() does the right thing™ for us. 4) We can use the client-interest mechanism to tell CM we are interested by avatars, and the CM should request the data if not found in its cache. Like that clients does not have to request them manually. 5) Still a RequestAvatars method is needed to force the fetch; for example in the case of offline XMPP contacts where we don't have the token, some clients may want to force the fetch of avatar the first time the contact is seen and then keep that in its database (N9(00) use case). I updated my spec branch I had for a while: http://cgit.collabora.com/git/user/xclaesse/telepathy-spec.git/log/?h=next-avatar-uri One problem with that spec branch is that CM can't notify when it knows the avatar changed but does not know the data. I think I should add a AvatarNeedsRequest signal to tell clients that are interested to call RequestAvatars. Also missing in that branch is the client-interest part, that should bypass the AvatarNeedsRequest step and make the request immediately. + <tp:contact-attribute name="AvatarURI" type="s" + tp:name-for-bindings="Avatar_URI"> Should be called avatar-uri. The documentation should say that: * it must be a file URI starting with file:/// as defined by RFC 1738 * it points into a cache maintained by the connection manager * whenever the content of the avatar changes, the URI must change It should have rationale and probably also examples. Rationale for file:///: if we allowed other URI schemes (such as HTTP) we'd have to specify which schemes clients were expected to support, and for decent performance those clients would have to do their own caching anyway. Rationale for not just using filenames: on Unix systems, local filenames are arbitrary byte sequences which are not necessarily UTF-8, or even any consistent encoding at all. Example 1: on XMPP with XEP-0084 (PEP avatars) or XEP-0153 (vCard avatars), avatars are announced by their SHA-1, so CM implementations could base their filenames in the cache on that SHA-1. Similarly, in AIM/ICQ, avatars are announced by their MD5. Example 2: in protocols where a change to a contact's avatar is announced by a "last updated" timestamp, CM implementations could base their filenames in the cache on the pair (contact ID, timestamp). (I believe at least one protocol is like Example 2, but I can't remember which one...) Eliminating avatar tokens might break MC's ability to set avatars nicely. We need to be able to distinguish between these situations: * On XMPP or something else with server-stored avatars * Telepathy online, set avatar ":-)" * Telepathy offline * Pidgin online, set avatar ":-(" * Pidgin offline * Telepathy online * Telepathy should leave the avatar set to ":-(", and store it in the MC account for the future, overwriting ":-)" and * On XMPP or something else with server-stored avatars * Telepathy online * Telepathy offline * Pidgin online, set avatar ":-(" * Pidgin offline * Telepathy user sets avatar ":-)" * Telepathy online * Telepathy should set the avatar to ":-)", overwriting what Pidgin set and * On local-xmpp or something else without server-stored avatars * Telepathy user sets avatar ":-)" * Telepathy online * Telepathy should set the avatar to ":-)" * other clients don't matter since the avatar doesn't persist anyway (In reply to comment #6) > Eliminating avatar tokens might break MC's ability to set avatars nicely. It doesn't, if the CM is required to provide a boolean indicator of whether avatars persist, and MC stores a boolean "avatar-changed-locally" in its Account data. Algorithm: * when local user sets avatar, store it, store avatar-changed-locally = TRUE, and (if online) call SetAvatar on the Connection * when connecting, if avatar-changed-locally || !avatars-persist, upload the avatar to the CM with SetAvatar * when SetAvatar succeeds or fails (for either of those reasons), store avatar-changed-locally = FALSE * when connecting, if !avatar-changed-locally && avatars-persist, download our own avatar from the CM. Store it, and store avatar-changed-locally = FALSE. * when the CM signals that our own avatar has changed, also download it, store it, and store avatar-changed-locally = FALSE. I think that works? (In reply to comment #7) > (In reply to comment #6) > > Eliminating avatar tokens might break MC's ability to set avatars nicely. > > It doesn't, if the CM is required to provide a boolean indicator of whether > avatars persist, and MC stores a boolean "avatar-changed-locally" in its > Account data. Algorithm: A "avatars-persist" property on Avatar iface makes perfect sense to me, good idea. > * when local user sets avatar, store it, store avatar-changed-locally = TRUE, > and (if online) call SetAvatar on the Connection > * when connecting, if avatar-changed-locally || !avatars-persist, upload the > avatar to the CM with SetAvatar > * when SetAvatar succeeds or fails (for either of those reasons), store > avatar-changed-locally = FALSE > * when connecting, if !avatar-changed-locally && avatars-persist, download our > own avatar from the CM. Store it, and store avatar-changed-locally = FALSE. > * when the CM signals that our own avatar has changed, also download it, > store it, and store avatar-changed-locally = FALSE. > > I think that works? That seems correct to me (In reply to comment #6) > + <tp:contact-attribute name="AvatarURI" type="s" > + tp:name-for-bindings="Avatar_URI"> > > Should be called avatar-uri. fixed > The documentation should say that: > > * it must be a file URI starting with file:/// as defined by RFC 1738 added > * it points into a cache maintained by the connection manager added > * whenever the content of the avatar changes, the URI must change I already tell that: <p>Connection Managers MUST choose token in a way that an avatar update always result in a different URI. For example using the image sha1 is a good idea, but not using the contact's ID.</p> It's not enough? > It should have rationale and probably also examples. > > Rationale for file:///: if we allowed other URI schemes (such as HTTP) we'd > have to specify which schemes clients were expected to support, and for decent > performance those clients would have to do their own caching anyway. > > Rationale for not just using filenames: on Unix systems, local filenames are > arbitrary byte sequences which are not necessarily UTF-8, or even any > consistent encoding at all. added > Example 1: on XMPP with XEP-0084 (PEP avatars) or XEP-0153 (vCard avatars), > avatars are announced by their SHA-1, so CM implementations could base their > filenames in the cache on that SHA-1. Similarly, in AIM/ICQ, avatars are > announced by their MD5. > > Example 2: in protocols where a change to a contact's avatar is announced by a > "last updated" timestamp, CM implementations could base their filenames in the > cache on the pair (contact ID, timestamp). added Here is tp-glib branch to implement the new spec in a TpAvatarsMixin and TpContact: http://cgit.collabora.com/git/user/xclaesse/telepathy-glib.git/log/?h=next-avatar-uri test-self-presence crash in that branch, I did not understand why... tp-glib branch updated, the failing unit test is fixed. I simplified a bit the code of the mixin too. tp-spec and tp-glib branches updated. The problems I'm worried are: Offline XMPP contacts --------------------- They don't have the token. Clients should not request those avatars except if it does persistent storage so it could request avatar only once (e.g. contactsd on N9). When the client gets the contact attributes, if avatar-uri is omitted it can't know if CM has the token or not. So it doesn't know if he should request the avatar. I've solved that by telling in avatar-uri attribute spec that if CM has the token but image is not in cache, it should omit from reply but start the request. Like that client-side we know that we won't have to call RequestAvatars. Is that a good idea? See bug #50082 comment #5 about implicit requests in contact attributes getters in general. Another solution I though about, is to add an avatar-needs-request contact attribute. Which makes sense considering the AvatarsNeedRequest signal... Initial population of avatars-mixin ----------------------------------- I guess when gabble receives the roster, it will call tp_avatars_mixin_avatar_changed() for each contact that has an avatar token, which emits AvatarUpdate for each... Shouldn't we avoid those initial signals? Maybe I should add a boolean arg in that method to tell not emit signals? So I would appreciate if people could take a look at current spec/implementation and give their opinions :) I have just been looking through your new avatar spec: * there are a few grammatical problems here and there, shall I just branch your spec branch and make my changes? * why is there a AvatarsNeedRequest signal? (that name sounds bad too, btw) Why doesn't the CM just request the avatar and save it in the cache immediately? * probably s/Mission Control/the account manager/ but feel free to say 'no'. :-) * you should make loads more links, to other specs and XEPs, etc. * I wonder if we should make AvatarUpdated plural? Other than these, the idea seems reasonable. I'll look at your tp-glib branch now. Here are some comments about your tp-glib branch: * the docs at the top of TpAvatarMixin are confusing. You start off with "To use the avatars mixin" then a section "Implementing Avatars". Try and consolidate the two. + /* TpHandle -> AvatarData Owned AvatarData? Please make it explicit. + * <informalexample><programlisting> + * tp_avatars_mixin_init ((GObject *) self, + * G_STRUCT_OFFSET (SomeObject, avatars_mixin)); + * </programlisting></informalexample> tp_avatars_mixin_init has loads more parameters than that. + static GQuark q[NUM_MIXIN_DBUS_PROPERTIES] = { 0, }; + + if (G_UNLIKELY (q[0] == 0)) + { I don't like this. Why don't you just use the getter data in TpDBusPropertiesMixinPropImpl? So something like this: static TpDBusPropertiesMixinPropImpl known_avatars_props[] = { { "AvatarPersists", GUINT_TO_POINTER (MIXIN_DP_AVATAR_PERSISTS), NULL }, ... then you can just check the GPOINTER_TO_UINT-ed value in the getter function? Include the standard TP_BASE_CONNECTION_ERROR_IF_NOT_CONNECTED in the RequestAvatars implementation. g_file_get_contents is always bad. avatar_cache_lookup is only called by tp_avatars_mixin_avatar_changed so we could make it use the async version, no? Same for _set_contents? (I didn't look at your other patches, updating TpContact and the unit tests.) (In reply to comment #13) > * why is there a AvatarsNeedRequest signal? (that name sounds bad too, btw) > Why doesn't the CM just request the avatar and save it in the cache > immediately? Actually I can kind of see how if a client doesn't care about avatars it won't want this unnecessary network traffic. I propose a rw boolean property instead (which is also a D-Bus connection parameter) which determines whether avatars will be automatically requested or not. Y/N (In reply to comment #13) > I have just been looking through your new avatar spec: > > * there are a few grammatical problems here and there, shall I just branch > your > spec branch and make my changes? Yes please. > * why is there a AvatarsNeedRequest signal? (that name sounds bad too, btw) > Why > doesn't the CM just request the avatar and save it in the cache immediately? The CM request the avatar only if a client is interested by it, that's why AvatarsNeedRequest is emitted so clients interested will call RequestAvatars in return. tbh I wouldn't mind letting the CM just request all avatars for which it has a token but not found in cache, that would simplify lots of things. In practice I think there is always a client that will ask to request the avatars anyway. CM could be smart a delay the request when in power-saving mode though, but I think gabble already put the vcard request on hold until we leave power saving mode, right? But I think Simon won't agree... > * probably s/Mission Control/the account manager/ but feel free to say 'no'. > :-) Makes sense, yes :) > * you should make loads more links, to other specs and XEPs, etc. Right > * I wonder if we should make AvatarUpdated plural? > > Other than these, the idea seems reasonable. I'll look at your tp-glib branch > now. Right (In reply to comment #14) > Actually I can kind of see how if a client doesn't care about avatars it won't > want this unnecessary network traffic. I propose a rw boolean property instead > (which is also a D-Bus connection parameter) which determines whether avatars > will be automatically requested or not. Y/N That's called a client-interest actually, we already have mechanism for that. I though about using that, but IIRC Simon wasn't all happy. If we go that way, I think it could make sense to separate interest in self avatar and in other contacts avatar, because MC will always (actually depending on the AvatarPersists property I guess) be interesting if self avatar to store it and re-set on reconnect. (In reply to comment #16) > That's called a client-interest actually, we already have mechanism for that. I > though about using that, but IIRC Simon wasn't all happy. 17:19 < smcv> xclaesse: "always request avatars" - my question is, whose? 17:19 < smcv> people with subscribe > No? people with publish > No? random people with no presence relationship who IM you out of the blue? Another possible set of people (which might in fact be what you meant) would be "people from whom I receive presence, whether it's broadcast or directed". In XMPP (but not necessarily other protocols), that's the same as the set of people whose avatar tokens you are aware of. One answer would be for the client interests to be self-describing things like /subscribe-avatars, /presence-received-avatars and /anyone-interacting-avatars. I mainly object to the idea of having a client interest whose semantics are "You automatically download the avatars of some arbitrary set of contacts, but I'm not going to tell you which ones" :-) > If we go that way, I > think it could make sense to separate interest in self avatar and in other > contacts avatar, because MC will always (actually depending on the > AvatarPersists property I guess) be interesting if self avatar to store it and > re-set on reconnect. Yes: MC only cares about our own avatar, and only cares about *that* on protocols where AvatarsPersist. Or, you could even just remove the /self-avatar client interest and assume that it's always-on? (In reply to comment #17) > One answer would be for the client interests to be self-describing things like > /subscribe-avatars, /presence-received-avatars and /anyone-interacting-avatars. My idea was to define the set as "anyone for who the CM has a useful way to lookup on disk cache" which is for gabble the same as "/presence-received-avatars". " /subscribe-avatars" must be avoided because on XMPP we don't get token for offline contacts, even when subscribed. That basically mean for the client: if you don't have an URI, then you just can't have an avatar, no need to request one. Which is all the client needs to know. With the exception in the case the client is doing persistent storage of contacts, like N9(00), then client could still call RequestAvatars on a first-seen contact (to get avatar of offline XMPP contacts but won't update until the contact is seen online). > Or, you could even just remove the /self-avatar client interest and assume that > it's always-on? I agree self avatar should be always-on. (In reply to comment #18) > My idea was to define the set as "anyone for who the CM has a useful way to > lookup on disk cache" which is for gabble the same as > "/presence-received-avatars". OK, you've convinced me that this is useful. > With the exception in the case the client is doing persistent storage of > contacts, like N9(00), then client could still call RequestAvatars on a > first-seen contact (to get avatar of offline XMPP contacts but won't update > until the contact is seen online). This does lead to this weird situation: * first contact with bob@example.com - address book explicitly requests avatar - Bob temporarily has an avatar token (URI) * sign out and back in * ask about Bob's avatar token - it's omitted from the reply, and presumably NULL in C code (we don't know that Bob doesn't have an avatar, which would be "", but neither do we know that he does, which would be a nonempty string) Should the UI remember the old avatar URI for this situation (and use it as a "best guess" at Bob's avatar until told otherwise), or should it copy the bytes of the avatar into permanent storage like the N9 does, and display that copy? I suppose either of those would be fine, really... (In reply to comment #19) > (In reply to comment #18) > > With the exception in the case the client is doing persistent storage of > > contacts, like N9(00), then client could still call RequestAvatars on a > > first-seen contact (to get avatar of offline XMPP contacts but won't update > > until the contact is seen online). > > This does lead to this weird situation: > > * first contact with bob@example.com > - address book explicitly requests avatar > - Bob temporarily has an avatar token (URI) > > * sign out and back in > > * ask about Bob's avatar token > - it's omitted from the reply, and presumably NULL in C code > (we don't know that Bob doesn't have an avatar, which would be "", > but neither do we know that he does, which would be a nonempty string) > > Should the UI remember the old avatar URI for this situation (and use it as a > "best guess" at Bob's avatar until told otherwise), or should it copy the bytes > of the avatar into permanent storage like the N9 does, and display that copy? > > I suppose either of those would be fine, really... I think this is client's choice. There are 3 possibilities: 1) stateless clients (empathy) will just not display avatar for those contacts 2) opportunistically statefull clients (tp-logger) will remember the cached URI and fallback to no avatar if the CM decided to cleanup its cache. 3) statefull clients (N9) will copy the file into its own storage and will use that until the CM gives a new URI. (In reply to comment #20) > 3) statefull clients (N9) will copy the file into its own storage and will use > that until the CM gives a new URI. Actually N900 does that, but N9 just put Tp::Contact's cache URI into tracker because this is a controlled environment where we know tp-qt4 won't cleanup its cache ever. Spec branch updated with a client-interest meaning "cm always do its best to update all avatars" idea. I will update tp-glib/gabble branch too. tp-glib branch updated as well now. Still could have some doc love though. Testing this with gabble is problematic since we didn't start porting it to tp-glib-next. (Only commenting on the API right now, not the English.) + <li>Image: <em>$XDG_CACHE_HOME</em>/telepathy/avatars/<em>cm</em>/<em>protocol</em>/<em>escaped-token</em></li> + <li>MIME Type: <em>$XDG_CACHE_HOME</em>/telepathy/avatars/<em>cm</em>/<em>protocol</em>/<em>escaped-token</em>.mime</li> + </ul> where <em>escaped-token</em> is the avatar <em>token</em> escaped + using telepathy-glib's tp_escape_as_identifier reference implementation.</p> I think we should just say the CM SHOULD cache avatars somewhere in XDG_CACHE_HOME/telepathy/avatars/*cm*, and that for each avatar for which the MIME type is known, it MUST store that in the file with the same name + ".mime" (or equivalently, the same file: URI + ".mime"). The precise details are up to the CM: for instance, if we were to merge Gabble and Wocky into one CM, they could safely mix local-xmpp and jabber avatars into a single directory, because they can't collide anyway, assuming no SHA-1 collisions (they both use SHA-1). If we do (eventually) care about SHA-1 collisions, the best format would be something like escaped JID + SHA-1 (and that would also make it easier to do logic like "purge all but the last 2 avatars for each JID"). Specifying tp_escape_as_identifier in the spec is unnecessarily strict - there's no requirement that these strings are valid C/C++/D-Bus identifiers, and no interop requirement that CMs use the same escaping algorithm. The CM could equally well use the hex SHA-1 directly (practical effect: no need to escape a leading digit), or even use (a modified, URI-safe form of) Base64 to get shorter names. + <tp:struct name="Contact_Avatar"> Do we need the MIME type on D-Bus at all? I don't think we do. If the CM is required to store the MIME type (where known) alongside the image, then we can just have plain avatar URIs in the D-Bus API, and any client that consumes them will read the image from that URI, and the MIME type from that URI + ".mime", successively. I don't like the name "contact avatar" for something that is only a reference to the avatar (specifically, the file URI). I'd prefer Avatar_URI or even Cache_URI (and without the MIME type, it'd become a tp:simple-type). The point of being strict on the cache path is to reuse what tp-glib/tp-qt already uses. But I guess that can be left out of spec and be an implementation detail? + <p>Request avatars for a number of contacts.</p> + + <p><tp:member-ref>AvatarsUpdated</tp:member-ref> signal is emitted when + avatars are retrieved. It is not mandatory to wait for all avatars to + be retrieved before emitting this signal, avatars could be signaled in + multiple batches, or even for each contact separately.</p> Those aren't the semantics we give to "request" everywhere else in tp-spec (e.g. in RequestContactInfo). ContactInfo calls this operation "refresh" (RefreshContactInfo), please do the same here. Please don't imply that AvatarsUpdated is guaranteed to be emitted - for contacts whose avatar we already knew, it shouldn't be emitted if "we were already right about it". I think we should have a special case wherever cache URIs appear: "" means "no avatar that this protocol allows us to see" (as distinct from "I don't know"). Or maybe "file:///dev/null" if you prefer :-) Do we want a RequestAvatar(u: Handle) -> s: Cache_URI, suitable for Contact Information dialogs, which does wait for the round-trip? (Analogous to RequestContactInfo). (In reply to comment #25) > The point of being strict on the cache path is to reuse what tp-glib/tp-qt > already uses. But I guess that can be left out of spec and be an implementation > detail? Yes, widely-used Telepathy 1.0 CMs should initially use the Telepathy 0.x tp-glib/tp-qt cache structure, but that's an implementation detail and doesn't need to be spec'd. New Telepathy 1.0 CMs probably shouldn't use tp_escape_as_identifier - with knowledge of the input that they use, they can almost certainly come up with their own escaping algorithm that is more efficient. (For instance, tp_escape_as_identifier escapes leading digits, which is pointless in a filename - particularly if what you're escaping is a numeric ICQ or QQ UIN). (In reply to comment #24) > (Only commenting on the API right now, not the English.) > I think we should just say the CM SHOULD cache avatars somewhere in > XDG_CACHE_HOME/telepathy/avatars/*cm*, and that for each avatar for which the > MIME type is known, it MUST store that in the file with the same name + ".mime" > (or equivalently, the same file: URI + ".mime"). Good idea, changed the spec like that. > + <tp:struct name="Contact_Avatar"> > > Do we need the MIME type on D-Bus at all? I don't think we do. If the CM is > required to store the MIME type (where known) alongside the image, then we can > just have plain avatar URIs in the D-Bus API, and any client that consumes them > will read the image from that URI, and the MIME type from that URI + ".mime", > successively. Tbh, the MIME is ~useless. So I'm happy to remove it from DBus with that URI suffix trick. Done. > I don't like the name "contact avatar" for something that is only a reference > to the avatar (specifically, the file URI). I'd prefer Avatar_URI or even > Cache_URI (and without the MIME type, it'd become a tp:simple-type). It's now a simple-type "s" and renamed to Avatar_URI (In reply to comment #26) > + <p>Request avatars for a number of contacts.</p> > + > + <p><tp:member-ref>AvatarsUpdated</tp:member-ref> signal is emitted when > + avatars are retrieved. It is not mandatory to wait for all avatars to > + be retrieved before emitting this signal, avatars could be signaled in > + multiple batches, or even for each contact separately.</p> > > Those aren't the semantics we give to "request" everywhere else in tp-spec > (e.g. in RequestContactInfo). ContactInfo calls this operation "refresh" > (RefreshContactInfo), please do the same here. Good point. Renamed. > Please don't imply that AvatarsUpdated is guaranteed to be emitted - for > contacts whose avatar we already knew, it shouldn't be emitted if "we were > already right about it". Done. > I think we should have a special case wherever cache URIs appear: "" means "no > avatar that this protocol allows us to see" (as distinct from "I don't know"). > Or maybe "file:///dev/null" if you prefer :-) "" URI means the contact has no avatar. "I don't know" is represented by contact attribute missing from the GetContactAttributes call. That's already said in the spec. Or do you mean something else? > Do we want a RequestAvatar(u: Handle) -> s: Cache_URI, suitable for Contact > Information dialogs, which does wait for the round-trip? (Analogous to > RequestContactInfo). We could. It can also easily be added later if we actually need it. I think this is good enough. One thing I would like fixed before merging is updating the documentation to make it very very clear when methods are expected to be returning, including the vfuncs in TpAvatarMixin to be implemented. Have you implemented this in gabble? Merged the spec. Branch merged. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.