Bug 26590 - GetFrequentContacts
Summary: GetFrequentContacts
Status: RESOLVED MOVED
Alias: None
Product: Telepathy
Classification: Unclassified
Component: logger (show other bugs)
Version: git master
Hardware: Other All
: low enhancement
Assignee: Telepathy bugs list
QA Contact: Telepathy bugs list
URL: http://git.collabora.co.uk/?p=user/da...
Whiteboard: Need more design
Keywords:
: 26585 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-02-16 04:09 UTC by Danielle Madeley
Modified: 2019-12-03 19:25 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
rebased and updated patch (28.27 KB, patch)
2010-10-06 21:46 UTC, Danielle Madeley
Details | Splinter Review

Description Danielle Madeley 2010-02-16 04:09:22 UTC
Return a weighted list of the most frequent contacts.

This is still currently a prototype implementation that reparses the entire log whenever you make a request (not optimal). Need to think about how to cache results for this.

It also requires a signal to announce changes to the most frequent contacts.
Comment 1 Danielle Madeley 2010-02-22 19:54:24 UTC
Ok. I have reimplemented this and GetRecentContacts (#25585) using a new TplLogStore called TplLogStoreCounter which uses sqlite3 to keep a count of the number of messages per day.

http://git.collabora.co.uk/?p=user/danni/telepathy-logger.git;a=shortlog;h=refs/heads/get-frequent-contacts

This branch also includes the patches for #26585 and some work for #26621.
Comment 2 Cosimo Alfarano 2010-04-23 10:34:27 UTC
Danielle can you rebase the branch to the current master's HEAD, please?

I see that there are still some references to log-store-counter, which are quite out of date.

Also several changes have been merged since then, and it might need some manual work when rebasing.

If you don't have time to do it, I might do it next week, just drop me a line.

thanks.
Comment 3 Cosimo Alfarano 2010-04-26 05:27:16 UTC
I rebased it, it was easier that what I thought and not really out of date, I saw a reference to "log-store-counter.h" but it was alone.
Comment 4 Cosimo Alfarano 2010-04-26 09:57:31 UTC
my rebased tree, on which I did my review, is at http://git.collabora.co.uk/?p=user/kalfa/telepathy-logger.git;a=shortlog;h=refs/heads/danni-get-frequent-contacts




970196c97d84814e0f8d269dc79c141a67d17148:

extentions/Logger.xml

+      <arg direction="out" name="Frequent_Contacts" type="a(sd)"
+           tp:type="Frequent_Contact[]">

Simple observation in case it wasn't meant: I usually see Frequent_Contact_List to indicate an array of Frequent_Contact.

in tpl_dbus_service_get_recent_contacts()

+  g_list_foreach (chats, (GFunc) tpl_log_manager_search_hit_free, NULL);

@chats is obtained by tpl_log_manager_get_chats() which might return a quite long list of entries.
Probably it would be better to use
g_list_delete_link() against the list's head, instead.

Maybe it's just me, but I usually prefer that way, even if it doesn't change from being O(n).


e78fa71ad34a34e0db2884ec750a09451a097df3:

+      <arg direction="out" name="Frequent_Contacts" type="a(sd)"
+           tp:type="Frequent_Contact[]">

Same above idea.


in tpl_dbus_service_get_frequent_contacts()


+  g_list_foreach (chats, (GFunc) tpl_log_manager_search_hit_free, NULL);

same comment about using g_list_delete_link(), if you liked the idea.


+      g_value_set_double (value, g_value_get_double (value) / maxfreq);

I'd also write 

 g_value_set_double (value, (g_value_get_double (value) / maxfreq));

which IMO is more readable.

3ea381791c984a3997bd4c3a0169b2979c6c9943:

in tpl_dbus_service_get_recent_contacts() and tpl_dbus_service_get_frequent_contacts():

-      if (hit->is_chatroom && !include_chatrooms)
+      if (!((filter_flags & TPL_IDENTIFIER_FILTER_ROOM && hit->is_chatroom) ||
+            (filter_flags & TPL_IDENTIFIER_FILTER_CONTACT && !hit->is_chatroom)
+           ))
           continue;

Even though it's pretty intuitive what you're doing, I'd use more parenthesis to help the eye to identify single expressions and probably split it into a "if ... else if ..." which IMO is more readable.



f600080726fd7a7d3e4c0948a70c852ada41822b:

telepathy-logger/Makefile.am:


+		$(BUILT_SOURCES)
 		$(NULL)

would probably be

+		$(BUILT_SOURCES) \
 		$(NULL)


27ef375add699f94cec8debc860cb4cd9bbb17c7:

@@ -734,25 +700,17 @@ tpl_dbus_service_get_frequent_contacts (TplSvcLogger *self,
   /* copy from the GList to the GPtrArray */
   for (ptr = contacts; ptr != NULL; ptr = ptr->next)
     {
-      GValueArray *varray = ptr->data;
-      GValue *value = g_value_array_get_nth (varray, 1);
-
-      /* normalise the value on [0, 1] */
-      g_value_set_double (value, g_value_get_double (value) / maxfreq);
-      g_ptr_array_add (array, varray);
+      g_ptr_array_add (array, ptr->data);
     }
 
   g_list_free (contacts);
 
-  tpl_svc_logger_return_from_get_recent_contacts (context, array);
+  tpl_svc_logger_return_from_get_frequent_contacts (context, array);
 

This should be in 27ef375add699f94cec8debc860cb4cd9bbb17c7 and not in a tpl_get_account() related commit.



cb7a6556b854e49645fd915c879fd330b2de5411:


+      <arg name="Frequent_Contacts" type="a(sd)" tp:type="Frequent_Contact[]">
+        <tp:docstring>
+          A list of the contacts whos frequency has changed and their new
+          frequency.
+        </tp:docstring>
+      </arg>

typo, s/whos/whose/


@@ -552,6 +570,35 @@ tpl_log_store_sqlite_add_message_counter (TplLogStore *self,
       goto out;
     }
 
+    {
+      const char *path;
+      TpAccount *tpaccount;
+      double frequency;
+      GError *lerror = NULL;
+
+      path = tpl_log_entry_get_account_path (message);
+      tpaccount = tpl_get_tp_account (path, &lerror);
+
+      if (lerror != NULL)
+        {
+          DEBUG ("Error getting TpAccount: %s", lerror->message);
+          g_error_free (lerror);
+          lerror = NULL;
+        }
+      else
+        {
+          // FIXME: keep a cache of frequencies, the new frequency is just
+          // the old frequency +1 (unless first message after midnight)
+          frequency = tpl_log_store_sqlite_get_frequency (
+              TPL_LOG_STORE_SQLITE (self), tpaccount, identifier);
+
+          g_signal_emit (self, _signals[FREQUENCY_CHANGED], 0,
+              path, identifier, frequency);
+
+          g_object_unref (tpaccount);
+        }
+    }
+
   retval = TRUE;

Is this isolated block intended or just a temporary thing?

You might put it in an "else" branch or create a separate function for it since it's big enough and does exactly one thing.


4d1c28d70f287005574cc0c28a4dccd0ac2a5174:


@@ -693,47 +693,117 @@ tpl_dbus_service_get_frequent_contacts (TplSvcLogger *self,
   GError *error = NULL;
   double maxfreq = 0;
 
-  account = tpl_get_tp_account (account_path, &error);
-  if (account == NULL)
+  if (!tp_strdiff (account_path, "/"))
...
+  else
+    {
+      DEBUG ("Looking up frequent contacts for account %s", account_path);


Isn't it possible to call from the first "if" branch tpl_dbus_service_get_frequent_contacts(account_path) for each valid account?

It would remove a lot of duplicated code.

+          /* create a GValueArray for this contact, and store it in a GList */
+          contacts = g_list_insert_sorted (contacts,
+              tp_value_array_build (3,
+                DBUS_TYPE_G_OBJECT_PATH, tp_proxy_get_object_path (account),
+                G_TYPE_STRING, hit->chat_id,
+                G_TYPE_DOUBLE, freq,
+                G_TYPE_INVALID),
+              cmp_frequent_contacts);


Why not passing directly account_path that you already have (when in "else" branch above)


The rest seems OK, you just need to squash commits after everything is OK.

If you use my rebased branch, be aware that in e78fa71ad34a34e0db2884ec750a09451a097df3

+  TplLogStore *store = tpl_log_store_counter_dup ();

should be changed in tpl_log_store_sqlite_dup().

There are also other occurrences in the module using _counter_ instead of _sqlite_.

Squashing c1232feacceb10a0f9197aed17130076887f6736 into e78fa71ad34a34e0db2884ec750a09451a097df3 would fix it.


970196c97d84814e0f8d269dc79c141a67d17148 also had some references to _counter_ symbols, but rebasing there was a conflict so I fixed it to make it compile.
Comment 5 Cosimo Alfarano 2010-04-28 06:44:22 UTC
Thinking of what FrequentContact does:

GetFrequentContact won't show the frequent ones, but the frequency for all contacts that TPL knows about.

It means two things:

1) if what is wanted is a list of frequent contacts, would be good to add one more parameter to the method: the number of contacts you want (n=10 would return ten contacts, meaning the top-10 among the most frequent contacts).
if this is not the semantic for the methods, probably a rename of the method would be a good idea, something like GetContactsFrequency.

2) the method returns all the contacts that TPL knows about, meaning all the logged contacts. Among them there might be some that are actually not a contact anymore for the user (ie removed contact or removed account).
It might be a good idea either to add a flag "show non-existing contacts" and filter the result according to it, or document somewhere that the client need to filter the list against non-existent contacts, if any.

The same for RecentContacts.

My two cents:

In a scenario with Folks, I like the idea to keep GetContactsFrequency-like API in TPL and let Folks publish information per-individual instead of per-identifier.
Raw information by TPL, refined by Folks.
Comment 6 Travis Reitter 2010-04-28 09:21:57 UTC
(In reply to comment #5)
> Thinking of what FrequentContact does:
> 
> GetFrequentContact won't show the frequent ones, but the frequency for all
> contacts that TPL knows about.
> 
> It means two things:
> 
> 1) if what is wanted is a list of frequent contacts, would be good to add one
> more parameter to the method: the number of contacts you want (n=10 would
> return ten contacts, meaning the top-10 among the most frequent contacts).
> if this is not the semantic for the methods, probably a rename of the method
> would be a good idea, something like GetContactsFrequency.
> 
> 2) the method returns all the contacts that TPL knows about, meaning all the
> logged contacts. Among them there might be some that are actually not a contact
> anymore for the user (ie removed contact or removed account).
> It might be a good idea either to add a flag "show non-existing contacts" and
> filter the result according to it, or document somewhere that the client need
> to filter the list against non-existent contacts, if any.
> 
> The same for RecentContacts.
> 
> My two cents:
> 
> In a scenario with Folks, I like the idea to keep GetContactsFrequency-like API
> in TPL and let Folks publish information per-individual instead of
> per-identifier.
> Raw information by TPL, refined by Folks.

Right, I think the best option here is to have Folks read out the stats from TPL and thread them through the Individuals (which have the sum stats of all their subcontacts). So we'll eventually add API to Folks for looking up contacts by frequency, etc.
Comment 7 Danielle Madeley 2010-10-06 21:46:38 UTC
Created attachment 39247 [details] [review]
rebased and updated patch

This branch has been rebased and updated.

It would be good to get a new review so we can get it merged.

There's also a further WIP to simplify recent conversations (and stop it returning "FIXME" as the most recent message, once this branch is merged.
Comment 8 Guillaume Desmottes 2010-10-07 05:54:58 UTC
Does this code assume that we are using the sqlite store to store messages? Afaik, we are still using the XML one atm.

Including chatrooms in the result of a function called Get_Recent_*Contacts* seems bong to me. What's the use case for this?
Same for GetFrequentContacts

      /* we assume that the AM is ready */
Ideally,  TplDBusService should have a priv->account_mgr and not be registered until it has been prepared so we don't have to check if features are prepared and just assert they are.

_tpl_get_tp_account()
why not use tp_account_manager_ensure_account() so we reuse existing TpAccount?

+  if (priv->sqlite_store)
+    {
+      g_object_unref (priv->sqlite_store);
+      priv->sqlite_store = NULL;
+    }
tp_clear_object() ftw!

You could make use of tp_value_array_unpack()
Comment 9 Danielle Madeley 2010-10-07 06:00:11 UTC
(In reply to comment #8)
> Does this code assume that we are using the sqlite store to store messages?
> Afaik, we are still using the XML one atm.

No, log-store-sqlite is keeping a cache for us. This is actually what I originally wrote it for, and the code to do it got merged some time ago.

> Including chatrooms in the result of a function called Get_Recent_*Contacts*
> seems bong to me. What's the use case for this?
> Same for GetFrequentContacts

Suggestions for new names?

>       /* we assume that the AM is ready */
> Ideally,  TplDBusService should have a priv->account_mgr and not be registered
> until it has been prepared so we don't have to check if features are prepared
> and just assert they are.

Should I take out the check then?

> _tpl_get_tp_account()
> why not use tp_account_manager_ensure_account() so we reuse existing TpAccount?

If someone passes a made-up account path it will create that object on the bus? I want it to error instead.
Comment 10 Guillaume Desmottes 2010-10-07 06:05:44 UTC
(In reply to comment #9)
> > Including chatrooms in the result of a function called Get_Recent_*Contacts*
> > seems bong to me. What's the use case for this?
> > Same for GetFrequentContacts
> 
> Suggestions for new names?

Do we really need chat rooms tbh? If we do I'd be tempted we should either have a GetFrequentChatrooms() or express the API in term of TpHandleType.

> >       /* we assume that the AM is ready */
> > Ideally,  TplDBusService should have a priv->account_mgr and not be registered
> > until it has been prepared so we don't have to check if features are prepared
> > and just assert they are.
> 
> Should I take out the check then?

Well, if you do the suggested refactoring, yes :)

> > _tpl_get_tp_account()
> > why not use tp_account_manager_ensure_account() so we reuse existing TpAccount?
> 
> If someone passes a made-up account path it will create that object on the bus?
> I want it to error instead.

Humm, maybe we should have a tp_account_manager_get_account() ?
Comment 11 Danielle Madeley 2010-10-07 14:58:05 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > > Including chatrooms in the result of a function called Get_Recent_*Contacts*
> > > seems bong to me. What's the use case for this?
> > > Same for GetFrequentContacts
> > 
> > Suggestions for new names?
> 
> Do we really need chat rooms tbh? If we do I'd be tempted we should either have
> a GetFrequentChatrooms() or express the API in term of TpHandleType.

I don't see why you wouldn't include them in the same API. TpHandleTypes can't be used as a bitwise mask.
Comment 12 Danielle Madeley 2010-10-07 19:35:35 UTC
(In reply to comment #8)

> +  if (priv->sqlite_store)
> +    {
> +      g_object_unref (priv->sqlite_store);
> +      priv->sqlite_store = NULL;
> +    }
> tp_clear_object() ftw!

Fixed.

> You could make use of tp_value_array_unpack()

Where? The only new calls to g_value_array_get_nth() are specifically retrieving the nth value of multiple structures.
Comment 13 Guillaume Desmottes 2010-10-11 01:46:14 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > > Including chatrooms in the result of a function called Get_Recent_*Contacts*
> > > > seems bong to me. What's the use case for this?
> > > > Same for GetFrequentContacts
> > > 
> > > Suggestions for new names?
> > 
> > Do we really need chat rooms tbh? If we do I'd be tempted we should either have
> > a GetFrequentChatrooms() or express the API in term of TpHandleType.
> 
> I don't see why you wouldn't include them in the same API. TpHandleTypes can't
> be used as a bitwise mask.

I don't like this API. Get{Frequent,Recent}*Contacts* should only return contacts or be renamed. Furthermore, the semantic of being frequent/recent is not clear. Does it involve only chat or also call (once we'll suport them)?

(In reply to comment #12)
> > You could make use of tp_value_array_unpack()
> 
> Where? The only new calls to g_value_array_get_nth() are specifically
> retrieving the nth value of multiple structures.

You're right, sorry.
Comment 14 Danielle Madeley 2010-10-11 02:57:17 UTC
(In reply to comment #13)

> I don't like this API. Get{Frequent,Recent}*Contacts* should only return
> contacts or be renamed. Furthermore, the semantic of being frequent/recent is
> not clear. Does it involve only chat or also call (once we'll suport them)?

What should we rename it to?

Perhaps we should also include channel class.
Comment 15 Simon McVittie 2010-10-11 04:05:44 UTC
11:00 < smcv> danni: when you say Identifier do you implicitly mean the 
              identifier of a *contact* (and not a chatroom or a bee or 
              something)? I think it'd help clarity if you said so
11:01 < danni> smcv: it could also be a chatroom
11:01 < danni> which cassidy dislikes
11:01 < smcv> that's not GetFrequentContacts, that's GetFrequentEntities or 
              something

11:04 < smcv> how does the logger know how many contacts to return? is it a 
              hard-coded number or frequency threshold?
11:05 < smcv> that sounds uncomfortably like "if Empathy's UI design wants to 
              show more than Meego's, one of them has to patch the logger"
11:05 < danni> smcv: it actually returns all of them
11:05 < smcv> is that a guarantee?
11:06 < smcv> that doesn't sound as though it scales tbh

11:08 < danni> smcv: the idea was to be able to build a recency based contact 
               roster
11:08 < smcv> danni: right, for that you want "the rankings among the N most 
              recent have changed"
11:09 < smcv> danni: or "the contacts with frequency >= 0.6 have changed"
11:10 < danni> smcv: where it gets messy is when all the frequencies change 
               once a day
11:11 < danni> I'd forgotten what a hack that signal was :-/
11:11 < smcv> couldn't the UI just poll every n hours? :-)
11:12 < danni> smcv: yeah, certainly
11:12 < smcv> or the signal could have no actual content, just "poll now"
11:12 < danni> doesn't that have the scaling problem still?
11:12 < danni> the signal could include the frequency of just the changing 
               contact
11:13 < smcv> frequency and rank order?
11:13 < smcv> perhaps?
11:14 < danni> actually, I think the signal is only meant to give entities whos 
               frequency has changed
11:15 < danni> it's just an a(sd) in case there's more than one entity

As far as I can tell, the idea is to be able to display the N most frequent or most recent contacts (where N ought to be chosen by the UI rather than the logger), or to be able to display contacts whose frequency/"recency" is above a threshold (which should perhaps also be chosen by the UI, but the UI can only usefully do that if it knows what the numbers mean).

If you're not sure what the semantics are yet as regards little details like multiple-entity signals, whether an entity is a contact or a (contact|room), etc., then writing them down would probably help :-)

I think having a bitmask for the contact/room/both choice seems like premature optimization. If you want it to be in terms of handle types, I'd just have an 'au', Handle_Type[]; if the first thing the implementation does is to convert that into a non-API bitfield, then that's fine.

For choices between contacts and (contacts|rooms|both), or between IMs, "abstract events" and (IMs|calls|both|...), I really can't say anything without a use case. What information is it that you actually want?

I can imagine wanting to have some global idea of frequency, in which an IM is worth one point and a call is worth N points, or something.

I can also imagine wanting to have a call UI list the most frequently called contacts, have a text UI list the most frequently messaged contacts (not the same, at least on my phone), and have the address book list a mixture of the two...

I'd tend to err on the side of the logger providing data, and UIs choosing their interpretation, particularly if in practice we want to pipe this information through Folks for per-metacontact aggregation before the UIs interpret it?

You can't really measure chatrooms on the same scale as contacts (even a not-very-noisy room like #telepathy can easily log more messages than a long 1-1 conversation) and I don't think the user's interaction with chatrooms is the same either, so it doesn't necessarily make sense to force contacts and rooms into one API if they don't actually want the same semantics.
Comment 16 Nicolas Dufresne 2011-01-13 14:10:16 UTC
*** Bug 26585 has been marked as a duplicate of this bug. ***
Comment 17 GitLab Migration User 2019-12-03 19:25:22 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/telepathy/telepathy-logger/issues/1.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.