Bug 35882

Summary: Different event type should be logged in different files
Product: Telepathy Reporter: Nicolas Dufresne <nicolas>
Component: loggerAssignee: Nicolas Dufresne <nicolas>
Status: RESOLVED FIXED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: enhancement    
Priority: medium CC: david.laban, pochu27
Version: git master   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 35549, 36626    

Description Nicolas Dufresne 2011-04-01 10:17:19 UTC
I think logging different event type in different XML file would give more performance and simpler code. An example, _exist() could keep a simple implementation doing a "if_file_exist", search scope would be reduced and XML parsing would be done on smaller XML files.
Comment 1 Nicolas Dufresne 2011-05-04 09:32:47 UTC
Here's my implementation, please review.

http://cgit.collabora.co.uk/git/user/nicolas/telepathy-logger.git/log/?h=split-event
Comment 2 David Laban 2011-05-05 09:19:42 UTC
http://cgit.collabora.co.uk/git/user/nicolas/telepathy-logger.git/commit/?h=split-event&id=14314bd21d31b8502e85111a6dabc6fa09a2f56a
-log_store_xml_get_timestamp_filename (void)
+log_store_xml_get_timestamp_filename (GType type)

would you object to 
+log_store_xml_get_timestamp_filename (TplEvent *event)
(which looks at the timestamp of the event as well as its type) or 
+log_store_xml_get_timestamp_filename (GType type, gint64 timestamp) ?
I think I might need this for logging message-edits.


http://cgit.collabora.co.uk/git/user/nicolas/telepathy-logger.git/commit/?h=split-event&id=c22ca3933c5ef47a19682e7809c03469b8018a1d

Have we released a logger which puts call events in with text events? If so, get_dates(CALL) will miss some days for some users. If we haven't made any such release then we're safe.
(similarly with http://cgit.collabora.co.uk/git/user/nicolas/telepathy-logger.git/commit/?h=split-event&id=1ce4c111999aa5dfae42f4b3a326fca5c2678ee7 )


http://cgit.collabora.co.uk/git/user/nicolas/telepathy-logger.git/commit/?h=split-event&id=429771478868a128f9a8b1be7d77c872b5fb8021
kudos for exposing GQueues in the api (they're much nicer to work with) but:

- g_queue_push_tail (&events, event);
+ {
+ while (index != NULL &&
+ tpl_event_get_timestamp (event) <
+ tpl_event_get_timestamp (TPL_EVENT (index->data)))
+ index = g_list_next (index);
+
+ if (index != NULL)
+ {
+ g_queue_insert_after (events, index, event);
+ index = g_list_next (index);
+ }
+ else
+ {
+ g_queue_push_tail (events, event);
+ index = events->head;
+ }
+
+ num_events++;
+ }
Do you mean g_queue_insert_sorted() or something else?
Alternatively, could you just throw away most of this code and just g_queue_sort() at the end?

+ else if (type_mask & TPL_EVENT_MASK_CALL)
do you mean:
+
+ if (type_mask & TPL_EVENT_MASK_CALL)

The poet in me is pining for g_queue_concat (grand_list, list_to_swallow (transfer full)) (=== g_list_concat (grand_list->tail, queue_to_swallow->head); 
grand_list->tail = list_to_swallow->tail; 
grand_list->length += queue_to_swallow->length; 
*queue_to_swallow = G_QUEUE_INIT; 
g_queue_free(queue_to_swallow);) and/or g_queue_merge_sorted (grand_list, list_to_swallow) so that we don't have to pass in a mutable queue.

I can't quite see how these changes will make opening text chats any faster, but I can see how they would make extracting a call log faster (assuming that we haven't made any releases with the mixed log format, in which case it would break extracting a call log)
Comment 3 David Laban 2011-05-05 09:37:58 UTC
> I can't quite see how these changes will make opening text chats any faster,
> but I can see how they would make extracting a call log faster (assuming that
> we haven't made any releases with the mixed log format, in which case it would
> break extracting a call log)

Just read #36626  and I think I like cassidy's suggestion best (though s/older/newer/)
> Do we really need to parse each files? Shouldn't we start from the older files
> and stop as soon as we found enough messages?

If we decide to go for your solution then (for backwards compatibility with existing mixed call logs) I think we need to rename new chat logs to .chat.log and make get_dates(CALL) match .call.log and .log

Also, please test get_dates(EVERYTHING) before you merge.
Comment 4 Emilio Pozuelo Monfort 2011-05-05 10:37:14 UTC
(In reply to comment #2)
> Have we released a logger which puts call events in with text events? If so,
> get_dates(CALL) will miss some days for some users. If we haven't made any such
> release then we're safe.

We have, but you need to build the logger with --enable-call, which I don't think any distribution does yet, so I think we can safely change this for now.

> I can't quite see how these changes will make opening text chats any faster,
> but I can see how they would make extracting a call log faster (assuming that
> we haven't made any releases with the mixed log format, in which case it would
> break extracting a call log)

Opening text chats isn't the only use-case that will benefit from this. tpl_log_manager_exists() will improve from O(n) to O(1), since with this you don't need to parse the file but only check that it exists.
Comment 5 Nicolas Dufresne 2011-05-05 11:06:47 UTC
(In reply to comment #2)
> http://cgit.collabora.co.uk/git/user/nicolas/telepathy-logger.git/commit/?h=split-event&id=14314bd21d31b8502e85111a6dabc6fa09a2f56a
> -log_store_xml_get_timestamp_filename (void)
> +log_store_xml_get_timestamp_filename (GType type)
> 
> would you object to 
> +log_store_xml_get_timestamp_filename (TplEvent *event)
> (which looks at the timestamp of the event as well as its type) or 
> +log_store_xml_get_timestamp_filename (GType type, gint64 timestamp) ?
> I think I might need this for logging message-edits.
> 

Yes, you probably need that as we want timestamp to come from event for text edit (guessing you may want edits to be stored in the same file as the original message), I just didn't want to change implementation here as it's unrelated to this work.

> 
> http://cgit.collabora.co.uk/git/user/nicolas/telepathy-logger.git/commit/?h=split-event&id=c22ca3933c5ef47a19682e7809c03469b8018a1d
> 
> Have we released a logger which puts call events in with text events? If so,
> get_dates(CALL) will miss some days for some users. If we haven't made any such
> release then we're safe.
> (similarly with
> http://cgit.collabora.co.uk/git/user/nicolas/telepathy-logger.git/commit/?h=split-event&id=1ce4c111999aa5dfae42f4b3a326fca5c2678ee7
> )

The answer is yes, but only as "Experimental Call Support". No call logs reader have been released yet and no distro I know of enable this experimental option. But if they do, the effect is that old logs will just not show, no big deal imho.

> 
> 
> http://cgit.collabora.co.uk/git/user/nicolas/telepathy-logger.git/commit/?h=split-event&id=429771478868a128f9a8b1be7d77c872b5fb8021
> kudos for exposing GQueues in the api (they're much nicer to work with) but:

Could be an option for 0.3 ...

> 
> - g_queue_push_tail (&events, event);
> + {
> + while (index != NULL &&
> + tpl_event_get_timestamp (event) <
> + tpl_event_get_timestamp (TPL_EVENT (index->data)))
> + index = g_list_next (index);
> +
> + if (index != NULL)
> + {
> + g_queue_insert_after (events, index, event);
> + index = g_list_next (index);
> + }
> + else
> + {
> + g_queue_push_tail (events, event);
> + index = events->head;
> + }
> +
> + num_events++;
> + }
> Do you mean g_queue_insert_sorted() or something else?
> Alternatively, could you just throw away most of this code and just
> g_queue_sort() at the end?

g_queue_insert_sorted() would be much slower, since it would linearly lookup from start at every insert. Also, we could not control the way it is inserted, which may cause logs of same timestamp to not follow log order.

In this implementation, I iterate only once on the queue I'm merging in, skipping the element I just added, to preserve log order.

> 
> + else if (type_mask & TPL_EVENT_MASK_CALL)
> do you mean:
> +
> + if (type_mask & TPL_EVENT_MASK_CALL)

Oops, good catch ! I'll try and cover that in the unit tests.

> 
> The poet in me is pining for g_queue_concat (grand_list, list_to_swallow
> (transfer full)) (=== g_list_concat (grand_list->tail, queue_to_swallow->head); 
> grand_list->tail = list_to_swallow->tail; 
> grand_list->length += queue_to_swallow->length; 
> *queue_to_swallow = G_QUEUE_INIT; 
> g_queue_free(queue_to_swallow);) and/or g_queue_merge_sorted (grand_list,
> list_to_swallow) so that we don't have to pass in a mutable queue.

The list would not be ordered correctly, we need to merge it, not concatenate it.

> 
> I can't quite see how these changes will make opening text chats any faster,
> but I can see how they would make extracting a call log faster (assuming that
> we haven't made any releases with the mixed log format, in which case it would
> break extracting a call log)

log_store_xml_get_dates() has a big performance gain as it no longer have to look inside the file to figure-out if a date contains texts (or calls) events. The implementation of log_store_xml_get_filtered_events() (use to get inital 5 messages in Empathy) uses first call _get_dates() and iterates over those dates until the amount of required events is reached. We can clearly optimize more this case instead of doing all this split, but it would not allow implementing _exists() correctly (currently broken for performance reason) and would not optimize _get_dates() at the same time.
Comment 6 Nicolas Dufresne 2011-05-05 12:01:55 UTC
Updated with the mentioned fix and test that covers log_store_xml_get_events_for_date().

Same link, two last commits (will squash second when review is done):
http://cgit.collabora.co.uk/git/user/nicolas/telepathy-logger.git/log/?h=split-event
Comment 7 Nicolas Dufresne 2011-05-05 12:36:37 UTC
Merge upstream after David's approval on IRC.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.