Bug 27271

Summary: Cleanup TP Logger API
Product: Telepathy Reporter: Simon McVittie <smcv>
Component: loggerAssignee: Guillaume Desmottes <guillaume.desmottes>
Status: RESOLVED FIXED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: normal    
Priority: medium CC: cosimo.alfarano, guillaume.desmottes, pochu27
Version: git master   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 27642, 27772, 27773, 32477, 32902, 33033, 33034, 33212, 33213, 33252, 33546, 33993, 34143, 34145, 34349    
Bug Blocks:    

Description Simon McVittie 2010-03-23 11:51:59 UTC
Placeholder bug for C API review.

In Bug #27178, Cosimo writes:

> log-entry.c/log-entry-text.c
> 
> the naming follows the ChannelType.
> 
> There are several methods duplicated, my intention is to drop most of them and
> keep only
> tpl_log_entry_<method> for what is implemented in TplLogEntry and
> tpl_log_entry_text_<method> for what is implemented in TplLogEntryText
> 
> dropping things similar to:
> 
> const gchar* (*get_log_id) (TplLogEntry *self);
> 
> in TplLogEntryClass and the related
> 
> const gchar *tpl_log_entry_text_get_log_id (TplLogEntryText *self);
> 
> which create only confusion and keeping only
> 
> const gchar* tpl_log_entry_get_log_id (TplLogEntry *self);
> 
> Unless in tp-glib is done differently.
Comment 1 Simon McVittie 2010-03-23 11:57:00 UTC
*** Bug 27176 has been marked as a duplicate of this bug. ***
Comment 2 Simon McVittie 2010-03-23 12:44:24 UTC
Only a partial review so far...

In TplLogEntryClass:

>  void (*dispose) (GObject *obj);
>  void (*finalize) (GObject *obj);

These seem unused... which is good, because they make no sense.

I'm not sure that the rest of the virtual methods (apart from equal()) make sense either: why would a subclass ever want to override the generic get_timestamp method? The timestamp hopefully isn't going to wander off into a different database table for certain events?

(In reply to comment #0)
> In Bug #27178, Cosimo writes:
> > There are several methods duplicated, my intention is to drop most of them and
> > keep only
> > tpl_log_entry_<method> for what is implemented in TplLogEntry and
> > tpl_log_entry_text_<method> for what is implemented in TplLogEntryText
> > 
> > dropping things similar to:
> > 
> > const gchar* (*get_log_id) (TplLogEntry *self);
> > 
> > in TplLogEntryClass and the related
> > 
> > const gchar *tpl_log_entry_text_get_log_id (TplLogEntryText *self);
> > 
> > which create only confusion and keeping only
> > 
> > const gchar* tpl_log_entry_get_log_id (TplLogEntry *self);

Yes, please do (or I could, or whatever).

There are basically three useful patterns for a GLib method:

* non-virtual (C++: no special keyword, Java: final)

  The base class has a method. Subclasses can't override it.

  GLib example: tp_proxy_get_object_path() (note that there
  is no tp_channel_get_object_path() - that wouldn't be
  useful)

* virtual (C++: virtual, Java: no special keyword)

  The base class has a method which just calls a function
  pointer in the class struct. The base class also has a
  default implementation in *its* class struct, usually.

  GInterfaces often have these; base classes can have
  them too.

  GLib example: g_initable_init() just calls
  the object's implementation of GInitableIface.init.

* "virtual and protected" (analogous to C++ virtual protected)?

  The base class struct has a function pointer in it
  which is called in order to to implement some functionality.
  It doesn't correspond 1:1 to any public method.

  GLib example: tp_base_connection_change_status() is a
  non-virtual method which calls private function pointers
  TpBaseConnection.connecting, TpBaseConnection.shut_down
  etc.

I'm not sure which one TplLogEntry[Text] is trying to use, tbh...
Comment 3 Simon McVittie 2010-04-19 12:08:27 UTC
A few more comments:

Will code outside the library need to subclass TplLogEntry? If not, we could move TplLogEntryClass to a log-entry-internal.h, reducing our API surface area.

> #define TPL_LOG_ENTRY_MSG_ID_IS_VALID(msg) (msg >= 0)
> #define TPL_LOG_ENTRY_MSG_ID_UNKNOWN -2
> #define TPL_LOG_ENTRY_MSG_ID_ACKNOWLEDGED -1

I don't like this overloading of message IDs. I think the log entry should have a separate flag for whether the message has been acknowledged, and perhaps also a flag for whether the message ID is meaningful.

> gint tpl_log_entry_get_pending_msg_id (TplLogEntry *self);

Will API users ever care about this? It'd be good if this could be internal to the logger and the library.

> gboolean tpl_log_entry_is_pending (TplLogEntry *self);

This only seems to make sense for Text messages? Perhaps it should only exist in the subclass?

> const gchar *tpl_log_entry_get_channel_path (TplLogEntry *self);

What's this for? Does anything use it? Channel paths aren't unique or meaningful over time, so it seems unwise.
Comment 4 Cosimo Alfarano 2010-04-20 07:22:27 UTC
> Will code outside the library need to subclass TplLogEntry? If not, 
> we could move TplLogEntryClass to a log-entry-internal.h, reducing our
> API surface area.

No, subclassing is just for who wants to extend the logger, so only within the logger until TPL will support plug-ins

I'll move the class to -private.

> #define TPL_LOG_ENTRY_MSG_ID_IS_VALID(msg) (msg >= 0)
> #define TPL_LOG_ENTRY_MSG_ID_UNKNOWN -2
> #define TPL_LOG_ENTRY_MSG_ID_ACKNOWLEDGED -1

> I don't like this overloading of message IDs. I think the log entry should 
> have a separate flag for whether the message has been acknowledged, 
> and perhaps also a flag for whether the message ID is meaningful.

the pending_msg_id bits are used only to be able to understand what, in the pending message list, has been already logged, at TPL start up.

Yeah, it might confuse people when using such values in TPL and with different semantic in tp-glib.

I'll add another method with a flag.


> gint tpl_log_entry_get_pending_msg_id (TplLogEntry *self);
>> Will API users ever care about this? It'd be good if this could be internal
>> to the logger and the library.

OK.

> gboolean tpl_log_entry_is_pending (TplLogEntry *self);
>> This only seems to make sense for Text messages? Perhaps it should 
>> only exist in the subclass?

ok.

> const gchar *tpl_log_entry_get_channel_path (TplLogEntry *self);
>> What's this for? Does anything use it? Channel paths aren't unique 
>>or meaningful over time, so it seems unwise.

I understand your point but I needed it. I think you suggested a similar approach in the past, but if you have a way to remove it, I'd be glad to get rid of lots of methods carrying transient of misleading data.

This method would only be internal.

I might use a separate object/struct and add tpl_log_store_text_set_transient_data(), or similar, which might include both pending msg id related data and the channel path.


The method is used by TplLogStoreSqlite to retrieve all the pending messages for a channel and set by TplChannelText.

When I instantiate TplChannelText I prepare it with the following steps (among other things):
- prepare the object's parent
- hook the signal
- check the pending messages need to be logged

Especially I need to know if any message in the pending msg list has already been logged.

The only way to do it, not having a unique, trustable and always present message id (at the time), was to create a tpl-log-id as result of a hash function
tpl-log-id(channel_path, timestamp, pending_msg_id).

This function returns a hash which accomplishes uniqueness, trustfulness and it's always present since it's TPL creating it (I might change it in the future, after the tp-specs change about it). 

What I do, once obtained the tpl-log-id is storing it in a SQLite table, together with its channel path, so that on channel Created notification, I have a list of tpl-log-id for it to check.

The worse case happens when the created channel path matches one store in SQL without being the same channel (ie re-utilised), in this case no tpl-log-id would match so its just a useless comparison.

It's just a way to filter out data that are not interesting for me.

If I do not check against the couple (channel-path, tpl-log-id) but just look for tpl-log-id, I'd retrieve much more log-ids (related to any channel), slowing down the TplChannelText preparation.


For instance:

Pending message list: 1 2 3
SQLITE: log-id3 log-id4 log-id5
related to the same channel path.

I compute a tpl-log-id from the pending messages ids obtaining 
log-id1 log-id2 log-id3. -> I have to log only log-id1 and log-id2.


Pending message list: 1 2 3 -> log-id1 log-id2 log-id3
SQLITE: log-id4 log-id5

In this scenario I have nothing to log, I don't know if it's because I already logged them all or because it's because they refer actually to two completely different channels and I'm not interested in knowing it at this point.
Comment 5 Cosimo Alfarano 2010-04-23 04:14:29 UTC
I am waiting for some pending branches to be reviewed before starting working on any API change. I'd like the pending queue for TPL reviews is near to zero lenght.

Then I'll be able to change the API freely.
Comment 6 Cosimo Alfarano 2010-04-28 10:14:18 UTC
This branch should still wait for some other branch to be merged.


> Will code outside the library need to subclass TplLogEntry? If not, 
> > we could move TplLogEntryClass to a log-entry-internal.h, reducing our
> > API surface area.
> 
> No, subclassing is just for who wants to extend the logger, so only within the
> logger until TPL will support plug-ins
> 
> I'll move the class to -private.

Moved.

> > #define TPL_LOG_ENTRY_MSG_ID_IS_VALID(msg) (msg >= 0)
> > #define TPL_LOG_ENTRY_MSG_ID_UNKNOWN -2
> > #define TPL_LOG_ENTRY_MSG_ID_ACKNOWLEDGED -1
> 
> > I don't like this overloading of message IDs. I think the log entry should 
> > have a separate flag for whether the message has been acknowledged, 
> > and perhaps also a flag for whether the message ID is meaningful.
> 
> the pending_msg_id bits are used only to be able to understand what, in the
> pending message list, has been already logged, at TPL start up.
> 
> Yeah, it might confuse people when using such values in TPL and with different
> semantic in tp-glib.
> 
> I'll add another method with a flag.

Added _pending_status() methods instead.

Now before getting the pending message id a caller should always check if the pending status for the log entry is set to PENDING, otherwise the msg_id is meaningless.

This is the reason for which I extended the msg_id possible values to negative, even if originally they were unsigned, so I could exclusively set the msg_id (zero-postive) or the alternative meta-value for it (negative).

I'd like there were a possible value for msg_ids that means not-set.

tpl_log_entry_text_set_pending_msg_id() returns 0 on errors, which is actually OK, since it can happen only on wrong instance of @self with g_return_*

But I'd like to avoid that someone use _get_pending_msg_id() considering its value as a valid id, when actually the _get_pending_status() is not PENDING.

It seems that there is nothing I can do about it, but document it.

In some cases I needed to set a improbable value for the msg-id, for example in
log-store-xml.c:829
channel-text.c:763 and 1179
log-entry-text.c:263 and 365
and some other places.

I don't feel really satisfied with it, though.


> > gint tpl_log_entry_get_pending_msg_id (TplLogEntry *self);
> >> Will API users ever care about this? It'd be good if this could be internal
> >> to the logger and the library.
> 
> OK.

Actually Empathy uses it. I think that it makes a wrong use of pending messages and it can be replaced, I'll need to check Empathy for it and ask the E. team.

In the meantime I flag them as possible-internal methods.


> > gboolean tpl_log_entry_is_pending (TplLogEntry *self);
> >> This only seems to make sense for Text messages? Perhaps it should 
> >> only exist in the subclass?
> 
> ok.

superseded by tpl_log_entry_text_get_pending_status(), as above.

all the methods about pending msg ids are not in TplLogEntryText, since for what I understand there the concept of "pending/ack'd" is not a generic concept but just some channels have it.
 
> > const gchar *tpl_log_entry_get_channel_path (TplLogEntry *self);
> >> What's this for? Does anything use it? Channel paths aren't unique 
> >>or meaningful over time, so it seems unwise.

> I understand your point but I needed it. I think you suggested a similar
> approach in the past, but if you have a way to remove it, I'd be glad to get
> rid of lots of methods carrying transient of misleading data.
> 
> This method would only be internal.

I moved _{s,g}et_channel_path and _get_account_path to -internal.h

This way no public misuse of paths can be done.

If you have a better way to handle the situation, I'd be glad to adopt it.
Comment 7 Guillaume Desmottes 2010-05-26 07:56:37 UTC
Empathy uses the following API:

tpl_log_entry_get_account_path ()
tpl_log_entry_get_timestamp ()

tpl_log_entry_text_get_log_id ()
tpl_log_entry_text_get_message ()
tpl_log_entry_text_get_receiver ()
tpl_log_entry_text_get_sender ()

I made a branch keeping only those API public:
http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/log-entry-api-27271
Comment 8 Guillaume Desmottes 2010-05-26 08:06:13 UTC
(In reply to comment #7)
> I made a branch keeping only those API public:
> http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/log-entry-api-27271

merged
Comment 9 Guillaume Desmottes 2010-05-26 08:34:50 UTC
Remove API duplication and useless virtual methods:
http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/log-entry-api-27271
Comment 10 Simon McVittie 2010-05-26 08:38:15 UTC
(In reply to comment #9)
> Remove API duplication and useless virtual methods:
> http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/log-entry-api-27271

Ship it.

Also, it might be nice to make the TplLogEntryText methods take a TplLogEntry *, like the way many Gtk methods take a GtkWidget?
Comment 11 Guillaume Desmottes 2010-05-27 02:22:07 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Remove API duplication and useless virtual methods:
> > http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/log-entry-api-27271
> 
> Ship it.

merged.

> Also, it might be nice to make the TplLogEntryText methods take a TplLogEntry
> *, like the way many Gtk methods take a GtkWidget?

Don't know, I don't have a strong opinion on this.
Comment 12 Simon McVittie 2010-05-28 06:59:05 UTC
There might be implementations of some of these in:

http://git.collabora.co.uk/?p=user/kalfa/telepathy-logger.git;a=shortlog;h=refs/heads/log-entry-api-review

but it'll probably need significant merging.

> TplContact * tpl_log_entry_get_sender (TplLogEntry *self);
> TplContact * tpl_log_entry_get_receiver (TplLogEntry *self);

These are Bug #27269, and are out of scope here.

>   param_spec = g_param_spec_uint ("signal-type",
>       "SignalType",
>       "The signal type which caused the log entry",

This is public API despite not being in the header, because it's a property. The enum that lets you interpret it is not public API. Am I right in thinking that Empathy silently ignores every TplLogEntry that isn't a TplLogEntryText? That seems less than ideal, to say the least!

> gint tpl_log_entry_get_pending_msg_id (TplLogEntry *self);

This is specific to Text and should be in the subclass; KA's branch made this change.

See Bug #26838 for some thoughts on safety (or lack of) of using pending message IDs to try to construct unique identifiers for messages.

What does Empathy use this for?

This is a writeable property, which means library users can change it arbitrarily - that seems dangerous. Make it read-only, with a library-internal setter function, instead. timestamp, signal-type, chat-id, channel-path, sender and receiver have the same bug, with the same solution.

>   /**
>    * TplLogEntry::log-id:
>    *
>    * A token which can be trusted as unique over time within TPL.

Not really :-P See Bug #26838.

>   param_spec = g_param_spec_uint ("direction",
>       "Direction",
>       "The direction of the log entry (in/out)",

Derivable from whether the sender is self. This is public API because it's a property, but it's not useful because the enum isn't public API.

> const gchar *tpl_log_entry_text_get_message (TplLogEntryText *self);

This isn't even sufficient for the old Text interface, which also carried a TpChannelTextMessageType and TpChannelTextMessageFlags. At the very least, Empathy needs to be able to ignore "empty" messages with the Non_Text_Content flag.
Comment 13 Simon McVittie 2010-05-28 07:03:16 UTC
Another trivial thing: I wonder if this class should be called TplEntry, since the entire library is about logs :-)
Comment 14 Guillaume Desmottes 2010-06-07 07:43:16 UTC
http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/entry-27271

(In reply to comment #13)
> Another trivial thing: I wonder if this class should be called TplEntry, since
> the entire library is about logs :-)

renamed.

> >   param_spec = g_param_spec_uint ("signal-type",
> >       "SignalType",
> >       "The signal type which caused the log entry",
> 
> This is public API despite not being in the header, because it's a property.
> The enum that lets you interpret it is not public API. Am I right in thinking
> that Empathy silently ignores every TplLogEntry that isn't a TplLogEntryText?
> That seems less than ideal, to say the least!

I removed this property as it's only used by the logger and not Empathy.

> > gint tpl_log_entry_get_pending_msg_id (TplLogEntry *self);
> 
> This is specific to Text and should be in the subclass; KA's branch made this
> change.
> 
> See Bug #26838 for some thoughts on safety (or lack of) of using pending
> message IDs to try to construct unique identifiers for messages.
> 
> What does Empathy use this for?

Empathy creates an EmpathyMessage from the log entry and uses this as message id:

	empathy_message_set_id (retval,
			tpl_log_entry_get_pending_msg_id (logentry));


> This is a writeable property, which means library users can change it
> arbitrarily - that seems dangerous. Make it read-only, with a library-internal
> setter function, instead. timestamp, signal-type, chat-id, channel-path, sender
> and receiver have the same bug, with the same solution.
> 
> >   /**
> >    * TplLogEntry::log-id:
> >    *
> >    * A token which can be trusted as unique over time within TPL.
> 
> Not really :-P See Bug #26838.
> 
> >   param_spec = g_param_spec_uint ("direction",
> >       "Direction",
> >       "The direction of the log entry (in/out)",
> 
> Derivable from whether the sender is self. This is public API because it's a
> property, but it's not useful because the enum isn't public API.

Moved TplEntryDirection to the API.

> > const gchar *tpl_log_entry_text_get_message (TplLogEntryText *self);
> 
> This isn't even sufficient for the old Text interface, which also carried a
> TpChannelTextMessageType and TpChannelTextMessageFlags. At the very least,
> Empathy needs to be able to ignore "empty" messages with the Non_Text_Content
> flag.
Comment 15 Guillaume Desmottes 2010-06-09 06:22:05 UTC
(In reply to comment #12)
> > gint tpl_log_entry_get_pending_msg_id (TplLogEntry *self);
> 
> This is specific to Text and should be in the subclass; 

done.

> This is a writeable property, which means library users can change it
> arbitrarily - that seems dangerous. Make it read-only, with a library-internal
> setter function, instead. timestamp, signal-type, chat-id, channel-path, sender
> and receiver have the same bug, with the same solution.

I made all the properties construct-only (so they can still be set during construction) which is fine as we always use internal setters.

> >   /**
> >    * TplLogEntry::log-id:
> >    *
> >    * A token which can be trusted as unique over time within TPL.
> 
> Not really :-P See Bug #26838.

Changing things requier more design and thoughts. What about removing this guarantee for now? In practice Empathy doesn't rely on it anyway.

> > const gchar *tpl_log_entry_text_get_message (TplLogEntryText *self);
> 
> This isn't even sufficient for the old Text interface, which also carried a
> TpChannelTextMessageType and TpChannelTextMessageFlags. At the very least,
> Empathy needs to be able to ignore "empty" messages with the Non_Text_Content
> flag.

That's what we get from the log store and was enough for Empathy so far.
Comment 16 Simon McVittie 2010-06-09 06:28:34 UTC
(In reply to comment #14)
> http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/entry-27271

review+

(In reply to comment #15)
> > >   /**
> > >    * TplLogEntry::log-id:
> > >    *
> > >    * A token which can be trusted as unique over time within TPL.
> > 
> > Not really :-P See Bug #26838.
> 
> Changing things requier more design and thoughts. What about removing this
> guarantee for now? In practice Empathy doesn't rely on it anyway.

Sounds good to me. We should look at Bug #26838, though.

> > > const gchar *tpl_log_entry_text_get_message (TplLogEntryText *self);
> > 
> > This isn't even sufficient for the old Text interface, which also carried a
> > TpChannelTextMessageType and TpChannelTextMessageFlags. At the very least,
> > Empathy needs to be able to ignore "empty" messages with the Non_Text_Content
> > flag.
> 
> That's what we get from the log store and was enough for Empathy so far.

They're in the spec for a reason, and I think their absence is a bug, but we can always add API...
Comment 17 Cosimo Alfarano 2010-06-09 06:45:25 UTC
> > >   /**
> > >    * TplLogEntry::log-id:
> > >    *
> > >    * A token which can be trusted as unique over time within TPL.
> > 
> > Not really :-P See Bug #26838.
> 
> Changing things requier more design and thoughts. What about removing this
> guarantee for now? In practice Empathy doesn't rely on it anyway.

TplLogEntry::log-id is craeted by _tpl_create_message_token() (there is a comment about Bug #26838 before this function declaration).

Which is a hash of channel-path, timestamp and pending-msg-id.

This should guarantee uniqueness, at least for the fact that it's not possible (confirm? :) to have two messages having the same p-msg-id and timestamp within the same channel.

Hashing the triplet should be fine, AFAIK.
Comment 18 Simon McVittie 2010-06-09 07:21:22 UTC
(In reply to comment #17)
> TplLogEntry::log-id is craeted by _tpl_create_message_token() (there is a
> comment about Bug #26838 before this function declaration).
> 
> Which is a hash of channel-path, timestamp and pending-msg-id.

Channel object paths are not unique (in practice, the channel path depends on the peer handle in most connection managers). If all of this happens within the same second:

* receive a message which creates a channel /C, with pending message number 0
* close that channel
* receive a message from the same contact, which creates another channel /C with pending message number 0

then you'll get a collision.

> This should guarantee uniqueness, at least for the fact that it's not possible
> (confirm? :) to have two messages having the same p-msg-id and timestamp within
> the same channel.

It's not possible to have two messages with the same pending message ID within the same channel, assuming you don't receive 2**32 messages in one channel.

(CMs don't check for this, in practice; in principle they should open a new channel after 2**32 messages, or something, but even at 1000 messages per second, it would take 49 days to receive this many messages. Real IM sessions are neither that long nor that rapid :-)
Comment 19 Guillaume Desmottes 2010-06-09 08:25:24 UTC
I merged the branch and pushes 2 more patches: http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/entry-27271
Comment 20 Simon McVittie 2010-06-09 11:03:44 UTC
(In reply to comment #19)
> I merged the branch and pushes 2 more patches:
> http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/entry-27271

r+
Comment 21 Guillaume Desmottes 2010-06-10 01:50:09 UTC
Merged. I added a small patch fixing a regression:
http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/entry
Comment 22 Simon McVittie 2010-06-10 03:06:14 UTC
(In reply to comment #21)
> Merged. I added a small patch fixing a regression:
> http://git.collabora.co.uk/?p=user/cassidy/telepathy-logger;a=shortlog;h=refs/heads/entry

review+
Comment 23 Nicolas Dufresne 2011-01-13 14:45:49 UTC
This seem all merged now ...
Comment 24 Nicolas Dufresne 2011-01-17 09:59:33 UTC
Reopening after noticing my error., still depends on #26590 #26648 #27135 and #27642
Comment 25 Nicolas Dufresne 2011-01-17 10:09:08 UTC
*** Bug 33030 has been marked as a duplicate of this bug. ***
Comment 26 Nicolas Dufresne 2011-04-01 10:48:51 UTC
Closing as most of this master bug item has been covered, there few enough bug now that it's manageable.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.