Bug 24618

Summary: [0.13] add TpStringPool, a refcounted pool of strings
Product: Telepathy Reporter: Simon McVittie <smcv>
Component: tp-glibAssignee: Cosimo Alfarano <cosimo.alfarano>
Status: RESOLVED WONTFIX QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: enhancement    
Priority: low CC: cosimo.alfarano
Version: unspecified   
Hardware: Other   
OS: All   
URL: http://git.collabora.co.uk/?p=user/kalfa/telepathy-glib.git;a=shortlog;h=refs/heads/TpStringPool
Whiteboard: review-
i915 platform: i915 features:
Bug Depends on: 24209    
Bug Blocks:    

Description Simon McVittie 2009-10-19 06:22:27 UTC
In Bug #24120, Will wrote:
> We should really add TpStringPool. It's silly to keep using the dynamic handle
> repo and have to explain all over the place that they're not actually handles.

I agree; we should split out most of the functionality of TpDynamicHandleRepo into a TpStringPool class. I'm not sure whether the API should be in terms of guint32 or string: probably both (like a generalization of GQuark), with faster _by_id versions for the times when you already have the integer handy.
Comment 1 Cosimo Alfarano 2010-04-08 10:36:42 UTC
The branch holds a TpStringPool implementation and the patch against TpDynamicHandleRepo.
Comment 2 Simon McVittie 2010-04-08 11:04:07 UTC
> +/* Dynamic handle repo */
...
> + * TpStringPoolClass:
> + *
> + * The class of a dynamic handle repository. The contents of the struct
... (etc.)

Not true :-)

> +    /* Map GUINT_TO_POINTER(handle) -> (TpStringPoolHandlePriv *) */
> +    GHashTable *handle_to_priv;

I'd prefer the terminology in this string pool to be something other than "handle". I'm not sure what a good word is, though; perhaps "number"?

I think it'd be worth having a dual API in terms of numbers and strings, like the way GObject qdata has g_object_get_data *and* g_object_get_data_by_id.

> +    /* Normalization function */
> +    TpStringPoolNormalizeFunc normalize_function;

I still don't think this string pool is the right place for normalization, and I'd have left it in the dynamic handle repo.

Note, in particular, that this normalize function doesn't have the same signature as the one for the dynamic handle repo!

If you didn't put the normalization function and the normalization context in the string pool, then it wouldn't need properties, which would make it much simpler.

The documentation you cut and pasted into the string pool makes no sense for a generic string pool.

If the normalization stayed in the dynamic handle repo, then the string pool would just have a function equivalent to tp_string_pool_lookup_exact (but it should be called tp_string_pool_lookup_by_number or something), and wouldn't need a function equivalent to the current tp_string_pool_lookup (which involves a malloc/free cycle).

In the TpDynamicHandleRepo reimplementation, you don't seem to do anything with the normalization function - as far as I can see, it'll never be called. This would break Gabble, a lot.
Comment 3 Cosimo Alfarano 2010-04-15 04:37:15 UTC
Briefly:

I'll remove the normalization in the pool it can be added later if needed with some methods added.
the handle repo is still using the original normalize function, it hasn't been touched.

About the "handle" term renamed, I'd use "id" (probably better in _lookup_by_id() which is already used and more recognisable). "number" sounds so generic.

gtk-doc fixed, I'll complete the doc, also.

not ready for the next review.

Long version:

TpDynamicHandleRepo has its implementation of the normalisation function, the original one.
Normalisation is done ensuring and looking up using repo's
self->priv->normalize_function and set by the _new() using the object's props.

The pool's normalisation func is not used by the handle repo.

Rationale for the pool having its normalisation method:
There might be protocols, for instance, whose identifiers are not case sensitive but the server won't send them normalized. A CM might want to store them as they "arrive", being sure to reference "id" as the already known "Id" instead as a new identifier.

I'll remove it for sake of simplicity if the rationale is not strong enough, it can always be added later, though it would need more methods: _lookup() wouldn't have the context passed, and if added later, it would be needed _lookup_normalized() added, equivalent to the current couple _lookup_exact()/_lookup().

About the two lookup functions, there is _lookup_exact(s) _lookup(s,ctx) and _lookup_handle(guint) (which will renamed be _lookup_by_number/_lookup_by_id).


> and wouldn't need a function equivalent to the current 
> tp_string_pool_lookup (which involves a malloc/free cycle).

tp_string_pool_lookup() when used without a normalization function is almost equivalent to the _exact version, beside the time spent on checking two if's guards and stacking a g_free with a NULL ptr which would return immediately.
Comment 4 Cosimo Alfarano 2010-04-15 10:46:13 UTC
- norm. func. removed, and with it properties.
- gtk-doc updated/improved
- renamed _inspect() into _lookup_by_handle()
- renamed _lookup_exact() into _lookup()
- removed the former _lookup() (with normalization)
- tests passed
- I touched test/dbus/handle-repo.c adding a test double ensuring the same string.


Still to do:
- rename handle to something else.

It will be a matter of doing s/handle/new/g s/Handle/New/g s/HANDLE/NEW/g
the current code (I checked all the occurrences of the words, and they can be easily replaced with it).

The proposals are
- number (too generic, IMHO)
- id (clashes with other tp-glib use of the same work)
- reference (it is a numeric reference to a string)
- key

I'd like to have a clear name for it, that's why I still didn't change it, but I'm not a native speaker, so I cannot find anything more than common or maybe often misused names.

I'll set it as patch when I'll have a good name (thought can be reviewed being aware of it).
Comment 5 Simon McVittie 2010-04-15 10:55:33 UTC
I need to do a spec release first, but this is in the queue...
Comment 6 Simon McVittie 2010-04-20 04:18:19 UTC
Makefile.am needs to add string-pool.h to the HEADERS.

telepathy-glib.h should perhaps include string-pool.h.

> + * SECTION:string-pool
> + * @title: TpStringPool
> + * @short_description: arbitrary string pool, with dynamic handle allocation
> + * and recycling.
> + *
> + * The #TpStringPool:normalize-function property may be set to
> + * perform validation and normalization on handle ID strings.

I think a better description would be:

 * @short_description: a pool of reference-counted strings identified by number
 * 
 * This string pool implementation is a generalization of #GQuark.
 * Whereas the #GQuark mechanism leaves strings allocated permanently,
 * each #TpStringPool maintains a reference count for each string it contains.

> + * Copyright (C) 2005, 2007 Collabora Ltd.
> + * Copyright (C) 2005, 2007 Nokia Corp.

Please extend the Collabora copyright range to 2005-2010, leaving the Nokia one as-is.

> -static void dynamic_repo_iface_init (gpointer g_iface,
> +static inline void dynamic_repo_iface_init (gpointer g_iface,

There's no point in inlining this: G_IMPLEMENT_INTERFACE takes the address of the iface_init function, so it can't be inlined.

> +  self->string_pool = tp_string_pool_new (NULL, NULL);
> +
> +  if (self->string_pool == NULL)
> +    g_error ("Unable to create the string pool");

Why would this ever fail? If the answer is OOM, you don't need to check for that - GLib calls g_error on OOM anyway.

> +  if (self->string_pool != NULL)
> +    {
> +      g_object_unref (self->string_pool);
> +      self->string_pool = NULL;
> +    }

This is releasing a reference, so it should be in dispose().

> @@ -810,26 +690,18 @@ dynamic_ensure_handle (TpHandleRepoIface *irepo,
...
> +  handle = tp_string_pool_lookup_exact (self->string_pool, id);
> +  if (handle != 0)
...
> +  handle = tp_string_pool_ensure_handle (self->string_pool, id, context, error)

Why don't you just call the ensure() function to start with?

> + * The #TpStringPool:normalize-function property may be set to
> + * perform validation and normalization on handle ID strings.

I still don't think TpStringPool should have normalization. You're not using it in TpDynamicHandleRepo, and neither Mission Control nor Gabble is going to use it either.

If in doubt, remove it for now, and we can reinstate it later if necessary.

I haven't reviewed normalization functionality further.

> +guint tp_string_pool_lookup_exact (TpStringPool *self,
> +    const char *id);

I think it'd be worth having the TpStringPool's API guarantee that its numbers are guint32 (in practice, that's what TpDynamicHandleRepo has always done, and numbers above G_MAXUINT32 can't be Telepathy handles anyway).

I'm not sure about the API naming: perhaps something like this, replacing NUMBER with whatever handle-like concept we come up with? (GQuark is another good source of naming conventions for this.)

/* does not return a ref */
guint32 tp_string_pool_lookup_NUMBER (TpStringPool *self, const gchar *s);
/* returns a ref */
guint32 tp_string_pool_ensure_NUMBER (TpStringPool *self, const gchar *s) G_GNUC_WARN_UNUSED_RESULT;
const gchar *tp_string_pool_get_string (TpStringPool *self, guint32 NUMBER);
gboolean tp_string_pool_string_exists (TpStringPool *self, const gchar *s);
gboolean tp_string_pool_NUMBER_exists (TpStringPool *self, guint32 NUMBER);
/* ref should usually return the thing that was reffed */
guint32 tp_string_pool_ref_NUMBER (TpStringPool *self, guint32 NUMBER);
void tp_string_pool_unref_NUMBER (TpStringPool *self, guint32 NUMBER);

I'm not sure that we need tp_string_pool_handles_are_valid - the only reason for taking a GArray there is to be nice to D-Bus APIs that take an array of handles. I'd be inclined to do the iteration in the dynamic handle repo instead.

What's the difference between tp_string_pool_inspect_handle and tp_string_pool_lookup_handle? Are they, in fact, the same?

> +#include <config.h>
> +
> +#include <telepathy-glib/errors.h>
> +#include <telepathy-glib/heap.h>
> +#include <telepathy-glib/util.h>
> +#include <telepathy-glib/string-pool.h>

Please put the file's own header (string-pool.h) immediately after config.h, as an easy way to verify that it's self-contained.

> +static inline TpStringPool *
> +tp_string_pool_new (TpStringPoolNormalizeFunc normalize_func,
> +    gpointer default_normalize_context)

I think making this inline is overkill. It can just be a normal function; it's not as if it will be called very often.

> +tp_string_pool_ensure_handle (TpStringPool *self,
...
> +  g_return_val_if_fail (!tp_str_empty (id), 0);

A string pool should be able to store "". It's OK if it can't store NULL, although that fact should probably be documented. (lookup, lookup_exact have the same problem).

A lot of the functions don't seem to be documented. Please do so (it should be much easier once you've got rid of normalization and redundant API).

> +const gchar *tp_string_pool_lookup_handle (TpStringPool *self,
> +    guint handle)
> +{
> +  g_return_val_if_fail (handle != 0, NULL);
> +  g_return_val_if_fail (TP_IS_STRING_POOL (self), NULL);
> +
> +  return g_hash_table_lookup (self->priv->handle_to_priv,
> +      GUINT_TO_POINTER (handle));

This seems to return a HandlePriv * (an opaque struct that library users can't interpret), rather than the string you'd expect it to return, which makes the whole function rather useless. Why does this function exist?
Comment 7 Simon McVittie 2010-04-20 08:53:13 UTC
Diffing the right branch this time (sorry about that):

The handle-leak-debug machinery in handle-repo-dynamic.c seems to be unused now? You'll probably need to delete it, otherwise the build will fail if you --enable-handle-leak-debug. (Please do that and see whether everything still compiles.)

I don't like the replacement of "handle" with "reference"... we already have a thing called a reference, and this isn't it. Of your suggestions, I think "key" is my favourite, although we'd have to rename the key_id argument of the qdata functions (easy, call it "quark" or something).

Perhaps rename like this?

guint32 reference => guint32 key
TpStringPoolReferencePriv => TpStringPoolEntry
reference_to_priv => key_to_entry (or entries)
string_to_reference => string_to_key

> +#ifdef ENABLE_REFERENCE_LEAK_DEBUG

I think that's a poor name for this functionality: I'd automatically assume that was tracking GObject ref leaks. (--enable-string-pool-leak-debug might be better.) Also, you haven't changed configure.ac to remove --enable-handle-leak-debug and replace it with this?

The practical effect of this debug mode is to consider it to be a problem if a string pool is destroyed with strings allocated to it. Whether that's the case or not depends on usage - if we're going to consider that to be an error, then we should probably have a tp_string_pool_release_all() method that can be used just before destroying the string pool if you don't care if it still has contents, or something?

(Or, we could just drop this functionality...)

I think it'd make the code clearer if we called strings "s" rather than "id", probably - the string pool doesn't care what the string means, so the string isn't guaranteed to be any sort of identifier.
Comment 8 Cosimo Alfarano 2010-04-21 07:43:06 UTC
Comment related to comment#6
I ignored the comments related to the out-of-date branch you used when they where not applicable (usually already fixed).

Shortly: fixed the rest of comments. not ready for next review cycle.

>> + * Copyright (C) 2005, 2007 Collabora Ltd.
>> + * Copyright (C) 2005, 2007 Nokia Corp.

> Please extend the Collabora copyright range to 2005-2010, leaving the Nokia one
> as-is.

OK. Updated both handle-repo-dynamic.[ch] with 2005-2010 and changed string-pool.h from "2005, 2007, 2010" to "2005-2010".

>> +  self->string_pool = tp_string_pool_new (NULL, NULL);
>> +
>> +  if (self->string_pool == NULL)
>> +    g_error ("Unable to create the string pool");
> Why would this ever fail? If the answer is OOM, you don't need to check for
> that - GLib calls g_error on OOM anyway.

Removed.

> +  if (self->string_pool != NULL)
> +    {
> +      g_object_unref (self->string_pool);
> +      self->string_pool = NULL;
> +    }

> This is releasing a reference, so it should be in dispose().

Right.

> Why don't you just call the ensure() function to start with?

fixed. I tried to keep my code as much similar as I could to the old one, but this was silly.


>> +guint tp_string_pool_lookup_exact (TpStringPool *self,
>> +    const char *id);

> I think it'd be worth having the TpStringPool's API guarantee that its numbers
> are guint32 (in practice, that's what TpDynamicHandleRepo has always done, and
> numbers above G_MAXUINT32 can't be Telepathy handles anyway).

>I'm not sure about the API naming: perhaps something like this, replacing
>NUMBER with whatever handle-like concept we come up with? (GQuark is another
>good source of naming conventions for this.)

wrt the #telepathy chat we had, I'm going to use the following instead:

>/* does not return a ref */
>guint32 tp_string_pool_lookup_NUMBER (TpStringPool *self, const gchar *s);

tp_string_pool_lookup_by_string()
from tp_string_pool_lookup()

/* returns a ref */
> guint32 tp_string_pool_ensure_NUMBER (TpStringPool *self, const gchar *s)
> G_GNUC_WARN_UNUSED_RESULT;

the same.
from tp_string_pool_ensure()

> const gchar *tp_string_pool_get_string (TpStringPool *self, guint32 NUMBER);

tp_string_pool_lookup_by_NUMBER()
from tp_string_pool_lookup_by_handle()

> gboolean tp_string_pool_string_exists (TpStringPool *self, const gchar *s);

not existing, added.

> gboolean tp_string_pool_NUMBER_exists (TpStringPool *self, guint32 NUMBER);

the same
from tp_string_pool_is_valid()

> /* ref should usually return the thing that was reffed */
> guint32 tp_string_pool_ref_NUMBER (TpStringPool *self, guint32 NUMBER);

the same
from tp_string_pool_ref_handle()

> void tp_string_pool_unref_NUMBER (TpStringPool *self, guint32 NUMBER);

the same
from tp_string_pool_unref_handle()

> I'm not sure that we need tp_string_pool_handles_are_valid - the only reason
> for taking a GArray there is to be nice to D-Bus APIs that take an array of
> handles. I'd be inclined to do the iteration in the dynamic handle repo
> instead.

removed in TpStringPool, the dyn.repo is already iterating. It's using
dynamic_handle_is_valid() instead of tp_string_pool_NUMBER_exists(), I usually think it's more readable. Is it OK?


> +static inline TpStringPool *
> +tp_string_pool_new (TpStringPoolNormalizeFunc normalize_func,
> +    gpointer default_normalize_context)

> I think making this inline is overkill. It can just be a normal function; it's
> not as if it will be called very often.

OK, it was defined that way in the handle repo headers. fixed.

> +tp_string_pool_ensure_handle (TpStringPool *self,
...
> +  g_return_val_if_fail (!tp_str_empty (id), 0);

> A string pool should be able to store "". It's OK if it can't store NULL,
> although that fact should probably be documented. (lookup, lookup_exact have
> the same problem).

Empty string is legal, updated sanity checks and doc.
Comment 9 Cosimo Alfarano 2010-04-22 06:16:38 UTC
I finally understood why some tests where failing.

It's not possible to clear the GDataList after disposing the string pool in the handle repo.

See afb3ae5e9ec1f5f6765be12018fc8911ce3c874f.

About it there are three option I can do:
1) (they current way as per afb3ae5e9ec1f5f6765be12018fc8911ce3c874f) call g_datalist_clear() in dispose. It's safe to call it more than once so it shouldn't be a problem to be in dispose().

2) analyse the problem better and fix the problem at its root (I'd probably open a bug and solve it independently later, unless you see some collateral effect)


The rest of the branch is actually several diffs that do not do much in terms of code. Mainly renaming, improving doc, etc.

When the review cycle will be done, I'll stash them all in just some patches, mainly the TpStringPool creation, the Handle repo application of TpStringPool, plus a commit per fix that should not be stashed with them.
Comment 10 Simon McVittie 2010-04-22 06:24:08 UTC
(In reply to comment #9)
> It's not possible to clear the GDataList after disposing the string pool in the
> handle repo.

It's fine to clear things in dispose if they depend on things that hold a reference, so you're correct to clear the GDataList in dispose.
Comment 11 Simon McVittie 2010-09-13 03:17:32 UTC
Not going to happen for 0.12. Deferred.
Comment 12 Simon McVittie 2013-09-16 18:55:06 UTC
I don't think this is ever going to happen.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.