Bug 44331

Summary: Gabble plugin API fails at runtime on Windows
Product: Telepathy Reporter: Siraj Razick <siraj>
Component: gabbleAssignee: Siraj Razick <siraj>
Status: RESOLVED FIXED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: enhancement    
Priority: high CC: olli.salli
Version: unspecifiedKeywords: patch
Hardware: Other   
OS: All   
URL: http://cgit.collabora.com/git/user/siraj/telepathy-gabble.git/commit/?h=plugin-api-change&id=74b90716d13ff8bd24d806f3f130404dad58fe3e
Whiteboard: review+
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 45799, 44256, 44447, 44535, 44649, 46245    

Description Siraj Razick 2011-12-30 14:23:17 UTC
When building plugins for windows to produce .dll files we need to define all the
symbols at link time. since all the current modules are compiled as -modules this problem is not show up on Linux builds. to solve this

1.) we need to fool the linker by creating a .dll from libgabble-convenience and specify this in LIBADD when building plugins 

2.) split libgabble-convenience into another library which can be used with only plugins. in the current code, the plugins code is very tightly coupled with almost all the files in libgabble-convenience
Comment 1 Siraj Razick 2011-12-30 14:25:56 UTC
wip patch for 1.) in comment #1

for 2.) i'm checking what can be factored out..so we can build a smaller .dll file for windows with the symbols only required by gabble plugins.

comments are welcome at this point.. not sure which way is the best .
Comment 2 Siraj Razick 2011-12-30 14:27:35 UTC
see also : https://bugs.freedesktop.org/show_bug.cgi?id=44256
Comment 3 Olli Salli 2012-01-03 11:07:11 UTC
2) is the only reasonable way to go IMO.

Here's some relevant IRC conversation:

--

19:32 < siraj`> jonnylamb, we can't produce .dll files for the plugins since, not all the symbols are defined at link time :(

19:32 < siraj`> jonnylamb, which is a must when producing .dll files for windows

19:33 < siraj`> jonnylamb, right now.. the plugin releated code is all tied up into the gabble-binary it self.. and on linux it will work

19:33 < siraj`> but on windows.. it fails 

19:33 < siraj`> since it expects to find all the dll's 

19:34 < siraj`> jonnylamb, so we need to split the code needed by the plugins from the main binary and compile it as a .dll or a tiny library . like MC does for it's plugins

19:34 < siraj`> jonnylamb, right now.. every thing is just too coupled up.. :( 

19:39 < oggis_> jonnylamb: the key is that when on linux gabble loads a plugin with dlopen, any symbols from gabble the plugin needs are resolved from the already loaded executable image

19:39 < oggis_> jonnylamb: so they can be left unresolved in the .so when linking it

19:39 < oggis_> jonnylamb: but on windows, .dlls need to have all external symbols resolved to link properly

19:40 < oggis_> jonnylamb: so either the plugins need to link to all of gabble (libgabble-convenience), or we need to split out the plugin specific symbols (which are declared in the gabble/ headers)

19:41 < jonnylamb> ohhhhhhhhhhh

19:42 < jonnylamb> siraj`, oggis_: thanks, now I understand

19:42 < oggis_> for example, mission-control has a separate -plugins.so for this purpose

19:56 < jonnylamb> I'd actually forgotten how gabble & salut did plugins and forgot it didn't have a -plugins.so

--

This blocks the Ytstenut plugin porting to Windows.
Comment 4 Jonny Lamb 2012-01-04 04:54:23 UTC
Yes, please do create a -plugins.so shared library (plugins rather than runtime to be more like mission-control).

At the moment you're only building it on windows but we could just build it and require it for linux too, no? It would make the Makefiles less complex and mean that this path would actually be tested by people on linux instead of breaking and no-one noticing. What do you think?
Comment 5 Simon McVittie 2012-01-04 05:46:15 UTC
(In reply to comment #4)
> At the moment you're only building it on windows but we could just build it and
> require it for linux too, no? It would make the Makefiles less complex and mean
> that this path would actually be tested by people on linux instead of breaking
> and no-one noticing. What do you think?

I think this is a better idea than having it Windows-specific.
Comment 6 Olli Salli 2012-01-04 13:35:53 UTC
Mmm, it looks like this alone doesn't suffice to make Ytstenut (and I believe other plugins) work on Windows.

Siraj updated the branch to have a proof of concept setup where the plugins library has all of the symbols in the gabble convenience library, and both the gabble executable and the plugins link to the plugins library. However, it linked correctly, but he hit the following assert, which isn't hit on Linux:

g_return_val_if_fail (g_simple_async_result_is_valid (result,
    G_OBJECT (plugin), gabble_plugin_create_sidecar), NULL);

The corresponding GAsyncResult is created in the ytstenut plugin for gabble like this (removed SALUT ifdefs for clarity):

GSimpleAsyncResult *result = g_simple_async_result_new (G_OBJECT (plugin),
    callback, user_data,
    /* sic: all plugins share {salut,gabble}_plugin_create_sidecar_finish() so we
     * need to use the same source tag.
     */
      gabble_plugin_create_sidecar
    );

gdb shows that g_simple_async_result_is_valid returns false because... tada, the source tag is not the same.

I think this is because, even though both pull the actual function implementation from the same plugins .dll, in Windows function pointers to DLL functions actually point to import stub functions which are like:

int func_from_dll(double param) {
  static int (*real_func) (double) = NULL;
  if (!real_func) ptr = GetProcAddress(... "func_from_dll");
  return (*ptr)(param);
}

A bunch of these are linked in from a "DLL import library", which is a static library which is produced along with the DLL when linking one.

A similar construct in Linux is the ELF indirection table, but it exists only in final executables, not all intermediate DLLs etc, and is produced by the linker when creating an executable. Thus, there's only one copy of these functions. In short, the runtime linking in linux does a bit more work, but as a reward global symbols are actually really global with just one definition 

But in Windows, these are pulled in from the small static import libs already at link time. As such, there's a separate copy of these small functions for each library linking to external DLLs in a running executable... and hence function pointers to them have different values depending on where you refer to them :o.

Thus, the ytstenut plugin has no way to create the async result in such a way that its source tag would be correct when it is checked in gabble (plugin .dll fills it with address to its own stub function copy, gabble verifies it against address to the copy in the main .exe).

So I was wondering, is this a fundamental incompatibility of GSimpleAsyncResult with Windows .dlls, or are we just using it wrong? Anybody has a lead on somebody who would be a guru in GIO and Windows?
Comment 7 Olli Salli 2012-01-04 13:54:26 UTC
> So I was wondering, is this a fundamental incompatibility of GSimpleAsyncResult
> with Windows .dlls, or are we just using it wrong? Anybody has a lead on
> somebody who would be a guru in GIO and Windows?

Answering myself in a separate comment because otherwise this is likely TL; DR. However, THIS comment has my .02€ on how to solve it, the above one is just rationale and a description of the practical problem we're having. SO PLEASE READ AT LEAST THIS ONE!!

My opinion is that we're actually using GAsyncResult wrong here. The gabble plugin interface has some odd hidden assumptions to fulfill. Its (UNDOCUMENTED!) GabblePluginCreateSidecarImpl create_sidecar function receives a callback and user_data parameter. It must somehow magically know that they should be used as a callback for a GSimpleAsyncResult, with the source tag set to a gabble symbol.

Isn't the general idiom in GIO that the function which creates the GASR, sets itself as the source tag? This is not currently the case, as the plugin implementations create it but set it to a symbol from gabble. (again, not possible on Windows).

This would be fixed most straightforwardly by starting to create the GASR in gabble, in the gabble_plugin_create_sidecar wrapper function, or even before in gabble_plugin_loader_create_sidecar (and changing the expected source tag in that case). Then, we'd pass this GASR through the plugin API, and have the plugin just call _complete() on it... This would work even in Windows, because the GASR is created and hence the source tag set in the same linker output file as is the check for the source tag done.

But actually, wouldn't even more idiomatic be having the plugin interface contain both a create_sidecar virtual function returning a GSimpleAsyncResult, and a create_sidecar_finish virtual function which digs out the resulting GabblePlugin from the GASR? Yeah this is two virtual funcs instead of one, and the second one is 95% the same for all plugins, but it's just a few lines anyway, and this way would be much cleaner IMO.

Here are some solution alternatives, but the question is, can we make such changes - that is, are external parties depending on the Gabble plugin API staying backwards compatible?

Otherwise, we'll just need to admit that the source tags can't be sensible on windows, and remove the asserts checking for them, or at least pass NULL to the g_simple_async_result_is_valid call, in which case the source tag check is skipped.

But we also need to make a mental note of not adding more stuff to the plugin API where the plugin would need to create a GASR to be finish()ed by Gabble code or the other way around. Or is there already other things like this?
Comment 8 Olli Salli 2012-01-04 13:59:36 UTC
Oh and Siraj also added some

> printf ("<function name>: %p\n", (void *) &gabble_plugin_create_sidecar);

in the plugin and in gabble - and indeed, they print a different pointer. So indeed, they resolve to the indirection stub functions, not the actual function from the DLL... See for yourself:

ytstenut-DEBUG: ytst_plugin_init: 014C3180
gabble-DEBUG: plugin_loader_try_to_load (plugin-loader.c:99): loaded 'Ytstenut p
lugin' version 0.2.0 (c:\gabble\libytstenut-gabble.dll), implementing these side
cars: org.freedesktop.ytstenut.xpmn.Status
tp-glib-DEBUG: started version 0.15.3.1 (telepathy-glib version 0.17.4.1)
ytstenut-DEBUG: ytstenut_plugin_create_channel_managers: 014C3180 on connection
014E90D0
ytstenut_plugin_create_sidecar:  69205810
ytstenut-DEBUG: ytstenut_plugin_create_sidecar: created side car for: org.freede
sktop.ytstenut.xpmn.Status
gabble-DEBUG: gabble_plugin_create_sidecar_finish (plugin.c:136): 6659EE50

Just after this, the assert is hit, obviously...
Comment 9 Jonny Lamb 2012-01-05 00:12:46 UTC
Interesting findings, thanks for the write-up.

To be honest, I can't imagine anyone is using gabble plugins apart from the one(s) in tree and Ytstenut so I wouldn't mind just breaking API & ABI here. Special casing Windows or removing assertions sounds rubbish to me.

So yeah, renaming create_sidecar to create_sidecar_async and adding a create_sidecar_finish seems the best option to me. While we're at it we could add versioning to the (beginning of the) GabblePluginInterface struct. I did this for Salut already but nothing looks it yet. I guess if we looked at said version number we wouldn't be breaking ABI for Salut, no?
Comment 10 Simon McVittie 2012-01-05 02:47:30 UTC
(In reply to comment #7)
> But actually, wouldn't even more idiomatic be having the plugin interface
> contain both a create_sidecar virtual function returning a GSimpleAsyncResult,
> and a create_sidecar_finish virtual function which digs out the resulting
> GabblePlugin from the GASR?

Yes, this is idiomatic: for instance see GSocketConnection's implementation of close_async and close_finish (those are from a base class, not a GInterface, but the principle is the same). This is annoyingly verbose but is apparently considered to be The Right Thing.
Comment 11 Siraj Razick 2012-01-05 16:33:10 UTC
> But actually, wouldn't even more idiomatic be having the plugin interface
> contain both a create_sidecar virtual function returning a GSimpleAsyncResult,
> and a create_sidecar_finish virtual function which digs out the resulting
> GabblePlugin from the GASR? Yeah this is two virtual funcs instead of one, and
> the second one is 95% the same for all plugins, but it's just a few lines
> anyway, and this way would be much cleaner IMO.
>

one detail is not clear to me about this approach, 

can we move gabble_plugin_create_sidecar_finish(...) from plugin.c to it's own version of
xxx_plugin_create_sidecar_finish(...) per plugin ? so that create_sidecar_cb(..) of plugin-loader.c will do something like 
plugin_iface->create_sidecar_async_finish and get a pointer in return ?
Comment 12 Olli Salli 2012-01-06 02:47:52 UTC
(In reply to comment #11)
> > But actually, wouldn't even more idiomatic be having the plugin interface
> > contain both a create_sidecar virtual function returning a GSimpleAsyncResult,
> > and a create_sidecar_finish virtual function which digs out the resulting
> > GabblePlugin from the GASR? Yeah this is two virtual funcs instead of one, and
> > the second one is 95% the same for all plugins, but it's just a few lines
> > anyway, and this way would be much cleaner IMO.
> >
> 
> one detail is not clear to me about this approach, 
> 
> can we move gabble_plugin_create_sidecar_finish(...) from plugin.c to it's own
> version of
> xxx_plugin_create_sidecar_finish(...) per plugin ? so that
> create_sidecar_cb(..) of plugin-loader.c will do something like 
> plugin_iface->create_sidecar_async_finish and get a pointer in return ?

Almost! Generally, utility functions such as the current gabble_plugin_create_sidecar() are provided for all GInterface virtual methods, so that users don't need to dig into the interface vtable structure themself.

So, we'd have functions like

GAsyncResult *gabble_plugin_create_sidecar_async(plugin, /* other params */, callback, user_data)
{
   GabblePluginInterface *iface = GABBLE_PLUGIN_GET_INTERFACE (plugin);
   /* ... any applicable sanity checks with g_return_val_if_fail etc */
   return iface->create_sidecar_async(plugin, ..., callback, user_data);
}

GabbleSidecar *gabble_plugin_create_sidecar_finish(plugin, /* other params */)
{
   GabblePluginInterface *iface = GABBLE_PLUGIN_GET_INTERFACE (plugin);
   /* ... any applicable sanity checks with g_return_val_if_fail etc */
   return iface->create_sidecar_finish(plugin, /* other params */);
}

and in all the plugins, implement create_sidecar_async much like the plugins currently do create_sidecar, but
1) return the GAsyncResult to the caller so that it can call _finish on it later when its callback is invoked
2) use the plugin's implementation function's name (ytstenut_plugin_create_sidecar_async or alike) as the source tag instead of trying to use gabble_plugin_create_sidecar... which doesn't work as we've seen!

and yeah, pretty much just copy current gabble_plugin_create_sidecar_finish as ytstenut_plugin_create_sidecar_finish in all the plugins, and set this as the iface->create_sidecar_finish function... but adjust for the changed source tag in the validity check assert in each of them!

The versioning suggested by Jonny would be something like:
- add a integer member as the first item in GabblePluginInterface after GTypeInterface
- document somehow that the current version of the plugin API is 1
- in plugins, fill the interface struct member with the version of the API the plugin was written against
- in code, which calls into plugins... switch/ifelse based on the version member, to use the correct members and expect corresponding behavior for each version
- whenever there's an incompatible change to the plugin API, bump the current version number... but retain the code which calls the old API, for the benefit of old plugins

Note that it wouldn't make sense to try and claim the current Gabble API is version 1, and our fixed version which actually e.g. works on Windows at all would be version 2. This is because adding the version member is already an API/ABI break so existing plugins would need to be changed anyway. So this only helps for future changes, not this one.

Hence, this isn't required for our current Ytstenut efforts. So between Jonny/tp upstream and you whether/when to do this. For Ytstenut purposes, we can just leave it as a Gabble TODO item for now... In any case, I think it should be done in a separate branch, separate bug - and this one used to make the plugin API workable for Windows usage.
Comment 13 Olli Salli 2012-01-06 02:49:30 UTC
Btw, when we're moving symbols for the plugin API out of the main gabble code... we might want to even move the resulting source files from src/ into gabble/, for clarity. That's where the corresponding headers are - the only reason I can see for the impl for them currently being in src/ is that it's interspersed with other Gabble code there. Jonny, Simon, opinions?
Comment 14 Jonny Lamb 2012-01-06 02:59:49 UTC
(In reply to comment #13)
> Btw, when we're moving symbols for the plugin API out of the main gabble
> code... we might want to even move the resulting source files from src/ into
> gabble/, for clarity. That's where the corresponding headers are - the only
> reason I can see for the impl for them currently being in src/ is that it's
> interspersed with other Gabble code there. Jonny, Simon, opinions?

This sounds extremely reasonable. I believe the least was done to make the source tree useful to build against when uninstalled, so that's just moving said headers into some gabble/ directory somewhere.
Comment 15 Olli Salli 2012-01-06 09:01:12 UTC
Some clarifications:

(In reply to comment #12)
> 
> GAsyncResult *gabble_plugin_create_sidecar_async(plugin, /* other params */,
> callback, user_data)
> {
>    GabblePluginInterface *iface = GABBLE_PLUGIN_GET_INTERFACE (plugin);
>    /* ... any applicable sanity checks with g_return_val_if_fail etc */
>    return iface->create_sidecar_async(plugin, ..., callback, user_data);
> }

Which should replace current gabble_plugin_create_sidecar completely. plugin-loader should start calling this one instead of it.

> 
> GabbleSidecar *gabble_plugin_create_sidecar_finish(plugin, /* other params */)
> {
>    GabblePluginInterface *iface = GABBLE_PLUGIN_GET_INTERFACE (plugin);
>    /* ... any applicable sanity checks with g_return_val_if_fail etc */
>    return iface->create_sidecar_finish(plugin, /* other params */);
> }
> 

Should be modified from current gabble_plugin_create_sidecar_finish. plugin-loader continues to call this function in the exactly same fashion as before - it's just its implementation guts which have been scattered over each plugin.
Comment 16 Siraj Razick 2012-01-06 13:44:48 UTC
First Patch with the proposed API changes + change applied to one plugin (test.c). if the Change is ok, I can change the other plugins as well. So r? please
Comment 17 Olli Salli 2012-01-07 11:58:50 UTC
(In reply to comment #16)
> First Patch with the proposed API changes + change applied to one plugin
> (test.c). if the Change is ok, I can change the other plugins as well. So r?
> please

-typedef void (*GabblePluginCreateSidecarImpl) (
+typedef GAsyncResult * (*GabblePluginCreateSidecarImpl) (

I'd additionally add "Async" to the name of the typedef to match the changed naming of the interface member

+  if (iface->create_sidecar_finish == NULL)
+    return NULL;
+

This might come a bit out of the blue to a caller. At least log a warning here?

Additionally, it might make sense to also verify that the plugin implements create_sidecar_finish, in gabble_plugin_create_sidecar_async (it already checks that the plugin implements create_sidecar_async). Because a plugin can never sensibly implement one without the other - so a create_sidecar_async operation should fail outright, if the corresponding _finish is not implemented.

Other than these minor adjustments, looks perfect! Please carry on to porting the other plugins. We should also cook up and attach a patch for the Ytstenut plugin and attach it to bug 44447.

Btw, given that you're changing the Gabble plugin API, and making the corresponding updates to a few plugins... could you briefly describe the required changes for plugins in NEWS, for the benefit of other out of three plugins besides Ytstenut? I know of at least one. We could make their porting effort a breeze with this information!
Comment 18 Jonny Lamb 2012-01-09 01:53:14 UTC
There's more wrong with this patch:

> Change the pluing API to create_sidecar_async and create_sidecar_finish

Perfect?!

You've only updated the test plugin; surely you also need to update the gateways and console plugins too?

Why are you returning the GAsyncResult in the create_sidecar_async function? That's really weird.

You should store the GAsyncResult in the plugin's private struct so create_sidecar_async isn't called twice on the plugin.

-    GabblePluginCreateSidecarImpl create_sidecar;
+     GabblePluginCreateSidecarFinishImpl create_sidecar_finish;

Wrong indentation.

   g_object_unref (result);
+  return g_object_ref (sidecar);

What is this I don't even? Store the GAsyncResult in the private struct of the plugin, complete it (in an idle if necessary to make sure it re-enters the mainloop) and unref & clear it straight afterwards. Returning a new ref to the sidecar is correct, though.
Comment 19 Jonny Lamb 2012-01-09 01:57:38 UTC
(In reply to comment #16)
> First Patch with the proposed API changes + change applied to one plugin
> (test.c). if the Change is ok, I can change the other plugins as well. So r?
> please

Sorry I read this as that if this is right you'll be pushing the same change into Salut and other Ytstenut stuff, please ignore my comments about not having touched the other plugins!
Comment 20 Olli Salli 2012-01-09 06:19:35 UTC
(In reply to comment #18)
> Why are you returning the GAsyncResult in the create_sidecar_async function?
> That's really weird.

Ah, you're absolutely right here; this is not GiOdiomatic at all. My bad! I guess this misconception of mine came from the Qt APIs, where we return pending operation / job objects, and you can use the pointer to them to associate a particular invocation with some context. But in GIO style, you use the user_data for passing context.

So Siraj, the return type of the _async functions can be changed to void. That's 
fairly simple as we don't even use the return value anywhere.

> 
>    g_object_unref (result);
> +  return g_object_ref (sidecar);
> 
> What is this I don't even? Store the GAsyncResult in the private struct of the
> plugin, complete it (in an idle if necessary to make sure it re-enters the
> mainloop) and unref & clear it straight afterwards. Returning a new ref to the
> sidecar is correct, though.

We don't even have to store it at all for the plugins I've looked at, because they all actually complete the operation already in the _async function starting it, through the mainloop as you say.

So Siraj: for this plugin and ytstenut, you can just keep the g_object_unref in the operation start function, after the call to g_simple_async_result_complete_in_idle() as it were. That's fairly natural once the return type is void... because then the only pointer in scope is held by GLib internals, so we must've dropped all of our object references at that point as well.

for any plugins which actually do some further asynchronous steps before completing the op: please store the op as Jonny says. But in any case, unref it immediately after the call to the g_simple_async_result_complete(_in_idle) function and setting it back to NULL if stored.

Additional observation:

+  iface->create_sidecar_async = test_plugin_create_sidecar;

would be more consistent to rename the implementation func to have _async suffix as well?
Comment 21 Siraj Razick 2012-01-09 13:26:38 UTC
updated patch which address the review comments.

1.) all gabble plugins updated to use the same API

2.) reverts back to "void gabble_plugin_create_sidecar_async(...)

3.) fail the operation if gabble_plugin_create_sidecar_finish is NULL and also additional warning when gabble_plugin_create_sidecar_finish is called and iface->create_sidecar_finish is NULL.

4.) other proposed minor fixes + news item
Comment 22 Jonny Lamb 2012-01-10 08:34:41 UTC
> Change the pluing API to create_sidecar_async and create_sidecar_finish

Please fix the commit message.

Looks good otherwise.
Comment 23 Jonny Lamb 2012-02-01 10:44:58 UTC
Okay, merged this for you.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.