Bug 41233 - Failure when connecting to two services provided by the same binary using autostart
Summary: Failure when connecting to two services provided by the same binary using aut...
Status: RESOLVED NOTOURBUG
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.4.x
Hardware: Other All
: medium normal
Assignee: Havoc Pennington
QA Contact: John (J5) Palmieri
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-26 07:42 UTC by Sam Thursfield
Modified: 2011-09-28 14:27 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Testcase (requires tumbler) (1.68 KB, text/x-csrc)
2011-09-26 07:42 UTC, Sam Thursfield
Details

Description Sam Thursfield 2011-09-26 07:42:23 UTC
Created attachment 51623 [details]
Testcase (requires tumbler)

I've noticed this failure when connecting to tumblerd[1], which is a thumbnail manager implementing the GNOME thumbnail spec[2]. The test case connects to two separate interfaces using g_dbus_proxy_new_sync(); in the case that tumblerd is already running it success buts in the case that it is not, the following occurs:

sam@candylion:~$ ./dbus-start-service-by-name

** ERROR **: Unable to get manager interface: Error calling StartServiceByName for org.freedesktop.thumbnails.Thumbnailer1: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process /usr/lib/tumbler-1/tumblerd exited with status 1
aborting...
Aborted

sam@candylion:~$ ps aux | grep tumblerd
sam      21164  0.1  0.7  37144  8028 ?        Sl   15:06   0:00 /usr/lib/tumbler-1/tumblerd
sam      21197  0.0  0.0   4160   844 pts/5    S+   15:08   0:00 grep tumblerd

Note it's the *second* call to g_dbus_proxy_new_sync() that fails - I guess that both calls are running StartServiceByName() for the interface so two instances of the tumblerd binary are launched. The second races the first for ownership and so it loses and returns failure.

It seems like this the dbus-daemon at fault for not realising that it's starting the same binary twice.

[1] http://git.xfce.org/xfce/tumbler/
[2] https://live.gnome.org/ThumbnailerSpec
Comment 1 Thiago Macieira 2011-09-26 08:05:37 UTC
Can you please check the following: is it possible that second instance of tumblerd exits before the first one has acquired the name being called?

Also, the others might correct me here, but I think that when D-Bus starts a service, it needs the process it started to acquire the name. Another process acquiring it might not be supported.
Comment 2 Simon McVittie 2011-09-26 09:11:34 UTC
(In reply to comment #1)
> Also, the others might correct me here, but I think that when D-Bus starts a
> service, it needs the process it started to acquire the name. Another process
> acquiring it might not be supported.

We should at least support the case where the process that was started has the name owner as a child, and doesn't exit until the name owner does so (so you can put a shell script wrapper around the real name owner without exec'ing it - which you have to do if the shell script wrapper needs to do cleanup after the real name owner has gone away).

In current dbus (>= 1.4) we also support (but do not recommend) services that daemonize and exit 0 before taking their name.

Combining those two can produce a nasty workaround for this, but it's not exactly a good thing.

Which version of dbus is this? If it's 1.4.8 or later, this is probably a regression caused by fixing Bug #35750.
Comment 3 Simon McVittie 2011-09-26 09:16:03 UTC
From the commit that I suspect broke this:

>   While I think it was broken for the service files to be changed
>   to Exec=/bin/false, we shouldn't be doing something here that's
>   not in the spec either.

Colin, do "GNOME 3 + systemd systems" (which I assume is secret code for "systems with current Fedora patches") still rely on this?
Comment 4 Sam Thursfield 2011-09-26 10:01:03 UTC
(In reply to comment #2)

> Which version of dbus is this? If it's 1.4.8 or later, this is probably a
> regression caused by fixing Bug #35750.

D-Bus 1.4.6

I'll get to the other questions this evening
Comment 5 Sam Thursfield 2011-09-27 01:47:54 UTC
> Can you please check the following: is it possible that second instance of
> tumblerd exits before the first one has acquired the name being called?

The second instance fails on acquiring the .Cache name which is the first one it tries to acquire; it gives "Error: Another thumbnail cache service is already running".

> Also, the others might correct me here, but I think that when D-Bus starts a
> service, it needs the process it started to acquire the name. Another process
> acquiring it might not be supported.

I think that's what we want to happen, the problem is that we want to make sure D-Bus doesn't launch two conflicting instances of the daemon.

It's hard to see an easy solution to this bug since each bus name is specified in a different .service file at the moment. dbus-daemon needs a way of knowing that "this binary I am executing will provide 3 different names" and behaving appropriately in further calls to StartServiceByName before the process has initialised.
Comment 6 Thiago Macieira 2011-09-27 02:15:59 UTC
(In reply to comment #5)
> > Can you please check the following: is it possible that second instance of
> > tumblerd exits before the first one has acquired the name being called?
> 
> The second instance fails on acquiring the .Cache name which is the first one
> it tries to acquire; it gives "Error: Another thumbnail cache service is
> already running".

Problem found then. This program needs to be fixed so that the instance launched does not exit until the other one is answering on D-Bus. If it exits too soon, the daemon doesn't know where to send the message.

Sounds like NOTOURBUG to me.

> I think that's what we want to happen, the problem is that we want to make sure
> D-Bus doesn't launch two conflicting instances of the daemon.
> 
> It's hard to see an easy solution to this bug since each bus name is specified
> in a different .service file at the moment. dbus-daemon needs a way of knowing
> that "this binary I am executing will provide 3 different names" and behaving
> appropriately in further calls to StartServiceByName before the process has
> initialised.

Not really, and especially since we're talking about freedesktop.org generic services. A different implementation may have them in different binaries, so we can't join them together.

I'm against adding complexity to the bus daemon if we can easily fix this in the application.
Comment 7 Sam Thursfield 2011-09-27 03:18:43 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > > Can you please check the following: is it possible that second instance of
> > > tumblerd exits before the first one has acquired the name being called?
> > 
> > The second instance fails on acquiring the .Cache name which is the first one
> > it tries to acquire; it gives "Error: Another thumbnail cache service is
> > already running".
> 
> Problem found then. This program needs to be fixed so that the instance
> launched does not exit until the other one is answering on D-Bus. If it exits
> too soon, the daemon doesn't know where to send the message.
> 
> Sounds like NOTOURBUG to me.

I altered tumbler to return 0 when the name is not available and this does fix the problem. It seems a bit of an ugly solution but I accept that it's far simpler than the alternative fix inside dbus. I'll file a bug on Tumbler about it and see what they say.
Comment 8 Sam Thursfield 2011-09-27 03:42:44 UTC
https://bugzilla.xfce.org/show_bug.cgi?id=8001
Comment 9 Simon McVittie 2011-09-27 04:02:24 UTC
(In reply to comment #6)
> Problem found then. This program needs to be fixed so that the instance
> launched does not exit until the other one is answering on D-Bus.

I think waiting for the other one like this is a better solution than exiting 0 immediately. The "contract" that activated services were initially meant to provide is something like: "when dbus-daemon starts the executable named by Exec, it will arrange for the Service to be provided, or exit nonzero if it can't". We later diluted that by adding the "if you exit 0 we'll keep waiting" thing for apps that daemonize, but I think that's a workaround, and correct services shouldn't exit until the desired functionality has become available.
Comment 10 Jannis Pohlmann 2011-09-27 16:01:51 UTC
(In reply to comment #9)
> (In reply to comment #6)
> > Problem found then. This program needs to be fixed so that the instance
> > launched does not exit until the other one is answering on D-Bus.
> 
> We later diluted that by adding the "if you exit 0 we'll keep waiting"
> thing for apps that daemonize, but I think that's a workaround, and correct
> services shouldn't exit until the desired functionality has become available.

I would like to fix this in tumbler properly, so can you explain which of the two instances spawned should wait and for it should wait? 

If I understood correctly, we have two tumbler instances here that are spawned for different interfaces. The first instance is supposed to serve .Cache1, the second one is supposed to serve .Thumbnailer1. When the first instance is started, it assumes ownership of .Cache1 and .Thumbnailer1. The second instance spawned almost at the same time tries to acquire the same and fails because the first one already owns the names at this point. 

Gracefully exiting the second instance with error code 0 sounds wrong. The second instance fails to acquire the name and the bus and this is unexpected, so IMHO it needs to return an error. But who is to wait on whom and how? Any pointers to services already implementing this?
Comment 11 Thiago Macieira 2011-09-27 17:13:50 UTC
(In reply to comment #10)
> I would like to fix this in tumbler properly, so can you explain which of the
> two instances spawned should wait and for it should wait? 

Both spawned instances should wait. If the instance is spawned by D-Bus, it should not exit until the service name being started has become available on the bus.

One way of doing that is to connect to the bus and wait for the name to appear. Another is to use other mechanisms to be informed that the service is up and running (filesystem lock, PID file, etc.).

If the application being started may be due to multiple service names, you should wait for all of them to become available. Usually the startup sequence of an application is constant, so you can wait for the last one acquried to become available.

> If I understood correctly, we have two tumbler instances here that are spawned
> for different interfaces. The first instance is supposed to serve .Cache1, the
> second one is supposed to serve .Thumbnailer1. When the first instance is
> started, it assumes ownership of .Cache1 and .Thumbnailer1. The second instance
> spawned almost at the same time tries to acquire the same and fails because the
> first one already owns the names at this point. 

From what I understand, the same process (tumblerd) provides both services. When the first instance is started by D-Bus, it appears to daemonise itself. Whether it is properly waiting for the service to be available or it returned 0 and D-Bus is being nice, evidence hasn't shown.

But the second instance started runs into a filesystem lock and exits with code 1.

That was a race condition. It should be solved by having tumblerd detect the filesystem lock and wait for the service to become available before exiting with code 0. If the lock goes away with the service not becoming available, then this instance (the second) should daemonise and provide the services.

> Gracefully exiting the second instance with error code 0 sounds wrong. The
> second instance fails to acquire the name and the bus and this is unexpected,
> so IMHO it needs to return an error. But who is to wait on whom and how? Any
> pointers to services already implementing this?

It should return a code indicating whether the service is available, not whether it managed to start the service. That means starting it over and over again when the service is available should return 0 all the time.
Comment 12 Jannis Pohlmann 2011-09-28 04:56:45 UTC
(In reply to comment #11)
> From what I understand, the same process (tumblerd) provides both services.
> When the first instance is started by D-Bus, it appears to daemonise itself.
> Whether it is properly waiting for the service to be available or it returned 0
> and D-Bus is being nice, evidence hasn't shown.

No, it doesn't daemonise. It's activated by D-Bus, calls dbus_bus_request_name() (with DBUS_NAME_FLAG_DO_NOT_QUEUE) on org.freedesktop.thumbnails.Thumbnailer1, .Cache1 and .Manager1. and if any of these requests return something != DBUS_REQUEST_NAME_REPLY_PRIMARY_OWNER, it exits with an error. If these requests succeed, it registers GObjects for each of the corresponding D-Bus paths and then enters a GLib main loop.

> But the second instance started runs into a filesystem lock and exits with code
> 1.
> 
> That was a race condition. It should be solved by having tumblerd detect the
> filesystem lock and wait for the service to become available before exiting
> with code 0. If the lock goes away with the service not becoming available,
> then this instance (the second) should daemonise and provide the services.

So what's the D-Bus way to do this? When I wrote tumbler, simply quitting the second instance seemed to make a lot of sense for two reasons: (1) the first instance may already be busy generating thumbnails and (2) what to wait for if there already is an instance of the service(s) in question?
 
> > Gracefully exiting the second instance with error code 0 sounds wrong. The
> > second instance fails to acquire the name and the bus and this is unexpected,
> > so IMHO it needs to return an error. But who is to wait on whom and how? Any
> > pointers to services already implementing this?
> 
> It should return a code indicating whether the service is available, not
> whether it managed to start the service. That means starting it over and over
> again when the service is available should return 0 all the time.

Ok, so wouldn't it be fine for the second instance to quit if dbus_bus_request_name() returns DBUS_REQUEST_NAME_REPLY_EXISTS?
Comment 13 Thiago Macieira 2011-09-28 07:20:19 UTC
(In reply to comment #12)
> (In reply to comment #11)
> No, it doesn't daemonise. It's activated by D-Bus, calls
> dbus_bus_request_name() (with DBUS_NAME_FLAG_DO_NOT_QUEUE) on
> org.freedesktop.thumbnails.Thumbnailer1, .Cache1 and .Manager1. and if any of
> these requests return something != DBUS_REQUEST_NAME_REPLY_PRIMARY_OWNER, it
> exits with an error. If these requests succeed, it registers GObjects for each
> of the corresponding D-Bus paths and then enters a GLib main loop.

Make sure you register all objects before you start processing the D-Bus socket. You don't want to reply with UnknownObject to calls during the startup sequence.

> So what's the D-Bus way to do this? When I wrote tumbler, simply quitting the
> second instance seemed to make a lot of sense for two reasons: (1) the first
> instance may already be busy generating thumbnails and (2) what to wait for if
> there already is an instance of the service(s) in question?

I believe this is the way:

> Ok, so wouldn't it be fine for the second instance to quit if
> dbus_bus_request_name() returns DBUS_REQUEST_NAME_REPLY_EXISTS?
Comment 14 Jannis Pohlmann 2011-09-28 14:27:03 UTC
(In reply to comment #13)
> Make sure you register all objects before you start processing the D-Bus
> socket. You don't want to reply with UnknownObject to calls during the startup
> sequence.

Done.

> > So what's the D-Bus way to do this? When I wrote tumbler, simply quitting the
> > second instance seemed to make a lot of sense for two reasons: (1) the first
> > instance may already be busy generating thumbnails and (2) what to wait for if
> > there already is an instance of the service(s) in question?
> 
> I believe this is the way:
> 
> > Ok, so wouldn't it be fine for the second instance to quit if
> > dbus_bus_request_name() returns DBUS_REQUEST_NAME_REPLY_EXISTS?

Ok, thanks Thiago! I implemented this as well, so tumbler should behave correctly now.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.