Bug 14121

Summary: Hal enabled xorg-server crash on dbus/hal restart, leaving distorted image
Product: xorg Reporter: Sascha Hlusiak <bugs>
Component: Server/Input/CoreAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: elylevy
Version: 7.3 (2007.09)   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 12560    
Attachments:
Description Flags
Doublecheck all connections before DBus kills us with an assert.
none
patches to stop dbus from bringing down the server. none

Description Sascha Hlusiak 2008-01-18 03:32:14 UTC
I am Using gentoo's xorg-server-1.4.0.90-r1 and using hal to automatically add devices which is working fine so far.

When I restart dbus (which restarts hal), my running X-Server crashes. KDM comes up again but the keyboard is not working anymore and the terminals (vt1-vt6) show red lines from top to bottom but no image. I'm using xf86-video-i810 on the 855gm chipset (The intel image distortion bug is reported somewhere else). 

It's unacceptable that it crashes but it's also unacceptable that it does not recover from the crash and I need to reboot. 

The /var/log/kdm.log contains this after the crash:

libhal.c 3506 : Error unsubscribing to signals, error=Connection is closed
process 2003: arguments to dbus_connection_get_dispatch_status() were
incorrect, assertion "connection != NULL" fail
ed in file dbus-connection.c line 4049.
This is normally a bug in some application using the D-Bus library.
  D-Bus not built with -rdynamic so unable to print a backtrace
Comment 1 E L 2008-01-23 08:37:53 UTC
happens to me as well.
In my case it does recover after the crash but the keyboard map is acting weird until I restart dbus/hal.

I use xserver 1.4.0.90 with dbus 1.1.4 and hal 0.5.10 
And I'm also on gentoo.
Comment 2 Peter Hutterer 2008-01-24 19:42:41 UTC
Created attachment 13915 [details] [review]
Doublecheck all connections before DBus kills us with an assert.

Please try the attached patched, they stop the errors for me.

Problem is simply the libdbus insists on using assert to kill us all if something isn't quite right. So if we try to shutdown hal when the connection is already down, libdbus kills us.
If we try to get the dispatch status of the current message and the connection is already down, libdbus kills us.
Comment 3 Daniel Stone 2008-01-24 20:00:58 UTC
Thanks Peter! They mostly look good, but the first one needs to also have an if (info->connection) guard _inside_ the do { } while block, else you get:
info->connection is NULL
do:
   dbus_blah(info->connection);
     ha ha, assert!

I'd actually fixed this in my Nokia tree, but forgot to cherry-pick the fix.  Please feel free to merge, and thanks again for picking up after me while I'm sans connectivity. :)
Comment 4 Peter Hutterer 2008-01-24 20:26:58 UTC
(In reply to comment #3)
> Thanks Peter! They mostly look good, but the first one needs to also have an if
> (info->connection) guard _inside_ the do { } while block, else you get:
> info->connection is NULL
> do:
>    dbus_blah(info->connection);
>      ha ha, assert!

I don't think so. Look at the context

if (info-connection != NULL)
   do
       dbus_oh_god_please_dont_kill_me_please_please_please()
   while(info->connection)

the first run is guarded and from then on the loop protects us.
Comment 5 Daniel Stone 2008-01-24 20:42:14 UTC
Oh, right, I'm just being daft.  Must've just meant to cherry-pick the while (info->connection) fix then, 'cause you can get disconnected after a dispatch call.  Thanks!
Comment 6 Sascha Hlusiak 2008-01-25 04:17:08 UTC
The patches fix it only half way for me. Stopping dbus+hald is no problem anymore, but when starting dbus again, X shuts down after about 1 second. I find this message in the log file:

[config/hal] couldn't initialise context: (null) ((null))
Comment 7 Peter Hutterer 2008-01-25 13:44:44 UTC
(In reply to comment #6)
> The patches fix it only half way for me. Stopping dbus+hald is no problem
> anymore, but when starting dbus again, X shuts down after about 1 second. I
> find this message in the log file:
> 
> [config/hal] couldn't initialise context: (null) ((null))
> 

Yeah, I found that yesterday too but ran out of time to fix it. the dbus API doesn't init properly again either. I'll get the patch out ASAP.
Comment 8 Peter Hutterer 2008-01-25 15:33:53 UTC
Created attachment 13951 [details] [review]
patches to stop dbus from bringing down the server.
Comment 9 Peter Hutterer 2008-01-25 15:39:41 UTC
(In reply to comment #6)
> The patches fix it only half way for me. Stopping dbus+hald is no problem
> anymore, but when starting dbus again, X shuts down after about 1 second. I
> find this message in the log file:
> 
> [config/hal] couldn't initialise context: (null) ((null))

Ok, I had a look but I couldn't reproduce your bug. The errors I got was from re-initialising the dbus-api, they are fixed with the new patch.

I think you're hitting a race condition. If hal isn't up yet but dbus is running, then initialising the hal layer will fail. 

This shouldn't bring down the server though, so anything reproducable (or more extensive log messages) would help greatly. if you define DEBUG when compiling xserver/config/ then there should me more spew to the console.

Comment 10 Sascha Hlusiak 2008-01-27 11:09:02 UTC
Thanks, attachment 13951 [details] [review] seems to fix it. I tried a lot with restarting dbus and hald and I can't reproduce the bug anymore.

If dbus is up but hald is not, it seems not to connect to hal anymore, even if it comes up later. So restarting dbus (which itself stops and starts hald on gentoo) can cause a tiny race condition, if you care. But it's not a crasher.
Comment 11 Peter Hutterer 2008-01-28 14:55:03 UTC
(In reply to comment #10)
> Thanks, attachment 13951 [details] [review] seems to fix it. I tried a lot with restarting dbus
> and hald and I can't reproduce the bug anymore.
> 
> If dbus is up but hald is not, it seems not to connect to hal anymore, even if
> it comes up later. So restarting dbus (which itself stops and starts hald on
> gentoo) can cause a tiny race condition, if you care. But it's not a crasher.
> 

pushed as commits 
975ab11799c819a81da1dfe83505194410dbcb95
7dde5a694a06efed0a9186f05d33f5be6f5dba71
f0ba7707161b8866e6fde32d6f25be6afcdecb48
2cb0ebec2b85d96289c23c17cfdcdf97ef6877d2

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.