Bug 23213 - Xserver terminates immediately without running hald (regression 1.6.2 --> 1.6.3)
Xserver terminates immediately without running hald (regression 1.6.2 --> 1.6.3)
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Server/General
unspecified
Other All
: medium critical
Assigned To: Havoc Pennington
John (J5) Palmieri
:
: 23850 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-08 02:59 UTC by Stefan Dirsch
Modified: 2009-09-10 13:23 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
0001-config-don-t-shutdown-the-libhal-ctx-if-it-failed-to.patch (2.17 KB, patch)
2009-08-13 16:51 UTC, Peter Hutterer
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Dirsch 2009-08-08 02:59:17 UTC
After updating xorg-server from 1.6.2 to 1.6.3 the Xserver terminates immediately if hald not running (yet).

# X -verbose 7
[...]
(EE) config/hal: couldn't initialise context: unknown error (null)
process 30383: Attempt to remove filter function 0x7f6fa1696f90 user data
0xadbdc0, but no such filter has been added
  D-Bus not built with -rdynamic so unable to print a backtrace
Aborted

Suspicious commits:
Comment 1 Stefan Dirsch 2009-08-08 03:04:12 UTC
Suspicious commits:

commit 546f913ff5461dd93d4a0b29b24d2267557326c7
Author: Alan Coopersmith <alan.coopersmith@sun.com>
Date:   Fri May 8 21:31:01 2009 -0700

    Don't printf NULL pointers on HAL connection error

commit c941479ecc2dead9c3deaee2620c9b9518c3da9a
Author: Rémi Cardona <remi@gentoo.org>
Date:   Mon Jul 27 12:07:51 2009 +0200

    config: add HAL error checks

Comment 2 Stefan Dirsch 2009-08-09 12:54:32 UTC
Reverting

commit c941479ecc2dead9c3deaee2620c9b9518c3da9a
Author: Rémi Cardona <remi@gentoo.org>
Date:   Mon Jul 27 12:07:51 2009 +0200

    config: add HAL error checks

fixes the issue.
Comment 3 Rémi Cardona 2009-08-09 22:40:56 UTC
Will look into it ASAP.

Thanks
Comment 4 Peter Hutterer 2009-08-12 18:41:34 UTC
I can't reproduce this with 1.6.3 in fedora or git master. Anything special I need aside from shutting down HAL?
Comment 5 Stefan Dirsch 2009-08-12 19:24:35 UTC
No, hald not running is enough.
Comment 6 Stefan Dirsch 2009-08-12 19:40:30 UTC
Oops. I just noticed that various hald 'helpers' are running when one starts the hald init script (at least on SuSE):

/usr/sbin/hald --daemon=yes        
hald-runner         
hald-addon-usb-csr: listening on 'MX1000 Laser Mouse'    
hald-addon-input: Listening on /dev/input/event0 /dev/input/event4 \
  /dev/input/event3    
hald-addon-storage: polling /dev/sr0 (every 16 sec)    
/usr/lib/hal/hald-addon-cpufreq         
hald-addon-acpi: listening on acpid socket /var/run/acpid.socket

If you kill either hald or hald-runner the issue occurs. Killing the others doesn't trigger the issue. Hope this helps.
Comment 7 Rémi Cardona 2009-08-12 23:52:17 UTC
Like Peter, I can't reproduce this on Gentoo unstable.

Could you try getting a backtrace when the server aborts?

Thanks
Comment 8 Stefan Dirsch 2009-08-13 09:10:35 UTC
(EE) config/hal: couldn't initialise context: unknown error (null)
process 9948: Attempt to remove filter function 0xb7f9cb60 user data 0x845b7c0, but no such filter has been added
  D-Bus not built with -rdynamic so unable to print a backtrace

Program received signal SIGABRT, Aborted.
0xffffe424 in __kernel_vsyscall ()
(gdb) bt
#0  0xffffe424 in __kernel_vsyscall ()
#1  0xb7c2c0bf in raise () from /lib/libc.so.6
#2  0xb7c2d9d7 in abort () from /lib/libc.so.6
#3  0xb7f7a6f5 in ?? () from /lib/libdbus-1.so.3
#4  0xb7f75da1 in ?? () from /lib/libdbus-1.so.3
#5  0xb7f59528 in dbus_connection_remove_filter () from /lib/libdbus-1.so.3
#6  0xb7f9c57b in libhal_ctx_shutdown () from /usr/lib/libhal.so.1
#7  0x080c6645 in connect_and_register (connection=0x8459af8, info=0x829beac) at hal.c:530
#8  0x080c698c in connect_hook (connection=0x8459af8, data=0x829beac) at hal.c:639
#9  0x080c5195 in connect_to_bus () at dbus-core.c:172
#10 0x080c51e2 in reconnect_timer (timer=0x82a4418, time=3807643, arg=0x0) at dbus-core.c:191
#11 0x08177073 in DoTimer (timer=0x82a4418, now=3807643, prev=0x829f44c) at WaitFor.c:425
#12 0x081767d4 in WaitForSomething (pClientsReady=0x8458f38) at WaitFor.c:277
#13 0x0808d9a1 in Dispatch () at dispatch.c:367
#14 0x080717a8 in main (argc=1, argv=0xbff67184, envp=0xbff6718c) at main.c:397
(gdb) 
Comment 9 Rémi Cardona 2009-08-13 15:07:05 UTC
Unless the X code or HAL is mis-using the dbus API, this looks like a dbus bug.

Thanks
Comment 10 Alan Coopersmith 2009-08-13 15:17:09 UTC
Since dbus also uses freedesktop.org, changing from Xorg:NOTOURBUG,
to dbus.
Comment 11 Peter Hutterer 2009-08-13 16:51:55 UTC
Created attachment 28609 [details] [review]
0001-config-don-t-shutdown-the-libhal-ctx-if-it-failed-to.patch

untested patch!

i think this should address the issue. I doubt it's a dbus problem though since it showed up with 1.6.2 to 1.6.3.
Anyway, looks like we're calling libhal_ctx_shutdown even if libhal_ctx_init and we might not be supposed to do that. Can you give this patch a try to see if it fixes the issue?
Comment 12 Stefan Dirsch 2009-08-13 17:32:21 UTC
Thanks, Peter. Your patch fixes the issue again. With your new patch I no longer need to revert commit c941479ecc2dead9c3deaee2620c9b9518c3da9a.
Comment 13 Peter Hutterer 2009-08-13 18:02:46 UTC
Pushed as 49046088f10cceaea7da97401d742d3fb59371f5. Thanks for testing.

Please nominate this patch for the 1.6 branch
Comment 14 Stefan Dirsch 2009-08-13 18:14:40 UTC
Thanks, Peter. How does nominating a patch currently work?
Comment 15 Peter Hutterer 2009-08-13 18:20:14 UTC
Just add it to http://www.x.org/wiki/Server16Branch
Comment 16 Stefan Dirsch 2009-08-13 21:39:20 UTC
Thanks. Nomination for 1.6 branch is done now.
Comment 17 Alan Coopersmith 2009-09-10 13:23:50 UTC
*** Bug 23850 has been marked as a duplicate of this bug. ***