Bug 82957 - Weston crash when weston-desktop-shell and weston-keyboard missing
Summary: Weston crash when weston-desktop-shell and weston-keyboard missing
Status: RESOLVED FIXED
Alias: None
Product: Wayland
Classification: Unclassified
Component: weston (show other bugs)
Version: 1.5.0
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Pekka Paalanen
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-22 16:31 UTC by Pekka Paalanen
Modified: 2014-09-01 06:59 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Pekka Paalanen 2014-08-22 16:31:55 UTC
If weston cannot find both weston-desktop-shell and weston-keyboard, it will crash as below. In my tests, I could not reproduce the crash if only one of them was missing.

[19:28:39.062] Loading module '/home/pq/local/lib/weston/desktop-shell.so'
[19:28:39.071] Loading module '/home/pq/local/lib/weston/xwayland.so'
[19:28:39.084] unlinking stale lock file /tmp/.X1-lock
[19:28:39.085] xserver listening on display :1
[19:28:39.098] launching '/home/pq/local/libexec/weston-desktop-shell'
[19:28:39.099] [19:28:39.099] input_method died, respawning...
compositor: executing '/home/pq/local/libexec/weston-desktop-shell' failed: No such file or directory
[19:28:39.099] launching '/home/pq/local/libexec/weston-keyboard'
[19:28:39.100] /home/pq/local/libexec/weston-desktop-shell died, respawning...
[19:28:39.100] launching '/home/pq/local/libexec/weston-desktop-shell'
[19:28:39.100] compositor: executing '/home/pq/local/libexec/weston-keyboard' failed: No such file or directory
[19:28:39.101] compositor: executing '/home/pq/local/libexec/weston-desktop-shell' failed: No such file or directory

Program received signal SIGSEGV, Segmentation fault.
0x000003e8000037eb in ?? ()
(gdb) bt
#0  0x000003e8000037eb in ?? ()
#1  0x00007ffff7bd231f in wl_signal_emit (data=0xab1a80, signal=0xab1af0) at src/wayland-server.h:261
#2  wl_client_destroy (client=0xab1a80) at src/wayland-server.c:667
#3  0x00007ffff7bd24e9 in wl_client_connection_data (fd=<optimized out>, mask=<optimized out>, data=0xab1a80)
    at src/wayland-server.c:348
#4  0x00007ffff7bd43e0 in wl_event_loop_dispatch (loop=0x6281c0, timeout=<optimized out>) at src/event-loop.c:419
#5  0x00007ffff7bd2a65 in wl_display_run (display=0x628130) at src/wayland-server.c:986
#6  0x0000000000408774 in main (argc=1, argv=0x7fffffffdcf8) at src/compositor.c:4420
Comment 1 Boyan Ding 2014-08-23 06:13:24 UTC
Well, the problem here is a quite tricky race condition.

If weston-desktop-shell isn't found (or dies), desktop-shell will do two things concurrently -- emitting the client's destroy signal and try to launch shell client again. If the destroy signal is emitted first, everything is okay. If a new shell client is launched first (calling launch_desktop_shell_process), things will go wrong because the shell->child.client_destroy_listener is registered the second time, breaking the listening list of the dead client's destroy signal. When destroy signal is finally emitted, iteration on the broken list will cause illegal invocation and the program will crash.

I guess the missing of two programs only increases the chance of race and sometimes weston won't crash in my observation.

I'll come up with a patch soon.
Comment 2 Boyan Ding 2014-08-23 06:35:24 UTC
I've just found that the only thing desktop_shell_client_destroy() do is setting shell->child.client to NULL and that's also done in desktop_shell_sigchld(). I think just getting rid of desktop_shell_client_destroy() and the client's destroy listener will solve this problem (and potentially many others).
Comment 3 Derek Foreman 2014-08-25 17:41:00 UTC
Someone on irc (rawoul) explained to me that the destroy listener is actually required because if it doesn't set client = NULL then something else might try to free it later.  This apparently can lead to a segfault on shutdown - I've not tried to reproduce this issue myself.

I've also tried removing the sigchld handler and pushing all its functionality into the destroy listener - that fails in other interesting ways:
- p->cleanup is always called, even if null. (need a no-op handler or to relax that constraint)

- if the destroy handler tries to add the destroy listener to the newly launched client's destroy notifiers it'll screw up the current pass through the old process's destroy handler list.

- the process struct (shell->child.process) is having its pid set to 0 in the destroy handler - this results in an incorrect log message when weston's sigchld handler can't find the dead process' pid.

I've tried deferring the call to launch_desktop_shell_process() with wl_event_loop_add_idle() but I don't know how to convince myself that the idle timer can't ever launch the new process before compositor.c processes the sigchld. (which would again result in a spurious unknown process error in the log)
Comment 4 Pekka Paalanen 2014-08-27 10:45:42 UTC
I sent a series, that supposedly fixes this and another related issue:
http://lists.freedesktop.org/archives/wayland-devel/2014-August/016901.html
Comment 5 Pekka Paalanen 2014-09-01 06:59:55 UTC
The three patches are upstream now.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.