Bug 62910

Summary: weston-launch leaves behind defunct procs when killed, hanging ttys
Product: Wayland Reporter: Joe Konno <joe.konno>
Component: westonAssignee: Wayland bug list <wayland-bugs>
Status: VERIFIED FIXED QA Contact:
Severity: major    
Priority: medium CC: brian.j.lovin, ullysses.a.eoff
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Joe Konno 2013-03-29 16:21:19 UTC
If one launches `weston-launch &` and then kills it, `weston-launch` leaves behind defunct processes and hangs the display. The only way to recover is to execute `weston-launch` again and "kill it properly"-- specifically, killing the 'weston' process.

  1. Source your environment, ensure no X server running (to force drm backend)
  2. From terminal,
    weston-launch &
  3. Kill it,
    killall weston-launch

The expectation is that weston-launch dies gracefully and doesn't leave the system in an unusable state. This could be a problem if a sysadmin tried to kill the weston-launch service manually or (possibly) through systemd (latter is unverified). After killing, display is hung and the only work-around requires remote access to machine, otherwise system is in an unusable state. Here are the defunct processes:

$ ps ax | grep weston
28909 tty2     Ss+    0:00 /wld/master/install/bin/weston
28911 tty2     Z+     0:00 [weston-desktop-] <defunct>
28914 pts/1    S+     0:00 grep --color=auto weston
$

The following work-around is only effective sometimes. You have to reboot the system when, e.g., weston-launch hangs on drm initialization:

  1. ssh to hung system
  2. Over ssh,
    weston-launch &
  3. Kill weston,
    killall weston
Comment 1 Kristian Høgsberg 2013-03-29 16:59:08 UTC
Right now weston-launch and weston cooperate to shut down cleanly.  Weston catches SIGSEGV and SIGABRT and cleans up as best as it can with help from weston-launch.  So killing weston is fine and should work, though SIGKILL can't be caught and doesn't let weston shut down nicely, so that will lock up the system.  Killing weston-launch isn't something we've tried to make work well and it will leave weston hanging trying to read from the socket to weston-launch while having the system stuck with keyboard disabled.

So in short, don't kill weston-launch, kill weston.  weston-launch now returns exit codes to reflect how weston exited: 0 means all is well, 10+signal means weston dies from signal and 128+signal means that weston-launch itself took a signal.

In the future, I think we'll  move some of the "dangerous stuff" out of weston and into weston-launcher.  Things like vt switch and disabling kb and switching the console into graphics mode may be better done in launcher so that no matter how weston dies, weston-launch can always clean it up.  When we do that, we'll also teach weston-launch to clean up properly when it itself crashes.
Comment 2 Joe Konno 2013-03-29 17:03:55 UTC
Appreciate the insights. Thanks, krh!

(In reply to comment #1)
> So in short, don't kill weston-launch, kill weston.  weston-launch now
> returns exit codes to reflect how weston exited: 0 means all is well,
> 10+signal means weston dies from signal and 128+signal means that
> weston-launch itself took a signal.
> 
> In the future, I think we'll  move some of the "dangerous stuff" out of
> weston and into weston-launcher.  Things like vt switch and disabling kb and
> switching the console into graphics mode may be better done in launcher so
> that no matter how weston dies, weston-launch can always clean it up.  When
> we do that, we'll also teach weston-launch to clean up properly when it
> itself crashes.
Comment 3 Rob Bradford 2013-05-03 12:46:10 UTC
I think catching SIGTERM and doing nothing (or perhaps printing a message) when launched by weston-launch might be helpful.
Comment 4 Kristian Høgsberg 2013-06-17 22:59:04 UTC
commit 1a81abb1ddb3fdc02a344624bef74710e31c0b73
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Mon Jun 17 15:23:20 2013 -0400

    weston-launch: Don't exit on SIGTERM
    
    Instead, forward signal to weston and wait for weston to clean up nicely.
    Weston relies on weston-launch being around to shut down correctly,
    so don't exit until we get the SIGCHLD from weston.  This make
    killall weston-launch work properly.
    
    https://bugs.freedesktop.org/show_bug.cgi?id=62910
Comment 5 Joe Konno 2013-06-18 18:02:40 UTC
Almost there! When I killall weston-launch, with either SIGINT or SIGTERM, get the following:

libwayland: error marshalling arguments for leave (signature uo): null value passed for arg 1

I'm in the process of tracking down what's breaking, but when I attach to desktop-shell prior to killing, gdb registers a segv.

(In reply to comment #4)
> commit 1a81abb1ddb3fdc02a344624bef74710e31c0b73
> Author: Kristian Høgsberg <krh@bitplanet.net>
> Date:   Mon Jun 17 15:23:20 2013 -0400
> 
>     weston-launch: Don't exit on SIGTERM
>     
>     Instead, forward signal to weston and wait for weston to clean up nicely.
>     Weston relies on weston-launch being around to shut down correctly,
>     so don't exit until we get the SIGCHLD from weston.  This make
>     killall weston-launch work properly.
>     
>     https://bugs.freedesktop.org/show_bug.cgi?id=62910
Comment 6 Kristian Høgsberg 2013-06-18 18:24:33 UTC
That's unrelated, it's a different bug that Jason Ekstrand is working on a fix for.  As for the segv, this might be the fix:

diff --git a/clients/window.c b/clients/window.c
index c13d207..c1ea0ec 100644
--- a/clients/window.c
+++ b/clients/window.c
@@ -1522,7 +1522,7 @@ window_destroy(struct window *window)
 
        wl_list_for_each(input, &display->input_list, link) {
                if (input->pointer_focus == window)
-                       input->pointer_focus = NULL;
+                       input_remove_pointer_focus(input);
                if (input->keyboard_focus == window)
                        input->keyboard_focus = NULL;
                if (input->focus_widget &&

If you have time to check it out, that'd be cool.
Comment 7 U. Artie Eoff 2013-06-18 18:37:21 UTC
(In reply to comment #5)
> Almost there! When I killall weston-launch, with either SIGINT or SIGTERM,
> get the following:
> 
> libwayland: error marshalling arguments for leave (signature uo): null value
> passed for arg 1
> 

The error message above is already being tracked at https://bugs.freedesktop.org/show_bug.cgi?id=65726
Comment 8 Joe Konno 2013-06-18 19:36:19 UTC
Since the commit does fix the issue originally reported, I'll mark 'verified'. The issue identified in comment #5 will be addressed in another filing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.