Bug 65961 - SIGSEGV in weston-desktop-shell when client is stopped
Summary: SIGSEGV in weston-desktop-shell when client is stopped
Status: VERIFIED FIXED
Alias: None
Product: Wayland
Classification: Unclassified
Component: weston (show other bugs)
Version: unspecified
Hardware: Other All
: medium critical
Assignee: Wayland bug list
QA Contact:
URL:
Whiteboard:
Keywords:
: 66001 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-06-20 10:00 UTC by Marek Chalupa
Modified: 2013-07-08 21:56 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Check if widget is not NULL (741 bytes, text/plain)
2013-06-20 10:00 UTC, Marek Chalupa
Details

Description Marek Chalupa 2013-06-20 10:00:23 UTC
Created attachment 81102 [details]
Check if widget is not NULL

Hey,
I experienced a bug which resulted into SIGTRAP of weston (after a few respawnings of weston-desktop-shell after SIGSEGV):

[11:15:19.134] 0: ./weston (on_caught_signal+0x17) [0x409517]
[11:15:19.134] 1: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x35a7c0ef9f]
[11:15:19.135] 2: /home/marek/wayland/install/lib/libwayland-server.so.0 (wl_resource_post_event+0x66) [0x7fdbd550b836]
[11:15:19.136] 3: /home/marek/wayland/install/lib/weston/desktop-shell.so (shell_grab_start+0x7d) [0x7fdbd224831d]
[11:15:19.136] 4: ./weston (weston_pointer_set_focus+0x147) [0x40e3f7]
[11:15:19.136] 5: ./weston (default_grab_focus+0x5b) [0x40e53b]
[11:15:19.136] 6: ./weston (notify_motion+0x4d) [0x40e8fd]
[11:15:19.136] 7: /home/marek/wayland/install/lib/weston/x11-backend.so (x11_compositor_handle_event+0x43e) [0x7fdbd486b2fe]
[11:15:19.137] 8: /home/marek/wayland/install/lib/libwayland-server.so.0 (wl_event_loop_dispatch+0x62) [0x7fdbd550d8f2]
[11:15:19.137] 9: ./weston (weston_compositor_read_input+0x12) [0x408b92]
[11:15:19.137] 10: /home/marek/wayland/install/lib/libwayland-server.so.0 (wl_event_loop_dispatch+0x62) [0x7fdbd550d8f2]
[11:15:19.137] 11: /home/marek/wayland/install/lib/libwayland-server.so.0 (wl_display_run+0x25) [0x7fdbd550c535]
[11:15:19.137] 12: ./weston (main+0x537) [0x408277]
[11:15:19.194] 13: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3f86421b75]
[11:15:19.194] 14: ./weston (_start+0x29) [0x408469]
[11:15:19.195] 15: ? (?+0x29) [0x29]

The bug appears when you run a client, then suspend it (^Z in my case) and then hover with mouse or click on it or whatever.
Additionally, when you use right button, you can spin the window around its axis (or move with i.e. smoke widget) (I'm new to wayland, so I'm not sure if that's normal behavior)

I tried to track it in weston-desktop-shell and attached patch worked for me (regarding the SIGSEGV)

Cheers,
mch
Comment 1 U. Artie Eoff 2013-06-20 15:41:59 UTC
I've also recently noticed that weston-desktop-shell crashes and restarts.  I never did investigate the cause though. Good find.
Comment 2 U. Artie Eoff 2013-06-21 15:30:10 UTC
*** Bug 66001 has been marked as a duplicate of this bug. ***
Comment 3 Rob Bradford 2013-06-26 16:08:17 UTC
It looks like we're getting into the situation where window->main_surface is not in the window->subsurface_list.

This is a bit peculiar...
Comment 4 Rob Bradford 2013-06-26 16:13:45 UTC
But this attached patch does indeed solve the problem
Comment 5 Kristian Høgsberg 2013-07-04 05:12:50 UTC
There's a few crashes at work here.  First, the weston crash happens when the desktop-shell helper client crashes too fast and we give up trying to re-launch it.  At that point shell.c doesn't have a desktop-shell to talk to.  It tries anyway to set the grab cursor (for example, when moving a window or setting the busy cursor) and crashes.  If the desktop-shell client is gone, the best we can do is just not setting the cursor:

commit c9974a0796fe2934299f10dc3a879d29c7045859
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Wed Jul 3 19:24:57 2013 -0400

    shell: Dont set grab cursor if desktop-shell client died
    
    If we don't have a desktop-shell helper client, don't try to send events
    to it.

Next, the reason desktop-shell dies in the first place is that it gets a motion event with coordinates outside the surface for the cursor dummy surface.  That's why the widget is NULL, but this shouldn't happen, so the NULL check in attachment 81102 [details] is papering over a deeper problem.  What happens is that the default_grab focus handler ends up triggering handle_pointer_focus in shell.c, which recognizes the unresponsive surface and starts the busy cursor grab.  However, when returning to notify_motion in input.c, we have a cached value of the previous grab (the default grab) in the 'interface' local var, and thus calls default_grab_motion, even though the new current grab is the busy cursor grab.  This means that the cursor surface gets a motion event, which should never happen and triggers the desktop-shell crash:

commit da751b8f9a16177b56399f10ca193b4c8b746ad8
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Thu Jul 4 00:58:07 2013 -0400

    input: Don't cache pointer grab interface between calls to focus and motion
    
    The focus callback for the current grab can change the grab, so we have
    to make sure we call the motion callback for the currently active grab.
    
    https://bugs.freedesktop.org/show_bug.cgi?id=65961


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.