Bug 80273 - SIGSEGV in weston_wm_window_destroy
Summary: SIGSEGV in weston_wm_window_destroy
Status: RESOLVED FIXED
Alias: None
Product: Wayland
Classification: Unclassified
Component: weston (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Wayland bug list
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-20 08:22 UTC by Marek Chalupa
Modified: 2014-07-24 21:37 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Backtrace (9.25 KB, text/plain)
2014-06-20 08:22 UTC, Marek Chalupa
Details
Patch: Solution 1 (2.33 KB, patch)
2014-06-30 10:45 UTC, Tyler Veness
Details | Splinter Review

Description Marek Chalupa 2014-06-20 08:22:41 UTC
Created attachment 101418 [details]
Backtrace

Hi,

When you do this:
1. run weston with xwayland, run xterm
2. pres ctrl and keep pressing left/right button (so that popup menu appears)

After a while the menu stops appearing and when you close the xterm, the weston gets SIGSEGV. It works in both x11 backend and drm backend.

See attached backtrace

Regards,
Marek
Comment 1 Tyler Veness 2014-06-28 03:15:27 UTC
I encountered this bug on a machine of mine last night. I reproduce it with the following:

1. Start weston with weston-launch
2. Open weston-terminal
3. Run evince from weston-terminal
4. Click on the close button on evince's window

My backtrace is essentially the same as the one already attached. At wayland-util.c:54, there is the line "elm->prev->next = elm->next;" in which both elm->prev and elm->next are NULL. Since wl_list_remove sets them to that after removing them, wl_list_remove may have been called twice, the SIGSEGV being from the second time, or wl_list_insert was never called at all and the pointers are NULL because Linux inits memory to 0 when it's malloc'ed.

There are two instances of wl_list_remove in xwayland/window-manager.c at lines 685 and 1133 respectively. I set a breakpoint at the first and it was never called. The second is where the SIGSEGV occurs. I also set a breakpoint at line 1384 to figure out if wl_list_insert was called in the first place, which wasn't the case; the first branch with "xserver_map_shell_surface" was called instead. I think the check for window->surface_id at line 1132 assumed surface_id would be set to 0 at line 684.

There are three ways I see to avoid wl_list_remove causing a SIGSEGV at line 1133. I have patches for the first two which I know fix the SIGSEGV.

Possible Fixes:

1. Add "window->surface_id = 0;" to the if branch of the if statement in Weston's source at xwayland/window-manager.c:1380. However, I don't know what this affects elsewhere in the code.

2. Add NULL pointer checks to wl_list_remove in Wayland's source at wayland-util.c:52. This would work, but it sort of acts like a band-aid for the real problem, which is in Weston. One could argue that wayland shouldn't crash due to the negligence of the display compositor, but the C++ STL is like wayland in that it will crash or leak memory if improperly managed.

3. Add a way to keep track of which branch the execution path took in Weston's source at xwayland/window-manager.c:1380. This would essentially provide a way to determine whether the window is in the unpaired_window_list or not. I can't think of a clean way to do it though (i.e. without a global variable).
Comment 2 Pekka Paalanen 2014-06-30 06:06:42 UTC
(In reply to comment #1)
> Possible Fixes:
> 
> 2. Add NULL pointer checks to wl_list_remove in Wayland's source at
> wayland-util.c:52. This would work, but it sort of acts like a band-aid for
> the real problem, which is in Weston. One could argue that wayland shouldn't
> crash due to the negligence of the display compositor, but the C++ STL is
> like wayland in that it will crash or leak memory if improperly managed.

Yeah, at least this option 2 is invalid. wl_list functions assume that they are always used right: a double-remove is illegal. Even the set-to-NULL here is only a debug aid to make sure the program crashes rather than silently corrupts memory. Also checking for wl_list::next/prev for NULL is invalid if expected to be set by libwayland.
Comment 3 Tyler Veness 2014-06-30 10:45:01 UTC
Created attachment 102008 [details] [review]
Patch: Solution 1

This patch assumes that window->surface_id is being used as a flag for determining if wl_list_remove should be called when the window is destroyed and that window->surface_id is not used for any other purpose.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.