Bug 79609

Summary: VT switching crashes Xwayland and weston
Product: Wayland Reporter: Boyan Ding <stu_dby>
Component: XWaylandAssignee: Wayland bug list <wayland-bugs>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: vebveb
Version: 1.5.0   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Weston launch
Fixes a segfault in Wayland in the event that XWayland crashes
Checks for some null pointers in XWayland to avoid segfaulting
A printout of the values in the struct pointed to by DeviceIntPtr
proposed patch
Updated patch

Description Boyan Ding 2014-06-04 00:29:15 UTC
Xwayland will crash under some actions on my computer.

Steps to reproduce:
1. weston-launch from a VT
2. Start weston-terminal
3. Launch an app that needs X from weston-terminal (glxgears for example)
4. Switch to another VT and switch back
5. Move mouse onto the X window (glxgears)

Then that X window will fade out (crashes), after the X window fade out completely, weston will crash too.
It is 100 percent reproducible on my machine

Environment:
Weston 1.5.0
Xwayland (HEAD) 1.15.99.902-0-g7ca4584
mesa 10.2-rc1

Following is the weston log generated from the steps above
Comment 1 Boyan Ding 2014-06-04 00:29:50 UTC
Created attachment 100374 [details]
Weston launch
Comment 2 Boyan Ding 2014-06-27 02:10:16 UTC
I digged a bit more into this issue and discovered the following info, hope it helps. I built a new version of xwayland with lower level of optimization and without stripping debug info (The log I post before is somewhat unclear because of optimization and stripping). Backtrace is as follows:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000491851 in miPointerSetCursorPosition ()
(gdb) bt
#0  0x0000000000491851 in miPointerSetCursorPosition ()
#1  0x000000000050017d in AnimCurSetCursorPosition ()
#2  0x00000000004337bb in pointer_handle_enter ()
#3  0x00007f732e3dcdf0 in ffi_call_unix64 () from /usr/lib/libffi.so.6
#4  0x00007f732e3dc861 in ffi_call () from /usr/lib/libffi.so.6
#5  0x00007f733053651d in ?? () from /usr/lib/libwayland-client.so.0
#6  0x00007f73305338cb in ?? () from /usr/lib/libwayland-client.so.0
#7  0x00007f7330533954 in ?? () from /usr/lib/libwayland-client.so.0
#8  0x00007f7330534684 in wl_display_dispatch_queue_pending ()
   from /usr/lib/libwayland-client.so.0
#9  0x0000000000431bdb in wakeup_handler ()
#10 0x0000000000567d35 in WakeupHandler ()
#11 0x0000000000591137 in WaitForSomething ()
#12 0x000000000056387e in Dispatch ()
#13 0x00000000005672cd in dix_main ()
#14 0x0000000000553d52 in main ()

and the SIGSESV is caused by
	   miPointerPtr pPointer = MIPOINTER(pDev); (line 262 of mi/mipointer.c)
where MIPOINTER(pDev) returns 0
Comment 3 Tyler Veness 2014-06-27 05:33:11 UTC
Created attachment 101836 [details] [review]
Fixes a segfault in Wayland in the event that XWayland crashes
Comment 4 Tyler Veness 2014-06-27 05:52:35 UTC
Created attachment 101838 [details] [review]
Checks for some null pointers in XWayland to avoid segfaulting
Comment 5 Tyler Veness 2014-06-27 06:14:06 UTC
Created attachment 101839 [details]
A printout of the values in the struct pointed to by DeviceIntPtr
Comment 6 Tyler Veness 2014-06-27 06:17:19 UTC
I noticed this myself and have been trying to track down the problem for two days now. I've been working off of the lastest commit to master in xserver as of yesterday. My backtrace of Weston is as follows:

Weston backtrace:
#0  0x00007fe832fb7b26 in wl_event_loop_add_idle (loop=0x0, func=0x7fe82c3010b9 <weston_wm_window_draw_decoration>, data=0x2670210) at src/event-loop.c:301
#1  0x00007fe82c301514 in weston_wm_window_schedule_repaint (window=0x2670210) at xwayland/window-manager.c:1045
#2  0x00007fe82c30099f in weston_wm_window_activate (listener=0x266f290, data=0x26f9cf0) at xwayland/window-manager.c:727
#3  0x0000000000411bd6 in wl_signal_emit (signal=0x25e93e8, data=0x26f9cf0) at /usr/include/wayland-server.h:260
#4  0x0000000000414124 in weston_surface_activate (surface=0x26f9cf0, seat=0x2669100) at src/input.c:1009
#5  0x00007fe82c520570 in activate (shell=0x25ebdf0, es=0x26f9cf0, seat=0x2669100, configure=true) at desktop-shell/shell.c:4578
#6  0x00007fe82c518344 in focus_state_surface_destroy (listener=0x2a8b560, data=0x2ad02e0) at desktop-shell/shell.c:695
#7  0x0000000000407b93 in wl_signal_emit (signal=0x2ad02e8, data=0x2ad02e0) at /usr/include/wayland-server.h:260
#8  0x000000000040b3f3 in weston_surface_destroy (surface=0x2ad02e0) at src/compositor.c:1406
#9  0x00007fe82c51db47 in fade_out_done (animation=0x26f5c10, data=0x2ad04d0) at desktop-shell/shell.c:3199
#10 0x000000000041e1a8 in weston_view_animation_destroy (animation=0x26f5c10) at src/animation.c:143
#11 0x000000000041e27d in weston_view_animation_frame (base=0x26f5c18, output=0x2687020, msecs=28460436) at src/animation.c:174
#12 0x000000000040c67a in weston_output_repaint (output=0x2687020, msecs=28460436) at src/compositor.c:1832
#13 0x000000000040c760 in weston_output_finish_frame (output=0x2687020, msecs=28460436) at src/compositor.c:1861
#14 0x00007fe83159285f in page_flip_handler (fd=12, frame=425573, sec=28460, usec=436680, data=0x2687020) at src/compositor-drm.c:761
#15 0x00007fe830d68e6e in drmHandleEvent () from /usr/lib/libdrm.so.2
#16 0x00007fe831593d81 in on_drm_input (fd=12, mask=1, data=0x25e9360) at src/compositor-drm.c:1259
#17 0x00007fe832fb756b in wl_event_source_fd_dispatch (source=0x2690a50, ep=0x7fff913542bc) at src/event-loop.c:86
#18 0x00007fe832fb7f19 in wl_event_loop_dispatch (loop=0x25e0220, timeout=-1) at src/event-loop.c:419
#19 0x00007fe832fb5d10 in wl_display_run (display=0x25e0190) at src/wayland-server.c:969
#20 0x0000000000411a71 in main (argc=1, argv=0x7fff91354768) at src/compositor.c:4378

weston passes in wm->server->loop from a weston_wm* to wayland's wl_event_loop_add_idle(). wayland dereferences some members from it and segfaults. I applied the patch in my first attachment to wayland to study the problem further. I also got a backtrace of XWayland but with line numbers as well:

XWayland backtrace:
#0  0x000000000046ffd6 in miPointerSetCursorPosition (pDev=0x2c94350, pScreen=0x27f42b0, x=33, y=167, generateEvent=1)
    at mipointer.c:264
#1  0x0000000000514aa6 in AnimCurSetCursorPosition (pDev=0x2c94350, pScreen=0x27f42b0, x=33, y=167, generateEvent=1)
    at animcur.c:245
#2  0x000000000042f742 in pointer_handle_enter (data=0x2b0abe0, pointer=0x2cc85d0, serial=43, surface=0x2cb4a90, sx_w=8635,
    sy_w=42824) at xwayland-input.c:160
#3  0x00007f1572748df0 in ffi_call_unix64 () from /usr/lib/libffi.so.6
#4  0x00007f1572748861 in ffi_call () from /usr/lib/libffi.so.6
#5  0x00007f15741ac382 in wl_closure_invoke (closure=0x2c92ac0, flags=1, target=0x2cc85d0, opcode=0, data=0x2b0abe0)
    at src/connection.c:934
#6  0x00007f15741a98e6 in dispatch_event (display=0x27f4970, queue=0x27f4a58) at src/wayland-client.c:1054
#7  0x00007f15741a9b41 in dispatch_queue (display=0x27f4970, queue=0x27f4a58) at src/wayland-client.c:1159
#8  0x00007f15741a9e7d in wl_display_dispatch_queue_pending (display=0x27f4970, queue=0x27f4a58) at src/wayland-client.c:1381
#9  0x00007f15741a9ee5 in wl_display_dispatch_pending (display=0x27f4970) at src/wayland-client.c:1458
#10 0x000000000042e53e in wakeup_handler (data=0x27f4800, err=1, read_mask=0x879000 <LastSelectMask>) at xwayland.c:419
#11 0x00000000005adc02 in WakeupHandler (result=1, pReadmask=0x879000 <LastSelectMask>) at dixutils.c:423
#12 0x00000000005ed01b in WaitForSomething (pClientsReady=0x2c9ad40) at WaitFor.c:229
#13 0x000000000059ebf2 in Dispatch () at dispatch.c:361
#14 0x00000000005acee2 in dix_main (argc=10, argv=0x7fff421f5c98, envp=0x7fff421f5cf0) at main.c:296
#15 0x000000000058f261 in main (argc=10, argv=0x7fff421f5c98, envp=0x7fff421f5cf0) at stubmain.c:34

This was another case of null pointers. I fixed all of them with the patch in my second attachment. XWayland no longer crashed, but X windows would no longer accept mouse input or be draggable due to the patch's change to hw/xwayland/xwayland-input.c. With that patch applied, the first one technically isn't needed, but should be included as a safeguard (the compositor shouldn't be able to break Wayland so easily). Maybe an assert would be better there so devs know if they broke things?

I traced the call to the MIPOINTER(pDev) macro made at mi/mipointer.c:262 and found that IsFloating(pDev) returns true, so the mouse pointer is a floating device. This lead to "dixLookupPrivate(&(dev)->devPrivates, miPointerPrivKey)" being called from the macro, which calls "dixGetPrivate" (include/privates.h:134), which returns "*(void **) dixGetPrivateAddr(privates, key)" (include/privates.h:120). This essentially returns "(char *) (*privates) + key->offset" dereferenced, which is NULL.

My third attachment contains a printout from GDB of the values in the struct from the DeviceIntPtr pDev located in the function miPointerDisplayCursor:193 in mi/mipointer.c. The values contained by the DeviceIntPtr don't fix themselves so the mouse input stays broken for X windows.

Native Wayland apps still accepted keyboard and mouse input properly.
Comment 7 Boyan Ding 2014-06-28 03:25:01 UTC
Ok, I think I have nearly found what causes the problem (thanks Tyler for the detailed info).

When switching VTs back and forth, weston will reprobe input devices. So after switching back from another VT, Xwayland, as a wayland client, will add new device for the pointer (line 473 of hw/xwayland/xwayland-input.c). However, the devPrivates field of the newly created device is not properly set (pointing to NULL) thus cause the problem.

The field is set in xwl_realize_cursor in xwayland-cursor.c and that function is passed into some functions as function pointer. However, from my debugging, it is never called after the new pointer device was created.

So the problem may lie in forgetting to init something.
Comment 8 Boyan Ding 2014-06-28 08:56:16 UTC
Created attachment 101908 [details] [review]
proposed patch

Finally worked it out. The same patch will go to xorg-devel list, too.

Review and suggestions are welcomed.
Comment 9 Boyan Ding 2014-06-29 09:34:17 UTC
Created attachment 101968 [details] [review]
Updated patch

This new patch seems to make more sense. We don't have to check for VCP and VCK, and we can put initialization of devices into one place.
Comment 10 Jasper St. Pierre 2014-06-30 01:34:57 UTC
*** Bug 73748 has been marked as a duplicate of this bug. ***
Comment 11 Boyan Ding 2014-07-27 02:23:21 UTC
commit 2172714c67d8701aa54c202e89f246f1dddac80a
Author: Carlos Garnacho <carlosg@gnome.org>
Date:   Tue Jul 22 17:55:25 2014 +0200

    xwayland: Only disable/enable devices on capabilities change

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.