Bug 96246

Summary: Xwayland crash with xorg-x11-server-Xwayland-1.18.3-4.fc25.x86_64
Product: xorg Reporter: Kevin Fenzi <kevin>
Component: Server/GeneralAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: leho
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
[PATCH RFC xserver] wayland: clear resource for pixmap on unrealize
none
[PATCH xserver v2] wayland: clear resource for pixmap on unrealize none

Description Kevin Fenzi 2016-05-27 16:54:23 UTC
I've been unable to isolate exactly what causes this, but some random time after logging in, it happens and my entire session is killed off. :( 

xorg-x11-server-Xwayland-1.18.3-4.fc25.x86_64

May 27 10:32:06 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (<unknown>:3469): Gtk-CRITICAL **: gtk_widget_get_style_con
text: assertion 'GTK_IS_WIDGET (widget)' failed
May 27 10:32:06 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (<unknown>:3469): Gtk-CRITICAL **: gtk_render_background: a
ssertion 'GTK_IS_STYLE_CONTEXT (context)' failed
May 27 10:32:08 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (<unknown>:3469): Gtk-CRITICAL **: gtk_widget_get_style_con
text: assertion 'GTK_IS_WIDGET (widget)' failed
May 27 10:32:08 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (<unknown>:3469): Gtk-CRITICAL **: gtk_render_background: a
ssertion 'GTK_IS_STYLE_CONTEXT (context)' failed
May 27 10:32:08 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (<unknown>:3469): Gtk-CRITICAL **: gtk_widget_get_style_con
text: assertion 'GTK_IS_WIDGET (widget)' failed
May 27 10:32:08 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (<unknown>:3469): Gtk-CRITICAL **: gtk_render_background: a
ssertion 'GTK_IS_STYLE_CONTEXT (context)' failed
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE)
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) Backtrace:
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 0: /usr/bin/Xwayland (OsLookupColor+0x139) [0x5914c9]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 1: /lib64/libc.so.6 (__restore_rt+0x0) [0x7f1c65a796af
]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 2: /usr/bin/Xwayland (ddxProcessArgument+0x14a) [0x424
e7a]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 3: /usr/bin/Xwayland (CloseInput+0x7bc) [0x4276ec]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 4: /usr/bin/Xwayland (FreeCursor+0x53) [0x549c93]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 5: /usr/bin/Xwayland (AddTraps+0x7617) [0x4f5b67]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 6: /usr/bin/Xwayland (FreeCursor+0x53) [0x549c93]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 7: /usr/bin/Xwayland (ChangeWindowAttributes+0x8e0) [0
x587ef0]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 8: /usr/bin/Xwayland (ProcBadRequest+0x1fd) [0x55034d]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 9: /usr/bin/Xwayland (SendErrorToClient+0x2df) [0x5563
8f]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 10: /usr/bin/Xwayland (remove_fs_handlers+0x463) [0x55
a3c3]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 11: /lib64/libc.so.6 (__libc_start_main+0xf1) [0x7f1c6
5a64231]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 12: /usr/bin/Xwayland (_start+0x29) [0x423879]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) 13: ? (?+0x29) [0x29]
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE)
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) Segmentation fault at address 0x20
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE)
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: Fatal server error:
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE) Caught signal 11 (Segmentation fault). Server aborting
May 27 10:32:19 sheelba.scrye.com org.gnome.Shell.desktop[2138]: (EE)
Comment 1 Olivier Fourdan 2016-05-30 08:22:21 UTC
Not the first time I see such a backtrace, iirc, mclasen had reported on irc a similar backtrace but could not reproduce either.
Comment 2 Olivier Fourdan 2016-05-30 15:47:41 UTC
Assuming the backtrace is not corrupted

$ rpm -qf /usr/lib/debug/usr/bin/Xwayland.debug
xorg-x11-server-debuginfo-1.18.3-4.fc25.x86_64

$ addr2line -fe /usr/lib/debug/usr/bin/Xwayland.debug 0x424e7a 0x4276ec 0x549c93 0x4f5b67 0x549c93 0x587ef0 0x55034d 0x55638f 0x55a3c3

gives:

InitOutput
/usr/src/debug/xorg-server-1.18.3/hw/xwayland/xwayland.c:751
xwl_screen_init_output
:?
FreeCursor
/usr/src/debug/xorg-server-1.18.3/dix/cursor.c:120 (discriminator 3)
dixGetPrivateAddr
/usr/src/debug/xorg-server-1.18.3/miext/sync/../../include/privates.h:123
FreeCursor
/usr/src/debug/xorg-server-1.18.3/dix/cursor.c:120 (discriminator 3)
ChangeWindowAttributes
/usr/src/debug/xorg-server-1.18.3/dix/window.c:1567
ProcDestroyWindow
/usr/src/debug/xorg-server-1.18.3/dix/dispatch.c:712
Dispatch
/usr/src/debug/xorg-server-1.18.3/dix/dispatch.c:353
dix_main
/usr/src/debug/xorg-server-1.18.3/dix/main.c:340

Not sure I make much sense out of that...
Comment 3 Kevin Fenzi 2016-05-31 00:26:18 UTC
I tried to get more info, but not sure how much it helps... I logged in and attached to Xwayland and waited for the crash and got: 

Thread 1 "Xwayland" received signal SIGSEGV, Segmentation fault.
xwl_seat_set_cursor (xwl_seat=0x1839ce0) at xwayland-cursor.c:127
127	        memcpy(pixmap->devPrivate.ptr,
(gdb) 
Continuing.
[Thread 0x7fde28eb5700 (LWP 2002) exited]
[Thread 0x7fde296b6700 (LWP 2001) exited]
[Thread 0x7fde29eb7700 (LWP 2000) exited]
[Thread 0x7fde36125ec0 (LWP 1998) exited]
[Inferior 1 (process 1998) exited with code 01]
Comment 4 Olivier Fourdan 2016-05-31 07:02:58 UTC
(In reply to Kevin Fenzi from comment #3)
> I tried to get more info, but not sure how much it helps... I logged in and
> attached to Xwayland and waited for the crash and got: 
> 
> Thread 1 "Xwayland" received signal SIGSEGV, Segmentation fault.
> xwl_seat_set_cursor (xwl_seat=0x1839ce0) at xwayland-cursor.c:127
> 127	        memcpy(pixmap->devPrivate.ptr,
> (gdb) 

Oh, that's interesting because we had commit 1815540 cherry-picked in 1.18.x so that might be related, not sure exactly how yet, though...

https://cgit.freedesktop.org/xorg/xserver/commit/?id=1815540

Can you try to reproduce and capture a bit more data then:

 - a backtrace:

 (gdb) bt full

 - print the values of xwl_seat, pixmap, cursor and cursor->bits in xwl_seat_set_cursor() where it crashes:

 (gdb) p *xwl_seat
 (gdb) p *pixmap
 (gdb) p *cursor
 (gdb) p *cursor->bits

(Note, some value might  be optimized out, in which case we won't get their content)

Thanks for your help!

PS: Still no idea how to reproduce? I don't understand why I don't see this at all...
Comment 5 Olivier Fourdan 2016-05-31 16:55:49 UTC
I wonder, is it possible to get a pointer_handle_enter() after a unrealize_cursor()?

If that happens, we would now end up calling xwl_seat_set_cursor() from pointer_handle_enter() with xwl_seat->cursor_frame_cb == NULL so we wouldn't return in xwl_seat_set_cursor() and do the memcpy() onto pixmap->devPrivate.ptr even though it would have been destroyed previously in xwl_unrealize_cursor()

Well, it's just a theory at this point...
Comment 6 Olivier Fourdan 2016-06-01 07:36:45 UTC
Created attachment 124226 [details] [review]
[PATCH RFC xserver] wayland: clear resource for pixmap on unrealize

Once you've captured the data as described in commit 4, you may want to try this patch, I wonder if that would make any difference...
Comment 7 Kevin Fenzi 2016-06-01 18:05:19 UTC
So, I waited for it to crash all yesterday and this morning and of course it didn't. ;) 

But finally it just crashed, but the trace was somewhat different: 

Thread 1 "Xwayland" received signal SIGSEGV, Segmentation fault.
dixGetPrivate (key=0x816720 <xwl_pixmap_private_key>, privates=0x20)
    at ../../include/privates.h:137
137	    return *(void **) dixGetPrivateAddr(privates, key);
(gdb) where
#0  dixGetPrivate (key=0x816720 <xwl_pixmap_private_key>, privates=0x20)
    at ../../include/privates.h:137
#1  dixLookupPrivate (key=0x816720 <xwl_pixmap_private_key>, privates=0x20)
    at ../../include/privates.h:167
#2  xwl_pixmap_get (pixmap=pixmap@entry=0x0) at xwayland.c:179
#3  0x0000000000426f3c in xwl_shm_destroy_pixmap (pixmap=0x0) at xwayland-shm.c:234
#4  0x0000000000549c93 in FreeCursor (value=0x51bc390, cid=cid@entry=0)
    at cursor.c:122
#5  0x00000000004ee637 in AnimCurUnrealizeCursor (pDev=<optimized out>, 
    pScreen=0xb083c0, pCursor=0x5232180) at animcur.c:278
#6  0x0000000000549c93 in FreeCursor (value=value@entry=0x5232180, cid=cid@entry=0)
    at cursor.c:122
#7  0x0000000000587ef0 in ChangeWindowAttributes (pWin=0x13b8d10, 
    vmask=<optimized out>, vlist=vlist@entry=0x51c1fb0, 
    client=client@entry=0xb01410) at window.c:1567
#8  0x00000000005501ed in ProcChangeWindowAttributes (client=0xb01410)
    at dispatch.c:677
#9  0x000000000055630f in Dispatch () at dispatch.c:430
#10 0x000000000055a333 in dix_main (argc=10, argv=0x7ffd07acbd68, 
    envp=<optimized out>) at main.c:300
#11 0x00007f748e393231 in __libc_start_main (main=0x423840 <main>, argc=10, 
    argv=0x7ffd07acbd68, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7ffd07acbd58) at ../csu/libc-start.c:289
#12 0x0000000000423879 in _start ()
(gdb) p *xwl_seat
No symbol "xwl_seat" in current context.
(gdb) p *pixmap
No symbol "pixmap" in current context.

I'll setup to regather...
Comment 8 Kevin Fenzi 2016-06-02 01:21:41 UTC
Just got another of the same ones:

Thread 1 "Xwayland" received signal SIGSEGV, Segmentation fault.
dixGetPrivate (key=0x816720 <xwl_pixmap_private_key>, privates=0x20)
    at ../../include/privates.h:137
137	    return *(void **) dixGetPrivateAddr(privates, key);
(gdb) where
#0  dixGetPrivate (key=0x816720 <xwl_pixmap_private_key>, privates=0x20)
    at ../../include/privates.h:137
#1  dixLookupPrivate (key=0x816720 <xwl_pixmap_private_key>, privates=0x20)
    at ../../include/privates.h:167
#2  xwl_pixmap_get (pixmap=pixmap@entry=0x0) at xwayland.c:179
#3  0x0000000000426f7c in xwl_shm_destroy_pixmap (pixmap=0x0) at xwayland-shm.c:234
#4  0x0000000000549cd3 in FreeCursor (value=0x35c6fc0, cid=cid@entry=0)
    at cursor.c:122
#5  0x00000000004ee677 in AnimCurUnrealizeCursor (pDev=<optimized out>, 
    pScreen=0x17373c0, pCursor=0x3633010) at animcur.c:278
#6  0x0000000000549cd3 in FreeCursor (value=value@entry=0x3633010, cid=cid@entry=0)
    at cursor.c:122
#7  0x0000000000587f30 in ChangeWindowAttributes (pWin=0x1fe2d70, 
    vmask=<optimized out>, vlist=vlist@entry=0x35d5060, 
    client=client@entry=0x1730410) at window.c:1567
#8  0x000000000055022d in ProcChangeWindowAttributes (client=0x1730410)
    at dispatch.c:677
#9  0x000000000055634f in Dispatch () at dispatch.c:430
#10 0x000000000055a373 in dix_main (argc=10, argv=0x7fff76b49e18, 
    envp=<optimized out>) at main.c:300
#11 0x00007f41a518f461 in __libc_start_main () from /lib64/libc.so.6
#12 0x00000000004238b9 in _start ()
Comment 9 Olivier Fourdan 2016-06-02 06:33:57 UTC
Created attachment 124254 [details] [review]
[PATCH xserver v2] wayland: clear resource for pixmap on unrealize

Can you try this patch instead?

It seems to me that FreeCursor() can be called more than once (FreeCursor() -> (*pscr->UnrealizeCursor) -> AnimCurUnrealizeCursor() -> FreeCursor()) so that we would end up calling xwl_unrealize_cursor() on a cursor with a pixmap that was already freed previously.
Comment 10 Kevin Fenzi 2016-06-02 15:54:36 UTC
ok. Running with that patch now. ;)
Comment 11 Kevin Fenzi 2016-06-02 20:11:49 UTC
So, it's not crashed yet with that patch... but I have seen an odddity: 

twice now when I have been in a X app, the app and cursor have stopped responding at all. Switching to a vt and back and everything returns to normal. 

In my gdb session I see: 

Detaching after fork from child process 25619.
Detaching after fork from child process 439.

Will keep watching for a crash...
Comment 12 Olivier Fourdan 2016-06-03 07:04:27 UTC
(In reply to Kevin Fenzi from comment #11)
> So, it's not crashed yet with that patch... but I have seen an odddity: 
> 
> twice now when I have been in a X app, the app and cursor have stopped
> responding at all. Switching to a vt and back and everything returns to
> normal. 
> [...]

Thanks! Is that not responding to cursor only or any input (including keyboard)? Was that the same X11 application in both cases (and possibly which app)?

I suspect my patch might fix the effect but not the root cause of the issue...
Comment 13 Kevin Fenzi 2016-06-03 12:54:12 UTC
It seemed to be any input (but it did take the control-alt-f3 to switch to vty3). 

It was claws-mail in all cases so far I think.
Comment 14 Olivier Fourdan 2016-06-03 13:11:54 UTC
(In reply to Kevin Fenzi from comment #13)
> It seemed to be any input (but it did take the control-alt-f3 to switch to
> vty3). 

OK, so it's probably unrelated to this patch ...
 
> It was claws-mail in all cases so far I think.

... and we cannot possibly rule out an application or even compositor (gnome-shell) bug for this.

I'll send the patch to the ML for further comments then.
Comment 15 Kevin Fenzi 2016-06-03 13:41:41 UTC
I'll keep trying to get it to crash again. 

Last night/this morning so far there have been 0 input issues, but I haven't yet setup gdb from my phone. Perhaps gdm was interfering somehow there and causing those input issues I was seeing?
Comment 16 Kevin Fenzi 2016-06-06 17:08:26 UTC
FYI, I have seen no crashes since I applied the patch in comment 9. ;)
Comment 17 Olivier Fourdan 2016-06-07 06:28:13 UTC
I have submitted a slightly different version including Rui's suggestion to xorg-devel ML here:

https://patchwork.freedesktop.org/series/8234/
Comment 18 Olivier Fourdan 2016-06-14 06:52:38 UTC
The patch has landed in master, closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.