Created attachment 141708 [details] xwayland core dump from coredumpctl, can be used inside gdb archlinux distro, gnome as desktop environment, gdm as desktop manager. Rollback to xorg-server-1.20.0 fixes the issue. There's core file attached. Coredump info: ```plain PID: 771 (Xwayland) UID: 1000 (nartes) GID: 1000 (nartes) Signal: 6 (ABRT) Timestamp: Sun 2018-09-23 17:04:52 +03 (56min ago) Command Line: /usr/bin/Xwayland :0 -rootless -terminate -accessx -core -listen 4 -listen 5 -displayfd 6 Executable: /usr/bin/Xwayland Control Group: /user.slice/user-1000.slice/session-2.scope Unit: session-2.scope Slice: user-1000.slice Session: 2 Owner UID: 1000 (nartes) Boot ID: 4ad1bc740df242c2b7e786fccd484039 Machine ID: ec8ac8cf4cb14010ac2f461877ef63ac Hostname: siarhei_hp Storage: /var/lib/systemd/coredump/core.Xwayland.1000.4ad1bc740df242c2b7e786fccd484039.771.1537711492000000.lz4 Message: Process 771 (Xwayland) of user 1000 dumped core. Stack trace of thread 771: #0 0x00007f168a998d7f raise (libc.so.6) #1 0x00007f168a983672 abort (libc.so.6) #2 0x0000562ff139e05a n/a (Xwayland) #3 0x0000562ff1396425 n/a (Xwayland) #4 0x0000562ff14bd3ec n/a (Xwayland) #5 0x00007f168a273e8a n/a (libwayland-client.so.0) #6 0x00007f168a26f1f9 n/a (libwayland-client.so.0) #7 0x00007f1688d781c8 ffi_call_unix64 (libffi.so.6) #8 0x00007f1688d77c2a ffi_call (libffi.so.6) #9 0x00007f168a272f5f n/a (libwayland-client.so.0) #10 0x00007f168a26f6ca n/a (libwayland-client.so.0) #11 0x00007f168a270bdf wl_display_dispatch_queue_pending (libwayland-client.so.0) #12 0x0000562ff14bebdb n/a (Xwayland) #13 0x0000562ff139fdd1 n/a (Xwayland) #14 0x0000562ff1467ad0 n/a (Xwayland) #15 0x0000562ff136210d n/a (Xwayland) #16 0x00007f168a985223 __libc_start_main (libc.so.6) #17 0x0000562ff136313e n/a (Xwayland) Stack trace of thread 781: #0 0x00007f1689672afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f168655bf3c n/a (i965_dri.so) #2 0x00007f168655bc78 n/a (i965_dri.so) #3 0x00007f168966ca9d start_thread (libpthread.so.0) #4 0x00007f168aa5ca43 __clone (libc.so.6) Stack trace of thread 783: #0 0x00007f1689672afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f1684936054 n/a (swrast_dri.so) #2 0x00007f1684935f98 n/a (swrast_dri.so) #3 0x00007f168966ca9d start_thread (libpthread.so.0) #4 0x00007f168aa5ca43 __clone (libc.so.6) Stack trace of thread 782: #0 0x00007f1689672afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f1684936054 n/a (swrast_dri.so) #2 0x00007f1684935f98 n/a (swrast_dri.so) #3 0x00007f168966ca9d start_thread (libpthread.so.0) #4 0x00007f168aa5ca43 __clone (libc.so.6) PID: 789 (Xwayland) UID: 1000 (nartes) GID: 1000 (nartes) ```
Unfortunately, the core file (attachment 141708 [details]) is pretty useless without the symbols. Could you please provide a backtrace of the crash with the debugging symbols attached instead?
Yes, it is without them. Cause archlinux doesn't add "-g", pretty standard pratice. I have compiled upstream code, but yet didn't check whether it works.
The problem seems to be an invalid display_handle, or authentication error. Just guessing from a stacktrace. In practice I reproduce the problem in the following way. A new gnome-session is started, which runs Xwayland. Then I switch a tty terminal. And during this time an autostart execute skype process. Which crashes Xwayland finally. So it's some a problem within gnome, skype, or etc. I've cross checked xorg-server-1.20.0, ..., xorg-server.1.20.1, as well as master branch. The problem is present at all of them. ```plain (gdb) where #0 0x00007fcfe9b25d7f in raise () from /usr/lib/libc.so.6 #1 0x00007fcfe9b10672 in abort () from /usr/lib/libc.so.6 #2 0x00005582a92a2fea in OsAbort () at ../xserver/os/utils.c:1350 #3 0x00005582a929b3e5 in AbortServer () at ../xserver/os/log.c:877 #4 FatalError (f=<optimized out>) at ../xserver/os/log.c:1015 #5 0x00005582a93c1e7c in xwl_log_handler (format=<optimized out>, args=<optimized out>) at ../xserver/hw/xwayland/xwayland.c:1147 #6 0x00007fcfe91e8cb8 in wl_log (fmt=0x7fcfe91e9102 "%s@%u: error %d: %s\n") at src/wayland-util.c:404 #7 0x00007fcfe91e3289 in display_handle_error (data=0x5582a9e268f0, display=0x5582a9e268f0, object=0x5582a9e2cc00, code=0, message=0x5582aa3c1af4 "authenicate failed") at src/wayland-client.c:898 #8 0x00007fcfe7ce11c8 in ffi_call_unix64 () from /usr/lib/libffi.so.6 #9 0x00007fcfe7ce0c2a in ffi_call () from /usr/lib/libffi.so.6 #10 0x00007fcfe91e7160 in wl_closure_invoke (closure=0x5582aa3c1a10, flags=1, target=0x5582a9e268f0, opcode=0, data=0x5582a9e268f0) at src/connection.c:1006 #11 0x00007fcfe91e417f in dispatch_event (display=0x5582a9e268f0, queue=0x5582a9e269a0) at src/wayland-client.c:1427 #12 0x00007fcfe91e443e in dispatch_queue (display=0x5582a9e268f0, queue=0x5582a9e269b8) at src/wayland-client.c:1566 #13 0x00007fcfe91e4766 in wl_display_dispatch_queue_pending (display=0x5582a9e268f0, queue=0x5582a9e269b8) at src/wayland-client.c:1815 #14 0x00007fcfe91e47d1 in wl_display_dispatch_pending (display=0x5582a9e268f0) at src/wayland-client.c:1878 #15 0x00005582a93c360b in xwl_read_events (xwl_screen=<optimized out>, xwl_screen=<optimized out>) at ../xserver/hw/xwayland/xwayland.c:820 #16 0x00005582a92a4d21 in ospoll_wait (ospoll=0x5582a9e1abe0, timeout=<optimized out>) at ../xserver/os/ospoll.c:651 #17 0x00005582a936c850 in WaitForSomething (are_ready=0) at ../xserver/os/WaitFor.c:207 #18 Dispatch () at ../xserver/dix/dispatch.c:421 #19 0x00005582a9266ced in dix_main (envp=<optimized out>, argv=<optimized out>, argc=<optimized out>) at ../xserver/dix/main.c:276 #20 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../xserver/dix/stubmain.c:34 ```
That "authenicate failed" (with the typo in "authenicate") is typical of drm authentication in Mesa: 175 static void 176 drm_authenticate(struct wl_client *client, 177 struct wl_resource *resource, uint32_t id) 178 { 179 struct wl_drm *drm = wl_resource_get_user_data(resource); 180 181 if (drm->callbacks.authenticate(drm->user_data, id) < 0) 182 wl_resource_post_error(resource, 183 WL_DRM_ERROR_AUTHENTICATE_FAIL, 184 "authenicate failed"); 185 else 186 wl_resource_post_event(resource, WL_DRM_AUTHENTICATED); 187 } https://cgit.freedesktop.org/mesa/mesa/tree/src/egl/wayland/wayland-drm/wayland-drm.c#n175 Does the issue occurs if you wait for the X client to start before switching to another tty?
Nope, it only happens when the session has been started. But there's some application being spawned, whilst active screen is TTY with no graphics. And the Xwayland operates at another TTY screen.
What library provides drm_authenticate method? I can recompile that one with debug symbols and set a breakpoint.
(In reply to Siarhei from comment #0) > Rollback to xorg-server-1.20.0 fixes the issue. Well, apparently it doesn't.
Yeah, I think this is neither a regression nor a bug actually, by switching VT while Xwayland is starting, you hinder drm auth and Xwayland can't start.
In the era of render nodes, why does Xwayland still need to use DRM auth? Wouldn't that be a valid feature request, to avoid DRM auth?
(In reply to Pekka Paalanen from comment #9) > In the era of render nodes, why does Xwayland still need to use DRM auth? > Wouldn't that be a valid feature request, to avoid DRM auth? Actually, it does: https://gitlab.freedesktop.org/xorg/xserver/blob/master/hw/xwayland/xwayland-glamor-gbm.c#L550
Ok, so it's actually related to the problem that wl_drm is still be only way for a client to know which device the compositor is using. Why wouldn't Xwayland look at the device and open the corresponding render node instead if there is one? Maybe the drmDevice API in libdrm would make that easy and portable nowadays? Do we have an issue filed anywhere for making a companion extension to zwp_linux_dmabuf so that clients can find out which device their buffers should at least work on?
(In reply to Pekka Paalanen from comment #11) > Ok, so it's actually related to the problem that wl_drm is still be only way > for a client to know which device the compositor is using. > > Why wouldn't Xwayland look at the device and open the corresponding render > node instead if there is one? Maybe the drmDevice API in libdrm would make > that easy and portable nowadays? Yeap, I reckon it should work, I can give that a try. > Do we have an issue filed anywhere for making a companion extension to > zwp_linux_dmabuf so that clients can find out which device their buffers > should at least work on? Not to my recollection, but maybe ask Daniel or Jonas?
> Do we have an issue filed anywhere for making a companion extension to > zwp_linux_dmabuf so that clients can find out which device their buffers > should at least work on? Afraid not, but doing so would allow us to kill wl_drm completely. I wrote one quickly though: https://gitlab.freedesktop.org/wayland/wayland/issues/5
(In reply to Daniel Stone from comment #13) > Afraid not, but doing so would allow us to kill wl_drm completely. I wrote > one quickly though: https://gitlab.freedesktop.org/wayland/wayland/issues/5 That should be https://gitlab.freedesktop.org/wayland/wayland/issues/59
https://gitlab.freedesktop.org/xorg/xserver/merge_requests/26 is proposing a workaround. ofourdan, should this be re-opened, or?
(In reply to Pekka Paalanen from comment #15) > https://gitlab.freedesktop.org/xorg/xserver/merge_requests/26 is proposing a > workaround. > > ofourdan, should this be re-opened, or? Sure, we can reopen.
Patch has landed in git master, re-closing.
I've tested the upstream build. There's no issue with a crash now. But Xwayland session doesn't process keyboard nor mouse events anymore. Looks like it's been detached from that session, or the device is not considered a real seat in terms of loginctl.
not too sure if bug109069 is relevant here - but it looks very similar. 1. start weston (nohup weston) 2. start firefox in weston to start xwayland 3. switch to tty, export DISPLAY=:1, firefox error message from firefox: libGL error: Connection closed during DRI3 initialization failureGdk-Message: 05:51:57.859: firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :1. error message from westons nohup.out: [05:51:57.855] libwayland: error in client communication (pid 29706) (EE) Fatal server error: (EE) wl_drm@6: error 0: authenicate failed (EE) [05:51:57.859] xserver exited, code 256
i forgot, this is with: xorg-server-xwayland 1.20.3-1
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.