Bug 108038

Summary: Xwayland crashes, display_handle_error, authenticate error inside libwayland-client.so, GUI app autostart whilst being switched to TTY
Product: Wayland Reporter: Siarhei <serega.belarus>
Component: XWaylandAssignee: Wayland bug list <wayland-bugs>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: xwayland core dump from coredumpctl, can be used inside gdb

Description Siarhei 2018-09-24 09:37:13 UTC
Created attachment 141708 [details]
xwayland core dump from coredumpctl, can be used inside gdb

archlinux distro, gnome as desktop environment, gdm as desktop manager.

Rollback to xorg-server-1.20.0 fixes the issue.


There's core file attached.

Coredump info:

```plain
           PID: 771 (Xwayland)
           UID: 1000 (nartes)
           GID: 1000 (nartes)
        Signal: 6 (ABRT)
     Timestamp: Sun 2018-09-23 17:04:52 +03 (56min ago)
  Command Line: /usr/bin/Xwayland :0 -rootless -terminate -accessx -core -listen 4 -listen 5 -displayfd 6
    Executable: /usr/bin/Xwayland
 Control Group: /user.slice/user-1000.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-1000.slice
       Session: 2
     Owner UID: 1000 (nartes)
       Boot ID: 4ad1bc740df242c2b7e786fccd484039
    Machine ID: ec8ac8cf4cb14010ac2f461877ef63ac
      Hostname: siarhei_hp
       Storage: /var/lib/systemd/coredump/core.Xwayland.1000.4ad1bc740df242c2b7e786fccd484039.771.1537711492000000.lz4
       Message: Process 771 (Xwayland) of user 1000 dumped core.
                
                Stack trace of thread 771:
                #0  0x00007f168a998d7f raise (libc.so.6)
                #1  0x00007f168a983672 abort (libc.so.6)
                #2  0x0000562ff139e05a n/a (Xwayland)
                #3  0x0000562ff1396425 n/a (Xwayland)
                #4  0x0000562ff14bd3ec n/a (Xwayland)
                #5  0x00007f168a273e8a n/a (libwayland-client.so.0)
                #6  0x00007f168a26f1f9 n/a (libwayland-client.so.0)
                #7  0x00007f1688d781c8 ffi_call_unix64 (libffi.so.6)
                #8  0x00007f1688d77c2a ffi_call (libffi.so.6)
                #9  0x00007f168a272f5f n/a (libwayland-client.so.0)
                #10 0x00007f168a26f6ca n/a (libwayland-client.so.0)
                #11 0x00007f168a270bdf wl_display_dispatch_queue_pending (libwayland-client.so.0)
                #12 0x0000562ff14bebdb n/a (Xwayland)
                #13 0x0000562ff139fdd1 n/a (Xwayland)
                #14 0x0000562ff1467ad0 n/a (Xwayland)
                #15 0x0000562ff136210d n/a (Xwayland)
                #16 0x00007f168a985223 __libc_start_main (libc.so.6)
                #17 0x0000562ff136313e n/a (Xwayland)
                
                Stack trace of thread 781:
                #0  0x00007f1689672afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f168655bf3c n/a (i965_dri.so)
                #2  0x00007f168655bc78 n/a (i965_dri.so)
                #3  0x00007f168966ca9d start_thread (libpthread.so.0)
                #4  0x00007f168aa5ca43 __clone (libc.so.6)
                
                Stack trace of thread 783:
                #0  0x00007f1689672afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f1684936054 n/a (swrast_dri.so)
                #2  0x00007f1684935f98 n/a (swrast_dri.so)
                #3  0x00007f168966ca9d start_thread (libpthread.so.0)
                #4  0x00007f168aa5ca43 __clone (libc.so.6)
                
                Stack trace of thread 782:
                #0  0x00007f1689672afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f1684936054 n/a (swrast_dri.so)
                #2  0x00007f1684935f98 n/a (swrast_dri.so)
                #3  0x00007f168966ca9d start_thread (libpthread.so.0)
                #4  0x00007f168aa5ca43 __clone (libc.so.6)

           PID: 789 (Xwayland)
           UID: 1000 (nartes)
           GID: 1000 (nartes)

```
Comment 1 Olivier Fourdan 2018-09-24 09:49:41 UTC
Unfortunately, the core file (attachment 141708 [details]) is pretty useless without the symbols.

Could you please provide a backtrace of the crash with the debugging symbols attached instead?
Comment 2 Siarhei 2018-09-24 09:55:49 UTC
Yes, it is without them. Cause archlinux doesn't add "-g", pretty standard pratice.
I have compiled upstream code, but yet didn't check whether it works.
Comment 3 Siarhei 2018-09-25 10:55:06 UTC
The problem seems to be an invalid display_handle, or authentication error. Just guessing from a stacktrace.

In practice I reproduce the problem in the following way.
A new gnome-session is started, which runs Xwayland.
Then I switch a tty terminal.
And during this time an autostart execute skype process.
Which crashes Xwayland finally.
So it's some a problem within gnome, skype, or etc.

I've cross checked xorg-server-1.20.0, ..., xorg-server.1.20.1, as well as master branch.
The problem is present at all of them.

```plain
(gdb) where
#0  0x00007fcfe9b25d7f in raise () from /usr/lib/libc.so.6
#1  0x00007fcfe9b10672 in abort () from /usr/lib/libc.so.6
#2  0x00005582a92a2fea in OsAbort () at ../xserver/os/utils.c:1350
#3  0x00005582a929b3e5 in AbortServer () at ../xserver/os/log.c:877
#4  FatalError (f=<optimized out>) at ../xserver/os/log.c:1015
#5  0x00005582a93c1e7c in xwl_log_handler (format=<optimized out>, args=<optimized out>) at ../xserver/hw/xwayland/xwayland.c:1147
#6  0x00007fcfe91e8cb8 in wl_log (fmt=0x7fcfe91e9102 "%s@%u: error %d: %s\n") at src/wayland-util.c:404
#7  0x00007fcfe91e3289 in display_handle_error (data=0x5582a9e268f0, display=0x5582a9e268f0, object=0x5582a9e2cc00, code=0, 
    message=0x5582aa3c1af4 "authenicate failed") at src/wayland-client.c:898
#8  0x00007fcfe7ce11c8 in ffi_call_unix64 () from /usr/lib/libffi.so.6
#9  0x00007fcfe7ce0c2a in ffi_call () from /usr/lib/libffi.so.6
#10 0x00007fcfe91e7160 in wl_closure_invoke (closure=0x5582aa3c1a10, flags=1, target=0x5582a9e268f0, opcode=0, data=0x5582a9e268f0) at src/connection.c:1006
#11 0x00007fcfe91e417f in dispatch_event (display=0x5582a9e268f0, queue=0x5582a9e269a0) at src/wayland-client.c:1427
#12 0x00007fcfe91e443e in dispatch_queue (display=0x5582a9e268f0, queue=0x5582a9e269b8) at src/wayland-client.c:1566
#13 0x00007fcfe91e4766 in wl_display_dispatch_queue_pending (display=0x5582a9e268f0, queue=0x5582a9e269b8) at src/wayland-client.c:1815
#14 0x00007fcfe91e47d1 in wl_display_dispatch_pending (display=0x5582a9e268f0) at src/wayland-client.c:1878
#15 0x00005582a93c360b in xwl_read_events (xwl_screen=<optimized out>, xwl_screen=<optimized out>) at ../xserver/hw/xwayland/xwayland.c:820
#16 0x00005582a92a4d21 in ospoll_wait (ospoll=0x5582a9e1abe0, timeout=<optimized out>) at ../xserver/os/ospoll.c:651
#17 0x00005582a936c850 in WaitForSomething (are_ready=0) at ../xserver/os/WaitFor.c:207
#18 Dispatch () at ../xserver/dix/dispatch.c:421
#19 0x00005582a9266ced in dix_main (envp=<optimized out>, argv=<optimized out>, argc=<optimized out>) at ../xserver/dix/main.c:276
#20 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../xserver/dix/stubmain.c:34

```
Comment 4 Olivier Fourdan 2018-09-25 11:24:59 UTC
That "authenicate failed" (with the typo in "authenicate") is typical of drm authentication in Mesa:

175 static void
176 drm_authenticate(struct wl_client *client,
177                  struct wl_resource *resource, uint32_t id)
178 {
179         struct wl_drm *drm = wl_resource_get_user_data(resource);
180 
181         if (drm->callbacks.authenticate(drm->user_data, id) < 0)
182                 wl_resource_post_error(resource,
183                                        WL_DRM_ERROR_AUTHENTICATE_FAIL,
184                                        "authenicate failed");
185         else
186                 wl_resource_post_event(resource, WL_DRM_AUTHENTICATED);
187 }


https://cgit.freedesktop.org/mesa/mesa/tree/src/egl/wayland/wayland-drm/wayland-drm.c#n175

Does the issue occurs if you wait for the X client to start before switching to another tty?
Comment 5 Siarhei 2018-09-25 11:39:44 UTC
Nope, it only happens when the session has been started. But there's some application being spawned, whilst active screen is TTY with no graphics. And the Xwayland operates at another TTY screen.
Comment 6 Siarhei 2018-09-25 11:40:45 UTC
What library provides drm_authenticate method?
I can recompile that one with debug symbols and set a breakpoint.
Comment 7 Siarhei 2018-09-25 11:42:37 UTC
(In reply to Siarhei from comment #0)
> Rollback to xorg-server-1.20.0 fixes the issue.
Well, apparently it doesn't.
Comment 8 Olivier Fourdan 2018-09-25 14:27:42 UTC
Yeah, I think this is neither a regression nor a bug actually, by switching VT while Xwayland is starting, you hinder drm auth and Xwayland can't start.
Comment 9 Pekka Paalanen 2018-09-26 08:26:31 UTC
In the era of render nodes, why does Xwayland still need to use DRM auth?
Wouldn't that be a valid feature request, to avoid DRM auth?
Comment 10 Olivier Fourdan 2018-09-26 08:46:38 UTC
(In reply to Pekka Paalanen from comment #9)
> In the era of render nodes, why does Xwayland still need to use DRM auth?
> Wouldn't that be a valid feature request, to avoid DRM auth?

Actually, it does:

https://gitlab.freedesktop.org/xorg/xserver/blob/master/hw/xwayland/xwayland-glamor-gbm.c#L550
Comment 11 Pekka Paalanen 2018-09-26 09:18:57 UTC
Ok, so it's actually related to the problem that wl_drm is still be only way for a client to know which device the compositor is using.

Why wouldn't Xwayland look at the device and open the corresponding render node instead if there is one? Maybe the drmDevice API in libdrm would make that easy and portable nowadays?

Do we have an issue filed anywhere for making a companion extension to zwp_linux_dmabuf so that clients can find out which device their buffers should at least work on?
Comment 12 Olivier Fourdan 2018-09-26 12:41:42 UTC
(In reply to Pekka Paalanen from comment #11)
> Ok, so it's actually related to the problem that wl_drm is still be only way
> for a client to know which device the compositor is using.
> 
> Why wouldn't Xwayland look at the device and open the corresponding render
> node instead if there is one? Maybe the drmDevice API in libdrm would make
> that easy and portable nowadays?

Yeap, I reckon it should work, I can give that a try.

> Do we have an issue filed anywhere for making a companion extension to
> zwp_linux_dmabuf so that clients can find out which device their buffers
> should at least work on?

Not to my recollection, but maybe ask Daniel or Jonas?
Comment 13 Daniel Stone 2018-09-26 13:17:52 UTC
> Do we have an issue filed anywhere for making a companion extension to
> zwp_linux_dmabuf so that clients can find out which device their buffers
> should at least work on?

Afraid not, but doing so would allow us to kill wl_drm completely. I wrote one quickly though: https://gitlab.freedesktop.org/wayland/wayland/issues/5
Comment 14 Daniel Stone 2018-09-26 13:18:11 UTC
(In reply to Daniel Stone from comment #13)
> Afraid not, but doing so would allow us to kill wl_drm completely. I wrote
> one quickly though: https://gitlab.freedesktop.org/wayland/wayland/issues/5

That should be https://gitlab.freedesktop.org/wayland/wayland/issues/59
Comment 15 Pekka Paalanen 2018-09-27 10:40:40 UTC
https://gitlab.freedesktop.org/xorg/xserver/merge_requests/26 is proposing a workaround.

ofourdan, should this be re-opened, or?
Comment 16 Olivier Fourdan 2018-09-27 11:15:30 UTC
(In reply to Pekka Paalanen from comment #15)
> https://gitlab.freedesktop.org/xorg/xserver/merge_requests/26 is proposing a
> workaround.
> 
> ofourdan, should this be re-opened, or?

Sure, we can reopen.
Comment 17 Olivier Fourdan 2018-10-01 15:48:26 UTC
Patch has landed in git master, re-closing.
Comment 18 Siarhei 2018-10-03 07:45:20 UTC
I've tested the upstream build.
There's no issue with a crash now.
But Xwayland session doesn't process keyboard nor mouse events anymore.
Looks like it's been detached from that session, or the device is not considered a real seat in terms of loginctl.
Comment 19 soloturn 2018-12-16 05:36:17 UTC
not too sure if bug109069 is relevant here - but it looks very similar.

1. start weston (nohup weston)
2. start firefox in weston to start xwayland
3. switch to tty, export DISPLAY=:1, firefox

error message from firefox:

libGL error: Connection closed during DRI3 initialization failureGdk-Message: 05:51:57.859: firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :1.

error message from westons nohup.out:

[05:51:57.855] libwayland: error in client communication (pid 29706)
(EE) 
Fatal server error:
(EE) wl_drm@6: error 0: authenicate failed
(EE) 
[05:51:57.859] xserver exited, code 256
Comment 20 soloturn 2018-12-16 05:37:10 UTC
i forgot, this is with:
xorg-server-xwayland 1.20.3-1

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.