107213 – [amdgpu/DisplayPort] KDE Wayland session is segfaulting right after login

Bug 107213 - [amdgpu/DisplayPort] KDE Wayland session is segfaulting right after login

Summary: [amdgpu/DisplayPort] KDE Wayland session is segfaulting right after login

Status:	RESOLVED MOVED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/AMDgpu (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-07-13 01:46 UTC by Shmerl
Modified:	2019-11-19 08:43 UTC (History)
CC List:	8 users (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg output (87.31 KB, text/plain) 2018-07-13 14:12 UTC, Shmerl	no flags	Details
amdgpu crash in dmesg output (109.91 KB, text/plain) 2018-09-03 01:53 UTC, george	no flags	Details
dmesg crash output (86.32 KB, text/plain) 2018-09-04 09:19 UTC, Pau Ruiz Safont	no flags	Details
dmesg error msg while suspending. (8.24 KB, text/plain) 2018-09-19 07:04 UTC, Lee Donaghy	no flags	Details
View All

Description Shmerl 2018-07-13 01:46:10 UTC

I just tried to use a Wayland session with KDE Plasma 5.13.2 (Debian testing) and it's segfaulting right after login and falling back to sddm.

That's what I see in dmesg:

[  176.359816] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:956
[  176.814144] QThread[2620]: segfault at f ip 00007f30f30b0e60 sp 00007f30e506e4f0 error 4 in libwayland-client.so.0.3.0[7f30f30a9000+d000]

Not sure whether it's amdgpu problem of something with KWin.

GPU: AMD Vega 56, connected over DisplayPort.

OpenGL renderer string: Radeon RX Vega (VEGA10, DRM 3.25.0, 4.17.0-1-amd64, LLVM 6.0.1)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.1.3

Corresponding KDE bug: https://bugs.kde.org/show_bug.cgi?id=396066

Comment 1 Michel Dänzer 2018-07-13 07:55:30 UTC

Please attach the full dmesg.

Comment 2 Shmerl 2018-07-13 14:12:53 UTC

Created attachment 140624 [details]
dmesg output

See attached dmesg output. Once thing to note, I consistently get a black screen after boot (the monitor goes into a sleep mode). I need to turn monitor off, turn it back on, switch to tty1 (then monitor turns on), then log in there and restart sddm. Only then sddm appears on tty7. After that, Wayland session log-in fails (I tried a couple of times which is reflected in dmesg).

First try didn't result in segfault in dmesg, just in *ERROR* REG_WAIT, but second attempt also added as segfault.

Comment 3 Shmerl 2018-08-27 03:11:52 UTC

Did anyone manage to narrow down the cause?

Comment 4 Shmerl 2018-08-29 04:27:49 UTC

Still a problem with kernel 4.18.5 and latest firmware for Vega (20180825).

Except now, the session doesn't crash but just hangs with black screen.

Similar dmesg can be seen:

[  162.743804] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif line:636
[  162.743830] WARNING: CPU: 6 PID: 1575 at /build/linux-ETX4PU/linux-4.18.5/drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:254 generic_reg_wait+0xe8/0x160 [amdgpu]

Comment 5 george 2018-09-03 01:51:16 UTC

Hello, found this bug via web search. I am experiencing the *exact* same bug. I'm running Fedora 28 with MATE desktop, so I'm confident this is not a KDE problem.

My error message:
[39911.150851] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:956
[39911.150927] WARNING: CPU: 5 PID: 1452 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:195 generic_reg_wait+0xe7/0x160 [amdgpu]

Mobo: Supermicro X9SRL
CPU: intel Xeon E5 2680 v2
GPU: MSI Radeon RX 480 4GB with latest polaris10 bin file
Kernel: 4.17.19
Mesa: 18.0.5

This must be an AMDGPU driver bug. I'm also connected via DisplayPort and I frequently get the monitor sleeping to powersave mode during boot. If I switch to TTY1 and do a Ctrl-Alt-Del reboot, it usually boots up normally after the blind "three finger salute" reboot.

Comment 6 george 2018-09-03 01:53:43 UTC

Created attachment 141418 [details]
amdgpu crash in dmesg output

Comment 7 Shmerl 2018-09-03 02:00:08 UTC

I'm waiting for kernel 4.19.x to see if it improves anything, since it apparently had some fix that looks related:

https://lists.freedesktop.org/archives/dri-devel/2018-August/185123.html

> drm/amd/display: Fix Vega10 black screen after mode change

Comment 8 Pau Ruiz Safont 2018-09-04 09:17:31 UTC

Same issue here, with the same error on dmesg:

GPU: R9 380 connected over Displayport
Monitor: DELL U2515H
CPU: AMD Ryzen 7 1700
Motherboard: ASRock AB350 Gaming-ITX/ac

OpenGL Renderer: AMD Radeon R9 380 Series (TONGA DRM 3.26.0 4.18.5-1-MANJARO LLVM 6.0.1)
OpenGL version: 4.5 Mesa 18.1.7

I'm using KDE Plasma 5.13.4

Comment 9 Pau Ruiz Safont 2018-09-04 09:19:13 UTC

Created attachment 141438 [details]
dmesg crash output

Comment 10 Sylvain BERTRAND 2018-09-04 11:45:28 UTC

May be related to (if your kernel have the same faulty commit):
https://bugs.freedesktop.org/show_bug.cgi?id=107784

Comment 11 Nicholas Kazlauskas 2018-09-04 12:32:22 UTC

I'm inclined to believe that is a userspace issue.

I can observe the crash happening on the newest stable Ubuntu/Debian releases. However, the crash does *not* occur for distributions that have newer userspace and kernel configurations (Fedora, Arch). I can boot and use Wayland under this ASIC and many others.

That said, I haven't done investigation into the root cause of the issue. Might be worth looking into a bisection on Wayland or the kernel. It shouldn't be specific to a particular ASIC at least.

Comment 12 Sylvain BERTRAND 2018-09-04 18:12:53 UTC

May be different then, because my bug
https://bugs.freedesktop.org/show_bug.cgi?id=107784 is with git userspace no
older than a few days, and displayport is broken whatever the screen resolution.
I did manually bisect the kernel and found the faulty commit though, I guess
the guys in amd are now looking into it.

Comment 13 Shmerl 2018-09-05 12:52:16 UTC

I have a recent kernel and userland stack with Debian testing, it's still crashing and falling back into sddm. But with newest kernel I don't see the segfault message in dmesg anymore.

Linux 4.19.0-rc2-amd64 #1 SMP Debian 4.19~rc2-1~exp1 (2018-09-03) x86_64 GNU/Linux

firmware-amd-graphics: 20180825+dfsg-1

I see this in dmesg:

[   21.111724] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:922
[   21.111795] WARNING: CPU: 6 PID: 153 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:254 generic_reg_wait+0xe7/0x160 [amdgpu]
[   21.111796] Modules linked in: devlink(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) cmac(E) bnep(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) arc4(E) amdkfd(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) uvcvideo(E) edac_mce_amd(E) btusb(E) snd_hda_codec_hdmi(E) btrtl(E) mxm_wmi(E) wmi_bmof(E) videobuf2_vmalloc(E) btbcm(E) btintel(E) amdgpu(E) kvm_amd(E) videobuf2_memops(E) iwlmvm(E) bluetooth(E) snd_hda_intel(E) chash(E) videobuf2_v4l2(E) kvm(E) irqbypass(E) snd_usb_audio(E) gpu_sched(E) snd_hda_codec(E) mac80211(E) snd_usbmidi_lib(E) videobuf2_common(E) snd_hda_core(E) crct10dif_pclmul(E) jitterentropy_rng(E) crc32_pclmul(E) ttm(E) snd_rawmidi(E) snd_seq_device(E) snd_hwdep(E) efi_pstore(E) videodev(E) evdev(E) drm_kms_helper(E) ghash_clmulni_intel(E)
[   21.111824]  snd_pcm(E) iwlwifi(E) pcspkr(E) drbg(E) efivars(E) media(E) drm(E) ansi_cprng(E) snd_timer(E) cfg80211(E) ecdh_generic(E) snd(E) soundcore(E) rfkill(E) sp5100_tco(E) crc16(E) k10temp(E) ccp(E) rng_core(E) sg(E) wmi(E) pcc_cpufreq(E) button(E) acpi_cpufreq(E) nct6775(E) hwmon_vid(E) parport_pc(E) ppdev(E) lp(E) parport(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) xfs(E) btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) crc32c_intel(E) ahci(E) xhci_pci(E) aesni_intel(E) aes_x86_64(E) libahci(E) crypto_simd(E) xhci_hcd(E) igb(E) cryptd(E) glue_helper(E) libata(E) i2c_piix4(E) nvme(E) i2c_algo_bit(E) dca(E) usbcore(E) scsi_mod(E) usb_common(E) nvme_core(E) gpio_amdpt(E) gpio_generic(E)
[   21.111860] CPU: 6 PID: 153 Comm: kworker/6:3 Tainted: G            E     4.19.0-rc2-amd64 #1 Debian 4.19~rc2-1~exp1
[   21.111861] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS L4.64 04/03/2018
[   21.111877] Workqueue: events drm_mode_rmfb_work_fn [drm]
[   21.111931] RIP: 0010:generic_reg_wait+0xe7/0x160 [amdgpu]
[   21.111932] Code: 44 24 58 8b 54 24 48 89 de 44 89 4c 24 08 48 8b 4c 24 50 48 c7 c7 20 dd 1e c2 e8 64 76 ab fe 83 7d 18 01 44 8b 4c 24 08 74 02 <0f> 0b 48 83 c4 10 44 89 c8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 0f
[   21.111933] RSP: 0018:ffffaf830207fa20 EFLAGS: 00010297
[   21.111935] RAX: 0000000000000000 RBX: 000000000000000a RCX: 0000000000000000
[   21.111936] RDX: 0000000000000000 RSI: ffff96dcceb966a8 RDI: ffff96dcceb966a8
[   21.111937] RBP: ffff96dcc61f1700 R08: 0000000000000005 R09: 0000000000010200
[   21.111938] R10: 0000000000000498 R11: ffffffff9a1dc6ed R12: 0000000000000bb9
[   21.111938] R13: 00000000000051e2 R14: 0000000000010000 R15: 0000000000000000
[   21.111940] FS:  0000000000000000(0000) GS:ffff96dcceb80000(0000) knlGS:0000000000000000
[   21.111941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.111942] CR2: 000055ab41a08358 CR3: 00000003f4b52000 CR4: 00000000003406e0
[   21.111943] Call Trace:
[   21.112005]  dce110_stream_encoder_dp_blank+0x12c/0x1a0 [amdgpu]
[   21.112061]  core_link_disable_stream+0x54/0x220 [amdgpu]
[   21.112116]  dce110_reset_hw_ctx_wrap+0xc1/0x1e0 [amdgpu]
[   21.112170]  dce110_apply_ctx_to_hw+0x45/0x650 [amdgpu]
[   21.112224]  ? dc_remove_plane_from_context+0x1fc/0x240 [amdgpu]
[   21.112276]  dc_commit_state+0x2c6/0x520 [amdgpu]
[   21.112334]  amdgpu_dm_atomic_commit_tail+0x37a/0xd80 [amdgpu]
[   21.112338]  ? __wake_up_common_lock+0x89/0xc0
[   21.112341]  ? _cond_resched+0x15/0x30
[   21.112342]  ? wait_for_completion_timeout+0x3b/0x1a0
[   21.112399]  ? amdgpu_dm_atomic_commit_tail+0xd80/0xd80 [amdgpu]
[   21.112407]  commit_tail+0x3d/0x70 [drm_kms_helper]
[   21.112414]  drm_atomic_helper_commit+0xb4/0x120 [drm_kms_helper]
[   21.112428]  drm_framebuffer_remove+0x361/0x410 [drm]
[   21.112442]  drm_mode_rmfb_work_fn+0x4f/0x60 [drm]
[   21.112446]  process_one_work+0x1a7/0x360
[   21.112447]  worker_thread+0x30/0x390
[   21.112449]  ? pwq_unbound_release_workfn+0xd0/0xd0
[   21.112451]  kthread+0x112/0x130
[   21.112452]  ? kthread_bind+0x30/0x30
[   21.112454]  ret_from_fork+0x22/0x40
[   21.112456] ---[ end trace b22dbbbbffd241d9 ]---

Comment 14 Shmerl 2018-09-05 13:03:00 UTC

Correction, segfault is still happening, it's just not consistent (not every time).

[  683.792530] QThread[3520]: segfault at f ip 00007f41f9f2ae60 sp 00007f41f19d4500 error 4 in libwayland-client.so.0.3.0[7f41f9f23000+d000]
[  683.792538] Code: 48 83 c4 10 5b c3 e8 cf d1 ff ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 41 55 41 54 49 89 cc 55 53 48 89 fb 48 83 ec 08 <48> 8b 7f 08 44 0f b6 07 45 84 c0 0f 84 17 01 00 00 48 89 f8 44 89

Comment 15 Shmerl 2018-09-14 13:12:38 UTC

I managed to make it produce a core. It's from kwin_wayland. After installing needed debug symbol packages, here is a backtrace:

Core was generated by `/usr/bin/kwin_wayland --xwayland --libinput --exit-with-session=/usr/lib/x86_64'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007eff59760f30 in wl_closure_init (message=message@entry=0x7, size=size@entry=52, num_arrays=num_arrays@entry=0x7eff5140858c, args=args@entry=0x0) at ../src/connection.c:562
562     ../src/connection.c: No such file or directory.
[Current thread is 1 (Thread 0x7eff51409700 (LWP 7249))]
(gdb) bt
#0  0x00007eff59760f30 in wl_closure_init (message=message@entry=0x7, size=size@entry=52, num_arrays=num_arrays@entry=0x7eff5140858c, args=args@entry=0x0) at ../src/connection.c:562
#1  0x00007eff59761aa0 in wl_connection_demarshal (connection=0x7eff440053e0, size=size@entry=52, objects=objects@entry=0x7eff440052e8, message=0x7) at ../src/connection.c:698
#2  0x00007eff5975fae8 in queue_event (len=52, display=0x7eff44005270) at ../src/wayland-client.c:1364
#3  read_events (display=0x7eff44005270) at ../src/wayland-client.c:1466
#4  wl_display_read_events (display=display@entry=0x7eff44005270) at ../src/wayland-client.c:1549
#5  0x00007eff59760169 in wl_display_dispatch_queue (display=0x7eff44005270, queue=0x7eff44005338) at ../src/wayland-client.c:1788
#6  0x00007eff5d123933 in KWayland::Client::ConnectionThread::Private::<lambda()>::operator() (__closure=0x7eff44009550) at ./src/client/connection_thread.cpp:129
#7  QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, KWayland::Client::ConnectionThread::Private::setupSocketNotifier()::<lambda()> >::call (arg=<optimized out>, 
    f=...) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:128
#8  QtPrivate::Functor<KWayland::Client::ConnectionThread::Private::setupSocketNotifier()::<lambda()>, 0>::call<QtPrivate::List<>, void> (arg=<optimized out>, f=...)
    at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:238
#9  QtPrivate::QFunctorSlotObject<KWayland::Client::ConnectionThread::Private::setupSocketNotifier()::<lambda()>, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void **, bool *) (which=<optimized out>, this_=0x7eff44009540, r=<optimized out>, a=<optimized out>, ret=<optimized out>)
    at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:421
#10 0x00007eff5e606910 in QtPrivate::QSlotObjectBase::call (a=0x7eff514087d0, r=0x564a80be84f0, this=0x7eff44009540) at ../../include/QtCore/../../src/corelib/kernel/qobjectdefs_impl.h:376
#11 QMetaObject::activate(QObject*, int, int, void**) () at kernel/qobject.cpp:3754
#12 0x00007eff5e606dd7 in QMetaObject::activate (sender=sender@entry=0x7eff44009440, m=m@entry=0x7eff5e863c60 <QSocketNotifier::staticMetaObject>, 
    local_signal_index=local_signal_index@entry=0, argv=argv@entry=0x7eff514087d0) at kernel/qobject.cpp:3633
#13 0x00007eff5e611ff9 in QSocketNotifier::activated (this=this@entry=0x7eff44009440, _t1=<optimized out>, _t2=...) at .moc/moc_qsocketnotifier.cpp:136
#14 0x00007eff5e612341 in QSocketNotifier::event (this=0x7eff44009440, e=0x7eff51408a30) at kernel/qsocketnotifier.cpp:266
#15 0x00007eff5e9cb4a1 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#16 0x00007eff5e9d2ae0 in QApplication::notify(QObject*, QEvent*) () from /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#17 0x00007eff5e5dd579 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at ../../include/QtCore/5.11.1/QtCore/private/../../../../../src/corelib/thread/qthread_p.h:307
#18 0x00007eff5e62fe4a in QCoreApplication::sendEvent (event=0x7eff51408a30, receiver=<optimized out>) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:234
#19 socketNotifierSourceDispatch(_GSource*, int (*)(void*), void*) () at kernel/qeventdispatcher_glib.cpp:106
#20 0x00007eff5a647287 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#21 0x00007eff5a6474c0 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#22 0x00007eff5a64754c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#23 0x00007eff5e62f223 in QEventDispatcherGlib::processEvents (this=0x7eff44000b20, flags=...) at kernel/qeventdispatcher_glib.cpp:423
#24 0x00007eff5e5dc24b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at ../../include/QtCore/../../src/corelib/global/qflags.h:140
#25 0x00007eff5e42b176 in QThread::exec() () at ../../include/QtCore/../../src/corelib/global/qflags.h:120
#26 0x00007eff5e434d47 in QThreadPrivate::start(void*) () at thread/qthread_unix.cpp:367
#27 0x00007eff5efb5f2a in start_thread (arg=0x7eff51409700) at pthread_create.c:463
#28 0x00007eff5e0fdedf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment 16 Michel Dänzer 2018-09-14 14:27:43 UTC

kwin crashes in libwayland code. You should probably report this to libwayland (uses Gitlab issues now) and/or kwin.

Comment 17 Shmerl 2018-09-14 14:35:36 UTC

Thanks. I already reported it for KWin (linked above): https://bugs.kde.org/show_bug.cgi?id=396066

I'll open libwayland bug too.

Comment 18 Shmerl 2018-09-14 15:10:00 UTC

Corresponding wayland-client bug: https://gitlab.freedesktop.org/wayland/wayland/issues/56

Comment 19 Lee Donaghy 2018-09-19 07:04:38 UTC

Created attachment 141650 [details]
dmesg error msg while suspending.

Found this bug report from a google search, i'm not using wayland and the machine appears to have suspended and resumed fine but i did happen to see the same error in the logs.
posting in case it help narrow down the problem.


System:    Host: Plasma Kernel: 4.18.8-arch1-1-ARCH x86_64 bits: 64 Desktop: KDE Plasma 5.13.5 
           Distro: Antergos Linux 
CPU:       Topology: 6-Core model: Intel Core i7-5820K bits: 64 type: MT MCP L2 cache: 15.0 MiB 
           Speed: 2697 MHz min/max: 1200/3600 MHz Core speeds (MHz): 1: 2292 2: 1949 3: 2333 
           4: 2576 5: 2371 6: 3401 7: 2804 8: 2979 9: 2767 10: 2782 11: 3069 12: 3402 
Graphics:  Card-1: AMD Vega 10 XT [Radeon RX Vega 64] driver: amdgpu v: kernel 
           Display: x11 server: X.Org 1.20.1 driver: modesetting unloaded: fbdev,vesa 
           resolution: 2560x1440~144Hz, 1280x720~60Hz 
           OpenGL: renderer: Radeon RX Vega (VEGA10 DRM 3.26.0 4.18.8-arch1-1-ARCH LLVM 6.0.1) 
           v: 4.5 Mesa 18.2.0

Comment 20 Shmerl 2018-09-26 03:20:33 UTC

I also opened downstream Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=909636

Comment 21 Shmerl 2018-10-03 15:37:52 UTC

I tested it with Intel GPU recently, and it doesn't crash. So it's amdgpu specific.

Comment 22 Shmerl 2018-10-11 16:34:03 UTC

It looks like it's related to https://bugs.freedesktop.org/show_bug.cgi?id=107978

My monitor (Dell U2413) has a setting for toggling DisplayPort 1.2. When I disable it, Wayland Plasma session isn't crashing anymore and is logging in properly! So it's likely an amdgpu issue actually.

Comment 23 Martin Peres 2019-11-19 08:43:28 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/445.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.