Summary: | AMDGPU RIP: dm_update_crtcs_state on kernel 4.17rc2 | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Kevin McCormack <harlemsquirrel> | ||||||||||||||
Component: | DRM/AMDgpu | Assignee: | Leo Li <sunpeng.li> | ||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||||||||
Severity: | major | ||||||||||||||||
Priority: | high | CC: | bjo, bugs.freedesktop.org, cig, ebiggers3, harry.wentland, jonemilj, levis.kool, sunpeng.li | ||||||||||||||
Version: | unspecified | ||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||
OS: | Linux (All) | ||||||||||||||||
Whiteboard: | |||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||
Attachments: |
|
This happened to me too: the same BUG() with the same stacktrace, on 4.17-rc2. It was when I was away, so maybe it happened when the screen turned off due to inactivity. Graphics card is an Radeon RX 550, one monitor connected via HDMI. Didn't happen with 4.16. I did not test 4.17-rc1. Created attachment 139197 [details]
dmesg output linux 4.17 rc2
Comment on attachment 139197 [details]
dmesg output linux 4.17 rc2
This is from one boot to the next boot. I booted the system at 18:22 and I was using it for upto 20:00. I went for a cup of tea and system froze at 20:08.
Then I've booted the system again.
I am using a hand compiled linux kernel 4.17-rc2
I know that I left the system at 20:00 Apr 28 20:08:28 levis-desktop kernel: kernel BUG at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4700! Apr 28 20:08:28 levis-desktop kernel: RIP: 0010:dm_update_crtcs_state+0x347/0x3c0 [amdgpu] Apr 28 20:08:28 levis-desktop kernel: amdgpu_dm_atomic_check+0x191/0x3e0 [amdgpu] Apr 28 20:08:28 levis-desktop kernel: amdgpu_drm_ioctl+0x49/0x80 [amdgpu] Apr 28 20:08:28 levis-desktop kernel: RIP: dm_update_crtcs_state+0x347/0x3c0 [amdgpu] RSP: ffffa87b814bbb18 System: Host: levis-desktop Kernel: 4.17.0-rc2-next-20180426-ARCH x86_64 bits: 64 Desktop: KDE Plasma 5.12.4 Distro: Arch Linux Machine: Type: Desktop Mobo: Micro-Star model: A320M PRO-VD PLUS (MS-7B38) v: 1.0 serial: N/A UEFI: American Megatrends v: 1.80 date: 03/15/2018 CPU: Topology: Quad Core model: AMD Ryzen 5 2400G with Radeon Vega Graphics bits: 64 type: MT MCP L2 cache: 2048 KiB Speed: 1368 MHz min/max: 1600/3600 MHz Core speeds (MHz): 1: 1611 2: 1381 3: 1373 4: 1470 5: 1834 6: 2727 7: 1375 8: 1376 Graphics: Card-1: AMD Raven Bridge [Radeon Vega Series / Radeon Vega Mobile Series] driver: amdgpu v: kernel Display: x11 server: X.Org 1.19.6 driver: modesetting unloaded: ati,fbdev,vesa resolution: 1600x900~60Hz OpenGL: renderer: AMD RAVEN (DRM 3.25.0 / 4.17.0-rc2-next-20180426-ARCH LLVM 6.0.0) v: 4.5 Mesa 18.0.1 I might be having the same issue, however my stack looks a bit different, perhaps due to different kernel commits and the way it is triggered. Can you guys trigger this hang by locking the window manager session/screen and waiting for the display(s) to go to sleep? That's how I can reliably trigger my similar hang. april 28 20:16:49 beist.localdomain kernel: kernel BUG at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4708! april 28 20:16:49 beist.localdomain kernel: invalid opcode: 0000 [#1] SMP NOPTI april 28 20:16:49 beist.localdomain kernel: RIP: 0010:dm_update_crtcs_state+0x397/0x410 [amdgpu] ... april 28 20:16:49 beist.localdomain kernel: Call Trace: april 28 20:16:49 beist.localdomain kernel: amdgpu_dm_atomic_check+0x182/0x3b0 [amdgpu] april 28 20:16:49 beist.localdomain kernel: drm_atomic_check_only+0x33a/0x4f0 [drm] april 28 20:16:49 beist.localdomain kernel: drm_atomic_commit+0x13/0x50 [drm] april 28 20:16:49 beist.localdomain kernel: drm_atomic_connector_commit_dpms+0xe5/0xf0 [drm] april 28 20:16:49 beist.localdomain kernel: drm_mode_obj_set_property_ioctl+0x174/0x290 [drm] april 28 20:16:49 beist.localdomain kernel: ? drm_mode_connector_set_obj_prop+0x70/0x70 [drm] april 28 20:16:49 beist.localdomain kernel: drm_mode_connector_property_set_ioctl+0x3e/0x60 [drm] april 28 20:16:49 beist.localdomain kernel: drm_ioctl_kernel+0x5b/0xb0 [drm] april 28 20:16:49 beist.localdomain kernel: drm_ioctl+0x2c3/0x360 [drm] april 28 20:16:49 beist.localdomain kernel: ? drm_mode_connector_set_obj_prop+0x70/0x70 [drm] april 28 20:16:49 beist.localdomain kernel: amdgpu_drm_ioctl+0x49/0x80 [amdgpu] april 28 20:16:49 beist.localdomain kernel: do_vfs_ioctl+0xa4/0x620 april 28 20:16:49 beist.localdomain kernel: ksys_ioctl+0x70/0x80 april 28 20:16:49 beist.localdomain kernel: __x64_sys_ioctl+0x16/0x20 april 28 20:16:49 beist.localdomain kernel: do_syscall_64+0x5b/0x160 april 28 20:16:49 beist.localdomain kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 april 28 20:16:49 beist.localdomain kernel: RIP: 0033:0x7f7d092cc0f7 Created attachment 139205 [details] [review] Partially revert commit introducing the BUG() where the "hang" happens This is the patch I'm currently testing to avoid the hang in my case. It probably does not solve the issue, but it seem to avoid the hang for me. It's just that my original problem is that after the displays go to sleep the stay blank until I reboot, even with the patch. For me also, the issues occurs when screen's display go black for inactivity and a hang on wake. Is this happening in 4.17rc3? Seems to be resolved with 4.17rc3 so far! This just happened to me again, now on v4.17-rc4. It's the 'BUG_ON(dm_new_crtc_state->stream == NULL);' at amdgpu_dm.c:4708, apparently happened when the screen turned off due to inactivity. Graphics card is an Radeon RX 550, one monitor connected via HDMI. I'm having trouble reproducing the BUG_ON in Ubuntu 18.04. For those who can, the below info will be helpful: - Distribution & version - Window manager used - Xorg log - journalctl log, if on Wayland Xorg log is at /var/log/Xorg.x.log Gzip journalctl via `$ journalctl | gzip - > journalctl.gz Created attachment 139602 [details] [review] Fix v1 It seems I've reproduced it by using the default modesetting DDX driver. Please give the attached patch a shot. Confirming i had the same issue on suspend. Patch "Fix v1" Resolves the issue for me. Attaching "before" and "after" dmesg outputs. Created attachment 139634 [details]
"Before fix" dmesg output
Created attachment 139635 [details]
"After Fix" dmesg output
Note a "new" error near the end. No loss in functionality with the error, screen behaves normally. Cannot confirm if new trace is in response to the patch or not. The 84 second range is right at the time i blanked and un-blanked the screen. I was using ``xset dpms force off".
Just confirmed that the error in the "after" dmesg output occurs every time the screen is blanked. Will post a new bug once the patch listed here is mainlined. So which rc version will contain the fix? (In reply to Shmerl from comment #17) > So which rc version will contain the fix? It's already in Dave's drm-fixes: https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes So it should make it to the 4.17 release. (In reply to Leo Li from comment #18) > (In reply to Shmerl from comment #17) > > So which rc version will contain the fix? > > It's already in Dave's drm-fixes: > https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes > > So it should make it to the 4.17 release. I can reproduce this (more precisely bug 104611, a likely duplicate) on 4.17.5. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 139018 [details] dmesg output running kernel 4.17-rc2 I left my computer on over night. After I noticed the system was frozen just now, I rebooted to found these log entries. 5:06:46 AM RIP: dm_update_crtcs_state+0x347/0x3c0 [amdgpu] RSP: ffffb2560bc37b10 5:06:46 AM kernel BUG at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4700! Software versions: OpenGL version string: 3.0 Mesa 18.0.1 CPU hardware: x86_64 AMD Ryzen 7 1800X Eight-Core Processor Max Speed: 4100 MHz Current Speed: 3600 MHz Memory: Speed: 3200 MT/s total used free shared buff/cache available Mem: 15Gi 3.0Gi 11Gi 67Mi 1.5Gi 12Gi Swap: 7.8Gi 0B 7.8Gi GPU hardware: OpenGL renderer string: AMD Radeon (TM) R9 Fury Series (FIJI / DRM 3.23.0 / 4.15.15-1-ARCH, LLVM 6.0.0) 0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] [1002:7300] (rev c8) 0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/580] [1002:67df] (rev e7) Motherboard: ASUSTeK COMPUTER INC. CROSSHAIR VI HERO BIOS Version: 3008 Storage: Smart Log for NVME device:nvme0 namespace-id:ffffffff I've attached the relevent dmesg output.