Created attachment 142434 [details] dmesg $ inxi -bM System: Host: localhost.localdomain Kernel: 4.20.0-0.rc1.git4.1.fc30.x86_64 x86_64 bits: 64 Desktop: Gnome 3.30.1 Distro: Fedora release 30 (Rawhide) Machine: Type: Desktop Mobo: ASUSTeK model: ROG STRIX X470-I GAMING v: Rev 1.xx serial: <root required> UEFI: American Megatrends v: 0901 date: 07/23/2018 CPU: 8-Core: AMD Ryzen 7 2700X type: MT MCP speed: 3427 MHz min/max: 2200/4000 MHz Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] driver: amdgpu v: kernel Display: wayland server: Fedora Project X.org 1.20.3 driver: amdgpu resolution: 3840x2160~60Hz OpenGL: renderer: Radeon RX Vega (VEGA10 DRM 3.27.0 4.20.0-0.rc1.git4.1.fc30.x86_64 LLVM 7.0.0) v: 4.5 Mesa 18.2.4 Network: Device-1: Intel I211 Gigabit Network driver: igb Device-2: Realtek RTL8822BE 802.11a/b/g/n/ac WiFi adapter driver: r8822be Drives: Local Storage: total: 11.36 TiB used: 5.93 TiB (52.2%) Info: Processes: 455 Uptime: 16m Memory: 31.30 GiB used: 15.99 GiB (51.1%) Shell: bash inxi: 3.0.27 [ 3852.511166] gmc_v9_0_process_interrupt: 56 callbacks suppressed [ 3852.511182] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:169 vmid:0 pasid:0, for process pid 0 thread pid 0) [ 3852.511184] amdgpu 0000:0b:00.0: in page starting at address 0x000000401080c000 from 18 [ 3852.511186] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00040152 [ 3862.673344] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=72072, emitted seq=72074 [ 3862.673356] [drm] GPU recovery disabled. [ 4044.170764] sysrq: SysRq : Show Blocked State [ 4044.170959] task PC stack pid father [ 4044.171026] kworker/u32:5 D10872 253 2 0x80000000 [ 4044.171060] Workqueue: events_unbound commit_work [drm_kms_helper] [ 4044.171063] Call Trace: [ 4044.171073] ? __schedule+0x2f3/0xb90 [ 4044.171077] ? __lock_acquire+0x279/0x1650 [ 4044.171085] ? dma_fence_default_wait+0x242/0x330 [ 4044.171089] schedule+0x2f/0x90 [ 4044.171092] schedule_timeout+0x31c/0x4f0 [ 4044.171096] ? find_held_lock+0x34/0xa0 [ 4044.171099] ? find_held_lock+0x34/0xa0 [ 4044.171104] ? mark_held_locks+0x57/0x80 [ 4044.171134] ? _raw_spin_unlock_irqrestore+0x4b/0x60 [ 4044.171140] ? dma_fence_default_wait+0x242/0x330 [ 4044.171143] dma_fence_default_wait+0x26e/0x330 [ 4044.171147] ? dma_fence_release+0x120/0x120 [ 4044.171153] dma_fence_wait_timeout+0x182/0x200 [ 4044.171160] reservation_object_wait_timeout_rcu+0x236/0x4e0 [ 4044.171263] amdgpu_dm_do_flip+0x112/0x380 [amdgpu] [ 4044.171378] amdgpu_dm_atomic_commit_tail+0x6d0/0xd30 [amdgpu] [ 4044.171386] ? _raw_spin_unlock_irq+0x29/0x40 [ 4044.171391] ? wait_for_completion_timeout+0x73/0x1a0 [ 4044.171408] commit_tail+0x3d/0x70 [drm_kms_helper] [ 4044.171413] process_one_work+0x27d/0x600 [ 4044.171423] worker_thread+0x3c/0x390 [ 4044.171428] ? drain_workqueue+0x180/0x180 [ 4044.171433] kthread+0x120/0x140 [ 4044.171437] ? kthread_park+0x80/0x80 [ 4044.171442] ret_from_fork+0x27/0x50 [ 4044.172479] (time-dir) D13944 15221 1 0x00000000 [ 4044.172487] Call Trace: [ 4044.172496] ? __schedule+0x2f3/0xb90 [ 4044.172501] ? prepare_to_wait_event+0xd2/0x180 [ 4044.172508] schedule+0x2f/0x90 [ 4044.172514] drm_sched_entity_flush+0x1df/0x1f0 [gpu_sched] [ 4044.172518] ? finish_wait+0x80/0x80 [ 4044.172580] amdgpu_ctx_mgr_entity_flush+0x7c/0xc0 [amdgpu] [ 4044.172637] amdgpu_flush+0x1f/0x30 [amdgpu] [ 4044.172640] filp_close+0x34/0x70 [ 4044.172645] __x64_sys_close+0x1e/0x50 [ 4044.172649] do_syscall_64+0x60/0x1f0 [ 4044.172653] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 4044.172656] RIP: 0033:0x7f5a96622ec7 [ 4044.172662] Code: Bad RIP value. [ 4044.172665] RSP: 002b:00007ffcce3d00e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 [ 4044.172668] RAX: ffffffffffffffda RBX: 000000000000007c RCX: 00007f5a96622ec7 [ 4044.172671] RDX: 0000000000000000 RSI: 00007ffcce3d0180 RDI: 000000000000007c [ 4044.172673] RBP: 000055d29a73aa60 R08: 000055d29a73b676 R09: 0000000000000000 [ 4044.172675] R10: 00007f5a965bbae0 R11: 0000000000000293 R12: 00007f5a95939750 [ 4044.172677] R13: 0000000000000000 R14: 0000000000000001 R15: 00007ffcce3d0180 [ 4057.229953] INFO: task kworker/u32:5:253 blocked for more than 120 seconds. [ 4057.229957] Tainted: G WC 4.20.0-0.rc1.git4.1.fc30.x86_64 #1 [ 4057.229959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4057.229962] kworker/u32:5 D10872 253 2 0x80000000 [ 4057.229979] Workqueue: events_unbound commit_work [drm_kms_helper] [ 4057.229982] Call Trace: [ 4057.229994] ? __schedule+0x2f3/0xb90 [ 4057.229998] ? __lock_acquire+0x279/0x1650 [ 4057.230006] ? dma_fence_default_wait+0x242/0x330 [ 4057.230010] schedule+0x2f/0x90 [ 4057.230013] schedule_timeout+0x31c/0x4f0 [ 4057.230017] ? find_held_lock+0x34/0xa0 [ 4057.230020] ? find_held_lock+0x34/0xa0 [ 4057.230025] ? mark_held_locks+0x57/0x80 [ 4057.230028] ? _raw_spin_unlock_irqrestore+0x4b/0x60 [ 4057.230034] ? dma_fence_default_wait+0x242/0x330 [ 4057.230037] dma_fence_default_wait+0x26e/0x330 [ 4057.230041] ? dma_fence_release+0x120/0x120 [ 4057.230047] dma_fence_wait_timeout+0x182/0x200 [ 4057.230052] reservation_object_wait_timeout_rcu+0x236/0x4e0 [ 4057.230134] amdgpu_dm_do_flip+0x112/0x380 [amdgpu] [ 4057.230221] amdgpu_dm_atomic_commit_tail+0x6d0/0xd30 [amdgpu] [ 4057.230228] ? _raw_spin_unlock_irq+0x29/0x40 [ 4057.230232] ? wait_for_completion_timeout+0x73/0x1a0 [ 4057.230249] commit_tail+0x3d/0x70 [drm_kms_helper] [ 4057.230254] process_one_work+0x27d/0x600 [ 4057.230263] worker_thread+0x3c/0x390 [ 4057.230269] ? drain_workqueue+0x180/0x180 [ 4057.230272] kthread+0x120/0x140 [ 4057.230276] ? kthread_park+0x80/0x80 [ 4057.230281] ret_from_fork+0x27/0x50 [ 4057.230571] Showing all locks held in the system: [ 4057.230581] 1 lock held by khungtaskd/94: [ 4057.230583] #0: 00000000a1fc4e6f (rcu_read_lock){....}, at: debug_show_all_locks+0x15/0x183 [ 4057.230596] 3 locks held by kworker/u32:5/253: [ 4057.230597] #0: 00000000156505f1 ((wq_completion)"events_unbound"){+.+.}, at: process_one_work+0x1f3/0x600 [ 4057.230603] #1: 000000000d248f14 ((work_completion)(&state->commit_work)){+.+.}, at: process_one_work+0x1f3/0x600 [ 4057.230608] #2: 000000003df03870 (reservation_ww_class_mutex){+.+.}, at: amdgpu_dm_do_flip+0xd6/0x380 [amdgpu] [ 4057.230700] 2 locks held by gnome-shell/2152: [ 4057.230702] #0: 00000000a2cb2cbf (crtc_ww_class_acquire){+.+.}, at: drm_mode_cursor_common+0x95/0x220 [drm] [ 4057.230721] #1: 00000000e86bda0d (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x101/0x120 [drm] [ 4057.230746] 5 locks held by Xwayland/2222: [ 4057.230784] 1 lock held by htop/3225: [ 4057.230848] 1 lock held by CPU 0/KVM/4333: [ 4057.230989] 1 lock held by (time-dir)/15221: [ 4057.230991] #0: 000000006ef8a6af (&mgr->lock){+.+.}, at: amdgpu_ctx_mgr_entity_flush+0x3c/0xc0 [amdgpu] [ 4057.231068] =============================================
Created attachment 142440 [details] yet another dmesg
Unfortunately in 4.20 rc2 this annoying bug still not fixed
Created attachment 142458 [details] dmesg 4.20 rc2
Does this patch help? https://patchwork.freedesktop.org/patch/261435/
Alex, unfortunately this patch couldn't help me. I am not observed messages as in comment 3: [ 1136.956119] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:171 vmid:2 pasid:32776, for process SOTTR.exe pid 12574 thread SOTTR.exe pid 12574) [ 1136.956122] amdgpu 0000:0b:00.0: in page starting at address 0x00008001802c0000 from 27 but gpu still hung with usual message: [ 390.017999] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=58179, emitted seq=58215 [ 390.018001] [drm] GPU recovery disabled.
Created attachment 142467 [details] dmesg 4.20 rc2 with patch from comment 4
Created attachment 142482 [details] dmesg 4.20 rc2 with patch from comment 4 (GPU hang again and again)
Oh I see again messages (even with proposed patch and Mesa 18.3.0-rc2): [ 1784.721401] gmc_v9_0_process_interrupt: 1 callbacks suppressed [ 1784.721406] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:169 vmid:0 pasid:0, for process pid 0 thread pid 0) [ 1784.721409] amdgpu 0000:0b:00.0: in page starting at address 0x000000010001a000 from 18 [ 1784.721410] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00040152 [ 1795.007321] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=74616, emitted seq=74621 [ 1795.007324] [drm] GPU recovery disabled. [ 1795.011389] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=233554, emitted seq=233557 [ 1795.011391] [drm] GPU recovery disabled. $ inxi -bM System: Host: localhost.localdomain Kernel: 4.20.0-0.rc2.git0.1.local.fc30.x86_64 x86_64 bits: 64 Desktop: Gnome 3.31.2 Distro: Fedora release 30 (Rawhide) Machine: Type: Desktop Mobo: ASUSTeK model: ROG STRIX X470-I GAMING v: Rev 1.xx serial: <root required> UEFI: American Megatrends v: 0901 date: 07/23/2018 CPU: 8-Core: AMD Ryzen 7 2700X type: MT MCP speed: 2506 MHz min/max: 2200/4000 MHz Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] driver: amdgpu v: kernel Display: wayland server: Fedora Project X.org 1.20.3 driver: amdgpu resolution: 3840x2160~60Hz OpenGL: renderer: Radeon RX Vega (VEGA10 DRM 3.27.0 4.20.0-0.rc2.git0.1.local.fc30.x86_64 LLVM 7.0.0) v: 4.5 Mesa 18.3.0-rc2 Network: Device-1: Intel I211 Gigabit Network driver: igb Device-2: Realtek RTL8822BE 802.11a/b/g/n/ac WiFi adapter driver: r8822be Drives: Local Storage: total: 11.36 TiB used: 5.95 TiB (52.4%) Info: Processes: 443 Uptime: 15m Memory: 31.34 GiB used: 15.94 GiB (50.9%) Shell: bash inxi: 3.0.27
Created attachment 142564 [details] 4.20rc3 still freezes
Looks like problem was gone after commit 94f371cb7394 In Fedora this is package 4.20.0-0.rc4.git2.1.fc30.x86_64
I am was able reproduce this issue again with mesa 18.3.0-rc5
Created attachment 142726 [details] 4.20 g94f371cb7394 + mesa 18.3.0-rc5
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/604.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.