Created attachment 145491 [details]
./umr -O halt_waves -wa
Created attachment 145492 [details]
./umr -R gfx[.]
Created attachment 145493 [details]
./umr -O many,bits -r *.*.mmGRBM_STATUS*
Ups, when I uploaded the previous file, happened yet another hung on the machine where I filling this bugreport. This machine has also Vega 20 GPU aboard. Created attachment 145501 [details]
./umr -O many,bits -r *.*.mmCP_EOP_*
Created attachment 145502 [details]
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP
Created attachment 145503 [details]
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP
Created attachment 145528 [details]
dmesg
Created attachment 145529 [details]
./umr -O halt_waves -wa
Created attachment 145530 [details]
./umr -R gfx[.]
Created attachment 145531 [details]
./umr -O many,bits -r *.*.mmGRBM_STATUS*
Created attachment 145532 [details]
./umr -O many,bits -r *.*.mmCP_EOP_*
Created attachment 145533 [details]
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP
Created attachment 145534 [details]
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP
Created attachment 145550 [details]
trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"
Created attachment 145551 [details]
dmesg
Created attachment 145552 [details]
./umr -O halt_waves -wa
Created attachment 145553 [details]
./umr -R gfx[.]
Created attachment 145554 [details]
./umr -O many,bits -r *.*.mmGRBM_STATUS*
Created attachment 145555 [details]
./umr -O many,bits -r *.*.mmCP_EOP_*
Created attachment 145556 [details]
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP
Created attachment 145557 [details]
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP
Created attachment 145588 [details]
dmesg
Also happens on Lenovo E585 with the latest firmware (R0UET74W (1.54 )), AMD 2500U w/ Vega 8, Kernel 5.3.1-arch1-1-ARCH, mesa 19.1.7-1, llvm 8.0.1. It happens after I launched LibreOffice Sheet. Sep 29 23:29:36 lzThinkpad gnome-shell[1676]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed Sep 29 23:29:41 lzThinkpad kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out or interrupted! Sep 29 23:29:45 lzThinkpad tracker-store[1810]: OK Sep 29 23:29:45 lzThinkpad systemd[1613]: tracker-store.service: Succeeded. Sep 29 23:29:46 lzThinkpad kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out or interrupted! Sep 29 23:29:46 lzThinkpad kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=757, emitted seq=759 Sep 29 23:29:46 lzThinkpad kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 1676 thread gnome-shel:cs0 pid 1683 Sep 29 23:29:46 lzThinkpad kernel: [drm] GPU recovery disabled. Created attachment 145589 [details]
dmesg of AMD 2500U w/ Vega 8
Created attachment 145655 [details]
./umr -O halt_waves -wa
Created attachment 145656 [details]
./umr -R gfx[.]
Created attachment 145657 [details]
./umr -O many,bits -r *.*.mmGRBM_STATUS*
Created attachment 145658 [details]
./umr -O many,bits -r *.*.mmCP_EOP_*
Created attachment 145659 [details]
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP
Created attachment 145660 [details]
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP
Created attachment 145661 [details]
dmesg
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/916. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 145490 [details] dmesg Annoying GPU stucks are continued on Vega 20 with Kernel 5.4 + mesa 9.3.0 + llvm 9.0.0 For reproducing is enough on the machine when happened memory pressing launch the game Supraland from steam store. [48662.086736] INFO: task OnlineA-nstance:153979 blocked for more than 122 seconds. [48662.086740] Not tainted 5.4.0-0.rc0.git4.1a.fc32.x86_64 #1 [48662.086743] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [48662.086746] OnlineA-nstance D12600 153979 153907 0x80004002 [48662.086753] Call Trace: [48662.086760] ? __schedule+0x307/0x950 [48662.086770] schedule+0x40/0xc0 [48662.086775] schedule_timeout+0x289/0x3c0 [48662.086782] ? mark_held_locks+0x50/0x80 [48662.086787] ? _raw_spin_unlock_irqrestore+0x4b/0x60 [48662.086792] ? lockdep_hardirqs_on+0xf0/0x180 [48662.086803] dma_fence_wait_any_timeout+0x208/0x275 [48662.086881] amdgpu_sa_bo_new+0x44b/0x510 [amdgpu] [48662.086982] amdgpu_ib_get+0x31/0x80 [amdgpu] [48662.087075] amdgpu_job_alloc_with_ib+0x46/0x70 [amdgpu] [48662.087081] ? find_held_lock+0x32/0x90 [48662.087154] amdgpu_vm_sdma_prepare+0x30/0x90 [amdgpu] [48662.087243] amdgpu_vm_bo_update_mapping+0x7b/0xe0 [amdgpu] [48662.087318] amdgpu_vm_clear_freed+0xd5/0x1d0 [amdgpu] [48662.087395] amdgpu_gem_object_close+0x159/0x1b0 [amdgpu] [48662.087407] ? lockdep_hardirqs_on+0xf0/0x180 [48662.087432] drm_gem_object_release_handle+0x30/0x90 [drm] [48662.087447] ? drm_gem_object_handle_put_unlocked+0xa0/0xa0 [drm] [48662.087453] idr_for_each+0x5e/0xd0 [48662.087459] ? mark_held_locks+0x50/0x80 [48662.087477] drm_gem_release+0x1c/0x30 [drm] [48662.087492] drm_file_free.part.0+0x22e/0x270 [drm] [48662.087509] drm_release+0xab/0xe0 [drm] [48662.087517] __fput+0xdd/0x270 [48662.087525] task_work_run+0x93/0xd0 [48662.087533] do_exit+0x349/0xcd0 [48662.087539] ? find_held_lock+0x32/0x90 [48662.087548] do_group_exit+0x47/0xb0 [48662.087554] get_signal+0x17e/0xcb0 [48662.087565] do_signal+0x36/0x680 [48662.087580] exit_to_usermode_loop+0x8d/0x120 [48662.087588] syscall_return_slowpath+0x205/0x330 [48662.087594] entry_SYSCALL_64_after_hwframe+0x49/0xbe [48662.087599] RIP: 0033:0x7f0b10b4ffaa [48662.087606] Code: Bad RIP value. [48662.087610] RSP: 002b:00007f0ae77fdc40 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca [48662.087615] RAX: fffffffffffffdfc RBX: 00000000000051ac RCX: 00007f0b10b4ffaa [48662.087619] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00007f0b0ebf1170 [48662.087622] RBP: 00007f0b0ebf1148 R08: 0000000000000000 R09: 00000000ffffffff [48662.087626] R10: 00007f0ae77fdd48 R11: 0000000000000246 R12: 0000000000000000 [48662.087629] R13: 00007f0b0ebf1120 R14: 00007f0b0ebf1170 R15: 00007f0ae77fdc80 [48662.087646] Showing all locks held in the system: [48662.087662] 1 lock held by khungtaskd/96: [48662.087665] #0: ffffffff8d693760 (rcu_read_lock){....}, at: debug_show_all_locks+0x15/0x174 [48662.087738] 1 lock held by CPU 0/KVM/3098: [48662.087833] 2 locks held by dnf/104312: [48662.087836] #0: ffff8d88dacc80a0 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x24/0x50 [48662.087844] #1: ffffa1088052a2f0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xe3/0x980 [48662.088002] 3 locks held by kworker/15:0/152888: [48662.088005] #0: ffff8d8936c21548 ((wq_completion)events){+.+.}, at: process_one_work+0x1e9/0x5a0 [48662.088012] #1: ffffa1088d61fe50 ((work_completion)(&(&bdev->wq)->work)){+.+.}, at: process_one_work+0x1e9/0x5a0 [48662.088018] #2: ffff8d892bf5c9f8 (reservation_ww_class_mutex){+.+.}, at: ttm_bo_delayed_delete+0x8d/0x200 [ttm] [48662.088032] 3 locks held by OnlineA-nstance/153979: [48662.088035] #0: ffffffffc0303070 (drm_global_mutex){+.+.}, at: drm_release+0x2c/0xe0 [drm] [48662.088054] #1: ffffa1088d457b30 (reservation_ww_class_acquire){+.+.}, at: amdgpu_gem_object_close+0xce/0x1b0 [amdgpu] [48662.088126] #2: ffff8d892bf5c9f8 (reservation_ww_class_mutex){+.+.}, at: ttm_eu_reserve_buffers+0x349/0x620 [ttm] [48662.088146] =============================================