Bug 111803 - Annoying GPU stucks are continued on Vega 20 with Kernel 5.4 + mesa 9.3.0 + llvm 9.0.0 [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Summary: Annoying GPU stucks are continued on Vega 20 with Kernel 5.4 + mesa 9.3.0 + l...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: XOrg git
Hardware: Other All
: not set not set
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-24 17:54 UTC by mikhail.v.gavrilov
Modified: 2019-10-05 09:21 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (303.11 KB, text/plain)
2019-09-24 17:54 UTC, mikhail.v.gavrilov
no flags Details
./umr -O halt_waves -wa (276.02 KB, text/plain)
2019-09-24 17:56 UTC, mikhail.v.gavrilov
no flags Details
./umr -R gfx[.] (171.48 KB, text/plain)
2019-09-24 17:56 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmGRBM_STATUS* (8.92 KB, text/plain)
2019-09-24 17:57 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_EOP_* (1.73 KB, text/plain)
2019-09-24 19:05 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP (275 bytes, text/plain)
2019-09-24 19:05 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP (273 bytes, text/plain)
2019-09-24 19:06 UTC, mikhail.v.gavrilov
no flags Details
dmesg (275.03 KB, text/plain)
2019-09-26 18:48 UTC, mikhail.v.gavrilov
no flags Details
./umr -O halt_waves -wa (273.77 KB, text/plain)
2019-09-26 18:48 UTC, mikhail.v.gavrilov
no flags Details
./umr -R gfx[.] (415.17 KB, text/plain)
2019-09-26 18:49 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmGRBM_STATUS* (8.92 KB, text/plain)
2019-09-26 18:49 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_EOP_* (1.73 KB, text/plain)
2019-09-26 18:50 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP (275 bytes, text/plain)
2019-09-26 18:50 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP (273 bytes, text/plain)
2019-09-26 18:51 UTC, mikhail.v.gavrilov
no flags Details
trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv" (2.04 MB, application/x-xz)
2019-09-27 16:54 UTC, mikhail.v.gavrilov
no flags Details
dmesg (183.05 KB, text/plain)
2019-09-27 16:54 UTC, mikhail.v.gavrilov
no flags Details
./umr -O halt_waves -wa (273.77 KB, text/plain)
2019-09-27 16:54 UTC, mikhail.v.gavrilov
no flags Details
./umr -R gfx[.] (172.90 KB, text/plain)
2019-09-27 16:55 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmGRBM_STATUS* (8.92 KB, text/plain)
2019-09-27 16:55 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_EOP_* (1.73 KB, text/plain)
2019-09-27 16:55 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP (275 bytes, text/plain)
2019-09-27 16:56 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP (273 bytes, text/plain)
2019-09-27 16:56 UTC, mikhail.v.gavrilov
no flags Details
dmesg (275.89 KB, text/plain)
2019-09-30 04:18 UTC, mikhail.v.gavrilov
no flags Details
dmesg of AMD 2500U w/ Vega 8 (242.95 KB, text/plain)
2019-09-30 06:43 UTC, Zheng Luo
no flags Details
./umr -O halt_waves -wa (273.77 KB, text/plain)
2019-10-05 09:17 UTC, mikhail.v.gavrilov
no flags Details
./umr -R gfx[.] (874.57 KB, text/plain)
2019-10-05 09:18 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmGRBM_STATUS* (8.92 KB, text/plain)
2019-10-05 09:18 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_EOP_* (1.73 KB, text/plain)
2019-10-05 09:19 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP (275 bytes, text/plain)
2019-10-05 09:20 UTC, mikhail.v.gavrilov
no flags Details
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP (273 bytes, text/plain)
2019-10-05 09:20 UTC, mikhail.v.gavrilov
no flags Details
dmesg (258.57 KB, text/plain)
2019-10-05 09:21 UTC, mikhail.v.gavrilov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description mikhail.v.gavrilov 2019-09-24 17:54:40 UTC
Created attachment 145490 [details]
dmesg

Annoying GPU stucks are continued on Vega 20 with Kernel 5.4 + mesa 9.3.0 + llvm 9.0.0

For reproducing is enough on the machine when happened memory pressing launch the game Supraland from steam store.

[48662.086736] INFO: task OnlineA-nstance:153979 blocked for more than 122 seconds.
[48662.086740]       Not tainted 5.4.0-0.rc0.git4.1a.fc32.x86_64 #1
[48662.086743] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[48662.086746] OnlineA-nstance D12600 153979 153907 0x80004002
[48662.086753] Call Trace:
[48662.086760]  ? __schedule+0x307/0x950
[48662.086770]  schedule+0x40/0xc0
[48662.086775]  schedule_timeout+0x289/0x3c0
[48662.086782]  ? mark_held_locks+0x50/0x80
[48662.086787]  ? _raw_spin_unlock_irqrestore+0x4b/0x60
[48662.086792]  ? lockdep_hardirqs_on+0xf0/0x180
[48662.086803]  dma_fence_wait_any_timeout+0x208/0x275
[48662.086881]  amdgpu_sa_bo_new+0x44b/0x510 [amdgpu]
[48662.086982]  amdgpu_ib_get+0x31/0x80 [amdgpu]
[48662.087075]  amdgpu_job_alloc_with_ib+0x46/0x70 [amdgpu]
[48662.087081]  ? find_held_lock+0x32/0x90
[48662.087154]  amdgpu_vm_sdma_prepare+0x30/0x90 [amdgpu]
[48662.087243]  amdgpu_vm_bo_update_mapping+0x7b/0xe0 [amdgpu]
[48662.087318]  amdgpu_vm_clear_freed+0xd5/0x1d0 [amdgpu]
[48662.087395]  amdgpu_gem_object_close+0x159/0x1b0 [amdgpu]
[48662.087407]  ? lockdep_hardirqs_on+0xf0/0x180
[48662.087432]  drm_gem_object_release_handle+0x30/0x90 [drm]
[48662.087447]  ? drm_gem_object_handle_put_unlocked+0xa0/0xa0 [drm]
[48662.087453]  idr_for_each+0x5e/0xd0
[48662.087459]  ? mark_held_locks+0x50/0x80
[48662.087477]  drm_gem_release+0x1c/0x30 [drm]
[48662.087492]  drm_file_free.part.0+0x22e/0x270 [drm]
[48662.087509]  drm_release+0xab/0xe0 [drm]
[48662.087517]  __fput+0xdd/0x270
[48662.087525]  task_work_run+0x93/0xd0
[48662.087533]  do_exit+0x349/0xcd0
[48662.087539]  ? find_held_lock+0x32/0x90
[48662.087548]  do_group_exit+0x47/0xb0
[48662.087554]  get_signal+0x17e/0xcb0
[48662.087565]  do_signal+0x36/0x680
[48662.087580]  exit_to_usermode_loop+0x8d/0x120
[48662.087588]  syscall_return_slowpath+0x205/0x330
[48662.087594]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[48662.087599] RIP: 0033:0x7f0b10b4ffaa
[48662.087606] Code: Bad RIP value.
[48662.087610] RSP: 002b:00007f0ae77fdc40 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[48662.087615] RAX: fffffffffffffdfc RBX: 00000000000051ac RCX: 00007f0b10b4ffaa
[48662.087619] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00007f0b0ebf1170
[48662.087622] RBP: 00007f0b0ebf1148 R08: 0000000000000000 R09: 00000000ffffffff
[48662.087626] R10: 00007f0ae77fdd48 R11: 0000000000000246 R12: 0000000000000000
[48662.087629] R13: 00007f0b0ebf1120 R14: 00007f0b0ebf1170 R15: 00007f0ae77fdc80
[48662.087646] 
               Showing all locks held in the system:
[48662.087662] 1 lock held by khungtaskd/96:
[48662.087665]  #0: ffffffff8d693760 (rcu_read_lock){....}, at: debug_show_all_locks+0x15/0x174
[48662.087738] 1 lock held by CPU 0/KVM/3098:
[48662.087833] 2 locks held by dnf/104312:
[48662.087836]  #0: ffff8d88dacc80a0 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x24/0x50
[48662.087844]  #1: ffffa1088052a2f0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xe3/0x980
[48662.088002] 3 locks held by kworker/15:0/152888:
[48662.088005]  #0: ffff8d8936c21548 ((wq_completion)events){+.+.}, at: process_one_work+0x1e9/0x5a0
[48662.088012]  #1: ffffa1088d61fe50 ((work_completion)(&(&bdev->wq)->work)){+.+.}, at: process_one_work+0x1e9/0x5a0
[48662.088018]  #2: ffff8d892bf5c9f8 (reservation_ww_class_mutex){+.+.}, at: ttm_bo_delayed_delete+0x8d/0x200 [ttm]
[48662.088032] 3 locks held by OnlineA-nstance/153979:
[48662.088035]  #0: ffffffffc0303070 (drm_global_mutex){+.+.}, at: drm_release+0x2c/0xe0 [drm]
[48662.088054]  #1: ffffa1088d457b30 (reservation_ww_class_acquire){+.+.}, at: amdgpu_gem_object_close+0xce/0x1b0 [amdgpu]
[48662.088126]  #2: ffff8d892bf5c9f8 (reservation_ww_class_mutex){+.+.}, at: ttm_eu_reserve_buffers+0x349/0x620 [ttm]

[48662.088146] =============================================
Comment 1 mikhail.v.gavrilov 2019-09-24 17:56:34 UTC
Created attachment 145491 [details]
./umr -O halt_waves -wa
Comment 2 mikhail.v.gavrilov 2019-09-24 17:56:53 UTC
Created attachment 145492 [details]
./umr -R gfx[.]
Comment 3 mikhail.v.gavrilov 2019-09-24 17:57:13 UTC
Created attachment 145493 [details]
./umr -O many,bits -r *.*.mmGRBM_STATUS*
Comment 4 mikhail.v.gavrilov 2019-09-24 19:04:26 UTC
Ups, when I uploaded the previous file, happened yet another hung on the machine where I filling this bugreport. This machine has also Vega 20 GPU aboard.
Comment 5 mikhail.v.gavrilov 2019-09-24 19:05:33 UTC
Created attachment 145501 [details]
./umr -O many,bits -r *.*.mmCP_EOP_*
Comment 6 mikhail.v.gavrilov 2019-09-24 19:05:50 UTC
Created attachment 145502 [details]
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP
Comment 7 mikhail.v.gavrilov 2019-09-24 19:06:07 UTC
Created attachment 145503 [details]
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP
Comment 8 mikhail.v.gavrilov 2019-09-26 18:48:21 UTC
Created attachment 145528 [details]
dmesg
Comment 9 mikhail.v.gavrilov 2019-09-26 18:48:44 UTC
Created attachment 145529 [details]
./umr -O halt_waves -wa
Comment 10 mikhail.v.gavrilov 2019-09-26 18:49:02 UTC
Created attachment 145530 [details]
./umr -R gfx[.]
Comment 11 mikhail.v.gavrilov 2019-09-26 18:49:34 UTC
Created attachment 145531 [details]
./umr -O many,bits -r *.*.mmGRBM_STATUS*
Comment 12 mikhail.v.gavrilov 2019-09-26 18:50:19 UTC
Created attachment 145532 [details]
./umr -O many,bits -r *.*.mmCP_EOP_*
Comment 13 mikhail.v.gavrilov 2019-09-26 18:50:43 UTC
Created attachment 145533 [details]
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP
Comment 14 mikhail.v.gavrilov 2019-09-26 18:51:05 UTC
Created attachment 145534 [details]
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP
Comment 15 mikhail.v.gavrilov 2019-09-27 16:54:01 UTC
Created attachment 145550 [details]
trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"
Comment 16 mikhail.v.gavrilov 2019-09-27 16:54:22 UTC
Created attachment 145551 [details]
dmesg
Comment 17 mikhail.v.gavrilov 2019-09-27 16:54:58 UTC
Created attachment 145552 [details]
./umr -O halt_waves -wa
Comment 18 mikhail.v.gavrilov 2019-09-27 16:55:15 UTC
Created attachment 145553 [details]
./umr -R gfx[.]
Comment 19 mikhail.v.gavrilov 2019-09-27 16:55:29 UTC
Created attachment 145554 [details]
./umr -O many,bits -r *.*.mmGRBM_STATUS*
Comment 20 mikhail.v.gavrilov 2019-09-27 16:55:44 UTC
Created attachment 145555 [details]
./umr -O many,bits -r *.*.mmCP_EOP_*
Comment 21 mikhail.v.gavrilov 2019-09-27 16:56:00 UTC
Created attachment 145556 [details]
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP
Comment 22 mikhail.v.gavrilov 2019-09-27 16:56:17 UTC
Created attachment 145557 [details]
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP
Comment 23 mikhail.v.gavrilov 2019-09-30 04:18:28 UTC
Created attachment 145588 [details]
dmesg
Comment 24 Zheng Luo 2019-09-30 06:43:06 UTC
Also happens on Lenovo E585 with the latest firmware (R0UET74W (1.54 )), AMD 2500U w/ Vega 8, Kernel 5.3.1-arch1-1-ARCH, mesa 19.1.7-1, llvm 8.0.1. It happens after I launched LibreOffice Sheet.

Sep 29 23:29:36 lzThinkpad gnome-shell[1676]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
Sep 29 23:29:41 lzThinkpad kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Sep 29 23:29:45 lzThinkpad tracker-store[1810]: OK
Sep 29 23:29:45 lzThinkpad systemd[1613]: tracker-store.service: Succeeded.
Sep 29 23:29:46 lzThinkpad kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Sep 29 23:29:46 lzThinkpad kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=757, emitted seq=759
Sep 29 23:29:46 lzThinkpad kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 1676 thread gnome-shel:cs0 pid 1683
Sep 29 23:29:46 lzThinkpad kernel: [drm] GPU recovery disabled.
Comment 25 Zheng Luo 2019-09-30 06:43:46 UTC
Created attachment 145589 [details]
dmesg of AMD 2500U w/ Vega 8
Comment 26 mikhail.v.gavrilov 2019-10-05 09:17:52 UTC
Created attachment 145655 [details]
./umr -O halt_waves -wa
Comment 27 mikhail.v.gavrilov 2019-10-05 09:18:21 UTC
Created attachment 145656 [details]
./umr -R gfx[.]
Comment 28 mikhail.v.gavrilov 2019-10-05 09:18:53 UTC
Created attachment 145657 [details]
./umr -O many,bits -r *.*.mmGRBM_STATUS*
Comment 29 mikhail.v.gavrilov 2019-10-05 09:19:35 UTC
Created attachment 145658 [details]
./umr -O many,bits -r *.*.mmCP_EOP_*
Comment 30 mikhail.v.gavrilov 2019-10-05 09:20:11 UTC
Created attachment 145659 [details]
./umr -O many,bits -r *.*.mmCP_PFP_HEADER_DUMP
Comment 31 mikhail.v.gavrilov 2019-10-05 09:20:40 UTC
Created attachment 145660 [details]
./umr -O many,bits -r *.*.mmCP_ME_HEADER_DUMP
Comment 32 mikhail.v.gavrilov 2019-10-05 09:21:04 UTC
Created attachment 145661 [details]
dmesg


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.