From https://bugs.freedesktop.org/show_bug.cgi?id=106151#c30 : Still running into this issue, now while running Mario Party 9 through Dolphin. This is a particularly good test case, because I can reliably get it to crash in the main menu after seconds/minutes. This ONLY happens with the Vulkan renderer. Versions: Radeon RX Vega (VEGA10, DRM 3.27.0, 4.20.3-arch1-1-ARCH, LLVM 7.0.0) Mesa: 18.3.1 I have also managed to get a stack trace this time, which is hopefully useful for debugging: [ 858.970202] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=160177, emitted seq=160179 [ 858.970205] [drm] GPU recovery disabled. [ 982.906053] INFO: task kworker/u32:6:398 blocked for more than 120 seconds. [ 982.906055] Not tainted 4.20.3-arch1-1-ARCH #1 [ 982.906056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 982.906057] kworker/u32:6 D 0 398 2 0x80000000 [ 982.906068] Workqueue: events_unbound commit_work [drm_kms_helper] [ 982.906069] Call Trace: [ 982.906075] ? __schedule+0x29b/0x8b0 [ 982.906077] ? __switch_to_asm+0x40/0x70 [ 982.906079] schedule+0x32/0x90 [ 982.906080] schedule_timeout+0x311/0x4a0 [ 982.906126] ? dce120_timing_generator_get_crtc_position+0x5b/0x70 [amdgpu] [ 982.906167] ? dce120_timing_generator_get_crtc_scanoutpos+0x70/0xb0 [amdgpu] [ 982.906170] dma_fence_default_wait+0x204/0x280 [ 982.906172] ? dma_fence_wait_timeout+0x120/0x120 [ 982.906173] dma_fence_wait_timeout+0x105/0x120 [ 982.906175] reservation_object_wait_timeout_rcu+0x1f2/0x370 [ 982.906178] ? preempt_count_add+0x79/0xb0 [ 982.906221] amdgpu_dm_do_flip+0x10d/0x370 [amdgpu] [ 982.906265] amdgpu_dm_atomic_commit_tail+0x6c4/0xd20 [amdgpu] [ 982.906267] ? _raw_spin_lock_irq+0x1a/0x40 [ 982.906268] ? wait_for_common+0x113/0x190 [ 982.906269] ? __switch_to_asm+0x34/0x70 [ 982.906275] commit_tail+0x3d/0x70 [drm_kms_helper] [ 982.906278] process_one_work+0x1eb/0x410 [ 982.906280] worker_thread+0x2d/0x3d0 [ 982.906282] ? process_one_work+0x410/0x410 [ 982.906283] kthread+0x112/0x130 [ 982.906284] ? kthread_park+0x80/0x80 [ 982.906286] ret_from_fork+0x22/0x40 [ 982.906290] INFO: task kworker/u32:8:404 blocked for more than 120 seconds. [ 982.906290] Not tainted 4.20.3-arch1-1-ARCH #1 [ 982.906291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 982.906291] kworker/u32:8 D 0 404 2 0x80000000 [ 982.906297] Workqueue: events_unbound commit_work [drm_kms_helper] [ 982.906298] Call Trace: [ 982.906300] ? __schedule+0x29b/0x8b0 [ 982.906301] schedule+0x32/0x90 [ 982.906302] schedule_preempt_disabled+0x14/0x20 [ 982.906303] __ww_mutex_lock.isra.2+0x413/0x7f0 [ 982.906329] ? amdgpu_get_vblank_counter_kms+0x110/0x160 [amdgpu] [ 982.906370] amdgpu_dm_do_flip+0xd2/0x370 [amdgpu] [ 982.906412] amdgpu_dm_atomic_commit_tail+0x6c4/0xd20 [amdgpu] [ 982.906414] ? _raw_spin_lock_irq+0x1a/0x40 [ 982.906415] ? wait_for_common+0x113/0x190 [ 982.906416] ? __switch_to_asm+0x34/0x70 [ 982.906422] commit_tail+0x3d/0x70 [drm_kms_helper] [ 982.906424] process_one_work+0x1eb/0x410 [ 982.906425] worker_thread+0x2d/0x3d0 [ 982.906427] ? process_one_work+0x410/0x410 [ 982.906428] kthread+0x112/0x130 [ 982.906429] ? kthread_park+0x80/0x80 [ 982.906431] ret_from_fork+0x22/0x40 Please let me know if I can help debugging. The fact I can get it to crash reliably and easily should help immensely.
Still happening with Mesa 18.3.2, LVM 7.0.1 and 4.20.5 kernel.
Just tried to reproduce with these settings OpenGL renderer string: Radeon RX Vega (VEGA10, DRM 3.27.0, 4.20.5-arch1-1-ARCH, LLVM 7.0.1) OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.3.2 No hangs so far with the Vulkan backend. Maybe you use some "Enhancements" settings?
I am using a stock install of Dolphin. The only thing I changed was switching from the OpenGL backend to the Vulkan backend. Are you using Vega 64 or Vega 56?
I cannot edit my previous post: Did you test with Mario party as well? I am in Europe, and am therefor using a PAL version. Not sure if that matters.
Yes, I tried Mario Party 9 on my vega 56.
Is there anything I can do, test or research to find out why this is happening? Could this be a Vega 64 bug that doesn't affect Vega 56?
Unlikely. What you can do is to capture a hang report as follows: export RADV_TRACE_FILE=$HOME/hang.trace export RADV_DEBUG=syncshaders,vmfaults,allbos dolphin-emu &> hang.report And attach the files here.
I've set the environment variables and started Dolphin as instructed. However, I can no longer get it to crash. The FPS is around 4-6 FPS, instead of the regular 60 FPS, probably because a trace is being generated. Is there anything different between having debugging enabled vs disabled? I've let it run for 30-60 minutes without a crash now.
Created attachment 143257 [details] Hang.report file, no crash. Contains an error though. I have just closed the emulator, since it still hadn't crashed. The hang.report file does contain an error, but it looks innocent. I have attached it just in case.
Do you have anything in dmesg when it freezes?
Can you reproduce the problem with "export RADV_DEBUG=zerovram" ?
I can still reproduce the hang with "export RADV_DEBUG=zerovram". The following message is present in dmesg: [ 3638.209907] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=167782, emitted seq=167784 [ 3638.209910] [drm] GPU recovery disabled.
I definitely can't reproduce the problem on my Vega 56. Can you attach the output of "R600_DEBUG=info glxgears"?
[jaap@Antergos-Desktop ~]$ R600_DEBUG=info glxgears Device info: pci (domain:bus:dev.func): 0000:09:00.0 pci_id = 0x687f family = 68 chip_class = 11 num_compute_rings = 4 num_sdma_rings = 2 clock_crystal_freq = 27000 tcc_cache_line_size = 64 Memory info: pte_fragment_size = 2097152 gart_page_size = 4096 gart_size = 8176 MB vram_size = 8176 MB vram_vis_size = 256 MB gds_size = 64 kB gds_gfx_partition_size = 4 kB max_alloc_size = 6541 MB min_alloc_size = 512 address32_hi = 4294934528 has_dedicated_vram = 1 CP info: gfx_ib_pad_with_type2 = 0 ib_start_alignment = 256 me_fw_version = 156 me_fw_feature = 42 pfp_fw_version = 177 pfp_fw_feature = 42 ce_fw_version = 77 ce_fw_feature = 42 Multimedia info: has_hw_decode = 1 uvd_enc_supported = 1 uvd_fw_version = 22483200 vce_fw_version = 922944512 vce_harvest_config = 0 Kernel & winsys capabilities: drm = 3.27.0 has_userptr = 1 has_syncobj = 1 has_syncobj_wait_for_submit = 1 has_fence_to_handle = 1 has_ctx_priority = 1 has_local_buffers = 0 kernel_flushes_hdp_before_ib = 1 htile_cmask_support_1d_tiling = 1 si_TA_CS_BC_BASE_ADDR_allowed = 1 has_bo_metadata = 1 has_gpu_reset_status_query = 1 has_gpu_reset_counter_query = 0 has_eqaa_surface_allocator = 1 has_format_bc1_through_bc7 = 1 kernel_flushes_tc_l2_after_ib = 1 has_indirect_compute_dispatch = 1 has_unaligned_shader_loads = 1 has_sparse_vm_mappings = 0 has_2d_tiling = 1 has_read_registers_query = 1 Shader core info: max_shader_clock = 1630 num_good_compute_units = 64 num_good_cu_per_sh = 16 num_tcc_blocks = 16 max_se = 4 max_sh_per_se = 1 Render backend info: num_render_backends = 16 num_tile_pipes = 4 pipe_interleave_bytes = 256 enabled_rb_mask = 0xffff max_alignment = 65536 GB_ADDR_CONFIG: num_pipes = 4 pipe_interleave_size = 256 max_compressed_frags = 2 bank_interleave_size = 1 num_banks = 16 shader_engine_tile_size = 32 num_shader_engines = 4 num_gpus = 0 (raw) multi_gpu_tile_size = 2 (raw) num_rb_per_se = 4 row_size = 4096 num_lower_pipes = 0 (raw) se_enable = 0 (raw) 471 frames in 5.0 seconds = 94.092 FPS 300 frames in 5.0 seconds = 59.908 FPS 300 frames in 5.0 seconds = 59.906 FPS 300 frames in 5.0 seconds = 59.911 FPS 300 frames in 5.0 seconds = 59.907 FPS XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" after 1882 requests (65 known processed) with 0 events remaining. The error was me closing the Window (or should I have left it running?).
Does the attached patch help https://patchwork.freedesktop.org/patch/290846/?series=57689&rev=1 ?
I am only familiar with Building mesa through the AUR (Mesa-Git). Can I easily apply that patch somehow in the PKGBUILD or would it be easier for me to wait for the patch to be accepted into the repository and build from that?
Just got the Mesa 19.0 update in Arch's official repository, and the issue seems to be fixed :) I have been running Mario Party 9 in the main menu for many minutes now, where it would usually crash within 20-30 seconds. I will report back if it happens again, but if you don't hear back from me, consider this issue solved. Thank you very much for the amazing work you guys are doing on Mesa and RADV!
Thanks for confirming. Feel free to re-open if needed, hopefully not. :)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.