Bug 100424

Summary: X hang (in kernel) after some event in Serious Sam Fusion using radv. 4.9/amd-staging-4.9
Product: Mesa Reporter: Darren Salt <bugspam>
Component: Drivers/Vulkan/radeonAssignee: mesa-dev
Status: RESOLVED WORKSFORME QA Contact: mesa-dev
Severity: major    
Priority: medium CC: john.ettedgui
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Darren Salt 2017-03-27 22:35:38 UTC
To reproduce, play Serious Sam Fusion (Vulkan rendering). X should freeze after a few minutes; at the start of SS:TFE, venturing out into the desert or killing an gnaar should be enough to trigger it.

It looks like a fence + buffer object 'interaction'.

This appears to affect all Linux 4.9.x and amd-staging-4.9; other kernels are as yet untested.

Mesa 17.0.2 or git, libdrm 2.4.75, llvm 4.0~svn294803. Hardware is RX 470.

Sample backtrace:

[  861.398271] INFO: task Xorg:4985 blocked for more than 120 seconds.
[  861.398292]       Tainted: G         C O    4.9.17+dc+ #1
[  861.398301] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.398306] Xorg            D    0  4985   4966 0x00000004
[  861.398317]  ffff88040d4037c8 ffff880409bc9480 ffff88041c9ca1c0 ffff88041ed96440
[  861.398339]  ffff88040d403340 ffffc9000acc3a70 ffffffff8155de4e 0000000000000002
[  861.398356]  ffff88040d403340 ffff880405ab1500 ffff880405ab1500 0000000000000001
[  861.398377] Call Trace:
[  861.398391]  [<ffffffff8155de4e>] ? __schedule+0x228/0x3bd
[  861.398400]  [<ffffffff8155e067>] schedule+0x84/0x98
[  861.398409]  [<ffffffff813194d5>] amd_sched_entity_push_job+0x69/0x86
[  861.398417]  [<ffffffff81059623>] ? __wake_up_sync+0xd/0xd
[  861.398425]  [<ffffffff81319c85>] amdgpu_job_submit+0x71/0x7f
[  861.398434]  [<ffffffff812dff16>] amdgpu_vm_bo_split_mapping+0x3df/0x47a
[  861.398443]  [<ffffffff812df742>] ? amdgpu_vm_adjust_mc_addr+0x1f/0x1f
[  861.398461]  [<ffffffff812e0f37>] amdgpu_vm_bo_update+0x17d/0x222
[  861.398472]  [<ffffffff812d5d41>] amdgpu_gem_va_ioctl+0x362/0x3d9
[  861.398486]  [<ffffffff812d4fed>] ? ttm_bo_unreserve+0x40/0x43
[  861.398495]  [<ffffffff8129859c>] ? drm_gem_handle_create+0x34/0x39
[  861.398503]  [<ffffffff81298ccc>] drm_ioctl+0x26c/0x38b
[  861.398510]  [<ffffffff81298ccc>] ? drm_ioctl+0x26c/0x38b
[  861.398516]  [<ffffffff812d59df>] ? amdgpu_gem_metadata_ioctl+0xe8/0xe8
[  861.398525]  [<ffffffff8104b554>] ? preempt_latency_start+0x21/0x5d
[  861.398532]  [<ffffffff8104b5f2>] ? preempt_count_add+0x62/0x65
[  861.398540]  [<ffffffff81560846>] ? _raw_spin_unlock_irqrestore+0x13/0x25
[  861.398549]  [<ffffffff812c1d17>] amdgpu_drm_ioctl+0x4a/0x7a
[  861.398557]  [<ffffffff810dbdbd>] vfs_ioctl+0x13/0x2f
[  861.398564]  [<ffffffff810dc2d0>] do_vfs_ioctl+0x47f/0x524
[  861.398573]  [<ffffffff810e48a3>] ? __fget+0x66/0x72
[  861.398581]  [<ffffffff810dc3b3>] SyS_ioctl+0x3e/0x5c
[  861.398588]  [<ffffffff81560be4>] entry_SYSCALL_64_fastpath+0x17/0x98
[  861.398654] INFO: task Sam2017:6271 blocked for more than 120 seconds.
[  861.398660]       Tainted: G         C O    4.9.17+dc+ #1
[  861.398664] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.398669] Sam2017         D    0  6271   6270 0x00000000
[  861.398762]  ffff8803902caa08 ffff88040d0d1800 ffffffff81c0c500 ffff88041ec16440
[  861.398778]  ffff8803902ca580 ffffc90002cdba78 ffffffff8155de4e 0000000000000002
[  861.398796]  ffff8803902ca580 7fffffffffffffff 0000000000000246 ffff88033dd3e100
[  861.398812] Call Trace:
[  861.398821]  [<ffffffff8155de4e>] ? __schedule+0x228/0x3bd
[  861.398829]  [<ffffffff8155e067>] schedule+0x84/0x98
[  861.398835]  [<ffffffff8155fdc5>] schedule_timeout+0x2f/0xf5
[  861.398843]  [<ffffffff8104b554>] ? preempt_latency_start+0x21/0x5d
[  861.398850]  [<ffffffff8104b5f2>] ? preempt_count_add+0x62/0x65
[  861.398857]  [<ffffffff813ae57a>] fence_default_wait+0x124/0x1c1
[  861.398863]  [<ffffffff813ae57a>] ? fence_default_wait+0x124/0x1c1
[  861.398869]  [<ffffffff813adf22>] ? fence_release+0x2b/0x2b
[  861.398875]  [<ffffffff813adee1>] fence_wait_timeout+0x2e/0x30
[  861.398881]  [<ffffffff812e4005>] amdgpu_ctx_add_fence+0x66/0x13b
[  861.398887]  [<ffffffff812d81b4>] amdgpu_cs_ioctl+0x1132/0x116a
[  861.398897]  [<ffffffff81298ccc>] drm_ioctl+0x26c/0x38b
[  861.398903]  [<ffffffff812d7082>] ? amdgpu_cs_find_mapping+0x7d/0x7d
[  861.398910]  [<ffffffff8104b554>] ? preempt_latency_start+0x21/0x5d
[  861.398917]  [<ffffffff8104b5f2>] ? preempt_count_add+0x62/0x65
[  861.398923]  [<ffffffff81560846>] ? _raw_spin_unlock_irqrestore+0x13/0x25
[  861.398931]  [<ffffffff812c1d17>] amdgpu_drm_ioctl+0x4a/0x7a
[  861.398938]  [<ffffffff810dbdbd>] vfs_ioctl+0x13/0x2f
[  861.398944]  [<ffffffff810dc2d0>] do_vfs_ioctl+0x47f/0x524
[  861.398951]  [<ffffffff810e48a3>] ? __fget+0x66/0x72
[  861.398958]  [<ffffffff810dc3b3>] SyS_ioctl+0x3e/0x5c
[  861.398965]  [<ffffffff81560be4>] entry_SYSCALL_64_fastpath+0x17/0x98
Comment 1 Darren Salt 2017-03-27 22:37:14 UTC
I'm aware (via IRC) that 4.11-rc3 should be fine. However, as I'm using amd-staging-4.9 for its HDMI audio support, this isn't an option.
Comment 2 Michel Dänzer 2017-03-28 02:34:43 UTC
Probably a radv (or LLVM) issue.
Comment 3 Darren Salt 2017-03-29 00:36:51 UTC
Happening with 4.11-rc4; essentially the same backtrace. I'll try bumping llvm next (probably to 4.0~svn294803), although I expect little difference.
Comment 4 Darren Salt 2017-03-29 19:50:07 UTC
… okay, it's looking like the Steam overlay has a lot to do with this problem. (Tested with current Mesa git, but the same LLVM as before.)
Comment 5 Samuel Pitoiset 2018-04-10 20:03:21 UTC
Hi Darren,

Can you still reproduce the hang?

I regularly test Serious Sam Fusion on Polaris/Vega, and it never hung for me.
Comment 6 Samuel Pitoiset 2018-05-15 20:13:37 UTC
Closing, I have tried to reproduce the issue yesterday (again), maybe I'm unlucky and not good enough at playing games, but it worked perfectly fine (tested on Vega 56). Feel free to re-open. Thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.