Created attachment 142929 [details] drm-card-error log As the kernel error recommends, I'm logging this with the sys crashlog. When trying to run the game in Wine with DXVK, an error is triggered and the program hangs. Misc information: Linux tiger 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC 2018 x86_64 GNU/Linux OpenGL renderer string: Mesa DRI Intel(R) HD Graphics (Coffeelake 3x8 GT3) OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.0.0-devel (git-c6b37e5412) ## Terminal / DXVK Output info: DXGI: Setting display mode: 1920x1200@60 INTEL-MESA: error: ../mesa/src/intel/vulkan/anv_device.c:2098: GPU hung on one of our command buffers (VK_ERROR_DEVICE_LOST) err: DxvkDevice: Command buffer submission failed: VK_ERROR_DEVICE_LOST info: Presenter: Actual swap chain properties: Format: VK_FORMAT_B8G8R8A8_UNORM Present mode: VK_PRESENT_MODE_FIFO_KHR Buffer size: 1920x1200 Image count: 3 err: DxvkDevice: Command buffer submission failed: VK_ERROR_DEVICE_LOST err: DxvkDevice: Command buffer submission failed: VK_ERROR_DEVICE_LOST ## Kernel Output [Tue Jan 1 01:27:53 2019] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Monopoly.exe [849], reason: hang on rcs0, action: reset [Tue Jan 1 01:27:53 2019] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [Tue Jan 1 01:27:53 2019] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [Tue Jan 1 01:27:53 2019] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [Tue Jan 1 01:27:53 2019] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [Tue Jan 1 01:27:53 2019] [drm] GPU crash dump saved to /sys/class/drm/card0/error [Tue Jan 1 01:27:53 2019] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Could you provide more information: - Was it one-off hang or it is reproducible? - In case it is reproducible, could you record apitrace of the game (when hang occurs) as described in https://github.com/doitsujin/dxvk/wiki/Common-issues#apitrace and upload it somewhere. Thanks for the report!
Created attachment 142966 [details] Log Pack and Trace It happens every time so it is reproducible. Attached a zip with the trace and the log files that DXVK itself creates in case they're also useful in some way.
Thanks! Will investigate.
I'm unable to quickly find the underlying issue so here is my intermediate findings: - It can be reproduce on HD Graphics 620 - Doesn't look like a bisectable issue - reproduced on 18.1 - Trace hangs exactly on 13203 call which is draw call. Unfortunately I was unable to move forward, the only thing I found is that it's NOT due to the early discard using subgroup operations (the shader which is used for 13203 call is the only one using them and disabling their usage changes nothing).
Two questions which may help in diagnosing the issue: 1. Does it hang on gen8 (Broadwell) hardware? 2. Does it still hang if you set INTEL_DEBUG=nohiz?
> 1. Does it hang on gen8 (Broadwell) hardware? Unfortunately that's the architecture I don't have. > 2. Does it still hang if you set INTEL_DEBUG=nohiz? It doesn't hang with nohiz.
My wild guess from 30s looking at the error state and the fact that it doesn't hang with INTEL_DEBUG=nohiz is that it's something going wrong with the stencil PMA fix. Can you try editing want_stencil_pma_fix() in gen8_cmd_buffer.c (which is for gen8+) to make it unconditionally return false? That should help narrow things down further than nohiz.
Will check tomorrow, thanks.
> Can you try editing want_stencil_pma_fix() in gen8_cmd_buffer.c (which is for gen8+) to make it unconditionally return false? No hang when want_stencil_pma_fix is disabled.
Is there anything I could check/investigate further given that we know that pma fix causes it?
I noticed that our PMA equations assume that "Force Thread Dispatch Enable" is never set...but we do in fact set it these days. I wrote a series to stop using it, in favor of a different fix which doesn't have as dire of an impact on the PMA equations: https://gitlab.freedesktop.org/kwg/mesa/commits/vk-pma-fix Perhaps it would help?
With 0822bef84a332f89b7b8545fa25eaa2b5279d7a9 from https://gitlab.freedesktop.org/kwg/mesa/commits/vk-pma-fix it still hangs.
Alien: Isolation suffer from the same issue when launched through proton (it also has Linux version). And 0822bef84a332f89b7b8545fa25eaa2b5279d7a9 also doesn't help. Here is a dx11 trace which can reproduce the hang: https://mega.nz/#!ZItyyYgL!_5-E-qaXor8KJUQtm1pIzTqcseEbb-T1o5fww5O637E
One more game which suffers from the same issue - "Heroes of The Storm", reported on dxvk bug tracker https://github.com/doitsujin/dxvk/issues/1130 . Disabling PMA fix helps it.
Did a bit of poking at this today. I figured out at least why Alien: Isolation was hanging and created an MR: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1347 Could you please try out the other games and see if this fixes the rest of them as well? Thanks!
Monopoly Plus, at least its trace, does not hang with this MR.
hi Jason. I re-checked ""Heroes of The Storm"" with your patch and looks like it really fixed it. Game doesn't hang (I navigated in the main menu, then I loaded "training" match and played a bit). Before hang could appear in any place (loading screen, main menu. Couldn't even start the match).
Fixed by the following commit in master: commit 6a441151c245d7b59b84502257a0ff1a300b8633 (HEAD -> master, origin/master, origin/HEAD) Author: Jason Ekstrand <jason@jlekstrand.net> Date: Mon Jul 15 17:14:26 2019 -0500 anv: Account for dynamic stencil write disables in the PMA fix In 6ce8592836b8 we started looking at the dynamic stencil state and disabling stencil writes when the stencil mask is zero. Unfortunately, we never updated the PMA fix code accordingly so 3DSTATE_WM_DEPTH_STENCIL and the PMA fix were getting out-of-sync causing hangs. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109203 Fixes: 6ce8592836 "anv: Disable stencil writes when both write..." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.