Bug 109203 - [cfl dxvk] GPU Crash Launching Monopoly Plus (Iris Plus 655 / Wine + DXVK)
Summary: [cfl dxvk] GPU Crash Launching Monopoly Plus (Iris Plus 655 / Wine + DXVK)
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-01 01:37 UTC by Benjamin Hodgetts
Modified: 2019-07-16 15:14 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
drm-card-error log (105.69 KB, text/plain)
2019-01-01 01:37 UTC, Benjamin Hodgetts
Details
Log Pack and Trace (204.38 KB, application/x-zip-compressed)
2019-01-03 20:06 UTC, Benjamin Hodgetts
Details

Description Benjamin Hodgetts 2019-01-01 01:37:44 UTC
Created attachment 142929 [details]
drm-card-error log

As the kernel error recommends, I'm logging this with the sys crashlog. When trying to run the game in Wine with DXVK, an error is triggered and the program hangs.


Misc information:
Linux tiger 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC 2018 x86_64 GNU/Linux
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics (Coffeelake 3x8 GT3)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.0.0-devel (git-c6b37e5412)


## Terminal / DXVK Output
info:  DXGI: Setting display mode: 1920x1200@60
INTEL-MESA: error: ../mesa/src/intel/vulkan/anv_device.c:2098: GPU hung on one of our command buffers (VK_ERROR_DEVICE_LOST)
err:   DxvkDevice: Command buffer submission failed: VK_ERROR_DEVICE_LOST
info:  Presenter: Actual swap chain properties:
  Format:       VK_FORMAT_B8G8R8A8_UNORM
  Present mode: VK_PRESENT_MODE_FIFO_KHR
  Buffer size:  1920x1200
  Image count:  3
err:   DxvkDevice: Command buffer submission failed: VK_ERROR_DEVICE_LOST
err:   DxvkDevice: Command buffer submission failed: VK_ERROR_DEVICE_LOST


## Kernel Output
[Tue Jan  1 01:27:53 2019] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Monopoly.exe [849], reason: hang on rcs0, action: reset
[Tue Jan  1 01:27:53 2019] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[Tue Jan  1 01:27:53 2019] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[Tue Jan  1 01:27:53 2019] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[Tue Jan  1 01:27:53 2019] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[Tue Jan  1 01:27:53 2019] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[Tue Jan  1 01:27:53 2019] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Comment 1 Danylo 2019-01-03 14:42:22 UTC
Could you provide more information:

- Was it one-off hang or it is reproducible?

- In case it is reproducible, could you record apitrace of the game (when hang occurs) as described in https://github.com/doitsujin/dxvk/wiki/Common-issues#apitrace and upload it somewhere.

Thanks for the report!
Comment 2 Benjamin Hodgetts 2019-01-03 20:06:30 UTC
Created attachment 142966 [details]
Log Pack and Trace

It happens every time so it is reproducible.

Attached a zip with the trace and the log files that DXVK itself creates in case they're also useful in some way.
Comment 3 Danylo 2019-01-04 14:40:25 UTC
Thanks!
Will investigate.
Comment 4 Danylo 2019-01-08 14:41:33 UTC
I'm unable to quickly find the underlying issue so here is my intermediate findings:

- It can be reproduce on HD Graphics 620 
- Doesn't look like a bisectable issue - reproduced on 18.1
- Trace hangs exactly on 13203 call which is draw call. Unfortunately I was unable to move forward, the only thing I found is that it's NOT due to the early discard using subgroup operations (the shader which is used for 13203 call is the only one using them and disabling their usage changes nothing).
Comment 5 Jason Ekstrand 2019-01-08 16:42:37 UTC
Two questions which may help in diagnosing the issue:

 1. Does it hang on gen8 (Broadwell) hardware?
 2. Does it still hang if you set INTEL_DEBUG=nohiz?
Comment 6 Danylo 2019-01-08 16:55:07 UTC
> 1. Does it hang on gen8 (Broadwell) hardware?
Unfortunately that's the architecture I don't have.
 
> 2. Does it still hang if you set INTEL_DEBUG=nohiz?
It doesn't hang with nohiz.
Comment 7 Jason Ekstrand 2019-01-08 17:07:31 UTC
My wild guess from 30s looking at the error state and the fact that it doesn't hang with INTEL_DEBUG=nohiz is that it's something going wrong with the stencil PMA fix.  Can you try editing want_stencil_pma_fix() in gen8_cmd_buffer.c (which is for gen8+) to make it unconditionally return false?  That should help narrow things down further than nohiz.
Comment 8 Danylo 2019-01-08 18:19:25 UTC
Will check tomorrow, thanks.
Comment 9 Danylo 2019-01-09 10:21:32 UTC
> Can you try editing want_stencil_pma_fix() in gen8_cmd_buffer.c (which is for gen8+) to make it unconditionally return false?

No hang when want_stencil_pma_fix is disabled.
Comment 10 Danylo 2019-03-05 12:32:34 UTC
Is there anything I could check/investigate further given that we know that pma fix causes it?
Comment 11 Kenneth Graunke 2019-05-01 08:02:22 UTC
I noticed that our PMA equations assume that "Force Thread Dispatch Enable" is never set...but we do in fact set it these days.  I wrote a series to stop using it, in favor of a different fix which doesn't have as dire of an impact on the PMA  equations:

https://gitlab.freedesktop.org/kwg/mesa/commits/vk-pma-fix

Perhaps it would help?
Comment 12 Danylo 2019-05-02 09:18:38 UTC
With 0822bef84a332f89b7b8545fa25eaa2b5279d7a9 from https://gitlab.freedesktop.org/kwg/mesa/commits/vk-pma-fix it still hangs.
Comment 13 Danylo 2019-06-10 15:18:11 UTC
Alien: Isolation suffer from the same issue when launched through proton (it also has Linux version). And 0822bef84a332f89b7b8545fa25eaa2b5279d7a9 also doesn't help.

Here is a dx11 trace which can reproduce the hang: https://mega.nz/#!ZItyyYgL!_5-E-qaXor8KJUQtm1pIzTqcseEbb-T1o5fww5O637E
Comment 14 Danylo 2019-07-15 12:56:15 UTC
One more game which suffers from the same issue - "Heroes of The Storm", reported on dxvk bug tracker https://github.com/doitsujin/dxvk/issues/1130 . Disabling PMA fix helps it.
Comment 15 Jason Ekstrand 2019-07-15 22:17:57 UTC
Did a bit of poking at this today.  I figured out at least why Alien: Isolation was hanging and created an MR:

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1347

Could you please try out the other games and see if this fixes the rest of them as well?  Thanks!
Comment 16 Danylo 2019-07-16 08:52:30 UTC
Monopoly Plus, at least its trace, does not hang with this MR.
Comment 17 Denis 2019-07-16 09:17:10 UTC
hi Jason.
I re-checked ""Heroes of The Storm"" with your patch and looks like it really fixed it.

Game doesn't hang (I navigated in the main menu, then I loaded "training" match and played a bit). Before hang could appear in any place (loading screen, main menu. Couldn't even start the match).
Comment 18 Jason Ekstrand 2019-07-16 15:14:54 UTC
Fixed by the following commit in master: 

commit 6a441151c245d7b59b84502257a0ff1a300b8633 (HEAD -> master, origin/master, origin/HEAD)
Author: Jason Ekstrand <jason@jlekstrand.net>
Date:   Mon Jul 15 17:14:26 2019 -0500

    anv: Account for dynamic stencil write disables in the PMA fix
    
    In 6ce8592836b8 we started looking at the dynamic stencil state and
    disabling stencil writes when the stencil mask is zero.  Unfortunately,
    we never updated the PMA fix code accordingly so 3DSTATE_WM_DEPTH_STENCIL
    and the PMA fix were getting out-of-sync causing hangs.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109203
    Fixes: 6ce8592836 "anv: Disable stencil writes when both write..."
    Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.