Bug 109203 - [cfl dxvk] GPU Crash Launching Monopoly Plus (Iris Plus 655 / Wine + DXVK)
Summary: [cfl dxvk] GPU Crash Launching Monopoly Plus (Iris Plus 655 / Wine + DXVK)
Status: NEEDINFO
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-01 01:37 UTC by Benjamin Hodgetts
Modified: 2019-06-10 15:18 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
drm-card-error log (105.69 KB, text/plain)
2019-01-01 01:37 UTC, Benjamin Hodgetts
Details
Log Pack and Trace (204.38 KB, application/x-zip-compressed)
2019-01-03 20:06 UTC, Benjamin Hodgetts
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Hodgetts 2019-01-01 01:37:44 UTC
Created attachment 142929 [details]
drm-card-error log

As the kernel error recommends, I'm logging this with the sys crashlog. When trying to run the game in Wine with DXVK, an error is triggered and the program hangs.


Misc information:
Linux tiger 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC 2018 x86_64 GNU/Linux
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics (Coffeelake 3x8 GT3)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.0.0-devel (git-c6b37e5412)


## Terminal / DXVK Output
info:  DXGI: Setting display mode: 1920x1200@60
INTEL-MESA: error: ../mesa/src/intel/vulkan/anv_device.c:2098: GPU hung on one of our command buffers (VK_ERROR_DEVICE_LOST)
err:   DxvkDevice: Command buffer submission failed: VK_ERROR_DEVICE_LOST
info:  Presenter: Actual swap chain properties:
  Format:       VK_FORMAT_B8G8R8A8_UNORM
  Present mode: VK_PRESENT_MODE_FIFO_KHR
  Buffer size:  1920x1200
  Image count:  3
err:   DxvkDevice: Command buffer submission failed: VK_ERROR_DEVICE_LOST
err:   DxvkDevice: Command buffer submission failed: VK_ERROR_DEVICE_LOST


## Kernel Output
[Tue Jan  1 01:27:53 2019] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Monopoly.exe [849], reason: hang on rcs0, action: reset
[Tue Jan  1 01:27:53 2019] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[Tue Jan  1 01:27:53 2019] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[Tue Jan  1 01:27:53 2019] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[Tue Jan  1 01:27:53 2019] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[Tue Jan  1 01:27:53 2019] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[Tue Jan  1 01:27:53 2019] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Comment 1 Danylo 2019-01-03 14:42:22 UTC
Could you provide more information:

- Was it one-off hang or it is reproducible?

- In case it is reproducible, could you record apitrace of the game (when hang occurs) as described in https://github.com/doitsujin/dxvk/wiki/Common-issues#apitrace and upload it somewhere.

Thanks for the report!
Comment 2 Benjamin Hodgetts 2019-01-03 20:06:30 UTC
Created attachment 142966 [details]
Log Pack and Trace

It happens every time so it is reproducible.

Attached a zip with the trace and the log files that DXVK itself creates in case they're also useful in some way.
Comment 3 Danylo 2019-01-04 14:40:25 UTC
Thanks!
Will investigate.
Comment 4 Danylo 2019-01-08 14:41:33 UTC
I'm unable to quickly find the underlying issue so here is my intermediate findings:

- It can be reproduce on HD Graphics 620 
- Doesn't look like a bisectable issue - reproduced on 18.1
- Trace hangs exactly on 13203 call which is draw call. Unfortunately I was unable to move forward, the only thing I found is that it's NOT due to the early discard using subgroup operations (the shader which is used for 13203 call is the only one using them and disabling their usage changes nothing).
Comment 5 Jason Ekstrand 2019-01-08 16:42:37 UTC
Two questions which may help in diagnosing the issue:

 1. Does it hang on gen8 (Broadwell) hardware?
 2. Does it still hang if you set INTEL_DEBUG=nohiz?
Comment 6 Danylo 2019-01-08 16:55:07 UTC
> 1. Does it hang on gen8 (Broadwell) hardware?
Unfortunately that's the architecture I don't have.
 
> 2. Does it still hang if you set INTEL_DEBUG=nohiz?
It doesn't hang with nohiz.
Comment 7 Jason Ekstrand 2019-01-08 17:07:31 UTC
My wild guess from 30s looking at the error state and the fact that it doesn't hang with INTEL_DEBUG=nohiz is that it's something going wrong with the stencil PMA fix.  Can you try editing want_stencil_pma_fix() in gen8_cmd_buffer.c (which is for gen8+) to make it unconditionally return false?  That should help narrow things down further than nohiz.
Comment 8 Danylo 2019-01-08 18:19:25 UTC
Will check tomorrow, thanks.
Comment 9 Danylo 2019-01-09 10:21:32 UTC
> Can you try editing want_stencil_pma_fix() in gen8_cmd_buffer.c (which is for gen8+) to make it unconditionally return false?

No hang when want_stencil_pma_fix is disabled.
Comment 10 Danylo 2019-03-05 12:32:34 UTC
Is there anything I could check/investigate further given that we know that pma fix causes it?
Comment 11 Kenneth Graunke 2019-05-01 08:02:22 UTC
I noticed that our PMA equations assume that "Force Thread Dispatch Enable" is never set...but we do in fact set it these days.  I wrote a series to stop using it, in favor of a different fix which doesn't have as dire of an impact on the PMA  equations:

https://gitlab.freedesktop.org/kwg/mesa/commits/vk-pma-fix

Perhaps it would help?
Comment 12 Danylo 2019-05-02 09:18:38 UTC
With 0822bef84a332f89b7b8545fa25eaa2b5279d7a9 from https://gitlab.freedesktop.org/kwg/mesa/commits/vk-pma-fix it still hangs.
Comment 13 Danylo 2019-06-10 15:18:11 UTC
Alien: Isolation suffer from the same issue when launched through proton (it also has Linux version). And 0822bef84a332f89b7b8545fa25eaa2b5279d7a9 also doesn't help.

Here is a dx11 trace which can reproduce the hang: https://mega.nz/#!ZItyyYgL!_5-E-qaXor8KJUQtm1pIzTqcseEbb-T1o5fww5O637E


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.