Bug 107941 - GPU hang and system crash with Dota 2 using Vulkan
Summary: GPU hang and system crash with Dota 2 using Vulkan
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/intel (show other bugs)
Version: git
Hardware: Other Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-15 04:19 UTC by leozinho29_eu
Modified: 2018-10-17 13:44 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Compressed file with relevant files (2.53 MB, application/gzip)
2018-09-15 04:19 UTC, leozinho29_eu
Details
Logs and screenshot (2.72 MB, application/gzip)
2018-09-15 17:31 UTC, leozinho29_eu
Details
error.tar.xz (1.33 MB, application/x-xz)
2018-10-17 11:30 UTC, Sergii Romantsov
Details
error.decode.tar.xz (2.76 MB, application/x-xz)
2018-10-17 11:30 UTC, Sergii Romantsov
Details

Description leozinho29_eu 2018-09-15 04:19:42 UTC
Created attachment 141570 [details]
Compressed file with relevant files

When playing Dota 2 with Vulkan, it's noticed that there are many occasions where a GPU hang happens and, if using 4.18 or newer kernel, causes a system crash a few seconds later. The GPU hang is observed with 4.17.19, but the system crash is not.

The workaround is to use INTEL_DEBUG=nohiz as one of the environment variables, but there is performance degradation. 

The attached file has the following content:

The GPU hang log, it has 2,6 MB;
The dmesg output which unfortunately has nothing about the system crash;
One screenshot (with the two monitors) showing what was the screen when the GPU hang happened;
One screenshot (single monitor) showing Dota 2's video settings;
One sound file with an approximation of what was the sound when the system crashed. Please note the sound is unpleasing to hear.

The GPU hang seems similar to https://bugs.freedesktop.org/show_bug.cgi?id=107760

Dota 2 is being affected by https://bugs.freedesktop.org/show_bug.cgi?id=107899 too.

Processor: Intel Core i3-6100U;
Video: Intel HD Graphics 520;
Architecture: amd64;
Mesa: 18.3.0-devel (git-914bd3014f);
Kernel version: drm-tip (feeccde66999c5e87be3550f2159e5d7eeb61c67)
Distribution: Xubuntu 18.04.1 amd64.
Comment 1 Jason Ekstrand 2018-09-15 17:10:59 UTC
Yeah, we appear to have a HiZ bug that's crept in some time in the not-so-distant past.  How reproducable is this?  Can I go into a "test out a character" game and get a hang fairly quickly?  If you've got a way to reliably reproduce the issue, it'll be much easier to fix.

The system crash is a separate issue and it's a kernel bug.  Probably best to file another bug for that one so we can track them independently.
Comment 2 leozinho29_eu 2018-09-15 17:31:54 UTC
Created attachment 141574 [details]
Logs and screenshot

The hang happens when, for example, I try to see a character description. In the screenshot I sent I clicked to see Earthshaker description. In the moment the 3D model should appear the GPU hang happens. The steps are:

Open the game, then click to see the heroes, then choose one of them. Not all heroes trigger the GPU hang, but Earthshaker do, for example.

Another possibility is in the Learn tab, in the tutorial 2, when the Dragon Knight should appear the GPU hang happens. 

One thing I noticed is that a few heroes, including the heroine Luna, do not cause the GPU hang. Which is fortunate as she's the first one that appears, so I have enough time to test.

The attached file has data from a hang using 4.17.19 kernel, as anything newer is causing the system to crash. I will report the system crash issue separated from this one.
Comment 3 Jason Ekstrand 2018-09-17 13:40:00 UTC
I've been able to successfully reproduce with Earthshaker.  Unfortunately, I don't have a second machine with me right now (and won't for a couple weeks) so I can't really debug it as running the game brings down my dev laptop.  I'll try to look at it in more detail first week of October.  Wanted to give you an update so you don't think I'm ignoring you for three weeks.
Comment 4 Sergii Romantsov 2018-09-19 15:46:01 UTC
On Skylake with kernel 4.15.0-33-generic
hang bisected to commit:
commit 79270d2140ec4fe5e4351f35150ed2d14687af07 (HEAD, refs/bisect/bad)
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Wed Jul 11 16:31:02 2018 -0700

    anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV
    
    We've had several broadwell hangs that have come down to this bit just
    not working correctly.  Most recently, we've had a pile of hangs
    reported with apps running under DXVK:
    
    https://github.com/doitsujin/dxvk/issues/469
    
    Instead, use the bit that doesn't try to imply weird D3D coherency
    things and just force-enables the PS like we want.
    
    cc: mesa-stable@lists.freedesktop.org
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    (cherry picked from commit abd629eb3d4027b89c13158e90c6732b412e550e)
Comment 5 Jason Ekstrand 2018-09-19 15:52:16 UTC
Bah!  Thanks for your bisection!  I think we need to only do what that patch does for gen8 and then do the old thing on gen9.
Comment 6 Sergii Romantsov 2018-09-19 16:22:06 UTC
Initial version: https://patchwork.freedesktop.org/patch/250989/
Also v2 uploaded but will check tomorrow
Comment 7 leozinho29_eu 2018-09-19 20:05:49 UTC
After reverting that commit, the GPU hang is no longer happening, I can see Earthshaker description and Dragon Knight's tutorial with no problem. In fact I could cycle though all heroes and no hang happened.

And removing INTEL_DEBUG=nohiz made framerate increase significantly too, from 28 to 105.
Comment 8 Sergii Romantsov 2018-09-20 07:25:57 UTC
Patch https://patchwork.freedesktop.org/patch/250989/
v2 works on Skylake and Kabylake
Comment 9 Jason Ekstrand 2018-10-16 18:22:29 UTC
This should be fixed by the following commit in master:

commit 0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5 (public/master)
Author: Sergii Romantsov <sergii.romantsov@gmail.com>
Date:   Wed Sep 19 19:21:11 2018 +0300

    anv/skylake: disable ForceThreadDispatchEnable
    
    On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.
    
    -v2: enabling of  ForceThreadDispatchEnable is only for gen8, for
         gen9 and higher reverted enabling of PixelShaderHasUAV.
    
    -v3 (Jason Ekstrand): Rework the comments a bit.
    
    CC: Jason Ekstrand <jason.ekstrand@intel.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
    Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
    Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Comment 10 Jason Ekstrand 2018-10-16 19:56:09 UTC
Before we decide that this is completely done and dusted, could you please try the following branch as well.  It seems to fix Dota 2 for me.

https://gitlab.freedesktop.org/jekstrand/mesa/tree/wip/dota-dirt-hiz-fix
Comment 11 Sergii Romantsov 2018-10-17 11:29:49 UTC
Hello, Jason.
Checked Dota 2 with your branch (commit 1a9cac2a8ef19be9c796fd78f6ed577086f2172d).

And it hangs.
card0/error: error.tar.xz
decoded: error.decode.tar.xz
Comment 12 Sergii Romantsov 2018-10-17 11:30:32 UTC
Created attachment 142070 [details]
error.tar.xz
Comment 13 Sergii Romantsov 2018-10-17 11:30:57 UTC
Created attachment 142071 [details]
error.decode.tar.xz
Comment 14 leozinho29_eu 2018-10-17 13:44:36 UTC
When I downloaded that branch HEAD was 3c08f47027adab569e8f94d4c03c689c8f9cba69 and it had no GPU hangs. Apparently the commit was reverted after I already started the download.

I'll have to test with 1a9cac2a8ef19be9c796fd78f6ed577086f2172d.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.