Bug 107760 - GPU Hang when Playing DiRT 3 Complete Edition using Steam Play with DXVK
Summary: GPU Hang when Playing DiRT 3 Complete Edition using Steam Play with DXVK
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 107763
  Show dependency treegraph
 
Reported: 2018-08-30 15:46 UTC by leozinho29_eu
Modified: 2018-10-16 22:50 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
/sys/class/drm/card0/error (1.19 MB, text/plain)
2018-08-30 15:46 UTC, leozinho29_eu
Details
Patch to try which may fix the issue (1.46 KB, patch)
2018-08-30 17:08 UTC, Jason Ekstrand
Details | Splinter Review
The moment DiRT 3 froze and GPU Hang happened (1.59 MB, image/png)
2018-08-30 18:46 UTC, leozinho29_eu
Details
/sys/class/drm/card0/error (22.08 KB, text/plain)
2018-08-31 20:47 UTC, leozinho29_eu
Details
/sys/class/drm/card0/error (19.84 KB, text/plain)
2018-08-31 23:51 UTC, leozinho29_eu
Details
/sys/class/drm/card0/error (936.64 KB, text/plain)
2018-09-02 03:02 UTC, leozinho29_eu
Details
Second patch to try (2.15 KB, patch)
2018-09-03 14:00 UTC, Jason Ekstrand
Details | Splinter Review
/sys/class/drm/card0/error (23.37 KB, text/plain)
2018-09-03 16:27 UTC, leozinho29_eu
Details
/sys/class/drm/card0/error (1.09 MB, text/plain)
2018-09-03 17:07 UTC, leozinho29_eu
Details
Logs and screenshot (3.17 MB, application/gzip)
2018-09-04 01:34 UTC, leozinho29_eu
Details
Third patch (806 bytes, patch)
2018-09-04 03:49 UTC, Jason Ekstrand
Details | Splinter Review
Logs and screenshot (2.58 MB, application/gzip)
2018-09-04 16:25 UTC, leozinho29_eu
Details
Logs and screenshot (2.89 MB, application/gzip)
2018-09-06 01:49 UTC, leozinho29_eu
Details
Syslog errors when the system stopped working (4.25 KB, application/gzip)
2018-09-08 15:52 UTC, leozinho29_eu
Details
Logs and screenshot (2.80 MB, application/gzip)
2018-09-15 01:13 UTC, leozinho29_eu
Details
Logs (631.26 KB, application/gzip)
2018-09-19 21:05 UTC, leozinho29_eu
Details
attachment-27387-0.html (339 bytes, text/html)
2018-09-19 21:16 UTC, Jason Ekstrand
Details
Logs and screenshot (2.94 MB, application/gzip)
2018-09-21 15:49 UTC, leozinho29_eu
Details
The xml files (1.37 KB, application/gzip)
2018-09-26 15:38 UTC, leozinho29_eu
Details

Description leozinho29_eu 2018-08-30 15:46:30 UTC
Created attachment 141383 [details]
/sys/class/drm/card0/error

When playing the game DiRT 3 Complete Edition using Steam Play with DXVK, the game may cause a GPU Hang. Generally the system recovers but the game not, being needed to force it to exit.

Steps to reproduce:

-Install Steam for Linux;
-Join Steam Beta;
-Enable Steam Play for all titles;
-Install DiRT 3 Complete Edition;
-Play DiRT 3 Complete Edition.

The hang may happen after a few minutes, but it's possible to play for many hours with no issues. This issue is rare, but when it happens it shows:

[67118.867752] [drm] GPU HANG: ecode 9:0:0x86cdffff, in dirt3_game.exe [13301], reason: Hang on rcs0, action: reset
[67118.867754] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[67118.867755] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[67118.867756] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[67118.867756] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[67118.867757] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[67118.867772] i915 0000:00:02.0: Resetting rcs0 after gpu hang

Mesa was built using LLVM 7 instead of LLVM 6 available on Ubuntu 18.04. With LLVM 6 there were more stuttering issues and lower performance.

System specifications:

Processor: Intel Core i3-6100U;
Video: Intel HD Graphics 520;
Architecture: amd64;
Mesa: 18.3.0-devel (git-9de34b4dde);
Kernel version: 4.17.18-lowlatency;
Distribution: Xubuntu 18.04.1 amd64.
Comment 1 Jason Ekstrand 2018-08-30 17:08:44 UTC
Created attachment 141385 [details] [review]
Patch to try which may fix the issue

I've attached a patch to the bug which may fix the issue you're seeing.  Could you please give it a try and let me know if it helps.
Comment 2 leozinho29_eu 2018-08-30 18:46:05 UTC
This patch did not work. It's still having GPU hangs as before and, apparently, there was no new card0/error created.
Comment 3 leozinho29_eu 2018-08-30 18:46:55 UTC
Created attachment 141387 [details]
The moment DiRT 3 froze and GPU Hang happened
Comment 4 Jason Ekstrand 2018-08-31 17:09:38 UTC
Can you tell me exactly what graphics settings you're using?  A screenshot or two of the graphics settings menu would be perfect.

(In reply to leozinho29_eu from comment #2)
> This patch did not work. It's still having GPU hangs as before and,
> apparently, there was no new card0/error created.

You can reset the error state by writing something to /sys/class/drm/card0/error.  I usually do something like

$ sudo sh -c "echo 1 > /sys/class/drm/card0/error"

Otherwise, the system will only hang on to the first error state captured.
Comment 5 Jason Ekstrand 2018-08-31 17:10:21 UTC
Also, the more error states you can get the better.  Sometimes it helps to look at several of them and look for similarities.
Comment 6 leozinho29_eu 2018-08-31 20:47:31 UTC
Created attachment 141399 [details]
/sys/class/drm/card0/error

I think the patch has really improved the situation. It needed a very long time until another GPU hang happened, and I think the GPU hang happening now is related to memory pressure (the card0/error is significantly smaller).

When opening the game for the first time with low memory usage (1 GB of 7,76 GB) in the moment the car should appear there is a long freeze of around 2 seconds until the car appears. 

Then, after choosing the race and waiting the load, the sound of the screen transition plays but the screen is frozen for around 2 seconds again until then the transition happens. After that, when choosing to start the race, the screen turns black and then it shows the car(s) in the starting line. In the moment the screen should show the cars it's relatively common a long freeze of up to 5 seconds or even GPU hangs.

It seems like the video memory is not fast enough and Vulkan/DXVK/something else is not ready for unified memory. The notebook has 8 GB of RAM DDR4 2133 MHz in dual channel, it's not a slow setting for an integrated GPU.

The screenshots with settings:

https://i.imgur.com/w5naEne.jpg
https://i.imgur.com/qXsp4ld.jpg
https://i.imgur.com/A7Ntr8a.jpg

The only setting I may change sometimes is the resolution, reducing it to 960x540 for races in the morning.
Comment 7 Jason Ekstrand 2018-08-31 21:06:42 UTC
That error state looks a lot like the new hang is in the kernel somewhere.  I'll have some kernel people look at it and see what we can dig up.
Comment 8 leozinho29_eu 2018-08-31 23:51:33 UTC
Created attachment 141400 [details]
/sys/class/drm/card0/error

Note: I can't test newer kernels right now as anything newer than 4.17 makes LXD containers fail, so I have to wait for a LXD update before testing newer kernel versions.

Reading dmesg it's not absurd to think there is a issue with the kernel, as there are thousands of messages like:

[20571.238992] dragonrise 0003:0079:0006.0001: output queue full

Right now there are 587692 messages like this one. They are related to the controller I have. Maybe this is unrelated to this hang, but all other titles I tested don't have this issue, being it exclusive to DiRT 3 Complete Edition, so far.

Here is a brief video showing DiRT 3 Complete Edition gameplay while the system was under memory pressure (6,5 GB + 500 MB swap):

https://cdn.discordapp.com/attachments/457747189616214019/485216723688226847/saida2.webm

Without the patch from the attachment 141385 [details] [review], there were at least 3 potential occasions where a GPU hang and consequent game freeze would have happened. At 2 seconds (after selecting "Race"), at 13 seconds, when the camera changes to the pilot's camera and at 18 seconds, when the green lights were lit.

The video is this short because after that I fell off the road and the computer froze for 5 minutes due to high memory usage (playing + recording seems to be too much), but even with that extremely long freeze the computer returned and DiRT 3 did not crash, which was unthinkable before the patch.
Comment 9 Jason Ekstrand 2018-09-01 14:09:49 UTC
The HiZ fix has landed in master:

commit 62378c5e9e5e1863bf8695af1df68b0338f5d4ea (public/master)
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Thu Aug 30 12:05:06 2018 -0500

    anv/blorp: Do more flushing around HiZ clears
    
    We make the flush after a HiZ clear unconditional and add a flush/stall
    before the clear as well.
    
    Cc: mesa-stable@lists.freedesktop.org
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
    Reviewed-by: Chad Versace <chadversary@chromium.org>
    Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Comment 10 Jason Ekstrand 2018-09-01 16:34:32 UTC
As far as the low-memory hangs go, I'm not sure what to tell you.  Graphics on a UMA with low memory is always difficult because your graphics may get swapped out to disk just like anything else and when that happens, things start timing out and apps get very confused.  I'm not at all surprised that you're seeing kernel hangs under those conditions.  It's also possible that what you're seeing isn't GPU hangs at all but it's something getting swapped to and from disk and causing things to stall.  If that's the case, then you're just running into normal Linux low-memory behavior.
Comment 11 leozinho29_eu 2018-09-02 03:02:02 UTC
Created attachment 141409 [details]
/sys/class/drm/card0/error

After this GPU hang happened and this report was generated, the system behaved as if it was in an unstable state.

It happened when I was using 4.18.5 kernel. To simulate a memory pressure situation, I used the following command:

sh -c "echo 2048 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages"

That creates huge pages that use 4 GB of RAM and can't be swapped. Then I opened the game and tried to play it but the GPU hang happened when showing the track. It's important to note htop was still showing 1 GB of free memory when the hang happened.

The worrisome event was after this hang. The system crashed when I used the command `lxc list` in the terminal. The sound from Audacity kept looping a very short portion of the track endlessly. Nothing was working, nor tty, nor num lock, nor opening and closing the notebook lid would turn on or off the panel.

In a second try under the same conditions, the system crashed when I opened xfce4-display-settings. It crashed as hard as before.

No logs were generated after that crashes, as if the logging services had crashed too.

I understand the allocation of huge pages to test memory pressure has reached the point of experiment, but it was useful to find this "unstable state" problem.
Comment 12 Jason Ekstrand 2018-09-03 13:52:29 UTC
That latest error state appears to have the same problem as the first error state so the patch which we merged, while it helps, appears not to be a full fix.
Comment 13 Jason Ekstrand 2018-09-03 14:00:29 UTC
Created attachment 141426 [details] [review]
Second patch to try

I've attached a second patch for you to try.  This patch works together with the first patch to do a bit (not too much) more flushing in the hopes that it will sort out your hang.
Comment 14 leozinho29_eu 2018-09-03 16:27:40 UTC
Created attachment 141428 [details]
/sys/class/drm/card0/error

This second patch is great. I could no longer make a hang happen by allocating 4 GB of huge pages. The game seemed to run better, with lower loading times (that freezes when loading).

To see how far it could go, I tested by allocating 5 GB of huge pages (leaving only 2,76 GB to the system). I still could play the game well, and when I tried to record the game with FFmpeg with settings intentionally above what the system is capable to encode, the game FPS fell to 0,0 for 5 minutes and, when the OOM killer killed the FFmpeg process, the game resumed! I am impressed.

Later, it crashed with the dump attached when transitioning after selecting "RACE". It seems the kernel problem again, but I know I pushed way too much this time (it still had 960 MB of swap in a 5400 RPM HDD), exactly to see how it would fail (if it would).

Is there an answer about the hangs that seem related to the kernel? I discovered that the kernels 4.18 and 4.19 not only make the LXD containers fail, there is a regression that makes my gamepad fail to work too.
Comment 15 leozinho29_eu 2018-09-03 17:07:40 UTC
Created attachment 141429 [details]
/sys/class/drm/card0/error

Is there any restriction/recommendation as "after a GPU hang, always reboot"? After the first GPU hang it seems easy to trigger other GPU hangs. The attachment is a GPU hang after the first one I sent today.

This was the screen when the hang happened: https://i.imgur.com/W3ZUIjj.png
Comment 16 leozinho29_eu 2018-09-03 20:30:54 UTC
I'm seeing a pattern in the GPU hangs (I don't know how to interpret them, sorry). The GPU hangs are happening mostly when the track is in the morning or evening. With other climates I'm no longer seeing GPU hangs.

I noticed the shadows blink when some of the GPU hangs happened. 

One GPU hang happened after selecting the track, before appearing the options such as "RACE" and "VEHICLE SETUP". This is the screenshot when it happened: https://i.imgur.com/9EE3Kto.png and the error file can be seen at: https://paste.ubuntu.com/p/WPrj7k4N7C/

Another happened when watching the replay, unfortunately I forgot to echo 1 and have no card0/error. This was the screen: https://i.imgur.com/d02ptX7.png

A third one happened (the first in a place other than Monte Carlo since the second patch) when watching the replay too. The screen can be seen at https://i.imgur.com/T2wDrue.png and the error file can be seen at https://paste.ubuntu.com/p/ZygZ3qMwkG/

I have increased verbosity from the logs. I have noticed two logs related to DXVK. The log called dirt3_game_dxgi.log has: 

info:  Game: dirt3_game.exe
info:  DXVK: v0.62-2-g4ab5682
warn:  OpenVR: Failed to locate module
err:   Required Vulkan extension VK_KHR_get_physical_device_properties2 not supported
err:   Required Vulkan extension VK_KHR_surface not supported
err:   Required Vulkan extension VK_KHR_win32_surface not supported
err:   DxvkInstance: Failed to create instance

The log called dirt3_game_dxgi.log has:

err:   D3D11CreateDevice: Failed to create a DXGI factory

I have noticed that from the three extensions shown in the error, only VK_KHR_win32_surface is not available. May the lack of this particular extension be the cause? But isn't it exclusive to Windows (as it has win32 in its name)?
Comment 17 leozinho29_eu 2018-09-04 01:34:36 UTC
Created attachment 141435 [details]
Logs and screenshot

I updated Proton to 3.7-5 Beta which uses DXVK 0.70. Now there are DXVK and Steam logs with more meaningful messages, hopefully they will help us to understand what is happening.

The attached file has the relevant logs and the screenshot showing the moment the game froze.
Comment 18 Jason Ekstrand 2018-09-04 03:49:58 UTC
Created attachment 141436 [details] [review]
Third patch

Please apply the newly attached patch.  I highly doubt it'll fix anything but it will provide me with more data next time you get a hang and send me an error state.
Comment 19 leozinho29_eu 2018-09-04 16:25:04 UTC
Created attachment 141448 [details]
Logs and screenshot

Here are the logs and the screenshot with the moment the GPU hang happened. I don't know what this patch should do, as it seems to be as big as the others.
Comment 20 leozinho29_eu 2018-09-06 01:49:36 UTC
Created attachment 141466 [details]
Logs and screenshot

The attached file has logs and the screenshot. What the screenshot does not show is that all shadows disappeared while the hang was happening. After the game froze the shadows reappeared. May the shadows be the culprit of this hang I'm experiencing?

I'm seeing GPU hangs in stages in the morning. Under rain, fog, wet or night I've seen no hangs after the patches. I would have to test more in evening, but so far I was unable to cause the hang in tracks at evening.

To test this hypothesis, I ran in a stage at night (Baroque, the same from the hangs from this and the previous screenshots in the tar.gz files when in morning) and let its replay playing while I was asleep. 7 hours later the game was still running, but with significant memory usage.

This is the image showing memory usage after the replay playing for 7 hours: https://i.imgur.com/65Nxk7i.png

Notice the high swap usage even with many gigabytes "free". Opening new applications made more swap happen, it was never using anything further than 2,86 GB, as if that 5038 MiB "free" were, in fact, used. When reading logs, an information makes this strange behavior be a bit logic. In the log file dirt3_game_dxgi.log there is: 

info:    Memory Heap[0]: 
info:      Size: 4936 MiB

I don't think this is a coincidence. As it kept running for so long, maybe it allocated all memory DXVK asked, basically locking it from other applications and making most of the system swap.

Please understand I'm not a developer, I'm just making guesses with the information I have and I can understand.
Comment 21 Jason Ekstrand 2018-09-07 19:59:31 UTC
Would you mind setting the INTEL_DEBUG environment variable to "nohiz" the next few times you play?  If that prevents the hang then it tells me my first notion was correct.  If it doesn't, then I need to look elsewhere.
Comment 22 leozinho29_eu 2018-09-08 15:52:49 UTC
Created attachment 141489 [details]
Syslog errors when the system stopped working

Adding INTEL_DEBUG=nohiz made the GPU hang stop and I noticed a performance decrease. I was able to play in stages in the morning or evening with no GPU hangs.

I let the game replaying a stage in the morning. The game did not crash, but the memory usage error happened again: 2,4 GiB used, 5,36 GiB free but unusable and 2,6 GiB of swap. Ultimately the system crashes. The information from syslog from two of there crashes is attached.

Should I report this memory issue as a new bug?
Comment 23 leozinho29_eu 2018-09-15 01:13:26 UTC
Created attachment 141568 [details]
Logs and screenshot

I have updated Mesa to git-c79aad30ae, which edited the files patched before so I had to revert the patches. The are GPU hangs still happening and INTEL_DEBUG=nohiz is still required to avoid the GPU hangs.
Comment 24 Jason Ekstrand 2018-09-19 15:53:07 UTC
Can you please try reverting 79270d2140ec4fe5e4351f35150ed2d14687af07?  A similar hang has been bisected to it so maybe that's the problem you're having.
Comment 25 leozinho29_eu 2018-09-19 21:05:10 UTC
Created attachment 141655 [details]
Logs

Reverting that commit was not enough to correct this issue to me, GPU hangs are still happening.

The attached file has the relevant logs of two GPU hangs that happened.
Comment 26 Jason Ekstrand 2018-09-19 21:16:56 UTC
Created attachment 141656 [details]
attachment-27387-0.html

That's a bummer.  I was hoping that maybe the Dota big was related and 
would help with the debugging. I guess we get to keep guessing in the dark...
Comment 27 leozinho29_eu 2018-09-19 23:27:24 UTC
Definitively, the cause of GPU hangs on DiRT 3 Complete Edition are the shadows. In the settings, if Shadows are set to anything higher than Ultra Low the GPU hang may happen on races at morning, evening or even fog (there are detailed shadows in fog races starting from Low settings).

Changing the settings to Ultra Low makes the GPU hangs stop. Ultra Low has very simple shadows, as if they are just drawn in the track instead being dynamic, with the car passing on a shadowed area but the car is not shadowed, for example. With Low setting and better the car is shadowed when passing in a shadowed area, as a comparison.

I noticed only now the second dmesg was truncated because of the drangonrise messages. The second dmesg is the continuation of the first dmesg.
Comment 28 Denis 2018-09-21 13:32:04 UTC
Hi guys. I also have this game, but unfortunately I couldn't reproduce the issue (followed steps from last comment, changing shadows).
Played 2 rounds on 2 maps "morning sunshine" and "wet". Also launched game via Steam beta).

Here is my configuration:

vulkaninfo | grep 'apiVersion'
	apiVersion     = 0x401000  (1.1.0)

Linux and-Vostro-15-3568 4.18.0-041800-generic 
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2) 
Intel Core i3-6006U
Ubuntu 16.04

____________________________________

also tried 1 time on another laptop, KBL, and also couldn't reproduce neither hang nor crash:

vulkaninfo | grep "apiVersion"	apiVersion     = 0x401050  (1.1.80)
Intel Core i7-7500U
4.17.0-041700-generic
Mesa DRI Intel(R) HD Graphics 620 (Kaby Lake GT2) 
Ubuntu 16.04

Will try to play more.
Comment 29 leozinho29_eu 2018-09-21 15:49:02 UTC
Created attachment 141675 [details]
Logs and screenshot

The best way I found to reproduce the hang is to choose the following:

Single Player -> Single Race -> Rally -> Monte Carlo -> Baroque -> 60s -> Morning, Sun or Fog -> Renault Alpine. I drive in first person and there are moments where there is Sun reflex in the panel, maybe contributing to this.

As it's a pretty long race (around 3min45s with 60s vehicles) and there is one part after the bridge with many trees, I found it is a pretty reliable stage to reproduce the hang.

If it did not hang in the race let the replay playing.

The attached file has dmesg, logs, the card0/error and a screenshot with the moment the GPU hang happened.

Dmesg is a bit strange because the GPU hang message is buried between the dragonrise messages.
Comment 30 Marina Chernish 2018-09-26 09:49:04 UTC
Hi,
I have tried to reproduce the issue on the mentioned configuration: Single Player -> Single Race -> Rally -> Monte Carlo -> Baroque -> 60s -> Morning, Sun; Fog -> Renault Alpine. Also observed sun reflection on the panel but still no hangs while playing or replaying the race. I tried several times with different graphic settings.
Graphic options were turned to Medium and then to High but still no issue.

Environment I was playing on:
Ubuntu 18 kernel 4.18.0
Intel Core i-3-6006U
DRI Intel HD Graphics 520 (Skylake GT2)
vulkaninfo | grep "apiVersion"	apiVersion     = 0x401050  (1.1.80)
Comment 31 Marina Chernish 2018-09-26 10:01:54 UTC
Also I observed problems when starting the game: just initial image "dirt3" appears on the black background and that's it - game doesn't proceed to load. It happened quit often and it finally started after relaunching the game or steam.
No hangs was observed related to this.

Have you faced similar problem?
Comment 32 leozinho29_eu 2018-09-26 14:17:00 UTC
The issue of the game failing to open is common and seems to be unrelated to Intel, according to https://spcr.netlify.com/ (on raw reports, as it's unlisted on Steam).

There is one important detail: DiRT 3 has DirectX 9 and DirectX 11 modes. DX11 mode can use DXVK while DX9 uses OpenGL translation from Wine.

One way reliable to know if you are opening the game in DirectX 9 or 11 mode is in the game opening. If the first small square shows the DiRT 3 logo, you are using DX9 mode, which does not use Vulkan and won't be able to reproduce this. With DX11 mode, the small square does not show the DiRT 3 logo (some graphical bug, looks like).

There is one particular configuration file at ${WHERESTEAMISINSTALLED}/Steam/steamapps/compatdata/321040/pfx/drive_c/users/steamuser/My Documents/My Games/DiRT3/hardwaresettings/hardware_settings_config.xml

One of the XML settings is:

<directx forcedx9="false" />

This settings has to be set to false, otherwise DXVK won't be used. If set to true it's using DX9.

I recommend enable the DXVK HUD by setting in the DiRT 3 launch settings on Steam the following:

DXVK_HUD=1 %command%

So you can be sure that the game is using DXVK.

It appears you are using DX9 because of the square that shows the logo, with DX11 the square should be empty.
Comment 33 Marina Chernish 2018-09-26 15:27:02 UTC
Thank you for the hint!
Me and Denis checked game settings and have <directx forcedx9="false" />. Also no logo at the initial small black square is shown.

Since hardware_settings_config.xml contains settings of the game it would be helpful if you could share it. So we could try the exact settings where this issue happened. 

Thanks!
Comment 34 leozinho29_eu 2018-09-26 15:38:41 UTC
Created attachment 141752 [details]
The xml files

The XML files are attached
Comment 35 leozinho29_eu 2018-09-29 00:43:20 UTC
I updated Mesa, Vulkan and Proton yesterday, I couldn't in the last few days because my LXD containers were failing to start as I was using an old LXD version.

With new Mesa 18.3.0-devel (git-90cda2a005), Vulkan 1.1.85 and Proton 3.7-7 Beta, I can no longer reproduce this GPU hang. To be sure, I let replays playing in two Morning, Sun stages (Finland and Monte Carlo).

After 21358.817 seconds (nearly six hours) no GPU hang happened. As apparently nothing can be perfect, there was a Wine error and the game froze because of the Wine error.

No GPU issues happened, as terminals and pavucontrol were still working fine and dmesg had no information about a hang. In a GPU hang episode everything would freeze and on this case only DiRT 3 froze. The message in the log that appeared was:

21358.817:0008:002a:err:ntdll:RtlpWaitForCriticalSection section 0x7297f68 "?" wait timed out in thread 002a, blocked by 0057, retrying (60 sec)

Which seems Wine related.

I can test a bit further but from what I tested so far GPU hangs are no longer happening with this new Mesa version.
Comment 36 leozinho29_eu 2018-10-16 01:27:34 UTC
Considering Mesa HEAD 26a2ce35aba4e63b17dfb95f7b0d9e61fa71cc72:

Without the patches from https://bugs.freedesktop.org/show_bug.cgi?id=107941 there are GPU hangs still happening. All other cases of GPU hangs I had before are no longer happening (those which INTEL_DEBUG=nohiz was needed) so, once that patch to fix the issue with Dota 2 lands, DiRT 3 should work with no hangs with master.

For now, applying that patch makes DiRT 3 work well and I no longer face GPU hangs caused by Vulkan or DXVK.
Comment 37 Jason Ekstrand 2018-10-16 18:23:23 UTC
(In reply to leozinho29_eu from comment #36)
> Considering Mesa HEAD 26a2ce35aba4e63b17dfb95f7b0d9e61fa71cc72:
> 
> Without the patches from https://bugs.freedesktop.org/show_bug.cgi?id=107941
> there are GPU hangs still happening. All other cases of GPU hangs I had
> before are no longer happening (those which INTEL_DEBUG=nohiz was needed)
> so, once that patch to fix the issue with Dota 2 lands, DiRT 3 should work
> with no hangs with master.
> 
> For now, applying that patch makes DiRT 3 work well and I no longer face GPU
> hangs caused by Vulkan or DXVK.

Cool.  Glad that fixes it.  I've merged that patch to master:

commit 0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5 (public/master)
Author: Sergii Romantsov <sergii.romantsov@gmail.com>
Date:   Wed Sep 19 19:21:11 2018 +0300

    anv/skylake: disable ForceThreadDispatchEnable
    
    On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.
    
    -v2: enabling of  ForceThreadDispatchEnable is only for gen8, for
         gen9 and higher reverted enabling of PixelShaderHasUAV.
    
    -v3 (Jason Ekstrand): Rework the comments a bit.
    
    CC: Jason Ekstrand <jason.ekstrand@intel.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
    Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
    Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Comment 38 Jason Ekstrand 2018-10-16 19:55:41 UTC
Before we decide that this is completely done and dusted, could you please try this branch that contains a different fix for the issue.  I know it fixes Dota2; I'd like to know if it fixes DiRT 3 as well.

https://gitlab.freedesktop.org/jekstrand/mesa/tree/wip/dota-dirt-hiz-fix
Comment 39 leozinho29_eu 2018-10-16 20:27:51 UTC
Meson is failing because it is asking for:

Native dependency libdrm_intel found: NO found '2.4.91' but need: '>=2.4.93'

Even if the meson.build file has:

 _drm_intel_ver = '2.4.75'

Apparently meson is thinking libdrm_intel version is the same amdgpu and the freedreno versions require instead of its specified version.

Autotools worked but the build will take some time, as I suppose it will need the 64-bit and 32-bit builds as DiRT 3 is 32-bit and Dota 2 is 64-bit.
Comment 40 leozinho29_eu 2018-10-16 22:50:13 UTC
I tested with that Mesa build and both DiRT 3 and Dota 2 work properly. No GPU hang happened.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.