Bug 104654 - r600/sb: Alien Isolation GPU lock
Summary: r600/sb: Alien Isolation GPU lock
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-16 10:52 UTC by Gert Wollny
Modified: 2018-03-06 18:21 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
resduce stuttering (608 bytes, patch)
2018-01-19 08:04 UTC, Gert Wollny
Details | Splinter Review
dmesg after reboot shoing the GPU lockups (82.76 KB, text/x-log)
2018-01-21 11:15 UTC, Gert Wollny
Details
Prelimiary fix, needs testing (1.57 KB, patch)
2018-02-24 01:12 UTC, Gert Wollny
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Gert Wollny 2018-01-16 10:52:57 UTC
When running Alien Isolation with the sb optimizer it locks up in Mission 10, in the beginning of the area "Gemini Exoplanet Solutions". With R600_DEBUG=nosb the problem can be alleviated. 

My first hunch was that the problem lies in sb does not always ending an ALU clause after a KILL instruction (which according to the HW documentation should be the case, but after trying to force this I'm not so sure any more, because this kind of wrong scheduling happens also earlier in the game, even in the start-up screen.

Graphics card: 6870 HD (barts).
Mesa-version: git 8045f01e2 + some patches to glsl_to_tgsi regarding register and array merging that should have no influence on r600/sb.
Comment 1 Gert Wollny 2018-01-16 12:07:25 UTC
It also locks up if the KILL instruction is not issued at all, so it's scheduling should not be the problem.
Comment 2 Gert Wollny 2018-01-17 00:25:03 UTC
Actually, the GPU lockups also happen without sb, only not to the point that one has to kill the program. It is likely that #104665 is actually a duplicate of this.
Comment 3 Gert Wollny 2018-01-17 00:33:40 UTC
Now, considering that compute shaders don't use sb there are likely two issues at hand here. Without sb AI hangs quite often for a second or so (especially when opening the map), and it completely hung like reported. Without sb it has a lot less issues and never to the point that the program needs to be killed.
Comment 4 Gert Wollny 2018-01-19 08:04:57 UTC
Created attachment 136842 [details] [review]
resduce stuttering

This change is taken from Dave Airlie CS WIP tree and is one of the hacks  that didn't make it into the final patch. It seems to reduce the GPU lockups in AI. There is still at least one reproducible lockup when Alt-Tabbing out of the game. It doesn't do anything for #104665 though.
Comment 5 Gert Wollny 2018-01-21 11:15:32 UTC
Created attachment 136871 [details]
dmesg after reboot shoing the GPU lockups
Comment 6 Gert Wollny 2018-01-21 11:34:30 UTC
The proposed patch is actually useless. Given the way the games provides saves I didn't initially test it with the specific scene that had the lockup, and I didn't see any further lockups so far. However, going back I the bug is still there.  

Running the game with

 export MESA_EXTENSION_OVERRIDE=-GL_ARB_compute_shader

also doesn't have an influence on the GPU lockup in the described scene, so it shouldn't be related to compute shaders at all. Even though the shader TGSI dump still reports some COMP shaders getting processed, but apparently they are just compiled but not run, because the scene shows visible artifacts that are not there when the compute shaders are enabled.
Comment 7 russianneuromancer 2018-02-22 20:51:38 UTC
> but apparently they are just compiled but not run, because the scene shows visible artifacts that are not there when the compute shaders are enabled

Does it looks like on screenshot in bug 105213 by any chance?
Comment 8 Darius Spitznagel 2018-02-22 22:33:14 UTC
Maybe this helps.

Every "older" feral port has it's own "shader warmer".

Disable it in the prefereces file in ~/.local/share/feral-interactive/AlienIsolation like below...

<value name="EnableShaderWarmer" type="integer">0</value>
Comment 9 Gert Wollny 2018-02-23 09:06:35 UTC
@russianneuromancer@ya.ru

That's exactly the type of visual artefacts that I see when compute shaders are disabled. In any case for this bug it is not important, I only wanted to see whether this bug and [1] are related (which they are not)

[1] https://bugs.freedesktop.org/show_bug.cgi?id=104665
Comment 10 Gert Wollny 2018-02-23 18:59:13 UTC
@Darius Spitznagel: This flag doesn't change anything.
Comment 11 Gert Wollny 2018-02-24 01:12:35 UTC
Created attachment 137570 [details] [review]
Prelimiary fix, needs testing

The problem is actually not directly with sb: in the translation from TGSI the jump offsets are not correctly evaluated when the last CF slot is ALU_EXT. I have a patch ready, only need to test whether there are any piglit regressions.
Comment 12 Gert Wollny 2018-03-06 17:39:49 UTC
This has been fixed in git as of 
c7cadcbda47537d474eea52b9e77e57ef9287f9b


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.