105670 – [regression][hang] Trine1EE hangs GPU after loading screen on Mesa3D-17.3 and later

Bug 105670 - [regression][hang] Trine1EE hangs GPU after loading screen on Mesa3D-17.3 and later

Summary: [regression][hang] Trine1EE hangs GPU after loading screen on Mesa3D-17.3 and...

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	glsl-compiler (show other bugs)
Version:	17.3
Hardware:	Other All

Importance:	medium normal
Assignee:	mesa-dev
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:	regression

Depends on:
Blocks:

Reported:	2018-03-22 00:15 UTC by i.kalvachev
Modified:	2018-03-27 23:54 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
Fragment/Pixel Shader generated by Wine3.3, extracted from lookup of #55251 op in the trace. (7.80 KB, text/plain) 2018-03-24 12:07 UTC, i.kalvachev	Details
View All

Description i.kalvachev 2018-03-22 00:15:23 UTC

The game is Trine1 Enchanted Edition, running under Wine-3.3.

The game works fine with Mesa-17.2.0, but with Mesa-17.3.0 hangs right after the loading screen.

The hang is soft, the driver tries to reset itself each 10 seconds and turns off my monitor. I am able to switch to text console (kms one) and kill the Xorg server, then reboot the machine. The kernel driver refuses to accept any new commands.


Using software render I was able to capture a small apitrace (90MB compressed) that successfully reproduces the issue. It could be found here:

https://drive.google.com/open?id=1RNKExfdBXUCN7SIhcrdiMwsvwIoCVMTg

By using the qapitrace lookup, I managed to locate the exact operation that hangs. Not surprising, it is a draw operation:

#55251 - works
#55252 - hangs


I tried to git bisect between 17.2.0 and 17.3.0, but I only managed to narrow it down to few steps:
git bisect good 375c4868efa3cf549699557989c8f5c08c0710f0
git bisect bad  09f6bd5ef27c1b16b1468441b070b60c2d57523d

The rest of my bisect log is full of skips, because I cannot find a commit that would work at all. All of them fail to run even `glxgears`, some crash, other give asserts in R600 code, in xmlconfig etc...


My hardware is AMD Radeon HD5670 Redwood Evergreen (R600 driver). Using "R600_DEBUG=nosb" does not workaround the issue.
Also the bug is reproducible on AMD RX480, running latest mesa3d master, llvm-svn 7.0.0svn_r328112 and experimental kernel.

Comment 1 Tapani Pälli 2018-03-22 11:33:10 UTC

Could be related to loop unrolling (?) While playing back trace on i965 I saw this:

--- 8< ---
glretrace: ../src/compiler/nir/nir_opt_loop_unroll.c:414: complex_unroll: Assertion `unroll_loc->type == nir_cf_node_block && exec_list_is_empty(&nir_cf_node_as_block(unroll_loc)->instr_list)' failed.

Comment 2 Gert Wollny 2018-03-22 17:00:11 UTC

I can confirm this on Radeon 6870 HD. 

I was able to track the issue to the series beginning with  

   ab23b759f24 glsl: don't drop instructions from unreachable 
               terminators continue branch

The first time around I had a few difficulties get the correct patch, it seems to be either 646621c66da9 or 7a7fb90af75. In any case, yes it is related to loop unrolling.

The original TGSI goes like 

BGNLOOP
  ISGE TEMP[13].x, TEMP[8].xxxx, IMM[7].yyyy
  UIF TEMP[13].xxxx
     BRK
  ENDIF
  ...
ENDLOOP

and with the series in question applied it is 

BGNLOOP
  ISLT TEMP[13].x, TEMP[8].xxxx, IMM[7].yyyy
  UIF TEMP[13].xxxx
  ...
  ENDI
ENDLOOP

This UIF should have an else path with the BRK to resemble the original code. 
(There are more BRK statements in the LOOP but they are the same in both versions.

Regarding bisecting this, after a failure one usually must reboot the system, otherwise the graphics card is in a bad state. But given the nature of the bug one should also be able to reproduce the endless loop with LIBGL_ALWAYS_SOFTWARE=1 thereby not clobbering the hardware.

Comment 3 Roland Scheidegger 2018-03-22 17:28:53 UTC

(In reply to Gert Wollny from comment #2)
> Regarding bisecting this, after a failure one usually must reboot the
> system, otherwise the graphics card is in a bad state. But given the nature
> of the bug one should also be able to reproduce the endless loop with
> LIBGL_ALWAYS_SOFTWARE=1 thereby not clobbering the hardware.

Just to avoid any confusion, llvmpipe won't show an infinite loop, since it uses a loop limiter on all loops (might still take quite some time, though). But of course you can still see the bogus tgsi.

Comment 4 almos 2018-03-22 19:53:44 UTC

The problem is not loop unrolling. The problem is that userspace code can hang the GPU unrecoverably, and thus bringing down the entire system.

BTW I can confirm this on Pitcairn with radeon drm in linux 4.15.

Comment 5 Roland Scheidegger 2018-03-22 20:15:03 UTC

(In reply to almos from comment #4)
> The problem is not loop unrolling. The problem is that userspace code can
> hang the GPU unrecoverably, and thus bringing down the entire system.
> 
> BTW I can confirm this on Pitcairn with radeon drm in linux 4.15.

There isn't much you can do about shaders not terminating (loop limiting in llvmpipe is quite a hack, you could legitimately have a loop which has more iterations).

But yes, gpu reset actually working reliably would be nice. I haven't really seen it succeed lately neither (but it can happen...).

I disagree that loop unrolling isn't a problem. Clearly there's two problems:
- loop unrolling shouldn't turn perfectly fine loops into loops which don't terminate, this is what this bug is about.
- gpu reset should work reliably

Comment 6 Timothy Arceri 2018-03-22 22:47:13 UTC

(In reply to Gert Wollny from comment #2)
> I can confirm this on Radeon 6870 HD. 
> 
> I was able to track the issue to the series beginning with  
> 
>    ab23b759f24 glsl: don't drop instructions from unreachable 
>                terminators continue branch
> 
> The first time around I had a few difficulties get the correct patch, it
> seems to be either 646621c66da9 or 7a7fb90af75. In any case, yes it is
> related to loop unrolling.
> 
> The original TGSI goes like 
> 
> BGNLOOP
>   ISGE TEMP[13].x, TEMP[8].xxxx, IMM[7].yyyy
>   UIF TEMP[13].xxxx
>      BRK
>   ENDIF
>   ...
> ENDLOOP
> 
> and with the series in question applied it is 
> 
> BGNLOOP
>   ISLT TEMP[13].x, TEMP[8].xxxx, IMM[7].yyyy
>   UIF TEMP[13].xxxx
>   ...
>   ENDI
> ENDLOOP
> 
> This UIF should have an else path with the BRK to resemble the original
> code. 
> (There are more BRK statements in the LOOP but they are the same in both
> versions.
> 
> Regarding bisecting this, after a failure one usually must reboot the
> system, otherwise the graphics card is in a bad state. But given the nature
> of the bug one should also be able to reproduce the endless loop with
> LIBGL_ALWAYS_SOFTWARE=1 thereby not clobbering the hardware.

Thanks for tracking down the problem. Do you think you would be able to create a piglit test to reproduce the bug? Or failing that can you copy the full glsl loop an attach it to the bug so that I can attempt to recreate the issue in piglit. Thanks.

Comment 7 Gert Wollny 2018-03-23 17:49:39 UTC

I tried to create a piglit, but so far i failed. So here is a dumped down version of the loop contained in the relevant shader: 

const vec4 ps_lc21 = vec4(5.00000000e-01, -5.00000000e-01, 
                          0.00000000e+00, 0.00000000e+00);
const vec4 ps_lc20 = vec4(1.00000000e+00, 3.92156886e-03, 
                          1.53787005e-05, 2.00000003e-01);
const vec4 ps_lc19 = vec4(3.30000013e-01, -1.70000002e-01, 
                          4.00000000e+00, 1.00000001e-01);
const vec4 ps_lc18 = vec4(2.00000000e+00, -1.00000000e+00, 
                          1.00000005e-03, 0.00000000e+00);
const vec4 ps_lc17 = vec4(-5.00000000e-01, 1.00000000e+00, 
                          2.00000000e+00, -9.99999978e-03);

R0.w = (ps_lc17.y);
for (tmpInt0 = 0; tmpInt0 < 4; tmpInt0++) {
   R6.xyz = ((R5.xyz * ps_lc20.www) + R6.xyz);
   R9.xyzw = (textureLod(ps_sampler4, R6.xy, R6.w).xyzw);
   R3.w = ((R9.w * ps_lc20.y) + R9.z);
   R5.w = ((R3.w * -ps_c[11].y) + ps_in[7].z);
   tmp0.w = (R5.w >= 0.0 ? abs(ps_lc18.w) : abs(ps_lc18.y));
   R5.w = (tmp0.w);
   R3.w = ((R3.w * ps_c[11].y) + -R6.z);
   tmp0.w = (R3.w >= 0.0 ? abs(ps_lc18.w) : abs(ps_lc18.y));
   R3.w = (tmp0.w);
   R3.w = (R5.w * R3.w);
   if (R3.w != -R3.w) {
      R8.xy = (R6.xy);
      R3.w = (R0.w);
      if (R3.w != -R3.w) 
         break;
   }
}

the working TGSI looks like this:

IMM[0] FLT32 {    2.0000,    -1.0000,     1.0000,     0.0010}
IMM[1] FLT32 {    0.3300,    -0.1700,    -0.0000,    -1.0000}
IMM[2] FLT32 {   -0.5000,     0.0000,     0.1000,    -2.0000}
IMM[3] FLT32 {    1.0000,     0.0039,     0.0000,    -0.0100}
IMM[4] FLT32 {   -0.5000,     1.0000,    -0.0100,     0.0000}
IMM[5] FLT32 {   -1.0000,     0.0000,     4.0000,     0.5000}
IMM[6] FLT32 {    0.5000,    -0.5000,     0.2000,     0.0000}
IMM[7] INT32 {0, 4, 1, 0}

...
218  MOV TEMP[2].w, IMM[0].zzzz
219  MOV TEMP[8].x, IMM[7].xxxx
220: BGNLOOP
221:   ISGE TEMP[13].x, TEMP[8].xxxx, IMM[7].yyyy
222:   UIF TEMP[13].xxxx
223:     BRK
224:   ENDIF
225:   MAD TEMP[14].xyz, TEMP[11].xyzz, IMM[6].zzzz, TEMP[14].xyzz
226:   MOV TEMP[13].xy, TEMP[14].xyyy
227:   MOV TEMP[13].w, TEMP[14].wwww
228:   TXL TEMP[13], TEMP[13], SAMP[4], 2D
229:   MOV TEMP[15], TEMP[13]
230:   MAD TEMP[13].x, TEMP[13].wwww, IMM[3].yyyy, TEMP[13].zzzz
231:   MAD TEMP[15].x, TEMP[13].xxxx, -CONST[0][11].yyyy, TEMP[1].zzzz
232:   FSGE TEMP[15].x, TEMP[15].xxxx, IMM[2].yyyy
233:   UIF TEMP[15].xxxx
234:     MOV TEMP[9].x, IMM[2].yyyy
235:   ELSE
236:     MOV TEMP[9].x, IMM[0].zzzz
237:   ENDIF
238:   MOV TEMP[11].w, TEMP[9].xxxx
239:   MAD TEMP[13].x, TEMP[13].xxxx, CONST[0][11].yyyy, -TEMP[14].zzzz
240:   FSGE TEMP[13].x, TEMP[13].xxxx, IMM[2].yyyy
241:   UIF TEMP[13].xxxx
242:     MOV TEMP[12].x, IMM[2].yyyy
243:   ELSE
244:     MOV TEMP[12].x, IMM[0].zzzz
245:   ENDIF
246:   MOV TEMP[6].w, TEMP[12].xxxx
247:   MUL TEMP[6].x, TEMP[9].xxxx, TEMP[12].xxxx
248:   MOV TEMP[7].w, TEMP[6].xxxx
249:   FSNE TEMP[6].x, TEMP[6].xxxx, -TEMP[6].xxxx
250:   UIF TEMP[6].xxxx
251:     MOV TEMP[4].xy, TEMP[14].xyxx
252:     MOV TEMP[7].w, TEMP[2].wwww
253:     FSNE TEMP[6].x, TEMP[2].wwww, -TEMP[2].wwww
254:     UIF TEMP[6].xxxx
255:       BRK
256:     ENDIF
257:   ENDIF
258:   UADD TEMP[8].x, TEMP[8].xxxx, IMM[7].zzzz
259: ENDLOOP

Comment 8 i.kalvachev 2018-03-24 12:07:35 UTC

Created attachment 138335 [details]
Fragment/Pixel Shader generated by Wine3.3, extracted from lookup of #55251 op in the trace.

It's quite simple to use qapitrace to get all needed information. Just go to op#55251, right click and select "lookup", it would start `glretrace` and dump all information about current state, including the shaders, textures etc.

On unrelated to this bug note, this condition looks really fishy to me:
   if (R3.w != -R3.w) break;
Will this condition ever fail?

If variables were two's complement integers, the above line would have been false if `R3.w==0` (or INT_MIN). However with float numbers we can have separate +0.0 and -0.0, just like we have +inf and -inf.

Comment 9 Gert Wollny 2018-03-24 13:00:28 UTC

Actually, 
  if (R3.w != -R3.w) 
will never fail, because R3.w = R0.w = (ps_lc17.y) = 1.0;

The compiler should optimize this away, and in my attempts to create a piglit it always did so far.

Comment 10 i.kalvachev 2018-03-24 23:00:59 UTC

(In reply to Gert Wollny from comment #9)
> Actually, 
>   if (R3.w != -R3.w) 
> will never fail, because R3.w = R0.w = (ps_lc17.y) = 1.0;

You are correct for the "if" with the "break".

However my question still stands for the other "if (R3.w != -R3.w)" right before the "break" one.

Comment 11 Gert Wollny 2018-03-25 05:36:10 UTC

The if statement can be be true because  

   tmp0.w = (R3.w >= 0.0 ? abs(ps_lc18.w) : abs(ps_lc18.y));

with ps_lc18.y = -1 and ps_lc18.w = 0.0, and then 

   R3.w = (tmp0.w);
   R3.w = (R5.w * R3.w);

which means at this point R3.w can be zero.

Comment 12 i.kalvachev 2018-03-25 09:43:22 UTC

(In reply to Gert Wollny from comment #11)
> The if statement can be be true because  
> which means at this point R3.w can be zero.

But that's the point in my initial question.
When R3.w=0.0, will "if(+0.0 != -0.0)" be true of false, because float points do have 2 different zeroes - positive and negative one, and both zero floats do have different binary representation aka +0.0 = 0x00000000; -0.0 = 0x80000000.

My question is more of the lines: are shader float point operations always IEEE complaint, and is it guaranteed that (+0.0 == -0.0) is true for all their implementations.
(For example, SSE float operations sometimes deviate from IEEE for speed reasons.)

Comment 13 Gert Wollny 2018-03-25 14:01:36 UTC

The GLSL spec says that "negative zeros are generated as dictated by IEEE" and that "==" returns the correct result, which I'd assume means (0.0 == -0.0) is true. 

In any case, I don't see how this is relevant for this bug.

Comment 14 almos 2018-03-25 20:25:28 UTC

(In reply to Roland Scheidegger from comment #5)
> (In reply to almos from comment #4)
> > The problem is not loop unrolling. The problem is that userspace code can
> > hang the GPU unrecoverably, and thus bringing down the entire system.
> > 
> > BTW I can confirm this on Pitcairn with radeon drm in linux 4.15.
> 
> There isn't much you can do about shaders not terminating (loop limiting in
> llvmpipe is quite a hack, you could legitimately have a loop which has more
> iterations).
You can detect the stall and kill the process that sent the drawing command.

> 
> But yes, gpu reset actually working reliably would be nice. I haven't really
> seen it succeed lately neither (but it can happen...).
It wouldn't be nice. It's essential.

Comment 15 Timothy Arceri 2018-03-25 23:33:18 UTC

Here is a fix for drivers that use GLSL IR loop unrolling. I'm still looking into NIR unrolling it seems there is a different bug in that pass.

https://patchwork.freedesktop.org/patch/212881/

Comment 16 i.kalvachev 2018-03-26 00:32:40 UTC

(In reply to Timothy Arceri from comment #15)
> Here is a fix for drivers that use GLSL IR loop unrolling. I'm still looking
> into NIR unrolling it seems there is a different bug in that pass.
> 
> https://patchwork.freedesktop.org/patch/212881/

I can confirm that this patch fixes the bug for me.

Thank you for reacting so quickly to it.


Is NIR unrolling bug triggered by the same shader/trace?
If so, I should not close this bug until it is fixed too.

Comment 17 Timothy Arceri 2018-03-26 01:38:49 UTC

We still need to create a piglit test for this but here is the fix for NIR also:

https://patchwork.freedesktop.org/patch/212882/

It seems once the loop in unrolled NIR then optimises this whole shader down to:

	vec4 32 ssa_0 = load_const (0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */)
	intrinsic store_var (ssa_0) (ps_out) (15) /* wrmask=xyzw */
	/* succs: block_0 */
	block block_0:

Comment 18 i.kalvachev 2018-03-27 23:43:16 UTC

Both fixes has been committed to git master, so I can close this bug.

https://cgit.freedesktop.org/mesa/mesa/commit/?id=56b867395dee1a48594b27987d3bf68a4e745dda

https://cgit.freedesktop.org/mesa/mesa/commit/?id=629ee690addad9b3dc8f68cfff5ae09858f31caf

(In reply to Timothy Arceri from comment #17)
> We still need to create a piglit test for this but here is the fix for NIR
> also:
> 

Don't forget to do that.

> It seems once the loop in unrolled NIR then optimises this whole shader down
> to:
> 
> 	vec4 32 ssa_0 = load_const (0x00000000 /* 0.000000 */, 0x00000000 /*
> 0.000000 */, 0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */)
> 	intrinsic store_var (ssa_0) (ps_out) (15) /* wrmask=xyzw */
> 	/* succs: block_0 */
> 	block block_0:

Is that correct? It doesn't look correct.
Have you found more bugs that need fixing?

Comment 19 Timothy Arceri 2018-03-27 23:54:32 UTC

Ok here is a piglit test for the original GLSL IR bug:

https://patchwork.freedesktop.org/patch/213332/

The NIR bug is a little harder to test reliably for since it depends on the order in which optimisations are called so I just going to leave it for now.

Anyway this bug should now be fixed with the following commits:

commit 56b867395dee1a48594b27987d3bf68a4e745dda
Author: Timothy Arceri <tarceri@itsqueeze.com>
Date:   Mon Mar 26 10:31:26 2018 +1100

    glsl: fix infinite loop caused by bug in loop unrolling pass
    
    Just checking for 2 jumps is not enough to be sure we can do a
    complex loop unroll. We need to make sure we also have also found
    2 loop terminators.
    
    Without this we were attempting to unroll a loop where the second
    jump was nested inside multiple ifs which loop analysis is unable
    to detect as a terminator. We ended up splicing out the first
    terminator but failed to actually unroll the loop, this resulted
    in the creation of a possible infinite loop.
    
    Fixes: 646621c66da9 "glsl: make loop unrolling more like the nir unrolling path"
    
    Tested-by: Gert Wollny <gw.fossdev@gmail.com>
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670

commit 629ee690addad9b3dc8f68cfff5ae09858f31caf
Author: Timothy Arceri <tarceri@itsqueeze.com>
Date:   Mon Mar 26 11:41:51 2018 +1100

    nir: fix crash in loop unroll corner case
    
    When an if nesting inside anouther if is optimised away we can
    end up with a loop terminator and following block that looks like
    this:
    
            if ssa_596 {
                    block block_5:
                    /* preds: block_4 */
                    vec1 32 ssa_601 = load_const (0xffffffff /* -nan */)
                    break
                    /* succs: block_8 */
            } else {
                    block block_6:
                    /* preds: block_4 */
                    /* succs: block_7 */
            }
            block block_7:
            /* preds: block_6 */
            vec1 32 ssa_602 = phi block_6: ssa_552
            vec1 32 ssa_603 = phi block_6: ssa_553
            vec1 32 ssa_604 = iadd ssa_551, ssa_66
    
    The problem is the phis. Loop unrolling expects the last block in
    the loop to be empty once we splice the instructions in the last
    block into the continue branch. The problem is we cant move phis
    so here we lower the phis to regs when preparing the loop for
    unrolling. As it could be possible to have multiple additional
    blocks/ifs following the terminator we just convert all phis at
    the top level of the loop body for simplicity.
    
    We also add some comments to loop_prepare_for_unroll() while we
    are here.
    
    Fixes: 51daccb289eb "nir: add a loop unrolling pass"
    
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.