The game is Trine1 Enchanted Edition, running under Wine-3.3. The game works fine with Mesa-17.2.0, but with Mesa-17.3.0 hangs right after the loading screen. The hang is soft, the driver tries to reset itself each 10 seconds and turns off my monitor. I am able to switch to text console (kms one) and kill the Xorg server, then reboot the machine. The kernel driver refuses to accept any new commands. Using software render I was able to capture a small apitrace (90MB compressed) that successfully reproduces the issue. It could be found here: https://drive.google.com/open?id=1RNKExfdBXUCN7SIhcrdiMwsvwIoCVMTg By using the qapitrace lookup, I managed to locate the exact operation that hangs. Not surprising, it is a draw operation: #55251 - works #55252 - hangs I tried to git bisect between 17.2.0 and 17.3.0, but I only managed to narrow it down to few steps: git bisect good 375c4868efa3cf549699557989c8f5c08c0710f0 git bisect bad 09f6bd5ef27c1b16b1468441b070b60c2d57523d The rest of my bisect log is full of skips, because I cannot find a commit that would work at all. All of them fail to run even `glxgears`, some crash, other give asserts in R600 code, in xmlconfig etc... My hardware is AMD Radeon HD5670 Redwood Evergreen (R600 driver). Using "R600_DEBUG=nosb" does not workaround the issue. Also the bug is reproducible on AMD RX480, running latest mesa3d master, llvm-svn 7.0.0svn_r328112 and experimental kernel.
Could be related to loop unrolling (?) While playing back trace on i965 I saw this: --- 8< --- glretrace: ../src/compiler/nir/nir_opt_loop_unroll.c:414: complex_unroll: Assertion `unroll_loc->type == nir_cf_node_block && exec_list_is_empty(&nir_cf_node_as_block(unroll_loc)->instr_list)' failed.
I can confirm this on Radeon 6870 HD. I was able to track the issue to the series beginning with ab23b759f24 glsl: don't drop instructions from unreachable terminators continue branch The first time around I had a few difficulties get the correct patch, it seems to be either 646621c66da9 or 7a7fb90af75. In any case, yes it is related to loop unrolling. The original TGSI goes like BGNLOOP ISGE TEMP[13].x, TEMP[8].xxxx, IMM[7].yyyy UIF TEMP[13].xxxx BRK ENDIF ... ENDLOOP and with the series in question applied it is BGNLOOP ISLT TEMP[13].x, TEMP[8].xxxx, IMM[7].yyyy UIF TEMP[13].xxxx ... ENDI ENDLOOP This UIF should have an else path with the BRK to resemble the original code. (There are more BRK statements in the LOOP but they are the same in both versions. Regarding bisecting this, after a failure one usually must reboot the system, otherwise the graphics card is in a bad state. But given the nature of the bug one should also be able to reproduce the endless loop with LIBGL_ALWAYS_SOFTWARE=1 thereby not clobbering the hardware.
(In reply to Gert Wollny from comment #2) > Regarding bisecting this, after a failure one usually must reboot the > system, otherwise the graphics card is in a bad state. But given the nature > of the bug one should also be able to reproduce the endless loop with > LIBGL_ALWAYS_SOFTWARE=1 thereby not clobbering the hardware. Just to avoid any confusion, llvmpipe won't show an infinite loop, since it uses a loop limiter on all loops (might still take quite some time, though). But of course you can still see the bogus tgsi.
The problem is not loop unrolling. The problem is that userspace code can hang the GPU unrecoverably, and thus bringing down the entire system. BTW I can confirm this on Pitcairn with radeon drm in linux 4.15.
(In reply to almos from comment #4) > The problem is not loop unrolling. The problem is that userspace code can > hang the GPU unrecoverably, and thus bringing down the entire system. > > BTW I can confirm this on Pitcairn with radeon drm in linux 4.15. There isn't much you can do about shaders not terminating (loop limiting in llvmpipe is quite a hack, you could legitimately have a loop which has more iterations). But yes, gpu reset actually working reliably would be nice. I haven't really seen it succeed lately neither (but it can happen...). I disagree that loop unrolling isn't a problem. Clearly there's two problems: - loop unrolling shouldn't turn perfectly fine loops into loops which don't terminate, this is what this bug is about. - gpu reset should work reliably
(In reply to Gert Wollny from comment #2) > I can confirm this on Radeon 6870 HD. > > I was able to track the issue to the series beginning with > > ab23b759f24 glsl: don't drop instructions from unreachable > terminators continue branch > > The first time around I had a few difficulties get the correct patch, it > seems to be either 646621c66da9 or 7a7fb90af75. In any case, yes it is > related to loop unrolling. > > The original TGSI goes like > > BGNLOOP > ISGE TEMP[13].x, TEMP[8].xxxx, IMM[7].yyyy > UIF TEMP[13].xxxx > BRK > ENDIF > ... > ENDLOOP > > and with the series in question applied it is > > BGNLOOP > ISLT TEMP[13].x, TEMP[8].xxxx, IMM[7].yyyy > UIF TEMP[13].xxxx > ... > ENDI > ENDLOOP > > This UIF should have an else path with the BRK to resemble the original > code. > (There are more BRK statements in the LOOP but they are the same in both > versions. > > Regarding bisecting this, after a failure one usually must reboot the > system, otherwise the graphics card is in a bad state. But given the nature > of the bug one should also be able to reproduce the endless loop with > LIBGL_ALWAYS_SOFTWARE=1 thereby not clobbering the hardware. Thanks for tracking down the problem. Do you think you would be able to create a piglit test to reproduce the bug? Or failing that can you copy the full glsl loop an attach it to the bug so that I can attempt to recreate the issue in piglit. Thanks.
I tried to create a piglit, but so far i failed. So here is a dumped down version of the loop contained in the relevant shader: const vec4 ps_lc21 = vec4(5.00000000e-01, -5.00000000e-01, 0.00000000e+00, 0.00000000e+00); const vec4 ps_lc20 = vec4(1.00000000e+00, 3.92156886e-03, 1.53787005e-05, 2.00000003e-01); const vec4 ps_lc19 = vec4(3.30000013e-01, -1.70000002e-01, 4.00000000e+00, 1.00000001e-01); const vec4 ps_lc18 = vec4(2.00000000e+00, -1.00000000e+00, 1.00000005e-03, 0.00000000e+00); const vec4 ps_lc17 = vec4(-5.00000000e-01, 1.00000000e+00, 2.00000000e+00, -9.99999978e-03); R0.w = (ps_lc17.y); for (tmpInt0 = 0; tmpInt0 < 4; tmpInt0++) { R6.xyz = ((R5.xyz * ps_lc20.www) + R6.xyz); R9.xyzw = (textureLod(ps_sampler4, R6.xy, R6.w).xyzw); R3.w = ((R9.w * ps_lc20.y) + R9.z); R5.w = ((R3.w * -ps_c[11].y) + ps_in[7].z); tmp0.w = (R5.w >= 0.0 ? abs(ps_lc18.w) : abs(ps_lc18.y)); R5.w = (tmp0.w); R3.w = ((R3.w * ps_c[11].y) + -R6.z); tmp0.w = (R3.w >= 0.0 ? abs(ps_lc18.w) : abs(ps_lc18.y)); R3.w = (tmp0.w); R3.w = (R5.w * R3.w); if (R3.w != -R3.w) { R8.xy = (R6.xy); R3.w = (R0.w); if (R3.w != -R3.w) break; } } the working TGSI looks like this: IMM[0] FLT32 { 2.0000, -1.0000, 1.0000, 0.0010} IMM[1] FLT32 { 0.3300, -0.1700, -0.0000, -1.0000} IMM[2] FLT32 { -0.5000, 0.0000, 0.1000, -2.0000} IMM[3] FLT32 { 1.0000, 0.0039, 0.0000, -0.0100} IMM[4] FLT32 { -0.5000, 1.0000, -0.0100, 0.0000} IMM[5] FLT32 { -1.0000, 0.0000, 4.0000, 0.5000} IMM[6] FLT32 { 0.5000, -0.5000, 0.2000, 0.0000} IMM[7] INT32 {0, 4, 1, 0} ... 218 MOV TEMP[2].w, IMM[0].zzzz 219 MOV TEMP[8].x, IMM[7].xxxx 220: BGNLOOP 221: ISGE TEMP[13].x, TEMP[8].xxxx, IMM[7].yyyy 222: UIF TEMP[13].xxxx 223: BRK 224: ENDIF 225: MAD TEMP[14].xyz, TEMP[11].xyzz, IMM[6].zzzz, TEMP[14].xyzz 226: MOV TEMP[13].xy, TEMP[14].xyyy 227: MOV TEMP[13].w, TEMP[14].wwww 228: TXL TEMP[13], TEMP[13], SAMP[4], 2D 229: MOV TEMP[15], TEMP[13] 230: MAD TEMP[13].x, TEMP[13].wwww, IMM[3].yyyy, TEMP[13].zzzz 231: MAD TEMP[15].x, TEMP[13].xxxx, -CONST[0][11].yyyy, TEMP[1].zzzz 232: FSGE TEMP[15].x, TEMP[15].xxxx, IMM[2].yyyy 233: UIF TEMP[15].xxxx 234: MOV TEMP[9].x, IMM[2].yyyy 235: ELSE 236: MOV TEMP[9].x, IMM[0].zzzz 237: ENDIF 238: MOV TEMP[11].w, TEMP[9].xxxx 239: MAD TEMP[13].x, TEMP[13].xxxx, CONST[0][11].yyyy, -TEMP[14].zzzz 240: FSGE TEMP[13].x, TEMP[13].xxxx, IMM[2].yyyy 241: UIF TEMP[13].xxxx 242: MOV TEMP[12].x, IMM[2].yyyy 243: ELSE 244: MOV TEMP[12].x, IMM[0].zzzz 245: ENDIF 246: MOV TEMP[6].w, TEMP[12].xxxx 247: MUL TEMP[6].x, TEMP[9].xxxx, TEMP[12].xxxx 248: MOV TEMP[7].w, TEMP[6].xxxx 249: FSNE TEMP[6].x, TEMP[6].xxxx, -TEMP[6].xxxx 250: UIF TEMP[6].xxxx 251: MOV TEMP[4].xy, TEMP[14].xyxx 252: MOV TEMP[7].w, TEMP[2].wwww 253: FSNE TEMP[6].x, TEMP[2].wwww, -TEMP[2].wwww 254: UIF TEMP[6].xxxx 255: BRK 256: ENDIF 257: ENDIF 258: UADD TEMP[8].x, TEMP[8].xxxx, IMM[7].zzzz 259: ENDLOOP
Created attachment 138335 [details] Fragment/Pixel Shader generated by Wine3.3, extracted from lookup of #55251 op in the trace. It's quite simple to use qapitrace to get all needed information. Just go to op#55251, right click and select "lookup", it would start `glretrace` and dump all information about current state, including the shaders, textures etc. On unrelated to this bug note, this condition looks really fishy to me: if (R3.w != -R3.w) break; Will this condition ever fail? If variables were two's complement integers, the above line would have been false if `R3.w==0` (or INT_MIN). However with float numbers we can have separate +0.0 and -0.0, just like we have +inf and -inf.
Actually, if (R3.w != -R3.w) will never fail, because R3.w = R0.w = (ps_lc17.y) = 1.0; The compiler should optimize this away, and in my attempts to create a piglit it always did so far.
(In reply to Gert Wollny from comment #9) > Actually, > if (R3.w != -R3.w) > will never fail, because R3.w = R0.w = (ps_lc17.y) = 1.0; You are correct for the "if" with the "break". However my question still stands for the other "if (R3.w != -R3.w)" right before the "break" one.
The if statement can be be true because tmp0.w = (R3.w >= 0.0 ? abs(ps_lc18.w) : abs(ps_lc18.y)); with ps_lc18.y = -1 and ps_lc18.w = 0.0, and then R3.w = (tmp0.w); R3.w = (R5.w * R3.w); which means at this point R3.w can be zero.
(In reply to Gert Wollny from comment #11) > The if statement can be be true because > which means at this point R3.w can be zero. But that's the point in my initial question. When R3.w=0.0, will "if(+0.0 != -0.0)" be true of false, because float points do have 2 different zeroes - positive and negative one, and both zero floats do have different binary representation aka +0.0 = 0x00000000; -0.0 = 0x80000000. My question is more of the lines: are shader float point operations always IEEE complaint, and is it guaranteed that (+0.0 == -0.0) is true for all their implementations. (For example, SSE float operations sometimes deviate from IEEE for speed reasons.)
The GLSL spec says that "negative zeros are generated as dictated by IEEE" and that "==" returns the correct result, which I'd assume means (0.0 == -0.0) is true. In any case, I don't see how this is relevant for this bug.
(In reply to Roland Scheidegger from comment #5) > (In reply to almos from comment #4) > > The problem is not loop unrolling. The problem is that userspace code can > > hang the GPU unrecoverably, and thus bringing down the entire system. > > > > BTW I can confirm this on Pitcairn with radeon drm in linux 4.15. > > There isn't much you can do about shaders not terminating (loop limiting in > llvmpipe is quite a hack, you could legitimately have a loop which has more > iterations). You can detect the stall and kill the process that sent the drawing command. > > But yes, gpu reset actually working reliably would be nice. I haven't really > seen it succeed lately neither (but it can happen...). It wouldn't be nice. It's essential.
Here is a fix for drivers that use GLSL IR loop unrolling. I'm still looking into NIR unrolling it seems there is a different bug in that pass. https://patchwork.freedesktop.org/patch/212881/
(In reply to Timothy Arceri from comment #15) > Here is a fix for drivers that use GLSL IR loop unrolling. I'm still looking > into NIR unrolling it seems there is a different bug in that pass. > > https://patchwork.freedesktop.org/patch/212881/ I can confirm that this patch fixes the bug for me. Thank you for reacting so quickly to it. Is NIR unrolling bug triggered by the same shader/trace? If so, I should not close this bug until it is fixed too.
We still need to create a piglit test for this but here is the fix for NIR also: https://patchwork.freedesktop.org/patch/212882/ It seems once the loop in unrolled NIR then optimises this whole shader down to: vec4 32 ssa_0 = load_const (0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */) intrinsic store_var (ssa_0) (ps_out) (15) /* wrmask=xyzw */ /* succs: block_0 */ block block_0:
Both fixes has been committed to git master, so I can close this bug. https://cgit.freedesktop.org/mesa/mesa/commit/?id=56b867395dee1a48594b27987d3bf68a4e745dda https://cgit.freedesktop.org/mesa/mesa/commit/?id=629ee690addad9b3dc8f68cfff5ae09858f31caf (In reply to Timothy Arceri from comment #17) > We still need to create a piglit test for this but here is the fix for NIR > also: > Don't forget to do that. > It seems once the loop in unrolled NIR then optimises this whole shader down > to: > > vec4 32 ssa_0 = load_const (0x00000000 /* 0.000000 */, 0x00000000 /* > 0.000000 */, 0x00000000 /* 0.000000 */, 0x00000000 /* 0.000000 */) > intrinsic store_var (ssa_0) (ps_out) (15) /* wrmask=xyzw */ > /* succs: block_0 */ > block block_0: Is that correct? It doesn't look correct. Have you found more bugs that need fixing?
Ok here is a piglit test for the original GLSL IR bug: https://patchwork.freedesktop.org/patch/213332/ The NIR bug is a little harder to test reliably for since it depends on the order in which optimisations are called so I just going to leave it for now. Anyway this bug should now be fixed with the following commits: commit 56b867395dee1a48594b27987d3bf68a4e745dda Author: Timothy Arceri <tarceri@itsqueeze.com> Date: Mon Mar 26 10:31:26 2018 +1100 glsl: fix infinite loop caused by bug in loop unrolling pass Just checking for 2 jumps is not enough to be sure we can do a complex loop unroll. We need to make sure we also have also found 2 loop terminators. Without this we were attempting to unroll a loop where the second jump was nested inside multiple ifs which loop analysis is unable to detect as a terminator. We ended up splicing out the first terminator but failed to actually unroll the loop, this resulted in the creation of a possible infinite loop. Fixes: 646621c66da9 "glsl: make loop unrolling more like the nir unrolling path" Tested-by: Gert Wollny <gw.fossdev@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670 commit 629ee690addad9b3dc8f68cfff5ae09858f31caf Author: Timothy Arceri <tarceri@itsqueeze.com> Date: Mon Mar 26 11:41:51 2018 +1100 nir: fix crash in loop unroll corner case When an if nesting inside anouther if is optimised away we can end up with a loop terminator and following block that looks like this: if ssa_596 { block block_5: /* preds: block_4 */ vec1 32 ssa_601 = load_const (0xffffffff /* -nan */) break /* succs: block_8 */ } else { block block_6: /* preds: block_4 */ /* succs: block_7 */ } block block_7: /* preds: block_6 */ vec1 32 ssa_602 = phi block_6: ssa_552 vec1 32 ssa_603 = phi block_6: ssa_553 vec1 32 ssa_604 = iadd ssa_551, ssa_66 The problem is the phis. Loop unrolling expects the last block in the loop to be empty once we splice the instructions in the last block into the continue branch. The problem is we cant move phis so here we lower the phis to regs when preparing the loop for unrolling. As it could be possible to have multiple additional blocks/ifs following the terminator we just convert all phis at the top level of the loop body for simplicity. We also add some comments to loop_prepare_for_unroll() while we are here. Fixes: 51daccb289eb "nir: add a loop unrolling pass" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.