Summary: | [GLSL] "i<5 && i<4" in for loop fails | ||
---|---|---|---|
Product: | Mesa | Reporter: | Gordon Jin <gordon.jin> |
Component: | glsl-compiler | Assignee: | Ian Romanick <idr> |
Status: | VERIFIED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | brianp, eric, tstellar |
Version: | git | ||
Hardware: | All | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
new piglit test case
Extend variable lifetimes to the start of the loop containing a def [PATCH] mesa: Extend register live intervals to the beginning of loops if they are referenced outside of the loop [PATCH 1/2] mesa: Extend register live intervals to the beginning of loops if they are referenced outside of the loop [PATCH 2/2] mesa: Cache registers in current top level loop when searching for live intervals |
Description
Gordon Jin
2011-02-16 19:50:12 UTC
Created attachment 43464 [details]
new piglit test case
I've tried this as a vertex shader and a fragment shader (I'll push the piglit test cases soon). The output of INTEL_DEBUG=vs is very instructive. In line 18 we can see that 'a' is TEMP[1]. However, TEMP[1] is also used as a temporary in the break condition (line 14). I'm not yet sure whether this is register allocation or some other optimization going wrong. # Vertex Program/Shader 3 0: MUL TEMP[0], STATE[0], INPUT[0].xxxx; 1: MAD TEMP[1], STATE[1], INPUT[0].yyyy, TEMP[0]; 2: MAD TEMP[0], STATE[2], INPUT[0].zzzz, TEMP[1]; 3: MAD TEMP[1], STATE[3], INPUT[0].wwww, TEMP[0]; 4: MOV OUTPUT[0], TEMP[1]; 5: MOV TEMP[0].x, CONST[4].xxxx; 6: BGNLOOP; # (end at 21) 7: SLT.C TEMP[1].x, TEMP[0].xxxx, CONST[4].yyyy; 8: IF (NE.xxxx); # (if false, goto 11); 9: SLT TEMP[1].x, TEMP[0].xxxx, CONST[4].zzzz; 10: MOV TEMP[2].x, TEMP[1].xxxx; 11: ELSE; # (goto 13) 12: MOV TEMP[2].x, CONST[4].xxxx; 13: ENDIF; 14: SEQ.C TEMP[1].x, TEMP[2].xxxx, CONST[4].xxxx; 15: IF (NE.xxxx); # (if false, goto 17); 16: BRK (TR.xxxx); # (goto 21); 17: ENDIF; 18: MOV TEMP[1].x, TEMP[0].xxxx; 19: ADD TEMP[2].x, TEMP[0].xxxx, CONST[4].wwww; 20: MOV TEMP[0].x, TEMP[2].xxxx; 21: ENDLOOP; # (goto 6) 22: SEQ.C TEMP[0].x, TEMP[1].xxxx, CONST[5].xxxx; 23: IF (NE.xxxx); # (if false, goto 25); 24: MOV OUTPUT[16], CONST[4].xwxw; 25: ELSE; # (goto 40) 26: SEQ.C TEMP[0].x, TEMP[1].xxxx, CONST[4].xxxx; 27: IF (NE.xxxx); # (if false, goto 29); 28: MOV OUTPUT[16], CONST[4].wxxw; 29: ELSE; # (goto 39) 30: SEQ.C TEMP[0].x, TEMP[1].xxxx, CONST[4].zzzz; 31: IF (NE.xxxx); # (if false, goto 33); 32: MOV OUTPUT[16], CONST[4].xxww; 33: ELSE; # (goto 38) 34: MOV TEMP[0].w, CONST[4].wwww; 35: RCP TEMP[2].x, TEMP[1].xxxx; 36: MOV TEMP[0].xyz, TEMP[2].xxxx; 37: MOV OUTPUT[16], TEMP[0]; 38: ENDIF; 39: ENDIF; 40: ENDIF; 41: END I've narrowed this down to a problem somewhere in _mesa_optimize_program. The problem is in the register allocator in prog_optimize. There is a pretty serious problem with the way it handles loops. The situation has three parts: 1. The variable is defined in a loop. 2. The variable has no other use inside the loop. 3. The definition inside the loop is reachable outside the loop. The problem is that the register allocator only marks the live range for the variable as extending from the definition inside the loop to the use outside the loop. The live interval should extend from the beginning of the loop to the use outside the loop. With the existing infrastructure in prog_optimize, I don't see how to fix this. Created attachment 44141 [details] [review] Extend variable lifetimes to the start of the loop containing a def This patch extends the liveness interval back to the start of the loop containing the definition. This fixes the test case at hand, but may still fail with nested loops where only the innermost loop contains a definition, and that definition is reachable outside all the loops. Moreover, this results in unnecessarily large liveness intervals for values only reachable inside the loop. The proper fix would likely require significant work. This fixes both test cases in swrast and the VS test for 965. Since 965 fragment shaders don't use this register allocator, register_allocate.c must have some similar bug. Created attachment 44708 [details] [review] [PATCH] mesa: Extend register live intervals to the beginning of loops if they are referenced outside of the loop This patch should also work for nested loops and doesn't extend the lifetimes unnecessaryly. It does, however, complicate the code somewhat. (In reply to comment #5) > Created an attachment (id=44141) [details] > Extend variable lifetimes to the start of the loop containing a def > > This patch extends the liveness interval back to the start of the loop > containing the definition. This fixes the test case at hand, but may still > fail with nested loops where only the innermost loop contains a definition, and > that definition is reachable outside all the loops. Moreover, this results in > unnecessarily large liveness intervals for values only reachable inside the > loop. > > The proper fix would likely require significant work. The patch I posted on the mailing list http://lists.freedesktop.org/archives/mesa-dev/2011-March/006227.html is pretty much the same as this one except that it extends the start of the liveness interval back to the start of the outermost loop. Created attachment 44971 [details] [review] [PATCH 1/2] mesa: Extend register live intervals to the beginning of loops if they are referenced outside of the loop I split the patch into two smaller patches to ease reviewing. Created attachment 44972 [details] [review] [PATCH 2/2] mesa: Cache registers in current top level loop when searching for live intervals Note that it's not just "used outside of the loop" that can cause the reg to need to be live across the whole loop. For example: int i = 0; int j; while (true) { i++; if (i < 4) j = 0; else continue; if (i > 10) } gah, hit tab and space while still constructing a case Something shaped like this, but with extra code that might smash j after the j = i assignment. int i = 0; int j; while (true) { if (i == 0) j = 0; i = j + 1; j = i; if (i < 4) continue; if (i > 10) break; } (While our structured control flow often makes our lives easy, for live interval calculations it really doesn't make things any easier -- we should be breaking down to basic blocks and calculating liveness that way. Oh well.) It looks to me like Tom's patch is the only one that covers this completely -- idr's might miss behavior like this in a middle loop and just set the start to the innermost loop. So, while Tom's patch is conservative, it's also Reviewed-by: Eric Anholt <eric@anholt.net> Since there is a release happening soon and this fixes a bug. I'll try to push my patch to master and the 7.10 and 7.9 branches some time tomorrow, unless there are objections. This should be fixed on master by the commit below. This has been cherry picked to 7.10 (a947d9be61dddca8e06d8cd2f066336a2907a0ed) and 7.9 (9e2c9cdf4864eb4f707790573bfa7bcf725f7006). commit 18dcbd358f1d4fd5e4a40fa26c6d3bf99485884e Author: Tom Stellard <tstellar@gmail.com> Date: Sun Mar 27 01:17:43 2011 -0700 prog_optimize: Fix reallocating registers for shaders with loops Registers that are used inside of loops need to be considered live starting with the first instruction of the outermost loop. https://bugs.freedesktop.org/show_bug.cgi?id=34370 NOTE: This is a candidate for the 7.9 and 7.10 branches. Reviewed-by: Eric Anholt <eric@anholt.net> piglit glsl-fs-loop-redundant-condition still fails on my side, with Tom's patch. It was fixed (only on i965) with mesa master branch by below commit: commit 963431829055f63ec94d88c97a5d07d30e49833a Author: Eric Anholt <eric@anholt.net> Date: Sat Apr 2 18:11:32 2011 -1000 i965/fs: Remove broken optimization for live intervals in loops. The theory here was to detect a temporary variable used within a loop, and avoid considering it live across the entire loop. However, it was overeager and failed when the first definition of the variable appeared within the loop but was only conditionally defined. Fixes glsl-fs-loop-redundant-condition. i965 with mesa 7.10 still fails. i915 still fails even with mesa master. It now passes on both i965 and i915, on both mesa master and 7.11 branch. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.