Bug 34370 - [GLSL] "i<5 && i<4" in for loop fails
[GLSL] "i<5 && i<4" in for loop fails
Status: VERIFIED FIXED
Product: Mesa
Classification: Unclassified
Component: glsl-compiler
git
All Linux (All)
: medium normal
Assigned To: Ian Romanick
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-02-16 19:50 UTC by Gordon Jin
Modified: 2011-10-24 20:30 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
new piglit test case (448 bytes, text/plain)
2011-02-16 19:51 UTC, Gordon Jin
Details
Extend variable lifetimes to the start of the loop containing a def (1.96 KB, patch)
2011-03-04 14:38 UTC, Ian Romanick
Details | Splinter Review
[PATCH] mesa: Extend register live intervals to the beginning of loops if they are referenced outside of the loop (4.26 KB, patch)
2011-03-22 04:44 UTC, Fabian Bieler
Details | Splinter Review
[PATCH 1/2] mesa: Extend register live intervals to the beginning of loops if they are referenced outside of the loop (3.79 KB, patch)
2011-03-28 13:10 UTC, Fabian Bieler
Details | Splinter Review
[PATCH 2/2] mesa: Cache registers in current top level loop when searching for live intervals (3.48 KB, patch)
2011-03-28 13:10 UTC, Fabian Bieler
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Gordon Jin 2011-02-16 19:50:12 UTC
The attached test case fails on mesa master, with "a" assigned as 0;

Changing "i<5 && i<4" to "i<4" or "i<4 && i<5" makes it pass.
Comment 1 Gordon Jin 2011-02-16 19:51:37 UTC
Created attachment 43464 [details]
new piglit test case
Comment 2 Ian Romanick 2011-03-02 16:29:14 UTC
I've tried this as a vertex shader and a fragment shader (I'll push the piglit test cases soon).  The output of INTEL_DEBUG=vs is very instructive.  In line 18 we can see that 'a' is TEMP[1].  However, TEMP[1] is also used as a temporary in the break condition (line 14).

I'm not yet sure whether this is register allocation or some other optimization going wrong.

# Vertex Program/Shader 3
  0: MUL TEMP[0], STATE[0], INPUT[0].xxxx;
  1: MAD TEMP[1], STATE[1], INPUT[0].yyyy, TEMP[0];
  2: MAD TEMP[0], STATE[2], INPUT[0].zzzz, TEMP[1];
  3: MAD TEMP[1], STATE[3], INPUT[0].wwww, TEMP[0];
  4: MOV OUTPUT[0], TEMP[1];
  5: MOV TEMP[0].x, CONST[4].xxxx;
  6: BGNLOOP; # (end at 21)
  7:    SLT.C TEMP[1].x, TEMP[0].xxxx, CONST[4].yyyy;
  8:    IF (NE.xxxx); # (if false, goto 11);
  9:       SLT TEMP[1].x, TEMP[0].xxxx, CONST[4].zzzz;
 10:       MOV TEMP[2].x, TEMP[1].xxxx;
 11:    ELSE; # (goto 13)
 12:       MOV TEMP[2].x, CONST[4].xxxx;
 13:    ENDIF;
 14:    SEQ.C TEMP[1].x, TEMP[2].xxxx, CONST[4].xxxx;
 15:    IF (NE.xxxx); # (if false, goto 17);
 16:       BRK (TR.xxxx); # (goto 21);
 17:    ENDIF;
 18:    MOV TEMP[1].x, TEMP[0].xxxx;
 19:    ADD TEMP[2].x, TEMP[0].xxxx, CONST[4].wwww;
 20:    MOV TEMP[0].x, TEMP[2].xxxx;
 21: ENDLOOP; # (goto 6)
 22: SEQ.C TEMP[0].x, TEMP[1].xxxx, CONST[5].xxxx;
 23: IF (NE.xxxx); # (if false, goto 25);
 24:    MOV OUTPUT[16], CONST[4].xwxw;
 25: ELSE; # (goto 40)
 26:    SEQ.C TEMP[0].x, TEMP[1].xxxx, CONST[4].xxxx;
 27:    IF (NE.xxxx); # (if false, goto 29);
 28:       MOV OUTPUT[16], CONST[4].wxxw;
 29:    ELSE; # (goto 39)
 30:       SEQ.C TEMP[0].x, TEMP[1].xxxx, CONST[4].zzzz;
 31:       IF (NE.xxxx); # (if false, goto 33);
 32:          MOV OUTPUT[16], CONST[4].xxww;
 33:       ELSE; # (goto 38)
 34:          MOV TEMP[0].w, CONST[4].wwww;
 35:          RCP TEMP[2].x, TEMP[1].xxxx;
 36:          MOV TEMP[0].xyz, TEMP[2].xxxx;
 37:          MOV OUTPUT[16], TEMP[0];
 38:       ENDIF;
 39:    ENDIF;
 40: ENDIF;
 41: END
Comment 3 Ian Romanick 2011-03-03 13:50:15 UTC
I've narrowed this down to a problem somewhere in _mesa_optimize_program.
Comment 4 Ian Romanick 2011-03-03 14:29:09 UTC
The problem is in the register allocator in prog_optimize.  There is a pretty serious problem with the way it handles loops.  The situation has three parts:

1. The variable is defined in a loop.
2. The variable has no other use inside the loop.
3. The definition inside the loop is reachable outside the loop.

The problem is that the register allocator only marks the live range for the variable as extending from the definition inside the loop to the use outside the loop.  The live interval should extend from the beginning of the loop to the use outside the loop.

With the existing infrastructure in prog_optimize, I don't see how to fix this.
Comment 5 Ian Romanick 2011-03-04 14:38:14 UTC
Created attachment 44141 [details] [review]
Extend variable lifetimes to the start of the loop containing a def

This patch extends the liveness interval back to the start of the loop containing the definition.  This fixes the test case at hand, but may still fail with nested loops where only the innermost loop contains a definition, and that definition is reachable outside all the loops.  Moreover, this results in unnecessarily large liveness intervals for values only reachable inside the loop.
    
The proper fix would likely require significant work.

This fixes both test cases in swrast and the VS test for 965.  Since 965 fragment shaders don't use this register allocator, register_allocate.c must have some similar bug.
Comment 6 Fabian Bieler 2011-03-22 04:44:12 UTC
Created attachment 44708 [details] [review]
[PATCH] mesa: Extend register live intervals to the beginning of loops if they are referenced outside of the loop

This patch should also work for nested loops and doesn't extend the lifetimes unnecessaryly. It does, however, complicate the code somewhat.
Comment 7 Tom Stellard 2011-03-27 22:14:51 UTC
(In reply to comment #5)
> Created an attachment (id=44141) [details]
> Extend variable lifetimes to the start of the loop containing a def
> 
> This patch extends the liveness interval back to the start of the loop
> containing the definition.  This fixes the test case at hand, but may still
> fail with nested loops where only the innermost loop contains a definition, and
> that definition is reachable outside all the loops.  Moreover, this results in
> unnecessarily large liveness intervals for values only reachable inside the
> loop.
> 
> The proper fix would likely require significant work.

The patch I posted on the mailing list http://lists.freedesktop.org/archives/mesa-dev/2011-March/006227.html is pretty much the same as this one except that it extends the start of the liveness interval back to the start of the outermost loop.
Comment 8 Fabian Bieler 2011-03-28 13:10:04 UTC
Created attachment 44971 [details] [review]
[PATCH 1/2] mesa: Extend register live intervals to the beginning of loops if they are referenced outside of the loop

I split the patch into two smaller patches to ease reviewing.
Comment 9 Fabian Bieler 2011-03-28 13:10:49 UTC
Created attachment 44972 [details] [review]
[PATCH 2/2] mesa: Cache registers in current top level loop when searching for live intervals
Comment 10 Eric Anholt 2011-03-29 09:06:23 UTC
Note that it's not just "used outside of the loop" that can cause the reg to need to be live across the whole loop.  For example:

int i = 0;
int j;
while (true) {
  i++;
  if (i < 4)
    j = 0;
  else
    continue;
  if (i > 10)

}
Comment 11 Eric Anholt 2011-03-29 09:06:50 UTC
gah, hit tab and space while still constructing a case
Comment 12 Eric Anholt 2011-03-29 09:23:42 UTC
Something shaped like this, but with extra code that might smash j after the j = i assignment.

int i = 0;
int j;

while (true) {
  if (i == 0)
    j = 0;

  i = j + 1;
  j = i;

  if (i < 4)
    continue;

  if (i > 10)
    break;
}

(While our structured control flow often makes our lives easy, for live interval calculations it really doesn't make things any easier -- we should be breaking down to basic blocks and calculating liveness that way. Oh well.)

It looks to me like Tom's patch is the only one that covers this completely -- idr's might miss behavior like this in a middle loop and just set the start to the innermost loop.  So, while Tom's patch is conservative, it's also Reviewed-by: Eric Anholt <eric@anholt.net>
Comment 13 Tom Stellard 2011-03-30 21:49:30 UTC
Since there is a release happening soon and this fixes a bug.  I'll try to push my patch to master and the 7.10 and 7.9 branches some time tomorrow, unless there are objections.
Comment 14 Ian Romanick 2011-04-04 09:59:56 UTC
This should be fixed on master by the commit below.  This has been cherry picked to 7.10 (a947d9be61dddca8e06d8cd2f066336a2907a0ed) and 7.9 (9e2c9cdf4864eb4f707790573bfa7bcf725f7006).

commit 18dcbd358f1d4fd5e4a40fa26c6d3bf99485884e
Author: Tom Stellard <tstellar@gmail.com>
Date:   Sun Mar 27 01:17:43 2011 -0700

    prog_optimize: Fix reallocating registers for shaders with loops
    
    Registers that are used inside of loops need to be considered live
    starting with the first instruction of the outermost loop.
    
    https://bugs.freedesktop.org/show_bug.cgi?id=34370
    
    NOTE: This is a candidate for the 7.9 and 7.10 branches.
    
    Reviewed-by: Eric Anholt <eric@anholt.net>
Comment 15 Gordon Jin 2011-06-01 20:39:59 UTC
piglit glsl-fs-loop-redundant-condition still fails on my side, with Tom's patch. 

It was fixed (only on i965) with mesa master branch by below commit:
commit 963431829055f63ec94d88c97a5d07d30e49833a
Author: Eric Anholt <eric@anholt.net>
Date:   Sat Apr 2 18:11:32 2011 -1000

    i965/fs: Remove broken optimization for live intervals in loops.

    The theory here was to detect a temporary variable used within a loop,
    and avoid considering it live across the entire loop.  However, it was
    overeager and failed when the first definition of the variable
    appeared within the loop but was only conditionally defined.

    Fixes glsl-fs-loop-redundant-condition.


i965 with mesa 7.10 still fails.
i915 still fails even with mesa master.
Comment 16 Gordon Jin 2011-10-24 19:14:16 UTC
It now passes on both i965 and i915, on both mesa master and 7.11 branch.