Bug 23743 - For loop from 0 to 0 not optimized out
For loop from 0 to 0 not optimized out
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Mesa core
git
All All
: medium normal
Assigned To: Ian Romanick
:
Depends on:
Blocks: 29044
  Show dependency treegraph
 
Reported: 2009-09-06 08:12 UTC by Hans Nieser
Modified: 2010-09-03 12:02 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
phong vertex shader (581 bytes, text/plain)
2009-09-06 08:12 UTC, Hans Nieser
Details
phong pixel shader (2.12 KB, text/plain)
2009-09-06 08:12 UTC, Hans Nieser
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hans Nieser 2009-09-06 08:12:23 UTC
Created attachment 29268 [details]
phong vertex shader

I have a basic phong GLSL shader (attached) that has a for loop in its pixel shader. It loops until MAX_LIGHTS, which in my case on Intel hardware is #defined to be 0, i.e. it should just ignore the for loop entirely. It is not however and during compilation it notes "Note: 'for (i ... )' body is too large/complex to unroll", and then seems to run the loop body at least once anyway since all my lighting is off (makes sense as I didn't set any parameters for the OpenGL lights), and performance plummets. If I just remove the entire loop (#ifdef 0/#endif) the lighting is fine and FPS is much higher.

Obviously it is simple to just put #if MAX_LIGHTS > 0 around it as a workaround but I guess this might be a problem for other applications so I thought I would report it anyway.

I am using Mesa from git (latest pull may be a few days ago at time of writing) with an Intel® GM45 Express integrated GPU
Comment 1 Hans Nieser 2009-09-06 08:12:48 UTC
Created attachment 29269 [details]
phong pixel shader
Comment 2 Ian Romanick 2010-08-27 10:48:38 UTC
I added the piglit test glsl-fs-loop-zero-iter to reproduce this behavior.  The test produces the correct result, but, as can be seen in the INTEL_DEBUG=wm output below, the loop is not optimized away.

brw_wm_glsl_emit:
pre-fp:
# Fragment Program/Shader 3
  0: MOV OUTPUT[1], CONST[0];
  1: MOV TEMP[0].x, CONST[0].xxxx;
  2: BGNLOOP; # (end at 10)
  3:    SGE.C TEMP[1].x, TEMP[0].xxxx, CONST[0].xxxx;
  4:    IF (NE.xxxx); # (if false, goto 6);
  5:       BRK (TR.xxxx); # (goto 10);
  6:    ENDIF;
  7:    MOV OUTPUT[1], CONST[0].yxzw;
  8:    ADD TEMP[1].x, TEMP[0].xxxx, CONST[0].yyyy;
  9:    MOV TEMP[0].x, TEMP[1].xxxx;
 10: ENDLOOP; # (goto 2)
 11: END

pass_fp:
  0: MOV OUTPUT[1], CONST[0];
  1: MOV TEMP[0].x, CONST[0].xxxx;
  2: BGNLOOP; # (end at 10)
  3: SGE.C TEMP[1].x, TEMP[0].xxxx, CONST[0].xxxx;
  4: IF (NE.xxxx); # (if false, goto 6);
  5: BRK (TR.xxxx); # (goto 10);
  6: ENDIF;
  7: MOV OUTPUT[1], CONST[0].yxzw;
  8: ADD TEMP[1].x, TEMP[0].xxxx, CONST[0].yyyy;
  9: MOV TEMP[0].x, TEMP[1].xxxx;
 10: ENDLOOP; # (goto 2)
 11: FB_WRITE  ???, OUTPUT[1], FILE14[30], OUTPUT[0];

wm-native:
mov(1)          a0<1>UW         0x00e0UW                        { align1 };
mov(8)          g9<1>F          0F                              { align1 };
mov(8)          g10<1>F         1F                              { align1 };
mov(8)          g11<1>F         0F                              { align1 };
mov(8)          g12<1>F         0F                              { align1 };
mov(8)          g13<1>F         0F                              { align1 };
do(8)                                                           { align1 };
cmp.ge(8)       null            g13<8,8,1>F     0F              { align1 };
mov(8)          g14<1>F         0F                              { align1 };
(+f0) mov(8)    g14<1>F         1F                              { align1 };
(+f0) iff(8)    ip              ip              3D              { align1 switch };
break(8)                        ip              9D              { align1 };
endif(8)                        g0<4,4,1>UD     65536D          { align1 switch };
mov(8)          g9<1>F          1F                              { align1 };
mov(8)          g10<1>F         0F                              { align1 };
mov(8)          g11<1>F         0F                              { align1 };
mov(8)          g12<1>F         0F                              { align1 };
add(8)          g14<1>F         g13<8,8,1>F     1F              { align1 };
mov(8)          g13<1>F         g14<8,8,1>F                     { align1 };
while(8)                        ip              65524D          { align1 };
mov(8)          m2<1>F          g9<8,8,1>F                      { align1 };
mov(8)          m3<1>F          g10<8,8,1>F                     { align1 };
mov(8)          m4<1>F          g11<8,8,1>F                     { align1 };
mov(8)          m5<1>F          g12<8,8,1>F                     { align1 };
mov(8)          m1<1>F          g1<8,8,1>F                      { align1 nomask };
send(8) 0       null            g0<8,8,1>UW
                write (0, 12, 4, 0) mlen 6 rlen 0               { align1 EOT };

brw_wm_glsl_emit done:
Comment 3 Ian Romanick 2010-09-03 12:02:10 UTC
commit 7850ce0a9990c7f752e43a1dd88c204a7cf090aa
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Fri Aug 27 11:26:08 2010 -0700

    glsl2: Eliminate zero-iteration loops