Bug 102905

Summary: [R600] Miscompilation of TGSI to VLIW causes artifacts in Gallium Nine with Crysis2 bump mapping
Product: Mesa Reporter: i.kalvachev
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium    
Version: 17.2   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description i.kalvachev 2017-09-20 16:39:10 UTC
In short, the miscompilation happens at "cmp" instruction where the destination register is also one of the source registers.

I suspect that the problem is caused by breaking "cmp" on two operands that are executed in different VLIW, thus the result of the first one changes the outcome of the second. Using a temporal register in "cmp" workarounds the issue.

The bug is NOT related to Shader Backend. Artifacts are still visible with "R600_DEBUG=nosb".

A D3D9 apitrace could be obtained from the ixit Nine ftp.
The bug is also tracked at https://github.com/iXit/Mesa-3D/issues/288 .

My hardware is Radeon HD5670 (Evergreen, Redwood).
Not reproducible on Radeon SI.

Bellow are D3D9 Pixel Shader, TGSI and R600 VLIW assembly:
--------------------------------------------------------------
445252 @1 IDirect3DDevice9::CreatePixelShader(this = 0xf1e0a290, pFunction = "
//
// Generated by Microsoft (R) D3DX9 Shader Compiler
//
// Parameters:
//
//   sampler2D BlendMapSampler;
//   sampler2D BumpMap2Sampler;
//   float4 MatSpecColor;
//   float3 __0BlendFactor__1BlendMaskTiling__2BlendFalloff__3;
//   sampler2D bumpMapSampler;
//   sampler2D normalsSampler2D;
//
//
// Registers:
//
//   Name                                               Reg   Size
//   -------------------------------------------------- ----- ----
//   MatSpecColor                                       c13      1
//   __0BlendFactor__1BlendMaskTiling__2BlendFalloff__3 c20      1
//   bumpMapSampler                                     s1       1
//   normalsSampler2D                                   s2       1
//   BumpMap2Sampler                                    s3       1
//   BlendMapSampler                                    s4       1
//

    ps_3_0
    def c0, 2, -1, 1, 0.5
    def c1, 0.00392156886, 0, 0, 0
    dcl_texcoord_centroid v0.x
    dcl_texcoord1 v1.xy
    dcl_texcoord2_pp v2
    dcl_texcoord3_pp v3.xyz
    dcl_texcoord4 v4.xyz
    dcl_2d s1
    dcl_2d s2
    dcl_2d s3
    dcl_2d s4
    mov r0.x, c1.x
    mul_pp oC1.w, r0.x, c13.w
    texld_pp r0, v4, s4
    mul_pp r0.x, r0.x, v4.z
    pow_sat_pp r1.x, r0.x, c20.z
    texld_pp r0, v1, s1
    mad_pp r0.xy, r0, c0.x, c0.y
    dp2add_sat_pp r0.w, r0, -r0, c0.z
    rsq_pp r0.w, r0.w
    rcp_pp r0.z, r0.w
    texld_pp r2, v1, s3
    mad_pp r1.yz, r2.xxyw, c0.x, c0.y
    dp2add_sat_pp r0.w, r1.yzzw, -r1.yzzw, c0.z
    rsq_pp r0.w, r0.w
    rcp_pp r1.w, r0.w
    lrp_pp r2.xyz, r1.x, r1.yzww, r0
    mul_pp r0.xyz, r2.y, v3
    mad_pp r0.xyz, r2.x, v2, r0
    mov_pp r1.xyz, v2
    mul_pp r2.xyw, r1.zxzy, v3.yzzx
    mad_pp r1.xyz, r1.yzxw, v3.zxyw, -r2.xyww
    mul_pp r1.xyz, r1, v2.w
    mad_pp r0.xyz, r2.z, r1, r0
    mad_pp r0.xyz, r0, c0.w, c0.w
    mad_pp r0.xyz, r0, c0.x, c0.y
    nrm_pp r1.xyz, r0
    max_pp r0.x, r1_abs.x, r1_abs.y
    max_pp r2.x, r1_abs.z, r0.x
    add r0.xy, r1_abs.zyzw, -r2.x
    rcp r0.z, r2.x
    cmp_pp r0.yw, r0.y, r1_abs.xxzz, r1_abs.xyzz
    cmp_pp r0.xy, r0.x, r1_abs, r0.ywzw    //<===== this one
    mul_pp r1.xyz, r1, r0.z
    add r0.z, -r0.y, r0.x
    cmp r0.xy, r0.z, r0, r0.yxzw
    rcp r1.w, r0.x
    mul r0.z, r0.y, r1.w
    mov r0.w, c1.y
    texldl r0, r0.xzww, s2
    mul_pp r0.xyz, r1, r0.w
    mad_pp oC1.xyz, r0, c0.w, c0.w
    mov_pp oC0, v0.x

// approximately 49 instruction slots used (5 texture, 44 arithmetic)
", ppShader = [0xbca9a080]) = D3D_OK
--------------------------------------------------------------

Making the following change in the above code:
---
-    cmp_pp r0.xy, r0.x, r1_abs, r0.ywzw    //<===== this one
+    mov r8, r0
+    cmp_pp r0.xy, r8.x, r1_abs, r0.ywzw
---
workarounds the issue.
Alternative workaround is using different destination register, it just needs more changes in the follow up instructions.

With NINE_DEBUG=ps R600_DEBUG=ps the buggy shader looks like this:
--------------------------------------------------------------
FRAG
PROPERTY FS_COORD_ORIGIN UPPER_LEFT
PROPERTY MUL_ZERO_WINS 1
DCL IN[0], GENERIC[0], PERSPECTIVE, CENTROID
DCL IN[1], GENERIC[1], PERSPECTIVE
DCL IN[2], GENERIC[2], PERSPECTIVE
DCL IN[3], GENERIC[3], PERSPECTIVE
DCL IN[4], GENERIC[4], PERSPECTIVE
DCL OUT[0], COLOR[1]
DCL OUT[1], COLOR
DCL SAMP[1]
DCL SAMP[2]
DCL SAMP[3]
DCL SAMP[4]
DCL CONST[0..20]
DCL TEMP[0..1]
DCL TEMP[2], LOCAL
DCL TEMP[3]
IMM[0] FLT32 {    2.0000,    -1.0000,     1.0000,     0.5000}
IMM[1] FLT32 {    0.0039,     0.0000, 340282346638528859811704183484516925440.0000,     0.0000}
  0: MOV TEMP[0].x, IMM[1].xxxx
  1: MUL OUT[0].w, TEMP[0].xxxx, CONST[13].wwww
  2: TEX TEMP[0], IN[4], SAMP[4], 2D
  3: MUL TEMP[0].x, TEMP[0].xxxx, IN[4].zzzz
  4: POW_SAT TEMP[1].x, |TEMP[0].xxxx|, CONST[20].zzzz
  5: TEX TEMP[0], IN[1], SAMP[1], 2D
  6: MAD TEMP[0].xy, TEMP[0], IMM[0].xxxx, IMM[0].yyyy
  7: DP2 TEMP[2].x, TEMP[0], -TEMP[0]
  8: ADD_SAT TEMP[0].w, IMM[0].zzzz, TEMP[2].xxxx
  9: RSQ TEMP[2], |TEMP[0].wwww|
 10: MIN TEMP[0].w, IMM[1].zzzz, TEMP[2]
 11: RCP TEMP[0].z, TEMP[0].wwww
 12: TEX TEMP[3], IN[1], SAMP[3], 2D
 13: MAD TEMP[1].yz, TEMP[3].xxyw, IMM[0].xxxx, IMM[0].yyyy
 14: DP2 TEMP[2].x, TEMP[1].yzzw, -TEMP[1].yzzw
 15: ADD_SAT TEMP[0].w, IMM[0].zzzz, TEMP[2].xxxx
 16: RSQ TEMP[2], |TEMP[0].wwww|
 17: MIN TEMP[0].w, IMM[1].zzzz, TEMP[2]
 18: RCP TEMP[1].w, TEMP[0].wwww
 19: LRP TEMP[3].xyz, TEMP[1].xxxx, TEMP[1].yzww, TEMP[0]
 20: MUL TEMP[0].xyz, TEMP[3].yyyy, IN[3]
 21: MAD TEMP[0].xyz, TEMP[3].xxxx, IN[2], TEMP[0]
 22: MOV TEMP[1].xyz, IN[2]
 23: MUL TEMP[3].xyw, TEMP[1].zxzy, IN[3].yzzx
 24: MAD TEMP[1].xyz, TEMP[1].yzxw, IN[3].zxyw, -TEMP[3].xyww
 25: MUL TEMP[1].xyz, TEMP[1], IN[2].wwww
 26: MAD TEMP[0].xyz, TEMP[3].zzzz, TEMP[1], TEMP[0]
 27: MAD TEMP[0].xyz, TEMP[0], IMM[0].wwww, IMM[0].wwww
 28: MAD TEMP[0].xyz, TEMP[0], IMM[0].xxxx, IMM[0].yyyy
 29: DP3 TEMP[2].x, TEMP[0], TEMP[0]
 30: RSQ TEMP[2].x, TEMP[2].xxxx
 31: MIN TEMP[2].x, IMM[1].zzzz, TEMP[2].xxxx
 32: MUL TEMP[1].xyz, TEMP[0], TEMP[2].xxxx
 33: MAX TEMP[0].x, |TEMP[1].xxxx|, |TEMP[1].yyyy|
 34: MAX TEMP[3].x, |TEMP[1].zzzz|, TEMP[0].xxxx
 35: ADD TEMP[0].xy, |TEMP[1].zyzw|, -TEMP[3].xxxx
 36: RCP TEMP[0].z, TEMP[3].xxxx
 37: CMP TEMP[0].yw, TEMP[0].yyyy, |TEMP[1].xyzz|, |TEMP[1].xxzz|
 38: CMP TEMP[0].xy, TEMP[0].xxxx, TEMP[0].ywzw, |TEMP[1]|
 39: MUL TEMP[1].xyz, TEMP[1], TEMP[0].zzzz
 40: ADD TEMP[0].z, -TEMP[0].yyyy, TEMP[0].xxxx
 41: CMP TEMP[0].xy, TEMP[0].zzzz, TEMP[0].yxzw, TEMP[0]
 42: RCP TEMP[1].w, TEMP[0].xxxx
 43: MUL TEMP[0].z, TEMP[0].yyyy, TEMP[1].wwww
 44: MOV TEMP[0].w, IMM[1].yyyy
 45: TXL TEMP[0], TEMP[0].xzww, SAMP[2], 2D
 46: MUL TEMP[0].xyz, TEMP[1], TEMP[0].wwww
 47: MAD OUT[0].xyz, TEMP[0], IMM[0].wwww, IMM[0].wwww
 48: MOV OUT[1], IN[0].xxxx
 49: END

===== SHADER #81 ==================================== PS/REDWOOD/EVERGREEN =====
===== 408 dw ===== 18 gprs ===== 0 stack =======================================
0000  4000000b a0a80000 ALU 43 @22 KC0[CB0:0-15]
 0022  00380c00 00146b80     1      x: INTERP_ZW          __.x,  R0.w, Param0.x         VEC_210
 0024  00380800 20146b80            y: INTERP_ZW          __.y,  R0.z, Param0.x         VEC_210
 0026  00380c00 40346b90            z: INTERP_ZW          R1.z,  R0.w, Param0.x         VEC_210
 0028  80380800 60346b90            w: INTERP_ZW          R1.w,  R0.z, Param0.x         VEC_210
 0030  00380c00 00346b10     2      x: INTERP_XY          R1.x,  R0.w, Param0.x         VEC_210
 0032  00380800 20346b10            y: INTERP_XY          R1.y,  R0.z, Param0.x         VEC_210
 0034  00380c00 40146b00            z: INTERP_XY          __.z,  R0.w, Param0.x         VEC_210
 0036  80380800 60146b00            w: INTERP_XY          __.w,  R0.z, Param0.x         VEC_210
 0038  00382400 00146b80     3      x: INTERP_ZW          __.x,  R0.y, Param1.x         VEC_210
 0040  00382000 20146b80            y: INTERP_ZW          __.y,  R0.x, Param1.x         VEC_210
 0042  00382400 40546b90            z: INTERP_ZW          R2.z,  R0.y, Param1.x         VEC_210
 0044  80382000 60546b90            w: INTERP_ZW          R2.w,  R0.x, Param1.x         VEC_210
 0046  00382400 00546b10     4      x: INTERP_XY          R2.x,  R0.y, Param1.x         VEC_210
 0048  00382000 20546b10            y: INTERP_XY          R2.y,  R0.x, Param1.x         VEC_210
 0050  00382400 40146b00            z: INTERP_XY          __.z,  R0.y, Param1.x         VEC_210
 0052  80382000 60146b00            w: INTERP_XY          __.w,  R0.x, Param1.x         VEC_210
 0054  00384400 00146b80     5      x: INTERP_ZW          __.x,  R0.y, Param2.x         VEC_210
 0056  00384000 20146b80            y: INTERP_ZW          __.y,  R0.x, Param2.x         VEC_210
 0058  00384400 40746b90            z: INTERP_ZW          R3.z,  R0.y, Param2.x         VEC_210
 0060  80384000 60746b90            w: INTERP_ZW          R3.w,  R0.x, Param2.x         VEC_210
 0062  00384400 00746b10     6      x: INTERP_XY          R3.x,  R0.y, Param2.x         VEC_210
 0064  00384000 20746b10            y: INTERP_XY          R3.y,  R0.x, Param2.x         VEC_210
 0066  00384400 40146b00            z: INTERP_XY          __.z,  R0.y, Param2.x         VEC_210
 0068  80384000 60146b00            w: INTERP_XY          __.w,  R0.x, Param2.x         VEC_210
 0070  00386400 00146b80     7      x: INTERP_ZW          __.x,  R0.y, Param3.x         VEC_210
 0072  00386000 20146b80            y: INTERP_ZW          __.y,  R0.x, Param3.x         VEC_210
 0074  00386400 40946b90            z: INTERP_ZW          R4.z,  R0.y, Param3.x         VEC_210
 0076  80386000 60946b90            w: INTERP_ZW          R4.w,  R0.x, Param3.x         VEC_210
 0078  00386400 00946b10     8      x: INTERP_XY          R4.x,  R0.y, Param3.x         VEC_210
 0080  00386000 20946b10            y: INTERP_XY          R4.y,  R0.x, Param3.x         VEC_210
 0082  00386400 40146b00            z: INTERP_XY          __.z,  R0.y, Param3.x         VEC_210
 0084  80386000 60146b00            w: INTERP_XY          __.w,  R0.x, Param3.x         VEC_210
 0086  00388400 00146b80     9      x: INTERP_ZW          __.x,  R0.y, Param4.x         VEC_210
 0088  00388000 20146b80            y: INTERP_ZW          __.y,  R0.x, Param4.x         VEC_210
 0090  00388400 40b46b90            z: INTERP_ZW          R5.z,  R0.y, Param4.x         VEC_210
 0092  80388000 60b46b90            w: INTERP_ZW          R5.w,  R0.x, Param4.x         VEC_210
 0094  00388400 00b46b10    10      x: INTERP_XY          R5.x,  R0.y, Param4.x         VEC_210
 0096  00388000 20b46b10            y: INTERP_XY          R5.y,  R0.x, Param4.x         VEC_210
 0098  00388400 40146b00            z: INTERP_XY          __.z,  R0.y, Param4.x         VEC_210
 0100  00388000 60146b00            w: INTERP_XY          __.w,  R0.x, Param4.x         VEC_210
 0102  800000fd 01000c90            t: MOV                R8.x,  [0x3b808081 0.00392157].x
 0104  3b808081 
 0106  8191a0ff 60c00090    11      w: MUL                R6.w,  PS, KC0[13].w
0002  00000036 80400000 TEX 1 @108
 0108  00051410 f00d1008 fc820000 SAMPLE              R8.xyzw, R5.xy__,   RID:20, SID:4 CT:NNNN
0004  40000038 a01c0004 ALU 8 @112 KC0[CB0:16-31]
 0112  8100a008 01000090    12      x: MUL                R8.x,  R8.x, R5.z
 0114  800000fe 01e04191    13      t: LOG_IEEE           R15.x,  |PV.x|
 0116  801fe884 01e00090    14      x: MUL                R15.x,  KC0[4].z, PS
 0118  800000fe 01e04090    15      t: EXP_IEEE           R15.x,  PV.x
 0120  000000ff 81200c90    16      x: MOV_sat            R9.x,  PS
 0122  000000ff a1200c80            y: MOV_sat            __.y,  PS
 0124  000000ff c1200c80            z: MOV_sat            __.z,  PS
 0126  800000ff e1200c80            w: MOV_sat            __.w,  PS
0006  00000040 80400000 TEX 1 @128
 0128  00021110 f00d1008 fc808000 SAMPLE              R8.xyzw, R2.xy__,   RID:17, SID:1 CT:NNNN
0008  00000042 a04c0000 ALU 20 @132
 0132  001fa008 010294f9    17      x: MULADD             R8.x,  R8.x, [0x40000000 2].x, -1.0
 0134  801fa408 210294f9            y: MULADD             R8.y,  R8.y, [0x40000000 2].x, -1.0
 0136  40000000 
 0138  021fc0fe 01405f10    18      x: DOT4               R10.x,  PV.x, -PV.x
 0140  029fc4fe 21405f00            y: DOT4               __.y,  PV.y, -PV.y
 0142  021f00f8 41405f00            z: DOT4               __.z,  0, -0
 0144  821f00f8 61405f00            w: DOT4               __.w,  0, -0
 0146  801fc8f9 e1000010    19      w: ADD_sat            R8.w,  1.0, PV.x
 0148  80000cfe 01e04391    20      t: RECIPSQRT_CLAMPED  R15.x,  |PV.w|
 0150  000000ff 01400c90    21      x: MOV                R10.x,  PS
 0152  000000ff 21400c90            y: MOV                R10.y,  PS
 0154  000000ff 41400c90            z: MOV                R10.z,  PS
 0156  800000ff 61400c90            w: MOV                R10.w,  PS
 0158  819fc0fd 61000210    22      w: MIN                R8.w,  [0x7f7fffff 3.40282e+38].x, PV.w
 0160  7f7fffff 
 0162  80000cfe 01e04310    23      t: RECIP_IEEE         R15.x,  PV.w
 0164  000000ff 01000c80    24      x: MOV                __.x,  PS
 0166  000000ff 21000c80            y: MOV                __.y,  PS
 0168  000000ff 41000c90            z: MOV                R8.z,  PS
 0170  800000ff 61000c80            w: MOV                __.w,  PS
0010  00000056 80400000 TEX 1 @172
 0172  00021310 f00d100b fc818000 SAMPLE              R11.xyzw, R2.xy__,   RID:19, SID:3 CT:NNNN
0012  00000058 a1980000 ALU 103 @176
 0176  001fa00b 212294f9    25      y: MULADD             R9.y,  R11.x, [0x40000000 2].x, -1.0
 0178  801fa40b 412294f9            z: MULADD             R9.z,  R11.y, [0x40000000 2].x, -1.0
 0180  40000000 
 0182  029fc4fe 01405f10    26      x: DOT4               R10.x,  PV.y, -PV.y
 0184  031fc8fe 21405f00            y: DOT4               __.y,  PV.z, -PV.z
 0186  021f00f8 41405f00            z: DOT4               __.z,  0, -0
 0188  821f00f8 61405f00            w: DOT4               __.w,  0, -0
 0190  801fc8f9 e1000010    27      w: ADD_sat            R8.w,  1.0, PV.x
 0192  80000cfe 01e04391    28      t: RECIPSQRT_CLAMPED  R15.x,  |PV.w|
 0194  000000ff 01400c90    29      x: MOV                R10.x,  PS
 0196  000000ff 21400c90            y: MOV                R10.y,  PS
 0198  000000ff 41400c90            z: MOV                R10.z,  PS
 0200  800000ff 61400c90            w: MOV                R10.w,  PS
 0202  819fc0fd 61000210    30      w: MIN                R8.w,  [0x7f7fffff 3.40282e+38].x, PV.w
 0204  7f7fffff 
 0206  80000cfe 01e04310    31      t: RECIP_IEEE         R15.x,  PV.w
 0208  000000ff 01200c80    32      x: MOV                __.x,  PS
 0210  000000ff 21200c80            y: MOV                __.y,  PS
 0212  000000ff 41200c80            z: MOV                __.z,  PS
 0214  800000ff 61200c90            w: MOV                R9.w,  PS
 0216  020120f9 01e00010    33      x: ADD                R15.x,  1.0, -R9.x
 0218  020120f9 21e00010            y: ADD                R15.y,  1.0, -R9.x
 0220  820120f9 41e00010            z: ADD                R15.z,  1.0, -R9.x
 0222  000100fe 01e00090    34      x: MUL                R15.x,  PV.x, R8.x
 0224  008104fe 21e00090            y: MUL                R15.y,  PV.y, R8.y
 0226  810108fe 41e00090            z: MUL                R15.z,  PV.z, R8.z
 0228  00812009 016280fe    35      x: MULADD             R11.x,  R9.x, R9.y, PV.x
 0230  01012009 216284fe            y: MULADD             R11.y,  R9.x, R9.z, PV.y
 0232  81812009 416288fe            z: MULADD             R11.z,  R9.x, R9.w, PV.z
 0234  000084fe 01000090    36      x: MUL                R8.x,  PV.y, R4.x
 0236  008084fe 21000090            y: MUL                R8.y,  PV.y, R4.y
 0238  810084fe 41000090            z: MUL                R8.z,  PV.y, R4.z
 0240  0000600b 010280fe    37      x: MULADD             R8.x,  R11.x, R3.x, PV.x
 0242  0080600b 210284fe            y: MULADD             R8.y,  R11.x, R3.y, PV.y
 0244  8100600b 410288fe            z: MULADD             R8.z,  R11.x, R3.z, PV.z
 0246  00000003 01200c90    38      x: MOV                R9.x,  R3.x
 0248  00000403 21200c90            y: MOV                R9.y,  R3.y
 0250  80000803 41200c90            z: MOV                R9.z,  R3.z
 0252  008088fe 01600090    39      x: MUL                R11.x,  PV.z, R4.y
 0254  010080fe 21600090            y: MUL                R11.y,  PV.x, R4.z
 0256  800084fe 61600090            w: MUL                R11.w,  PV.y, R4.x
 0258  01008409 012290fe    40      x: MULADD             R9.x,  R9.y, R4.z, -PV.x
 0260  00008809 212294fe            y: MULADD             R9.y,  R9.z, R4.x, -PV.y
 0262  80808009 41229cfe            z: MULADD             R9.z,  R9.x, R4.y, -PV.w
 0264  018060fe 01200090    41      x: MUL                R9.x,  PV.x, R3.w
 0266  018064fe 21200090            y: MUL                R9.y,  PV.y, R3.w
 0268  818068fe 41200090            z: MUL                R9.z,  PV.z, R3.w
 0270  001fc80b 01028008    42      x: MULADD             R8.x,  R11.z, PV.x, R8.x
 0272  009fc80b 21028408            y: MULADD             R8.y,  R11.z, PV.y, R8.y
 0274  811fc80b 41028808            z: MULADD             R8.z,  R11.z, PV.z, R8.z
 0276  019f80fe 01028cfc    43      x: MULADD             R8.x,  PV.x, 0.5, 0.5
 0278  019f84fe 21028cfc            y: MULADD             R8.y,  PV.y, 0.5, 0.5
 0280  819f88fe 41028cfc            z: MULADD             R8.z,  PV.z, 0.5, 0.5
 0282  001fa0fe 010294f9    44      x: MULADD             R8.x,  PV.x, [0x40000000 2].x, -1.0
 0284  001fa4fe 210294f9            y: MULADD             R8.y,  PV.y, [0x40000000 2].x, -1.0
 0286  801fa8fe 410294f9            z: MULADD             R8.z,  PV.z, [0x40000000 2].x, -1.0
 0288  40000000 
 0290  001fc0fe 01405f10    45      x: DOT4               R10.x,  PV.x, PV.x
 0292  009fc4fe 21405f00            y: DOT4               __.y,  PV.y, PV.y
 0294  011fc8fe 41405f00            z: DOT4               __.z,  PV.z, PV.z
 0296  801f00f8 61405f00            w: DOT4               __.w,  0, 0
 0298  800000fe 01e04391    46      t: RECIPSQRT_CLAMPED  R15.x,  |PV.x|
 0300  000000ff 01400c90    47      x: MOV                R10.x,  PS
 0302  000000ff 21400c80            y: MOV                __.y,  PS
 0304  000000ff 41400c80            z: MOV                __.z,  PS
 0306  800000ff 61400c80            w: MOV                __.w,  PS
 0308  801fc0fd 01400210    48      x: MIN                R10.x,  [0x7f7fffff 3.40282e+38].x, PV.x
 0310  7f7fffff 
 0312  001fc008 01200090    49      x: MUL                R9.x,  R8.x, PV.x
 0314  001fc408 21200090            y: MUL                R9.y,  R8.y, PV.x
 0316  801fc808 41200090            z: MUL                R9.z,  R8.z, PV.x
 0318  809fc0fe 01000193    50      x: MAX                R8.x,  |PV.x|, |PV.y|
 0320  801fc809 01600191    51      x: MAX                R11.x,  |R9.z|, PV.x
 0322  021fc809 01000011    52      x: ADD                R8.x,  |R9.z|, -PV.x
 0324  021fc409 21000011            y: ADD                R8.y,  |R9.y|, -PV.x
 0326  800000fe 01e04310            t: RECIP_IEEE         R15.x,  PV.x
 0328  000000ff 01000c80    53      x: MOV                __.x,  PS
 0330  000000ff 21000c80            y: MOV                __.y,  PS
 0332  000000ff 41000c90            z: MOV                R8.z,  PS
 0334  000000ff 61000c80            w: MOV                __.w,  PS
 0336  80000009 22200c91            t: MOV                R17.y,  |R9.x|
 0338  80000409 22000c91    54      y: MOV                R16.y,  |R9.y|
 0340  00822408 210364fe    55      y: CNDGE              R8.y,  R8.y, R17.y, PV.y
 0342  00000809 62200c91            w: MOV                R17.w,  |R9.z|
 0344  80000809 62000c91            t: MOV                R16.w,  |R9.z|
 0346  00000009 02000c91    56      x: MOV                R16.x,  |R9.x|
 0348  819fc4fe 610360ff            w: CNDGE              R8.w,  PV.y, PV.w, PS
 0350  001fc008 01036408    57      x: CNDGE              R8.x,  R8.x, PV.x, R8.y
 0352  80000409 22000c91            y: MOV                R16.y,  |R9.y|
 0354  01010009 01200090    58      x: MUL                R9.x,  R9.x, R8.z
 0356  009fc0fe 21036c08            y: CNDGE              R8.y,  PV.x, PV.y, R8.w
 0358  01010809 41200090            z: MUL                R9.z,  R9.z, R8.z
 0360  81010409 21200090            t: MUL                R9.y,  R9.y, R8.z
 0362  800114fe 41000010    59      z: ADD                R8.z,  -PV.y, R8.x
 0364  000108fe 01036408    60      x: CNDGE              R8.x,  PV.z, R8.x, R8.y
 0366  808108fe 21036008            y: CNDGE              R8.y,  PV.z, R8.y, R8.x
 0368  800000fe 01e04310    61      t: RECIP_IEEE         R15.x,  PV.x
 0370  000000ff 01200c80    62      x: MOV                __.x,  PS
 0372  000000ff 21200c80            y: MOV                __.y,  PS
 0374  000000ff 41200c80            z: MOV                __.z,  PS
 0376  800000ff 61200c90            w: MOV                R9.w,  PS
 0378  019fc408 41000090    63      z: MUL                R8.z,  R8.y, PV.w
 0380  800004f8 61000c90            w: MOV                R8.w,  0
0014  000000c0 80400000 TEX 1 @384
 0384  00081211 f00d1008 6d010000 SAMPLE_L            R8.xyzw, R8.xzww,   RID:18, SID:2 CT:NNNN
0016  000000c2 a0240000 ALU 10 @388
 0388  01810009 01000090    64      x: MUL                R8.x,  R9.x, R8.w
 0390  01810409 21000090            y: MUL                R8.y,  R9.y, R8.w
 0392  81810809 41000090            z: MUL                R8.z,  R9.z, R8.w
 0394  019f80fe 00c28cfc    65      x: MULADD             R6.x,  PV.x, 0.5, 0.5
 0396  019f84fe 20c28cfc            y: MULADD             R6.y,  PV.y, 0.5, 0.5
 0398  819f88fe 40c28cfc            z: MULADD             R6.z,  PV.z, 0.5, 0.5
 0400  00000001 00e00c90    66      x: MOV                R7.x,  R1.x
 0402  00000001 20e00c90            y: MOV                R7.y,  R1.x
 0404  00000001 40e00c90            z: MOV                R7.z,  R1.x
 0406  80000001 60e00c90            w: MOV                R7.w,  R1.x
0018  c0030001 94c00688 EXPORT             PIXEL 1     R6.xyzw
0020  c0038000 95200688 EXPORT_DONE        PIXEL 0     R7.xyzw  EOP
===== SHADER_END ===============================================================
Comment 1 i.kalvachev 2018-01-08 01:52:29 UTC
This is how the modified (working) shader looks like when compiled.
---
FRAG
PROPERTY FS_COORD_ORIGIN UPPER_LEFT
PROPERTY MUL_ZERO_WINS 1
DCL IN[0], GENERIC[0], PERSPECTIVE, CENTROID
DCL IN[1], GENERIC[1], PERSPECTIVE
DCL IN[2], GENERIC[2], PERSPECTIVE
DCL IN[3], GENERIC[3], PERSPECTIVE
DCL IN[4], GENERIC[4], PERSPECTIVE
DCL OUT[0], COLOR[1]
DCL OUT[1], COLOR
DCL SAMP[1]
DCL SAMP[2]
DCL SAMP[3]
DCL SAMP[4]
DCL CONST[0..20]
DCL TEMP[0..1]
DCL TEMP[2], LOCAL
DCL TEMP[3..4]
IMM[0] FLT32 {    2.0000,    -1.0000,     1.0000,     0.5000}
IMM[1] FLT32 {    0.0039,     0.0000, 340282346638528859811704183484516925440.0000,     0.0000}
  0: MOV TEMP[0].x, IMM[1].xxxx
  1: MUL OUT[0].w, TEMP[0].xxxx, CONST[13].wwww
  2: TEX TEMP[0], IN[4], SAMP[4], 2D
  3: MUL TEMP[0].x, TEMP[0].xxxx, IN[4].zzzz
  4: POW_SAT TEMP[1].x, |TEMP[0].xxxx|, CONST[20].zzzz
  5: TEX TEMP[0], IN[1], SAMP[1], 2D
  6: MAD TEMP[0].xy, TEMP[0], IMM[0].xxxx, IMM[0].yyyy
  7: DP2 TEMP[2].x, TEMP[0], -TEMP[0]
  8: ADD_SAT TEMP[0].w, IMM[0].zzzz, TEMP[2].xxxx
  9: RSQ TEMP[2], |TEMP[0].wwww|
 10: MIN TEMP[0].w, IMM[1].zzzz, TEMP[2]
 11: RCP TEMP[0].z, TEMP[0].wwww
 12: TEX TEMP[3], IN[1], SAMP[3], 2D
 13: MAD TEMP[1].yz, TEMP[3].xxyw, IMM[0].xxxx, IMM[0].yyyy
 14: DP2 TEMP[2].x, TEMP[1].yzzw, -TEMP[1].yzzw
 15: ADD_SAT TEMP[0].w, IMM[0].zzzz, TEMP[2].xxxx
 16: RSQ TEMP[2], |TEMP[0].wwww|
 17: MIN TEMP[0].w, IMM[1].zzzz, TEMP[2]
 18: RCP TEMP[1].w, TEMP[0].wwww
 19: LRP TEMP[3].xyz, TEMP[1].xxxx, TEMP[1].yzww, TEMP[0]
 20: MUL TEMP[0].xyz, TEMP[3].yyyy, IN[3]
 21: MAD TEMP[0].xyz, TEMP[3].xxxx, IN[2], TEMP[0]
 22: MOV TEMP[1].xyz, IN[2]
 23: MUL TEMP[3].xyw, TEMP[1].zxzy, IN[3].yzzx
 24: MAD TEMP[1].xyz, TEMP[1].yzxw, IN[3].zxyw, -TEMP[3].xyww
 25: MUL TEMP[1].xyz, TEMP[1], IN[2].wwww
 26: MAD TEMP[0].xyz, TEMP[3].zzzz, TEMP[1], TEMP[0]
 27: MAD TEMP[0].xyz, TEMP[0], IMM[0].wwww, IMM[0].wwww
 28: MAD TEMP[0].xyz, TEMP[0], IMM[0].xxxx, IMM[0].yyyy
 29: DP3 TEMP[2].x, TEMP[0], TEMP[0]
 30: RSQ TEMP[2].x, TEMP[2].xxxx
 31: MIN TEMP[2].x, IMM[1].zzzz, TEMP[2].xxxx
 32: MUL TEMP[1].xyz, TEMP[0], TEMP[2].xxxx
 33: MAX TEMP[0].x, |TEMP[1].xxxx|, |TEMP[1].yyyy|
 34: MAX TEMP[3].x, |TEMP[1].zzzz|, TEMP[0].xxxx
 35: ADD TEMP[0].xy, |TEMP[1].zyzw|, -TEMP[3].xxxx
 36: RCP TEMP[0].z, TEMP[3].xxxx
 37: CMP TEMP[0].yw, TEMP[0].yyyy, |TEMP[1].xyzz|, |TEMP[1].xxzz|
 38: MOV TEMP[4], TEMP[0]
 39: CMP TEMP[0].xy, TEMP[4].xxxx, TEMP[0].ywzw, |TEMP[1]|
 40: MUL TEMP[1].xyz, TEMP[1], TEMP[0].zzzz
 41: ADD TEMP[0].z, -TEMP[0].yyyy, TEMP[0].xxxx
 42: CMP TEMP[0].xy, TEMP[0].zzzz, TEMP[0].yxzw, TEMP[0]
 43: RCP TEMP[1].w, TEMP[0].xxxx
 44: MUL TEMP[0].z, TEMP[0].yyyy, TEMP[1].wwww
 45: MOV TEMP[0].w, IMM[1].yyyy
 46: TXL TEMP[0], TEMP[0].xzww, SAMP[2], 2D
 47: MUL TEMP[0].xyz, TEMP[1], TEMP[0].wwww
 48: MAD OUT[0].xyz, TEMP[0], IMM[0].wwww, IMM[0].wwww
 49: MOV OUT[1], IN[0].xxxx
 50: END

===== SHADER #81 ==================================== PS/REDWOOD/EVERGREEN =====
===== 416 dw ===== 19 gprs ===== 0 stack =======================================
0000  4000000b a0a80000 ALU 43 @22 KC0[CB0:0-15]
 0022  00380c00 00146b80     1      x: INTERP_ZW          __.x,  R0.w, Param0.x         VEC_210
 0024  00380800 20146b80            y: INTERP_ZW          __.y,  R0.z, Param0.x         VEC_210
 0026  00380c00 40346b90            z: INTERP_ZW          R1.z,  R0.w, Param0.x         VEC_210
 0028  80380800 60346b90            w: INTERP_ZW          R1.w,  R0.z, Param0.x         VEC_210
 0030  00380c00 00346b10     2      x: INTERP_XY          R1.x,  R0.w, Param0.x         VEC_210
 0032  00380800 20346b10            y: INTERP_XY          R1.y,  R0.z, Param0.x         VEC_210
 0034  00380c00 40146b00            z: INTERP_XY          __.z,  R0.w, Param0.x         VEC_210
 0036  80380800 60146b00            w: INTERP_XY          __.w,  R0.z, Param0.x         VEC_210
 0038  00382400 00146b80     3      x: INTERP_ZW          __.x,  R0.y, Param1.x         VEC_210
 0040  00382000 20146b80            y: INTERP_ZW          __.y,  R0.x, Param1.x         VEC_210
 0042  00382400 40546b90            z: INTERP_ZW          R2.z,  R0.y, Param1.x         VEC_210
 0044  80382000 60546b90            w: INTERP_ZW          R2.w,  R0.x, Param1.x         VEC_210
 0046  00382400 00546b10     4      x: INTERP_XY          R2.x,  R0.y, Param1.x         VEC_210
 0048  00382000 20546b10            y: INTERP_XY          R2.y,  R0.x, Param1.x         VEC_210
 0050  00382400 40146b00            z: INTERP_XY          __.z,  R0.y, Param1.x         VEC_210
 0052  80382000 60146b00            w: INTERP_XY          __.w,  R0.x, Param1.x         VEC_210
 0054  00384400 00146b80     5      x: INTERP_ZW          __.x,  R0.y, Param2.x         VEC_210
 0056  00384000 20146b80            y: INTERP_ZW          __.y,  R0.x, Param2.x         VEC_210
 0058  00384400 40746b90            z: INTERP_ZW          R3.z,  R0.y, Param2.x         VEC_210
 0060  80384000 60746b90            w: INTERP_ZW          R3.w,  R0.x, Param2.x         VEC_210
 0062  00384400 00746b10     6      x: INTERP_XY          R3.x,  R0.y, Param2.x         VEC_210
 0064  00384000 20746b10            y: INTERP_XY          R3.y,  R0.x, Param2.x         VEC_210
 0066  00384400 40146b00            z: INTERP_XY          __.z,  R0.y, Param2.x         VEC_210
 0068  80384000 60146b00            w: INTERP_XY          __.w,  R0.x, Param2.x         VEC_210
 0070  00386400 00146b80     7      x: INTERP_ZW          __.x,  R0.y, Param3.x         VEC_210
 0072  00386000 20146b80            y: INTERP_ZW          __.y,  R0.x, Param3.x         VEC_210
 0074  00386400 40946b90            z: INTERP_ZW          R4.z,  R0.y, Param3.x         VEC_210
 0076  80386000 60946b90            w: INTERP_ZW          R4.w,  R0.x, Param3.x         VEC_210
 0078  00386400 00946b10     8      x: INTERP_XY          R4.x,  R0.y, Param3.x         VEC_210
 0080  00386000 20946b10            y: INTERP_XY          R4.y,  R0.x, Param3.x         VEC_210
 0082  00386400 40146b00            z: INTERP_XY          __.z,  R0.y, Param3.x         VEC_210
 0084  80386000 60146b00            w: INTERP_XY          __.w,  R0.x, Param3.x         VEC_210
 0086  00388400 00146b80     9      x: INTERP_ZW          __.x,  R0.y, Param4.x         VEC_210
 0088  00388000 20146b80            y: INTERP_ZW          __.y,  R0.x, Param4.x         VEC_210
 0090  00388400 40b46b90            z: INTERP_ZW          R5.z,  R0.y, Param4.x         VEC_210
 0092  80388000 60b46b90            w: INTERP_ZW          R5.w,  R0.x, Param4.x         VEC_210
 0094  00388400 00b46b10    10      x: INTERP_XY          R5.x,  R0.y, Param4.x         VEC_210
 0096  00388000 20b46b10            y: INTERP_XY          R5.y,  R0.x, Param4.x         VEC_210
 0098  00388400 40146b00            z: INTERP_XY          __.z,  R0.y, Param4.x         VEC_210
 0100  00388000 60146b00            w: INTERP_XY          __.w,  R0.x, Param4.x         VEC_210
 0102  800000fd 01000c90            t: MOV                R8.x,  [0x3b808081 0.00392157].x
 0104  3b808081 
 0106  8191a0ff 60c00090    11      w: MUL                R6.w,  PS, KC0[13].w
0002  00000036 80400000 TEX 1 @108
 0108  00051410 f00d1008 fc820000 SAMPLE              R8.xyzw, R5.xy__,   RID:20, SID:4 CT:NNNN
0004  40000038 a01c0004 ALU 8 @112 KC0[CB0:16-31]
 0112  8100a008 01000090    12      x: MUL                R8.x,  R8.x, R5.z
 0114  800000fe 02004191    13      t: LOG_IEEE           R16.x,  |PV.x|
 0116  801fe884 02000090    14      x: MUL                R16.x,  KC0[4].z, PS
 0118  800000fe 02004090    15      t: EXP_IEEE           R16.x,  PV.x
 0120  000000ff 81200c90    16      x: MOV_sat            R9.x,  PS
 0122  000000ff a1200c80            y: MOV_sat            __.y,  PS
 0124  000000ff c1200c80            z: MOV_sat            __.z,  PS
 0126  800000ff e1200c80            w: MOV_sat            __.w,  PS
0006  00000040 80400000 TEX 1 @128
 0128  00021110 f00d1008 fc808000 SAMPLE              R8.xyzw, R2.xy__,   RID:17, SID:1 CT:NNNN
0008  00000042 a04c0000 ALU 20 @132
 0132  001fa008 010294f9    17      x: MULADD             R8.x,  R8.x, [0x40000000 2].x, -1.0
 0134  801fa408 210294f9            y: MULADD             R8.y,  R8.y, [0x40000000 2].x, -1.0
 0136  40000000 
 0138  021fc0fe 01405f10    18      x: DOT4               R10.x,  PV.x, -PV.x
 0140  029fc4fe 21405f00            y: DOT4               __.y,  PV.y, -PV.y
 0142  021f00f8 41405f00            z: DOT4               __.z,  0, -0
 0144  821f00f8 61405f00            w: DOT4               __.w,  0, -0
 0146  801fc8f9 e1000010    19      w: ADD_sat            R8.w,  1.0, PV.x
 0148  80000cfe 02004391    20      t: RECIPSQRT_CLAMPED  R16.x,  |PV.w|
 0150  000000ff 01400c90    21      x: MOV                R10.x,  PS
 0152  000000ff 21400c90            y: MOV                R10.y,  PS
 0154  000000ff 41400c90            z: MOV                R10.z,  PS
 0156  800000ff 61400c90            w: MOV                R10.w,  PS
 0158  819fc0fd 61000210    22      w: MIN                R8.w,  [0x7f7fffff 3.40282e+38].x, PV.w
 0160  7f7fffff 
 0162  80000cfe 02004310    23      t: RECIP_IEEE         R16.x,  PV.w
 0164  000000ff 01000c80    24      x: MOV                __.x,  PS
 0166  000000ff 21000c80            y: MOV                __.y,  PS
 0168  000000ff 41000c90            z: MOV                R8.z,  PS
 0170  800000ff 61000c80            w: MOV                __.w,  PS
0010  00000056 80400000 TEX 1 @172
 0172  00021310 f00d100b fc818000 SAMPLE              R11.xyzw, R2.xy__,   RID:19, SID:3 CT:NNNN
0012  00000058 a1a80000 ALU 107 @176
 0176  001fa00b 212294f9    25      y: MULADD             R9.y,  R11.x, [0x40000000 2].x, -1.0
 0178  801fa40b 412294f9            z: MULADD             R9.z,  R11.y, [0x40000000 2].x, -1.0
 0180  40000000 
 0182  029fc4fe 01405f10    26      x: DOT4               R10.x,  PV.y, -PV.y
 0184  031fc8fe 21405f00            y: DOT4               __.y,  PV.z, -PV.z
 0186  021f00f8 41405f00            z: DOT4               __.z,  0, -0
 0188  821f00f8 61405f00            w: DOT4               __.w,  0, -0
 0190  801fc8f9 e1000010    27      w: ADD_sat            R8.w,  1.0, PV.x
 0192  80000cfe 02004391    28      t: RECIPSQRT_CLAMPED  R16.x,  |PV.w|
 0194  000000ff 01400c90    29      x: MOV                R10.x,  PS
 0196  000000ff 21400c90            y: MOV                R10.y,  PS
 0198  000000ff 41400c90            z: MOV                R10.z,  PS
 0200  800000ff 61400c90            w: MOV                R10.w,  PS
 0202  819fc0fd 61000210    30      w: MIN                R8.w,  [0x7f7fffff 3.40282e+38].x, PV.w
 0204  7f7fffff 
 0206  80000cfe 02004310    31      t: RECIP_IEEE         R16.x,  PV.w
 0208  000000ff 01200c80    32      x: MOV                __.x,  PS
 0210  000000ff 21200c80            y: MOV                __.y,  PS
 0212  000000ff 41200c80            z: MOV                __.z,  PS
 0214  800000ff 61200c90            w: MOV                R9.w,  PS
 0216  020120f9 02000010    33      x: ADD                R16.x,  1.0, -R9.x
 0218  020120f9 22000010            y: ADD                R16.y,  1.0, -R9.x
 0220  820120f9 42000010            z: ADD                R16.z,  1.0, -R9.x
 0222  000100fe 02000090    34      x: MUL                R16.x,  PV.x, R8.x
 0224  008104fe 22000090            y: MUL                R16.y,  PV.y, R8.y
 0226  810108fe 42000090            z: MUL                R16.z,  PV.z, R8.z
 0228  00812009 016280fe    35      x: MULADD             R11.x,  R9.x, R9.y, PV.x
 0230  01012009 216284fe            y: MULADD             R11.y,  R9.x, R9.z, PV.y
 0232  81812009 416288fe            z: MULADD             R11.z,  R9.x, R9.w, PV.z
 0234  000084fe 01000090    36      x: MUL                R8.x,  PV.y, R4.x
 0236  008084fe 21000090            y: MUL                R8.y,  PV.y, R4.y
 0238  810084fe 41000090            z: MUL                R8.z,  PV.y, R4.z
 0240  0000600b 010280fe    37      x: MULADD             R8.x,  R11.x, R3.x, PV.x
 0242  0080600b 210284fe            y: MULADD             R8.y,  R11.x, R3.y, PV.y
 0244  8100600b 410288fe            z: MULADD             R8.z,  R11.x, R3.z, PV.z
 0246  00000003 01200c90    38      x: MOV                R9.x,  R3.x
 0248  00000403 21200c90            y: MOV                R9.y,  R3.y
 0250  80000803 41200c90            z: MOV                R9.z,  R3.z
 0252  008088fe 01600090    39      x: MUL                R11.x,  PV.z, R4.y
 0254  010080fe 21600090            y: MUL                R11.y,  PV.x, R4.z
 0256  800084fe 61600090            w: MUL                R11.w,  PV.y, R4.x
 0258  01008409 012290fe    40      x: MULADD             R9.x,  R9.y, R4.z, -PV.x
 0260  00008809 212294fe            y: MULADD             R9.y,  R9.z, R4.x, -PV.y
 0262  80808009 41229cfe            z: MULADD             R9.z,  R9.x, R4.y, -PV.w
 0264  018060fe 01200090    41      x: MUL                R9.x,  PV.x, R3.w
 0266  018064fe 21200090            y: MUL                R9.y,  PV.y, R3.w
 0268  818068fe 41200090            z: MUL                R9.z,  PV.z, R3.w
 0270  001fc80b 01028008    42      x: MULADD             R8.x,  R11.z, PV.x, R8.x
 0272  009fc80b 21028408            y: MULADD             R8.y,  R11.z, PV.y, R8.y
 0274  811fc80b 41028808            z: MULADD             R8.z,  R11.z, PV.z, R8.z
 0276  019f80fe 01028cfc    43      x: MULADD             R8.x,  PV.x, 0.5, 0.5
 0278  019f84fe 21028cfc            y: MULADD             R8.y,  PV.y, 0.5, 0.5
 0280  819f88fe 41028cfc            z: MULADD             R8.z,  PV.z, 0.5, 0.5
 0282  001fa0fe 010294f9    44      x: MULADD             R8.x,  PV.x, [0x40000000 2].x, -1.0
 0284  001fa4fe 210294f9            y: MULADD             R8.y,  PV.y, [0x40000000 2].x, -1.0
 0286  801fa8fe 410294f9            z: MULADD             R8.z,  PV.z, [0x40000000 2].x, -1.0
 0288  40000000 
 0290  001fc0fe 01405f10    45      x: DOT4               R10.x,  PV.x, PV.x
 0292  009fc4fe 21405f00            y: DOT4               __.y,  PV.y, PV.y
 0294  011fc8fe 41405f00            z: DOT4               __.z,  PV.z, PV.z
 0296  801f00f8 61405f00            w: DOT4               __.w,  0, 0
 0298  800000fe 02004391    46      t: RECIPSQRT_CLAMPED  R16.x,  |PV.x|
 0300  000000ff 01400c90    47      x: MOV                R10.x,  PS
 0302  000000ff 21400c80            y: MOV                __.y,  PS
 0304  000000ff 41400c80            z: MOV                __.z,  PS
 0306  800000ff 61400c80            w: MOV                __.w,  PS
 0308  801fc0fd 01400210    48      x: MIN                R10.x,  [0x7f7fffff 3.40282e+38].x, PV.x
 0310  7f7fffff 
 0312  001fc008 01200090    49      x: MUL                R9.x,  R8.x, PV.x
 0314  001fc408 21200090            y: MUL                R9.y,  R8.y, PV.x
 0316  801fc808 41200090            z: MUL                R9.z,  R8.z, PV.x
 0318  809fc0fe 01000193    50      x: MAX                R8.x,  |PV.x|, |PV.y|
 0320  801fc809 01600191    51      x: MAX                R11.x,  |R9.z|, PV.x
 0322  021fc809 01000011    52      x: ADD                R8.x,  |R9.z|, -PV.x
 0324  021fc409 21000011            y: ADD                R8.y,  |R9.y|, -PV.x
 0326  800000fe 02004310            t: RECIP_IEEE         R16.x,  PV.x
 0328  000000ff 01000c80    53      x: MOV                __.x,  PS
 0330  000000ff 21000c80            y: MOV                __.y,  PS
 0332  000000ff 41000c90            z: MOV                R8.z,  PS
 0334  000000ff 61000c80            w: MOV                __.w,  PS
 0336  80000009 22400c91            t: MOV                R18.y,  |R9.x|
 0338  80000409 22200c91    54      y: MOV                R17.y,  |R9.y|
 0340  00824408 210364fe    55      y: CNDGE              R8.y,  R8.y, R18.y, PV.y
 0342  00000809 62400c91            w: MOV                R18.w,  |R9.z|
 0344  80000809 62200c91            t: MOV                R17.w,  |R9.z|
 0346  819fc4fe 610360ff    56      w: CNDGE              R8.w,  PV.y, PV.w, PS
 0348  00000008 01800c90    57      x: MOV                R12.x,  R8.x
 0350  00000408 21800c90            y: MOV                R12.y,  R8.y
 0352  00000808 41800c90            z: MOV                R12.z,  R8.z
 0354  00000cfe 61800c90            w: MOV                R12.w,  PV.w
 0356  80000009 02200c91            t: MOV                R17.x,  |R9.x|
 0358  001fe0fe 01036408    58      x: CNDGE              R8.x,  PV.x, PS, R8.y
 0360  80000409 22200c91            y: MOV                R17.y,  |R9.y|
 0362  01010009 01280090    59      x: MUL                R9.x,  R9.x, R8.z             VEC_120
 0364  009fc00c 21036c08            y: CNDGE              R8.y,  R12.x, PV.y, R8.w
 0366  01010809 41200090            z: MUL                R9.z,  R9.z, R8.z
 0368  81010409 21200090            t: MUL                R9.y,  R9.y, R8.z
 0370  800114fe 41000010    60      z: ADD                R8.z,  -PV.y, R8.x
 0372  000108fe 01036408    61      x: CNDGE              R8.x,  PV.z, R8.x, R8.y
 0374  808108fe 21036008            y: CNDGE              R8.y,  PV.z, R8.y, R8.x
 0376  800000fe 02004310    62      t: RECIP_IEEE         R16.x,  PV.x
 0378  000000ff 01200c80    63      x: MOV                __.x,  PS
 0380  000000ff 21200c80            y: MOV                __.y,  PS
 0382  000000ff 41200c80            z: MOV                __.z,  PS
 0384  800000ff 61200c90            w: MOV                R9.w,  PS
 0386  019fc408 41000090    64      z: MUL                R8.z,  R8.y, PV.w
 0388  800004f8 61000c90            w: MOV                R8.w,  0
0014  000000c4 80400000 TEX 1 @392
 0392  00081211 f00d1008 6d010000 SAMPLE_L            R8.xyzw, R8.xzww,   RID:18, SID:2 CT:NNNN
0016  000000c6 a0240000 ALU 10 @396
 0396  01810009 01000090    65      x: MUL                R8.x,  R9.x, R8.w
 0398  01810409 21000090            y: MUL                R8.y,  R9.y, R8.w
 0400  81810809 41000090            z: MUL                R8.z,  R9.z, R8.w
 0402  019f80fe 00c28cfc    66      x: MULADD             R6.x,  PV.x, 0.5, 0.5
 0404  019f84fe 20c28cfc            y: MULADD             R6.y,  PV.y, 0.5, 0.5
 0406  819f88fe 40c28cfc            z: MULADD             R6.z,  PV.z, 0.5, 0.5
 0408  00000001 00e00c90    67      x: MOV                R7.x,  R1.x
 0410  00000001 20e00c90            y: MOV                R7.y,  R1.x
 0412  00000001 40e00c90            z: MOV                R7.z,  R1.x
 0414  80000001 60e00c90            w: MOV                R7.w,  R1.x
0018  c0030001 94c00688 EXPORT             PIXEL 1     R6.xyzw
0020  c0038000 95200688 EXPORT_DONE        PIXEL 0     R7.xyzw  EOP
===== SHADER_END ===============================================================
Comment 2 i.kalvachev 2018-03-12 16:01:33 UTC
Back when I discovered the bug, glennk suggested that this commit might be involved:
https://cgit.freedesktop.org/mesa/mesa/commit/?id=acef65503e79ce61a16bdba92462f0ed8a7b52c2

"r600g: fix abs() support on ALU 3 source operands instructions
Since alu does not support abs() modifier on source operands, spill
and apply the modifiers to a temp register when needed."

It turns:
  cndge r0.xy, r1, |r2|, r3   

into:

  mov  r12.x, |r2|
  cndge r0.x, r1, r12, r3
  mov  r12.y, |r2|
  cndge r0.y, r1, r12, r3


This breaks the "cndge" into separate VLIW. 

The problem that causes this bug is when the first half of the above code, changes the condition check in the second. Something that will not happen, if "cndge" is executed as a single VLIW.
Aka I have "cndge r0.xy, r0.xx, |r2|, r3" that is turned into:
  mov  r12.x, |r2|
  cndge r0.x, r0.x, r12, r3
  mov  r12.y, |r2|
  cndge r0.y, r0.x, r12, r3

I think that we should copy the whole register first.
The generated code should looks like this:

  mov r12, |r2|
  cndge r0.xy, r1, r12, r3
Comment 3 Roland Scheidegger 2018-03-13 03:31:46 UTC
(In reply to iive from comment #2)
> The problem that causes this bug is when the first half of the above code,
> changes the condition check in the second. Something that will not happen,
> if "cndge" is executed as a single VLIW.
> Aka I have "cndge r0.xy, r0.xx, |r2|, r3" that is turned into:
>   mov  r12.x, |r2|
>   cndge r0.x, r0.x, r12, r3
>   mov  r12.y, |r2|
>   cndge r0.y, r0.x, r12, r3
> 
> I think that we should copy the whole register first.
> The generated code should looks like this:
> 
>   mov r12, |r2|
>   cndge r0.xy, r1, r12, r3

That analysis looks spot on. I've just sent a patch to mesa-dev which should fix this, can you verify it works?
Comment 4 i.kalvachev 2018-03-13 20:32:47 UTC
(In reply to Roland Scheidegger from comment #3)
> I've just sent a patch to mesa-dev which should
> fix this, can you verify it works?

Yes it works nicely.

Can't wait to see the patch in git master. :)
Comment 5 Roland Scheidegger 2018-03-14 03:56:50 UTC
Fixed by 274f8bf05ef526d65f01614313dda65bc7ec7a87.
Thanks for the analysis! (the code wasn't really all that difficult after that...)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.