Bug 99349

Summary: Failed to build shader (translation from TGSI)
Product: Mesa Reporter: Enver Balalic <balalic.enver>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: balalic.enver, elia.argentieri, gw.fossdev, mirh, xavier.giannakopoulos
Version: 13.0   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: glxinfo
TGSI for failing shader
Patch proposed on mesa-dev to work around too many temporaries

Description Enver Balalic 2017-01-10 17:24:46 UTC
Created attachment 128864 [details]
glxinfo

I'm using a Radeon HD6950, it fails to build a shader when playing "War Thunder", renders the login screen fine, starts to build shaders and then fails and spams the following in the console:

EE r600_state_common.c:799 r600_shader_select - Failed to build shader variant (type=1) -1
EE r600_shader.c:183 r600_pipe_shader_create - translation from TGSI failed !

same thing happens a lof of shaders from shadertoy.com.

OS: OpenSUSE Tumbleweed, glxinfo attached
Comment 1 Enver Balalic 2017-01-10 17:44:10 UTC
With the R600_DEBUG=sbsafemath flag the game starts, it still spams the console with the error. The skybox is not being rendered and an box of pink flickers on the screen
Comment 2 Bronson 2017-05-17 06:45:25 UTC
yes ive got this issue also on the new godot engine (3.0 git)

More info here:
https://github.com/godotengine/godot/issues/8774
Comment 3 Gert Wollny 2017-05-24 13:10:43 UTC
I've had the same error output when I was playing with the Unreal Editor with an HD6850  (BARTS) using the latest mesa-git.  

In order to track it down I added some debugging to mesa and could get the  following trace that leads to the error:

r600/r600_asm.c:615 
  check_and_set_bank_swizzle - Couldn't find a working swizzle

drivers/r600/r600_shader.c:3975 
  tgsi_op2_s - Error in tgsi_op2_s, i = 3, r600_bytecode_add_alu returned  -1

r600/r600_shader.c:3332 
r600_shader_from_tgsi - Failed to build shader at ctx.inst_info->process, 
   chip class: 6, opcode: 7 result: -1

Since I added output for every "return -1" in r600/r600_asm.c I also get a lots of other messages, but I guess that these are normal. 

Debugging the code I also found  that check_and_set_bank_swizzle exiting with "Couldn't find a working swizzle" is not necessarily an error, which means that I'm now a bit lost, because I'm not sure about how continue debugging this problem.
Comment 4 Gert Wollny 2017-05-24 14:56:21 UTC
The same problem also seems to be discussed in these mails 

http://mesa-dev.freedesktop.narkive.com/cHAXj1eT/bug-50338-radeon-tgsi-takes-more-than-two-cfiles-from-r600-shader

It is very likely that this is actually a duplicated of #50338.
Comment 5 Gert Wollny 2017-05-25 08:10:39 UTC
Adding yet more debugging output, and so far it seems that there is only one operation failing: a multiplication of two operands with a write mask of 0xF (see log below). 

I also tested gzdoom like the poster I referenced in my last comment, but on my machine it works fine. 

r600_asm.c:1297 r600_bytecode_add_alu_type - check_and_set_bank_swizzle returned -1
r600_asm.c:1303 r600_bytecode_add_alu_type -   slot[0]: op: 2, bank_swizzle:0 bank_swizzle_force: 0
r600_asm.c:1303 r600_bytecode_add_alu_type -   slot[1]: op: 2, bank_swizzle:0 bank_swizzle_force: 0
r600_asm.c:1303 r600_bytecode_add_alu_type -   slot[2]: op: 2, bank_swizzle:0 bank_swizzle_force: 0
r600_asm.c:1303 r600_bytecode_add_alu_type -   slot[3]: op: 2, bank_swizzle:0 bank_swizzle_force: 0
r600_asm.c:1305 r600_bytecode_add_alu_type -   slot[4] = 0: 0
r600_shader.c:3979 tgsi_op2_s - Error in tgsi_op2_s, i = 3, lasti = 3, r600_bytecode_add_alu returned  -1
r600_shader.c:3981 tgsi_op2_s -   op=2, num src registers: 2, write_mask=15
r600_shader.c:3983 tgsi_op2_s -   alu.dst: {sel: 15, chan: 3, clamp: 0, write: 1, rel: 0
r600_shader.c:3986 tgsi_op2_s -   alu.src0: {sel: 160, chan: 3, kc_bank:0
r600_shader.c:3988 tgsi_op2_s -   alu.src1: {sel: 535, chan: 3, kc_bank:0
r600_shader.c:3332 r600_shader_from_tgsi - Failed to build shader at ctx.inst_info->process, chip class: 6, opcode: 7 result: -1
r600_shader.c:183 r600_pipe_shader_create - translation from TGSI failed !
r600_state_common.c:787 r600_shader_select - Failed to build shader variant (type=1) -1
Comment 6 Gert Wollny 2017-05-25 09:32:55 UTC
The actual instruction failing is 

   MUL TEMP[11], CONST[26], CONST[23]

i.e. the multiplication of two constants.
Comment 7 Gert Wollny 2017-05-26 09:01:01 UTC
Now, just  multiplying two constants/uniforms not necessarily trigger the bug. With a simple shader program like 

uniform vec4 base_color;
uniform vec4 test;
uniform vec4 test2;
uniform vec4 test3;
	
void main()
{
 vec4 h1 = base_color * test;
 vec4 h2 = test2 * test3;
 gl_FragColor = h1 * h2;
}

for both const-const multiplications one constant is always addressed via a GPR, i.e. I get 

  1: MUL TEMP[0], CONST[0], CONST[1]
r600_shader.c:3986 tgsi_op2_s - About to multiply two constants
r600_shader.c:4000 tgsi_op2_s -  ctx->src[0]: 
                  sel:7   // this is a GPR address 
              swizzle:0 1 2 3
                  neg:0
                  abs:0
                  rel:0
              kc_bank:0
               kc_rel:0
                value:0 0 0 0

r600_shader.c:4000 tgsi_op2_s -  ctx->src[1]: 
                  sel:513  // this is a cfile address 
              swizzle:0 1 2 3
                  neg:0
                  abs:0
                  rel:0
              kc_bank:0
               kc_rel:0
                value:0 0 0 0

and then check_vector/reserve_cfile can successfully assign the read ports via cfile because only 4 values need to be read. 


However, for a more complicated shader I get the following:  

250: MUL TEMP[11], CONST[26], CONST[23]
r600_shader.c:3986 tgsi_op2_s - About to multiply two constants
r600_shader.c:4000 tgsi_op2_s -  ctx->src[0]: 
                  sel:160  // cfile kcache after  translation 
              swizzle:0 1 2 3
                  neg:0
                  abs:0
                  rel:0
              kc_bank:0
               kc_rel:0
                value:0 0 0 0

r600_shader.c:4000 tgsi_op2_s -  ctx->src[1]: 
                  sel:535 // cfile kcache before  translation 
              swizzle:0 1 2 3
                  neg:0
                  abs:0
                  rel:0
              kc_bank:0
               kc_rel:0
                value:0 0 0 0

r600_asm.c:472 check_vector -  bs->hw_cfile_addr:[-1 -1]  bs->hw_cfile_elem: [-1 -1] bank_swizzle:0  num_src:2
r600_asm.c:494 check_vector -  src 0: sel:160 elem:0
r600_asm.c:423 reserve_cfile -   res=0: bs->hw_cfile_addr:-1 bs->hw_cfile_elem:-1 sel:160 chan:0
r600_asm.c:494 check_vector -  src 1: sel:535 elem:0
r600_asm.c:423 reserve_cfile -   res=0: bs->hw_cfile_addr:160 bs->hw_cfile_elem:0 sel:535 chan:0
r600_asm.c:423 reserve_cfile -   res=1: bs->hw_cfile_addr:-1 bs->hw_cfile_elem:-1 sel:535 chan:0
r600_asm.c:472 check_vector -  bs->hw_cfile_addr:[160 535]  bs->hw_cfile_elem: [0 0] bank_swizzle:0  num_src:2
r600_asm.c:494 check_vector -  src 0: sel:160 elem:1
r600_asm.c:423 reserve_cfile -   res=0: bs->hw_cfile_addr:160 bs->hw_cfile_elem:0 sel:160 chan:0
r600_asm.c:494 check_vector -  src 1: sel:535 elem:1
r600_asm.c:423 reserve_cfile -   res=0: bs->hw_cfile_addr:160 bs->hw_cfile_elem:0 sel:535 chan:0
r600_asm.c:423 reserve_cfile -   res=1: bs->hw_cfile_addr:535 bs->hw_cfile_elem:0 sel:535 chan:0
r600_asm.c:472 check_vector -  bs->hw_cfile_addr:[160 535]  bs->hw_cfile_elem: [0 0] bank_swizzle:0  num_src:2
r600_asm.c:494 check_vector -  src 0: sel:160 elem:2
r600_asm.c:423 reserve_cfile -   res=0: bs->hw_cfile_addr:160 bs->hw_cfile_elem:0 sel:160 chan:1
r600_asm.c:423 reserve_cfile -   res=1: bs->hw_cfile_addr:535 bs->hw_cfile_elem:0 sel:160 chan:1
r600_asm.c:436 reserve_cfile - All cfile read ports are used, cannot reference vector element.

In summary allocating a read port for elem >= 2 fails, because it would mean reading more than four values in one instruction group, and this is ot possible according to the AMD Evergreen-Family instruction set manual 4.7.5.
Comment 8 Gert Wollny 2017-05-26 09:03:39 UTC
The mesa-code with the added debugging output can be found at: 

https://github.com/gerddie/mesa
Comment 9 Gert Wollny 2017-05-26 13:31:12 UTC
It turns out that in r600_shader.c:tgsi_split_constant the constants should be moved to the GPR range, but for large shaders this is not sufficient, since the temporary registers used there may be beyond 127 which is the limit for GPRs. 

tgsi_split_constant doesn't move all constants and if an operator uses the same constant as source more than once, then one of the instances of the constants is moved to a new address, and this may even be counter productive. 

Now to fix this bug, a partial workaround is in the repo I've given above, the patch changes the register handling to reserve a few registers in the low range and moves constants there if necessary. Note, however, that it also contains additional debugging output. 

However, for the instruction 

 LRP TEMP[0].xyz, CONST[31].wwww, CONST[31].xyzz, TEMP[0].xyzz

check_and_set_bank_swizzle still fails. (This is, by the way, one such case where 
tgsi_split_constant moves one of the instances of CONST[31] to another place.) 

I will try to correct  tgsi_split_constant to not move the values around if they are originally from the same source and see whether this fixes the problem.
Comment 10 Gert Wollny 2017-05-26 20:24:10 UTC
Well, it turns out that the shader simply uses too many registers, and since this is only tested at the end, at one point the indices of the temporaries used to store constants are beyond the GPR range, which makes translation from TGSI fail because it tries to do use two or more cfile addresses in one instruction, and this is not allowed.
Comment 11 Gert Wollny 2017-05-29 13:28:29 UTC
Created attachment 131567 [details]
TGSI for failing shader

This is the failing shader. For some reasons 151 GPRs are allocate as TEMP but only 40 actually appear as source for an operation, the remaining ones are all only targets and discarded.
Comment 12 Gert Wollny 2017-06-02 17:55:30 UTC
Created attachment 131683 [details] [review]
Patch proposed on mesa-dev to work around too many temporaries
Comment 13 Elia Argentieri 2017-06-25 22:29:04 UTC
I tried your patch, but unfortunately it didn't solve my problem with godot engine... I don't get any error about GPR limits, just this:

EE r600_shader.c:190 r600_pipe_shader_create - translation from TGSI failed !
EE r600_state_common.c:816 r600_shader_select - Failed to build shader variant (type=1) -1

There must be another bug. Thank you for your effort.
Comment 14 Gert Wollny 2017-06-26 06:03:16 UTC
Actually that patch was more of a bad hack. Try this new patch set that goes to the source of the problem: 

https://patchwork.freedesktop.org/series/26330/

for me it solved this "translation from TGSI" problem with large shaders with the GpuTest 0.7.0 piano and voloplosion benchmarks. 

Please note that after applying these patches you will have to set the environment variable MESA_GLSL_TO_TGSI_NEW_MERGE to activate it.
Comment 15 Elia Argentieri 2017-06-26 13:41:30 UTC
Now it works! I also had to set MESA_GLSL_CACHE_DISABLE to make it work, maybe it was picking the old shader. Thank you very much.
Comment 16 higuita 2017-07-11 23:07:56 UTC
So if the patch works, will it be merged? or is already merged, and if yes, in what version?
Comment 17 Gert Wollny 2017-07-16 21:17:51 UTC
An updated version of the patch set is currently under review: 

https://patchwork.freedesktop.org/series/25594/
Comment 18 Gert Wollny 2017-09-08 05:56:05 UTC
Fix applied and consolidated with 
c4741bbb6fb98f78551f9e42ae570dcc924e0031
Comment 19 Emil Velikov 2017-09-08 11:05:07 UTC
Note that the series adds an extra optimisation pass. As such it's not suitable for stable. You'll have to use mesa from git until 17.3 is out.
Comment 20 mirh 2018-03-26 13:13:53 UTC
http://www.graphicsfuzz.com/benchmark/android-v1.html
I'm still having the same problem with mesa-git and firefox nightly (HD 6310)

r600_shader_from_tgsi - GPR limit exceeded - shader requires 130 registers
EE r600_shader.c:183 r600_pipe_shader_create - translation from TGSI failed !
EE r600_state_common.c:872 r600_shader_select - Failed to build shader variant (type=1) -12
Comment 21 mirh 2018-03-26 18:26:40 UTC
Specific issue was reported in bug 105371. Follows there. 
Sorry for the bother.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.