Bug 103142

Summary: R600g+sb: optimizer apparently stuck in an endless loop
Product: Mesa Reporter: Gert Wollny <gw.fossdev>
Component: Drivers/Gallium/r600Assignee: mesa-dev
Status: RESOLVED FIXED QA Contact: mesa-dev
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: TGSI, byte code and logging output
Shader triggering the endless loop
Patch to fix shader in UE4 (added for completeness)

Description Gert Wollny 2017-10-07 22:37:27 UTC
Created attachment 134738 [details]
TGSI, byte code and logging output

With sb enabled some shaders apparently trigger an endless loops in the sb optimizer. 

The attached TGSI and log of a shader triggering this behaviour comes from the Unreal Editor version 4.17.1,  

The log was created with 

  #define PSC_DEBUG 1 

in src/gallium/drivers/r600/sb_sched.cpp

The problem is probably related to the "FIXME rework this loop" on line 1808 in the same file, and I also have the impression that it has something to do with MULADD when the sources are a mix of CONST and IMM.

I'd try to fix it myself, but right now I don't really have a clue how to approach this bug.

best, 
Gert
Comment 1 Gert Wollny 2017-10-09 07:51:15 UTC
Created attachment 134759 [details]
Shader triggering the endless loop

I think the last log was not correct, i.e. it was not the right shader. This new log shows different error messages. The the endless loop is happening in "post_scheduler". 

I've run the code with R600_DEBUG=nocw,sbdump in addition to the PSC_DUMP. 
I've also tried R600_DEBUG=sbsafemath, but to no avail.

Snip of the log: 

# REGMAP :
    current_AR: R42.x.199||@R1.x
  current_AR is R42.x.199||@R1.x  trying to use R41.x.235||@R0.z
  current_AR is R42.x.199||@R1.x  trying to use R42.x.200@R10.w
  current_AR is R42.x.199||@R1.x  trying to use R44.x.77@R7.z
!!!!!! interf slot: 2  : ADD     t116||@R2.z,    A100.y[R41.x.235||@R0.z]_763F@R8.y, A100.y[R43.x.126@R2.z]_764F@R8.y
					    rels: A100.y[R41.x.235||@R0.z]_763F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
					    rels: A100.y[R43.x.126@R2.z]_764F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
!!!!!! interf slot: 3  : MOV     R43.z.49||@R10.w,    A100.y[R42.x.200@R10.w]_759F@R8.y
					    rels: A100.y[R42.x.200@R10.w]_759F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
!!!!!! interf slot: 4  : MOV     R43.y.48||@R12.z,    A100.x[R44.x.77@R7.z]_755F@R8.x
					    rels: A100.x[R44.x.77@R7.z]_755F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
ci: discarding slots 28
discard_slots : packed_ops : 0
discarding slot 2 : ADD     t116||@R2.z,    A100.y[R41.x.235||@R0.z]_763F@R8.y, A100.y[R43.x.126@R2.z]_764F@R8.y
					    rels: A100.y[R41.x.235||@R0.z]_763F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
					    rels: A100.y[R43.x.126@R2.z]_764F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
discarding slot 3 : MOV     R43.z.49||@R10.w,    A100.y[R42.x.200@R10.w]_759F@R8.y
					    rels: A100.y[R42.x.200@R10.w]_759F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
discarding slot 4 : MOV     R43.y.48||@R12.z,    A100.x[R44.x.77@R7.z]_755F@R8.x
					    rels: A100.x[R44.x.77@R7.z]_755F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
check_interferences: after: 
# REGMAP :
    current_AR: R42.x.199||@R1.x
update_local_interferences : [R26.x.7F R26.y.7F R26.z.7F R27.x.7F R27.y.7F R27.z.7F R28.x.7F R28.y.7F R28.z.7F R100.x.1F R101.x.1F R100.y.1F R101.y.1F R102.x.1F R102.y.1F R103.x.1F R104.x.1F R103.y.1F R104.y.1F R105.x.1F R105.y.1F R106.x.1F R107.x.1F R106.y.1F R107.y.1F R108.x.1F R108.y.1F R109.x.1F R109.y.1F R4.x.410||@R6.w R41.x.194||@R4.y R42.x.184||@R2.y R43.x.112||@R12.y R43.y.43||@R14.w R41.x.202||@R0.w R42.x.188||@R1.z R43.x.114||@R13.w R43.y.44||@R7.w R4.x.423||@R5.w R41.x.213||@R2.w R42.x.195||@R3.y R43.x.119||@R17.w R43.y.48||@R12.z R43.z.48||@R10.w R41.x.221||@R0.y R42.x.199||@R1.x R44.x.78||@R3.x R43.z.49||@R10.w R4.x.436||@R4.w R40.x.206||@R1.y R41.x.231||@R1.w R42.x.206||@R16.z R42.y.71||@R10.z R42.z.71||@R8.w R40.x.214||@R0.x R41.x.235||@R0.z R42.x.208||@R9.z R42.y.72||@R3.z R42.z.72||@R8.w t111||@R8.z t112||@R13.z t113||@R7.z t114||@R3.x t115||@R3.w t116||@R2.z ]
p_a_g: MOV     R42.x.206||@R16.z,    A100.x[R41.x.231||@R1.w]_760F@R8.x
					    rels: A100.x[R41.x.231||@R1.w]_760F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
slot: 2
current group:
slot 2 : MOV     R42.x.206||@R16.z,    A100.x[R41.x.231||@R1.w]_760F@R8.x
					    rels: A100.x[R41.x.231||@R1.w]_760F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
p_a_g: MOV     R43.z.48||@R10.w,    A100.x[R42.x.196@R11.z]_756F@R8.x
					    rels: A100.x[R42.x.196@R11.z]_756F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
slot: 3
current group:
slot 2 : MOV     R42.x.206||@R16.z,    A100.x[R41.x.231||@R1.w]_760F@R8.x
					    rels: A100.x[R41.x.231||@R1.w]_760F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
slot 3 : MOV     R43.z.48||@R10.w,    A100.x[R42.x.196@R11.z]_756F@R8.x
					    rels: A100.x[R42.x.196@R11.z]_756F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
p_a_g: MOV     R42.z.72||@R8.w,    A100.y[R41.x.236@R8.w]_765F@R8.y
					    rels: A100.y[R41.x.236@R8.w]_765F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
slot: 4
current group:
slot 2 : MOV     R42.x.206||@R16.z,    A100.x[R41.x.231||@R1.w]_760F@R8.x
					    rels: A100.x[R41.x.231||@R1.w]_760F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
slot 3 : MOV     R43.z.48||@R10.w,    A100.x[R42.x.196@R11.z]_756F@R8.x
					    rels: A100.x[R42.x.196@R11.z]_756F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
slot 4 : MOV     R42.z.72||@R8.w,    A100.y[R41.x.236@R8.w]_765F@R8.y
					    rels: A100.y[R41.x.236@R8.w]_765F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
p_a_g: ADD     t112||@R13.z,    A100.y[R42.x.188||@R1.z]_749F@R8.y, A100.y[R44.x.73@R2.x]_750F@R8.y
					    rels: A100.y[R42.x.188||@R1.z]_749F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
					    rels: A100.y[R44.x.73@R2.x]_750F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
   no suitable slots
p_a_g: ADD     t115||@R3.w,    A100.x[R41.x.231||@R1.w]_760F@R8.x, A100.x[R43.x.125@R3.w]_761F@R8.x
					    rels: A100.x[R41.x.231||@R1.w]_760F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
					    rels: A100.x[R43.x.125@R3.w]_761F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
   no suitable slots
p_a_g: MOV     R43.x.114||@R13.w,    A100.y[R42.x.188||@R1.z]_749F@R8.y
					    rels: A100.y[R42.x.188||@R1.z]_749F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
   no suitable slots
p_a_g: ADD     t113||@R7.z,    A100.x[R42.x.195||@R3.y]_754F@R8.x, A100.x[R44.x.77@R7.z]_755F@R8.x
					    rels: A100.x[R42.x.195||@R3.y]_754F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
					    rels: A100.x[R44.x.77@R7.z]_755F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
   no suitable slots
p_a_g: MOV     R42.z.71||@R8.w,    A100.x[R41.x.232@R15.w]_762F@R8.x
					    rels: A100.x[R41.x.232@R15.w]_762F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
   no suitable slots
p_a_g: MOV     R42.y.72||@R3.z,    A100.y[R43.x.126@R2.z]_764F@R8.y
					    rels: A100.y[R43.x.126@R2.z]_764F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
   no suitable slots
p_a_g: ADD     t111||@R8.z,    A100.x[R42.x.184||@R2.y]_744F@R8.x, A100.x[R44.x.72@R8.z]_745F@R8.x
					    rels: A100.x[R42.x.184||@R2.y]_744F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
					    rels: A100.x[R44.x.72@R8.z]_745F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
   no suitable slots
p_a_g: MOV     R43.y.43||@R14.w,    A100.x[R44.x.72@R8.z]_745F@R8.x
					    rels: A100.x[R44.x.72@R8.z]_745F@R8.x :  <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F
   no suitable slots
p_a_g: MOV     R42.x.208||@R9.z,    A100.y[R41.x.235||@R0.z]_763F@R8.y
					    rels: A100.y[R41.x.235||@R0.z]_763F@R8.y :  <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F
   no suitable slots


...
Comment 2 Gert Wollny 2017-10-09 19:54:47 UTC
In summary, the optimizer gets stuck in an  infinite loop in schedule_alu,
because prepare_alu_group() does not find a proper scheduling.
Comment 3 Gert Wollny 2017-10-10 10:23:47 UTC
Created attachment 134771 [details] [review]
Patch to fix shader in UE4 (added for completeness)
Comment 4 Gert Wollny 2017-10-10 10:25:34 UTC
The bug can be triggered on BARTS (6850 HD) by using the Unreal Engine 4.17.1 or 4.18.0-pre3 with this patch applied [1] to fix a shader that otherwise would use too many registers.

(Note that in mesa-debug mode UE4Editor will likely crash at one point because of 
#102387 [2]) 

[1] Attached "Patch to fix shader in UE4"
[2] https://bugs.freedesktop.org/show_bug.cgi?id=102387
Comment 5 Emil Velikov 2017-11-09 15:33:40 UTC
Gert, should we close this considering 69eee511c63 ("r600/sb: bail out if prepare_alu_group() doesn't find a proper scheduling") has landed?
Comment 6 Gert Wollny 2017-11-09 17:07:46 UTC
In debug mode an assertion fires as a reminder that this  patch only works around the real, yet to be understood bug. For that reason I think it would be better to keep it open (At least the aforementioned bug #102387 is of a similar nature).
Comment 7 Gert Wollny 2018-02-07 17:28:54 UTC
It seems sb is trying to create an operation that tries to use two distinct relative indices within the same instruction, this is forbidden and the scheduler gets stuck. 

from the post scheduler dump: 

 # 0.y => t175||FP@R0.y
  # 0.z => t194||FP@R0.z
  new current_AR assigned: R13.x.235@R0.w
  current_AR is R13.x.235@R0.w  trying to use R15.x.126@R1.x
  current_AR is R13.x.235@R0.w  trying to use R15.x.126@R1.x
!!!!!! interf slot: 0  : 
  ADD     t80@R1.x,    A26.y[R13.x.235@R0.w]_608F@R5.y,  \
                       A26.y[R15.x.126@R1.x]_609F@R5.y
Comment 8 Gert Wollny 2018-02-14 09:52:51 UTC
Fixed in master as of c36172e387b68aed083bb751d48733919f59bef7

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.