Created attachment 134738 [details] TGSI, byte code and logging output With sb enabled some shaders apparently trigger an endless loops in the sb optimizer. The attached TGSI and log of a shader triggering this behaviour comes from the Unreal Editor version 4.17.1, The log was created with #define PSC_DEBUG 1 in src/gallium/drivers/r600/sb_sched.cpp The problem is probably related to the "FIXME rework this loop" on line 1808 in the same file, and I also have the impression that it has something to do with MULADD when the sources are a mix of CONST and IMM. I'd try to fix it myself, but right now I don't really have a clue how to approach this bug. best, Gert
Created attachment 134759 [details] Shader triggering the endless loop I think the last log was not correct, i.e. it was not the right shader. This new log shows different error messages. The the endless loop is happening in "post_scheduler". I've run the code with R600_DEBUG=nocw,sbdump in addition to the PSC_DUMP. I've also tried R600_DEBUG=sbsafemath, but to no avail. Snip of the log: # REGMAP : current_AR: R42.x.199||@R1.x current_AR is R42.x.199||@R1.x trying to use R41.x.235||@R0.z current_AR is R42.x.199||@R1.x trying to use R42.x.200@R10.w current_AR is R42.x.199||@R1.x trying to use R44.x.77@R7.z !!!!!! interf slot: 2 : ADD t116||@R2.z, A100.y[R41.x.235||@R0.z]_763F@R8.y, A100.y[R43.x.126@R2.z]_764F@R8.y rels: A100.y[R41.x.235||@R0.z]_763F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F rels: A100.y[R43.x.126@R2.z]_764F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F !!!!!! interf slot: 3 : MOV R43.z.49||@R10.w, A100.y[R42.x.200@R10.w]_759F@R8.y rels: A100.y[R42.x.200@R10.w]_759F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F !!!!!! interf slot: 4 : MOV R43.y.48||@R12.z, A100.x[R44.x.77@R7.z]_755F@R8.x rels: A100.x[R44.x.77@R7.z]_755F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F ci: discarding slots 28 discard_slots : packed_ops : 0 discarding slot 2 : ADD t116||@R2.z, A100.y[R41.x.235||@R0.z]_763F@R8.y, A100.y[R43.x.126@R2.z]_764F@R8.y rels: A100.y[R41.x.235||@R0.z]_763F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F rels: A100.y[R43.x.126@R2.z]_764F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F discarding slot 3 : MOV R43.z.49||@R10.w, A100.y[R42.x.200@R10.w]_759F@R8.y rels: A100.y[R42.x.200@R10.w]_759F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F discarding slot 4 : MOV R43.y.48||@R12.z, A100.x[R44.x.77@R7.z]_755F@R8.x rels: A100.x[R44.x.77@R7.z]_755F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F check_interferences: after: # REGMAP : current_AR: R42.x.199||@R1.x update_local_interferences : [R26.x.7F R26.y.7F R26.z.7F R27.x.7F R27.y.7F R27.z.7F R28.x.7F R28.y.7F R28.z.7F R100.x.1F R101.x.1F R100.y.1F R101.y.1F R102.x.1F R102.y.1F R103.x.1F R104.x.1F R103.y.1F R104.y.1F R105.x.1F R105.y.1F R106.x.1F R107.x.1F R106.y.1F R107.y.1F R108.x.1F R108.y.1F R109.x.1F R109.y.1F R4.x.410||@R6.w R41.x.194||@R4.y R42.x.184||@R2.y R43.x.112||@R12.y R43.y.43||@R14.w R41.x.202||@R0.w R42.x.188||@R1.z R43.x.114||@R13.w R43.y.44||@R7.w R4.x.423||@R5.w R41.x.213||@R2.w R42.x.195||@R3.y R43.x.119||@R17.w R43.y.48||@R12.z R43.z.48||@R10.w R41.x.221||@R0.y R42.x.199||@R1.x R44.x.78||@R3.x R43.z.49||@R10.w R4.x.436||@R4.w R40.x.206||@R1.y R41.x.231||@R1.w R42.x.206||@R16.z R42.y.71||@R10.z R42.z.71||@R8.w R40.x.214||@R0.x R41.x.235||@R0.z R42.x.208||@R9.z R42.y.72||@R3.z R42.z.72||@R8.w t111||@R8.z t112||@R13.z t113||@R7.z t114||@R3.x t115||@R3.w t116||@R2.z ] p_a_g: MOV R42.x.206||@R16.z, A100.x[R41.x.231||@R1.w]_760F@R8.x rels: A100.x[R41.x.231||@R1.w]_760F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F slot: 2 current group: slot 2 : MOV R42.x.206||@R16.z, A100.x[R41.x.231||@R1.w]_760F@R8.x rels: A100.x[R41.x.231||@R1.w]_760F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F p_a_g: MOV R43.z.48||@R10.w, A100.x[R42.x.196@R11.z]_756F@R8.x rels: A100.x[R42.x.196@R11.z]_756F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F slot: 3 current group: slot 2 : MOV R42.x.206||@R16.z, A100.x[R41.x.231||@R1.w]_760F@R8.x rels: A100.x[R41.x.231||@R1.w]_760F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F slot 3 : MOV R43.z.48||@R10.w, A100.x[R42.x.196@R11.z]_756F@R8.x rels: A100.x[R42.x.196@R11.z]_756F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F p_a_g: MOV R42.z.72||@R8.w, A100.y[R41.x.236@R8.w]_765F@R8.y rels: A100.y[R41.x.236@R8.w]_765F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F slot: 4 current group: slot 2 : MOV R42.x.206||@R16.z, A100.x[R41.x.231||@R1.w]_760F@R8.x rels: A100.x[R41.x.231||@R1.w]_760F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F slot 3 : MOV R43.z.48||@R10.w, A100.x[R42.x.196@R11.z]_756F@R8.x rels: A100.x[R42.x.196@R11.z]_756F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F slot 4 : MOV R42.z.72||@R8.w, A100.y[R41.x.236@R8.w]_765F@R8.y rels: A100.y[R41.x.236@R8.w]_765F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F p_a_g: ADD t112||@R13.z, A100.y[R42.x.188||@R1.z]_749F@R8.y, A100.y[R44.x.73@R2.x]_750F@R8.y rels: A100.y[R42.x.188||@R1.z]_749F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F rels: A100.y[R44.x.73@R2.x]_750F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F no suitable slots p_a_g: ADD t115||@R3.w, A100.x[R41.x.231||@R1.w]_760F@R8.x, A100.x[R43.x.125@R3.w]_761F@R8.x rels: A100.x[R41.x.231||@R1.w]_760F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F rels: A100.x[R43.x.125@R3.w]_761F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F no suitable slots p_a_g: MOV R43.x.114||@R13.w, A100.y[R42.x.188||@R1.z]_749F@R8.y rels: A100.y[R42.x.188||@R1.z]_749F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F no suitable slots p_a_g: ADD t113||@R7.z, A100.x[R42.x.195||@R3.y]_754F@R8.x, A100.x[R44.x.77@R7.z]_755F@R8.x rels: A100.x[R42.x.195||@R3.y]_754F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F rels: A100.x[R44.x.77@R7.z]_755F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F no suitable slots p_a_g: MOV R42.z.71||@R8.w, A100.x[R41.x.232@R15.w]_762F@R8.x rels: A100.x[R41.x.232@R15.w]_762F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F no suitable slots p_a_g: MOV R42.y.72||@R3.z, A100.y[R43.x.126@R2.z]_764F@R8.y rels: A100.y[R43.x.126@R2.z]_764F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F no suitable slots p_a_g: ADD t111||@R8.z, A100.x[R42.x.184||@R2.y]_744F@R8.x, A100.x[R44.x.72@R8.z]_745F@R8.x rels: A100.x[R42.x.184||@R2.y]_744F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F rels: A100.x[R44.x.72@R8.z]_745F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F no suitable slots p_a_g: MOV R43.y.43||@R14.w, A100.x[R44.x.72@R8.z]_745F@R8.x rels: A100.x[R44.x.72@R8.z]_745F@R8.x : <= R100.x.1F, R101.x.1F, R102.x.1F, R103.x.1F, R104.x.1F, R105.x.1F, R106.x.1F, R107.x.1F, R108.x.1F, R109.x.1F no suitable slots p_a_g: MOV R42.x.208||@R9.z, A100.y[R41.x.235||@R0.z]_763F@R8.y rels: A100.y[R41.x.235||@R0.z]_763F@R8.y : <= R100.y.1F, R101.y.1F, R102.y.1F, R103.y.1F, R104.y.1F, R105.y.1F, R106.y.1F, R107.y.1F, R108.y.1F, R109.y.1F no suitable slots ...
In summary, the optimizer gets stuck in an infinite loop in schedule_alu, because prepare_alu_group() does not find a proper scheduling.
Created attachment 134771 [details] [review] Patch to fix shader in UE4 (added for completeness)
The bug can be triggered on BARTS (6850 HD) by using the Unreal Engine 4.17.1 or 4.18.0-pre3 with this patch applied [1] to fix a shader that otherwise would use too many registers. (Note that in mesa-debug mode UE4Editor will likely crash at one point because of #102387 [2]) [1] Attached "Patch to fix shader in UE4" [2] https://bugs.freedesktop.org/show_bug.cgi?id=102387
Gert, should we close this considering 69eee511c63 ("r600/sb: bail out if prepare_alu_group() doesn't find a proper scheduling") has landed?
In debug mode an assertion fires as a reminder that this patch only works around the real, yet to be understood bug. For that reason I think it would be better to keep it open (At least the aforementioned bug #102387 is of a similar nature).
It seems sb is trying to create an operation that tries to use two distinct relative indices within the same instruction, this is forbidden and the scheduler gets stuck. from the post scheduler dump: # 0.y => t175||FP@R0.y # 0.z => t194||FP@R0.z new current_AR assigned: R13.x.235@R0.w current_AR is R13.x.235@R0.w trying to use R15.x.126@R1.x current_AR is R13.x.235@R0.w trying to use R15.x.126@R1.x !!!!!! interf slot: 0 : ADD t80@R1.x, A26.y[R13.x.235@R0.w]_608F@R5.y, \ A26.y[R15.x.126@R1.x]_609F@R5.y
Fixed in master as of c36172e387b68aed083bb751d48733919f59bef7
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.