Created attachment 89688 [details]
Shader dump when test fails
I did a little investigation on why the piglit test: vs-temp-array-mat3-index-col-rd fails with SB.
If I disable SB with R600_DEBUG=nosb the test passes.
I did a bisect and found this commit:
If I revert that commit the test passes with SB, but I'm pretty sure that commit only exposed a bug that weren't hit before.
I looked at the shader dumps and the only thing I found that looked a little strange was this.
z: SETE T0.z, R[2+AR].y, KC0.y
w: SETE T1.w, R[2+AR].z, KC0.z
x: SETE T1.x, R[2+AR].x, KC0.x
w: SETE_DX10 T0.w, R[2+AR].z, KC0.z
z: SETE_DX10 T0.z, R[2+AR].y, KC0.y
x: SETE_DX10 T1.x, R[0+AR].x, KC0.x
Is R[0+AR].x correct in the last line or should it be R[2+AR].x?
Tested with mesa master: 751e8697f2099a0ceaba09e0d8775e4636f209b1
Created attachment 89689 [details]
Shader dump when test passes
(In reply to comment #0)
> If I revert that commit the test passes with SB, but I'm pretty sure that
> commit only exposed a bug that weren't hit before.
Yes, I think you are right.
> I looked at the shader dumps and the only thing I found that looked a little
> strange was this.
> before: 7ae9cc71f097af5ae1f83f77f75de2198849faca
> z: SETE T0.z, R[2+AR].y, KC0.y
> w: SETE T1.w, R[2+AR].z, KC0.z
> x: SETE T1.x, R[2+AR].x, KC0.x
> after: 7ae9cc71f097af5ae1f83f77f75de2198849faca
> w: SETE_DX10 T0.w, R[2+AR].z, KC0.z
> z: SETE_DX10 T0.z, R[2+AR].y, KC0.y
> x: SETE_DX10 T1.x, R[0+AR].x, KC0.x
> Is R[0+AR].x correct in the last line or should it be R[2+AR].x?
It's not necessarily incorrect, IIRC components of the indirectly addressed array are considered as separate arrays by sb, they are allocated independently and may have different base GPR indices, e.g. yz components of the original array may be stored in R2.yz-R10.yz (if R0.yz and/or R1.yz are used for something else) and x component in R0.x-R8.x.
AFAICS the problem is here:
0106 12 x: MOV R1.x, [0x40800000 4].x
0108 y: MOV R2.y, [0x40000000 2].y
0110 z: MUL T0.z, KC0.w, R1.x
0112 w: MUL T1.w, KC0.z, R1.x
0116 13 x: MOV R0.x, 1.0
0118 y: MUL T0.y, KC0.x, R1.x
Instruction 0106 writes 4 to R1.x, but instruction 0118 in the next group expects old value in R1.x (the value that was in R1.x before 0106). So it looks like scheduler failed to handle that data dependency correctly for some reason, instruction 0118 should be scheduled in the same group as 0106 or earlier.
Please attach the output with R600_DEBUG=vs,sbdump
Created attachment 89694 [details]
Output with R600_DEBUG=vs,sbdump
Created attachment 89964 [details]
Output from PSC_DEBUG
I tried to look a little closer at the group of instructions where you thought the problem was. But I can't really wrap my head around how the scheduling works.
When is the data dependency checking done? Is it when one instruction is added to a group or after the whole group is done or even later?
(In reply to comment #5)
> I tried to look a little closer at the group of instructions where you
> thought the problem was. But I can't really wrap my head around how the
> scheduling works.
Yes, sometimes it looks unnecessarily complicated to me, rewriting some parts of the scheduler and adding new features on top of existing code made it a bit messy. I planned to rewrite it completely to make it more clean and simple, but I suspect it won't happen in foreseeable future.
> When is the data dependency checking done? Is it when one instruction is
> added to a group or after the whole group is done or even later?
I'm not sure about exact terminology and classification of dependencies, I said it's a data dependency but I guess problematic dependency in this case is actually called anti-dependency (write-after-read), and it's checked for the entire group in the post_scheduler::check_interferences ("interference" here means the case when two different values are allocated to the same register at some point - we should prevent it). The check is not performed per instruction because we want to take into account parallel execution and allow reading old value from GPR and writing new value to that GPR in the same group. This allows to process the groups like this:
x: MOV R0.x, R0.y
y: MOV R0.y, R0.x
Per-instruction check would disallow any of these instructions that comes first because it has dependency with the other one that is not processed yet.
The check uses register map (post_scheduler::regmap) where the mapping of GPR index/component to value is maintained. If at some point value V is stored in R0.x, then the map contains entry (R0.x => V) and we know that we can't put new value into R0.x until the entry for previous value in the map is removed. When we process first use of some value (scheduling is performed in the bottom-up order, so actually it's the last use in the program execution order), corresponding entry is added to regmap, when we process it's definition the entry is removed from the map so that new value can be placed into that register.
As for the data dependencies (read-after-write), the way of releasing instructions for scheduling ensures they are not broken - before scheduling alu block (one or more consecutive alu clauses) we compute the number of uses of each value that is defined in that block, and instruction that defines the value is released for scheduling only after all uses are scheduled, so the write is always scheduled after all reads (before all reads in normal non-bottom-up order).
In this particular case I think the check for anti-dependency doesn't work as expected because there are special cases - some values can be preallocated/pinned to registers before the scheduling, and probably some situations are handled incorrectly or not handled at all. There are special flags that are shown in the dumps (with sbdump option), e.g. "R23.x.6||FP@R1.x" means that value R23.x.6 (ssa-value R23.x with version 6, name contains original register used in source bytecode) has some special flags printed as "||", "F" and "P", and is stored in R1.x. Indirectly addressed GPR arrays are also allocated before the post-scheduler, and it seems there is some conflict between these preallocated registers that post-scheduler doesn't take into account.
Hope this helps, because I'm not sure when/if I'll be able to look into it myself. But I'll be glad to help as I can, so let me know if you have any other questions.
Thanks for the explanation. I made a hack that prevented the "y: MUL T0.y, KC0.x, R1.x" instruction from being added to group 13. Then those two groups became this:
0102 000000fd 00200c90 12 x: MOV R1.x, [0x40800000 4].x
0104 00002083 2f800090 y: MUL T0.y, KC0.x, R1.x
0106 000004fd 40400c90 z: MOV R2.z, [0x40400000 3].y
0108 80002883 6fa00090 w: MUL T1.w, KC0.z, R1.x
0112 000000f9 00000c90 13 x: MOV R0.x, 1.0
0114 000000fd 20400c90 y: MOV R2.y, [0x40000000 2].x
0116 00002c83 4f800090 z: MUL T0.z, KC0.w, R1.x
0118 81104a02 6f800610 w: SETE_DX10 T0.w, R[2+AR].z, KC0.z
The test passes with this hack, but shouldn't instruction 116 also be in conflict with instruction 102?
(In reply to comment #7)
> The test passes with this hack, but shouldn't instruction 116 also be in
> conflict with instruction 102?
Yes, it looks wrong too, probably incorrect result of that instruction is not critical for the test (or maybe it's even correct by accident).
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/472.