Bug 71944

Summary: vs-temp-array-mat3-index-col-rd test fails with SB
Product: Mesa Reporter: Martin Andersson <g02maran>
Component: Drivers/Gallium/r600Assignee: mesa-dev
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Shader dump when test fails
Shader dump when test passes
Output with R600_DEBUG=vs,sbdump
Output from PSC_DEBUG

Description Martin Andersson 2013-11-23 17:22:32 UTC
Created attachment 89688 [details]
Shader dump when test fails

I did a little investigation on why the piglit test: vs-temp-array-mat3-index-col-rd fails with SB.

If I disable SB with R600_DEBUG=nosb the test passes.

I did a bisect and found this commit:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=7ae9cc71f097af5ae1f83f77f75de2198849faca

If I revert that commit the test passes with SB, but I'm pretty sure that commit only exposed a bug that weren't hit before.

I looked at the shader dumps and the only thing I found that looked a little strange was this.

before: 7ae9cc71f097af5ae1f83f77f75de2198849faca
z: SETE               T0.z,  R[2+AR].y, KC0[2].y
w: SETE               T1.w,  R[2+AR].z, KC0[2].z
x: SETE               T1.x,  R[2+AR].x, KC0[2].x

after: 7ae9cc71f097af5ae1f83f77f75de2198849faca
w: SETE_DX10          T0.w,  R[2+AR].z, KC0[2].z
z: SETE_DX10          T0.z,  R[2+AR].y, KC0[2].y
x: SETE_DX10          T1.x,  R[0+AR].x, KC0[2].x

Is R[0+AR].x correct in the last line or should it be R[2+AR].x?

Tested with mesa master: 751e8697f2099a0ceaba09e0d8775e4636f209b1
Comment 1 Martin Andersson 2013-11-23 17:23:23 UTC
Created attachment 89689 [details]
Shader dump when test passes
Comment 2 Vadim Girlin 2013-11-23 18:31:50 UTC
(In reply to comment #0)
> If I revert that commit the test passes with SB, but I'm pretty sure that
> commit only exposed a bug that weren't hit before.

Yes, I think you are right.

> 
> I looked at the shader dumps and the only thing I found that looked a little
> strange was this.
> 
> before: 7ae9cc71f097af5ae1f83f77f75de2198849faca
> z: SETE               T0.z,  R[2+AR].y, KC0[2].y
> w: SETE               T1.w,  R[2+AR].z, KC0[2].z
> x: SETE               T1.x,  R[2+AR].x, KC0[2].x
> 
> after: 7ae9cc71f097af5ae1f83f77f75de2198849faca
> w: SETE_DX10          T0.w,  R[2+AR].z, KC0[2].z
> z: SETE_DX10          T0.z,  R[2+AR].y, KC0[2].y
> x: SETE_DX10          T1.x,  R[0+AR].x, KC0[2].x
> 
> Is R[0+AR].x correct in the last line or should it be R[2+AR].x?

It's not necessarily incorrect, IIRC components of the indirectly addressed array are considered as separate arrays by sb, they are allocated independently and may have different base GPR indices, e.g. yz components of the original array may be stored in R2.yz-R10.yz (if R0.yz and/or R1.yz are used for something else) and x component in R0.x-R8.x.

AFAICS the problem is here:

 0106      12      x: MOV                R1.x,  [0x40800000 4].x
 0108              y: MOV                R2.y,  [0x40000000 2].y
 0110              z: MUL                T0.z,  KC0[3].w, R1.x
 0112              w: MUL                T1.w,  KC0[3].z, R1.x
 0114  40800000 
 0115  40000000 
 0116      13      x: MOV                R0.x,  1.0
 0118              y: MUL                T0.y,  KC0[3].x, R1.x

Instruction 0106 writes 4 to R1.x, but instruction 0118 in the next group expects old value in R1.x (the value that was in R1.x before 0106). So it looks like scheduler failed to handle that data dependency correctly for some reason, instruction 0118 should be scheduled in the same group as 0106 or earlier.

Please attach the output with R600_DEBUG=vs,sbdump
Comment 3 Martin Andersson 2013-11-23 21:07:52 UTC
Created attachment 89694 [details]
Output with R600_DEBUG=vs,sbdump
Comment 4 Martin Andersson 2013-11-28 18:20:23 UTC
Created attachment 89964 [details]
Output from PSC_DEBUG
Comment 5 Martin Andersson 2013-11-28 18:20:53 UTC
I tried to look a little closer at the group of instructions where you thought the problem was. But I can't really wrap my head around how the scheduling works.

When is the data dependency checking done? Is it when one instruction is added to a group or after the whole group is done or even later?
Comment 6 Vadim Girlin 2013-11-28 20:06:52 UTC
(In reply to comment #5)
> I tried to look a little closer at the group of instructions where you
> thought the problem was. But I can't really wrap my head around how the
> scheduling works.

Yes, sometimes it looks unnecessarily complicated to me, rewriting some parts of the scheduler and adding new features on top of existing code made it a bit messy. I planned to rewrite it completely to make it more clean and simple, but I suspect it won't happen in foreseeable future.

> When is the data dependency checking done? Is it when one instruction is
> added to a group or after the whole group is done or even later?

I'm not sure about exact terminology and classification of dependencies, I said it's a data dependency but I guess problematic dependency in this case is actually called anti-dependency (write-after-read), and it's checked for the entire group in the post_scheduler::check_interferences ("interference" here means the case when two different values are allocated to the same register at some point - we should prevent it). The check is not performed per instruction because we want to take into account parallel execution and allow reading old value from GPR and writing new value to that GPR in the same group. This allows to process the groups like this:

 x: MOV R0.x, R0.y
 y: MOV R0.y, R0.x

Per-instruction check would disallow any of these instructions that comes first because it has dependency with the other one that is not processed yet.
The check uses register map (post_scheduler::regmap) where the mapping of GPR index/component to value is maintained. If at some point value V is stored in R0.x, then the map contains entry (R0.x => V) and we know that we can't put new value into R0.x until the entry for previous value in the map is removed. When we process first use of some value (scheduling is performed in the bottom-up order, so actually it's the last use in the program execution order), corresponding entry is added to regmap, when we process it's definition the entry is removed from the map so that new value can be placed into that register.

As for the data dependencies (read-after-write), the way of releasing instructions for scheduling ensures they are not broken - before scheduling alu block (one or more consecutive alu clauses) we compute the number of uses of each value that is defined in that block, and instruction that defines the value is released for scheduling only after all uses are scheduled, so the write is always scheduled after all reads (before all reads in normal non-bottom-up order).

In this particular case I think the check for anti-dependency doesn't work as expected because there are special cases - some values can be preallocated/pinned to registers before the scheduling, and probably some situations are handled incorrectly or not handled at all. There are special flags that are shown in the dumps (with sbdump option), e.g. "R23.x.6||FP@R1.x" means that value R23.x.6 (ssa-value R23.x with version 6, name contains original register used in source bytecode) has some special flags printed as "||", "F" and "P", and is stored in R1.x. Indirectly addressed GPR arrays are also allocated before the post-scheduler, and it seems there is some conflict between these preallocated registers that post-scheduler doesn't take into account.

Hope this helps, because I'm not sure when/if I'll be able to look into it myself. But I'll be glad to help as I can, so let me know if you have any other questions.
Comment 7 Martin Andersson 2013-12-01 16:23:43 UTC
Thanks for the explanation. I made a hack that prevented the "y: MUL T0.y,  KC0[3].x, R1.x" instruction from being added to group 13. Then those two groups became this:

0102  000000fd 00200c90    12      x: MOV                R1.x,  [0x40800000 4].x
0104  00002083 2f800090            y: MUL                T0.y,  KC0[3].x, R1.x
0106  000004fd 40400c90            z: MOV                R2.z,  [0x40400000 3].y
0108  80002883 6fa00090            w: MUL                T1.w,  KC0[3].z, R1.x
0110  40800000
0111  40400000
0112  000000f9 00000c90    13      x: MOV                R0.x,  1.0
0114  000000fd 20400c90            y: MOV                R2.y,  [0x40000000 2].x
0116  00002c83 4f800090            z: MUL                T0.z,  KC0[3].w, R1.x
0118  81104a02 6f800610            w: SETE_DX10          T0.w,  R[2+AR].z, KC0[2].z

The test passes with this hack, but shouldn't instruction 116 also be in conflict with instruction 102?
Comment 8 Vadim Girlin 2013-12-01 17:46:56 UTC
(In reply to comment #7)
> The test passes with this hack, but shouldn't instruction 116 also be in
> conflict with instruction 102?

Yes, it looks wrong too, probably incorrect result of that instruction is not critical for the test (or maybe it's even correct by accident).
Comment 9 GitLab Migration User 2019-09-18 19:11:54 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/472.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.