Following commit drops SynMark2 CSDof test performance on all platforms supporting compute shaders: commit 4d35683d91e3d61bf14b76d801bf6ae17237e162 Author: Ian Romanick <ian.d.romanick@intel.com> Date: Wed Oct 19 08:53:10 2016 -0700 nir: Optimize integer division and modulus with 1 The previous power-of-two rules didn't catch idiv (because i965 doesn't set lower_idiv) and imod cases. The udiv and umod cases should have been caught, but I included them for orthogonality. This fixes silly code observed from compute shaders with local_size_[xy] = 1. Commit seems clear optimization, so I assume this regression is some kind of bad interaction in the optimization passes. On SKL GT2 drop is 4.5% and more on on GT4(e). INTEL_DEBUG=perf reports a lot of register spilling and tells about inefficient fallback code for CS variable indexing with this test. No other tests besides CSDof were affected, but it's the only test in our set that is register spilling currently.
I see the same perf regression in my results.
Strange. I don't see shader-db differences, and I just recaptured the CSDof shaders and don't see differences there either. None of the shaders seem to use intdiv and only one uses intmod (by 0xc0).
Jordan noted that I likely needed to revert i965/cs: Use udiv/umod for local IDs i965/cs: Don't use a thread channel ID for small local sizes as well in order to see a difference. Indeed, I do, though it looks like the three patches helped instruction and cycle counts in four compute shaders by small amounts. Notably, no differences in spills or fills.
Confirming Matt's observation. Reverting the indicated change from Mesa HEAD, doesn't affect the performance, although going back to version before this commit (or commits before it), shows 4-5% better perf than the version built from the commit. I.e. additional changes after the indicated commit also have effect on CSDof test.
(In reply to Eero Tamminen from comment #4) > Confirming Matt's observation. > > Reverting the indicated change from Mesa HEAD, doesn't affect the > performance, although going back to version before this commit (or commits > before it), shows 4-5% better perf than the version built from the commit. > I.e. additional changes after the indicated commit also have effect on CSDof > test. When reverting: 4d35683 nir: Optimize integer division and modulus with 1 Did you also revert these? 64c3d73 i965/cs: Don't use a thread channel ID for small local sizes 1fa000a i965/cs: Use udiv/umod for local IDs It would be good to know if taking the current master, and reverting these 3 patches regains the 4-5%.
(In reply to Jordan Justen from comment #5) > When reverting: > > 4d35683 nir: Optimize integer division and modulus with 1 > > Did you also revert these? No. > 64c3d73 i965/cs: Don't use a thread channel ID for small local sizes > 1fa000a i965/cs: Use udiv/umod for local IDs > > It would be good to know if taking the current master, and > reverting these 3 patches regains the 4-5%. Tried that now. If one builds commit 4d35683, reverting just that commit fixes the drop. With HEAD, that's not enough, but reverting all 3 patches does fix the drop there too.
Note: performance-wise I'm less worried in this test about perf drop than these INTEL_DEBUG=perf warnings (and resulting spilling): ----------------- Unsupported form of variable indexing in CS; falling back to very inefficient code generation -----------------
Probably not worth tracking this old regression - we've since improved performance a lot. Closing.
verified/closed
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.