Summary: | i965/fs: Better heuristics on when to drop from SIMD16 to SIMD8, on register spilling | ||
---|---|---|---|
Product: | Mesa | Reporter: | Eero Tamminen <eero.t.tamminen> |
Component: | Drivers/DRI/i965 | Assignee: | Matt Turner <mattst88> |
Status: | RESOLVED MOVED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | ||
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 77547 |
Description
Eero Tamminen
2014-10-27 09:01:03 UTC
I'd be surprised to see a shader that spilled the same number of registers in SIMD8 and SIMD16. But yes, there are improvements we should make. I've noticed that we often schedule the SIMD8 program to minimize delays between dependent instructions at the code of higher register usage, but since the SIMD16 program uses more registers we're not able to perform the same schedule. So instead we schedule the SIMD16 program to minimize register usage, which drastically increases latency. I suspect in cases like this the SIMD16 program is slower. I've been planning to look at this, so I'll assign this bug to myself. I modified the code that discards the SIMD16 shader when it does register spilling to instead accept it. Tested GpuTest volplosion with the resulting Mesa. Before (using SIMD8): frame time 244ms After (using SIMD16, spilling): frame time 235ms More anecdotal evidence: SynMark2, test OglShMapVsm: SIMD8 shader is 2% faster. The SIMD16 shader also does a lot more spilling. That was on HSW GT2. On GT3e, it's the same, by forcing SIMD16 even with register spilling, gives few percent better results with Volplosion and few percent worse with ShMapVsm. The difference between these tests is that Volplosion spills only a little compared to the amount of instructions in it and it has no texture access, so latency compensation from SIMD16 improves performance. ShMapVsm spills more and does shadow buffer accesses. Good heuristics on how serious register spilling is, should take into account how many instructions shader has vs. how much it needs to do large-latency fetches for other things: spilled registers from scratch, pull parameters & textures with sampler. Because just color buffer & depth are larger than normal LLC, I think first rule of thumb for these accesses could be using system memory fetch latency for all of them and comparing that against number of instructions. If the ratio is good enough, there's no need to drop into SIMD8, even if that wouldn't have any spilling. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1459. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.