Currently SIMD16 versions of shaders are dropped on register spilling.
If same (number of) registers are spilled also by SIMD8 version, it most likely makes sense to keep the SIMD16 version.
I'd be surprised to see a shader that spilled the same number of registers in SIMD8 and SIMD16. But yes, there are improvements we should make.
I've noticed that we often schedule the SIMD8 program to minimize delays between dependent instructions at the code of higher register usage, but since the SIMD16 program uses more registers we're not able to perform the same schedule. So instead we schedule the SIMD16 program to minimize register usage, which drastically increases latency. I suspect in cases like this the SIMD16 program is slower.
I've been planning to look at this, so I'll assign this bug to myself.
I modified the code that discards the SIMD16 shader when it does register spilling to instead accept it. Tested GpuTest volplosion with the resulting Mesa.
Before (using SIMD8): frame time 244ms
After (using SIMD16, spilling): frame time 235ms
More anecdotal evidence: SynMark2, test OglShMapVsm: SIMD8 shader is 2% faster. The SIMD16 shader also does a lot more spilling.
That was on HSW GT2. On GT3e, it's the same, by forcing SIMD16 even with register spilling, gives few percent better results with Volplosion and few percent worse with ShMapVsm.
The difference between these tests is that Volplosion spills only a little compared to the amount of instructions in it and it has no texture access, so latency compensation from SIMD16 improves performance. ShMapVsm spills more and does shadow buffer accesses.
Good heuristics on how serious register spilling is, should take into account how many instructions shader has vs. how much it needs to do large-latency fetches for other things: spilled registers from scratch, pull parameters & textures with sampler.
Because just color buffer & depth are larger than normal LLC, I think first rule of thumb for these accesses could be using system memory fetch latency for all of them and comparing that against number of instructions. If the ratio is good enough, there's no need to drop into SIMD8, even if that wouldn't have any spilling.