Bug 85505

Summary:	i965/fs: Better heuristics on when to drop from SIMD16 to SIMD8, on register spilling
Product:	Mesa	Reporter:	Eero Tamminen <eero.t.tamminen>
Component:	Drivers/DRI/i965	Assignee:	Matt Turner <mattst88>
Status:	RESOLVED MOVED	QA Contact:	Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity:	normal
Priority:	medium
Version:	git
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	77547

Description Eero Tamminen 2014-10-27 09:01:03 UTC

Currently SIMD16 versions of shaders are dropped on register spilling.

If same (number of) registers are spilled also by SIMD8 version, it most likely makes sense to keep the SIMD16 version.

Comment 1 Matt Turner 2014-10-27 16:25:45 UTC

I'd be surprised to see a shader that spilled the same number of registers in SIMD8 and SIMD16. But yes, there are improvements we should make.

I've noticed that we often schedule the SIMD8 program to minimize delays between dependent instructions at the code of higher register usage, but since the SIMD16 program uses more registers we're not able to perform the same schedule. So instead we schedule the SIMD16 program to minimize register usage, which drastically increases latency. I suspect in cases like this the SIMD16 program is slower.

I've been planning to look at this, so I'll assign this bug to myself.

Comment 2 Petri Latvala 2014-10-28 15:34:55 UTC

I modified the code that discards the SIMD16 shader when it does register spilling to instead accept it. Tested GpuTest volplosion with the resulting Mesa.

Before (using SIMD8): frame time 244ms
After (using SIMD16, spilling): frame time 235ms

Comment 3 Petri Latvala 2014-10-28 15:58:37 UTC

More anecdotal evidence: SynMark2, test OglShMapVsm: SIMD8 shader is 2% faster. The SIMD16 shader also does a lot more spilling.

Comment 4 Eero Tamminen 2014-11-03 13:03:33 UTC

That was on HSW GT2.  On GT3e, it's the same, by forcing SIMD16 even with register spilling, gives few percent better results with Volplosion and few percent worse with ShMapVsm.

The difference between these tests is that Volplosion spills only a little compared to the amount of instructions in it and it has no texture access, so latency compensation from SIMD16 improves performance.  ShMapVsm spills more and does shadow buffer accesses.

Good heuristics on how serious register spilling is, should take into account how many instructions shader has vs. how much it needs to do large-latency fetches for other things: spilled registers from scratch, pull parameters & textures with sampler.

Because just color buffer & depth are larger than normal LLC, I think first rule of thumb for these accesses could be using system memory fetch latency for all of them and comparing that against number of instructions.  If the ratio is good enough, there's no need to drop into SIMD8, even if that wouldn't have any spilling.

Comment 5 GitLab Migration User 2019-09-25 18:52:40 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1459.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.