Summary: | 6% performance drop in GpuTest v0.7 Volplosion with "remove GLSL IR optimisation loop" (-> fails to compile as SIMD16) | ||
---|---|---|---|
Product: | Mesa | Reporter: | Eero Tamminen <eero.t.tamminen> |
Component: | Drivers/DRI/i965 | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Status: | RESOLVED MOVED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | kenneth, mattst88, t_arceri |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=99221 https://bugs.freedesktop.org/show_bug.cgi?id=101064 |
||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | Register liveness & fragmentation for the SIMD16 assembly before the change |
Description
Eero Tamminen
2017-05-03 13:36:40 UTC
After the change, the main fragment shader doesn't anymore compile as SIMD16 (benchmark seems to compile shader 7 twice so you get the warning twice): FS compile failed: Failure to register allocate. Reduce number of live scalar values to avoid this. Resulting SIMD8 shader has about the same amount of instructions (nearly two thousand) as earlier, with following changes: - One mad() more - One add() & mul() less - slightly more register bank conflicts - at max, 71 live registers instead of earlier 61 -> somehow this change messes up the current register allocation. Before the change, the SIMD16 variant of the shader had at max 109 live registers. Looks like this regression could/should have been caught by shader-db: LOST: shaders/closed/gputest/volplosion/7.shader_test FS SIMD16 Separately, Curro is having an interesting time with this particular shader with his new scheduler. Perhaps if that is sorted out, the regression will be fixed. (In reply to Matt Turner from comment #2) > Looks like this regression could/should have been caught by shader-db: > > LOST: shaders/closed/gputest/volplosion/7.shader_test FS SIMD16 Sorry about that - I saw 48 LOST and 48 GAINED in a random assortment of programs and didn't think much of it. There are always some teetering on the edge, and most of them aren't crazy ALU-bound like volplosion where it makes such a large difference. Created attachment 131197 [details]
Register liveness & fragmentation for the SIMD16 assembly before the change
(In reply to Matt Turner from comment #2) > Looks like this regression could/should have been caught by shader-db: > > LOST: shaders/closed/gputest/volplosion/7.shader_test FS SIMD16 > > > Separately, Curro is having an interesting time with this particular shader > with his new scheduler. Perhaps if that is sorted out, the regression will > be fixed. Right. As Ken said there were differences noticed, but it would be pretty difficult to not see any difference considering the change. The results from BDW as per the commit message were: LOST: 17 GAINED: 40 Looking at the output the only real difference I can see is the ordering of the final nir instructions which seems makes a big diff to the scheduler. It appears when the vectors are split in nir the instructions seem to be grouped together as per the execution order of the vectors in the shader, in contrast the GLSL IR the instructions seem to be order such that they are grouped slightly more towards executing more instructions on a channel one after the other. This is probably enough to free up some registers. I suspect the only thing we can do is wait for Curros new scheduler. We could revert the change, but that just feels like it hides the problem rather than fixes it and would be a step backwards IMO. By the way I should also note that I did look into a large number of the regressions when working on this series (otherwise I couldn't have gotten to this point) and all the remaining regressions (besides a couple of instructions here and there) appeared to be scheduler related as per the commit message. I was able to do better bisection with ezBench on SKL GT2, there drop to Volplosion was slightly smaller (5%), and FurMark actually improved by 1%. Previous "nir: shuffle constants to the top" commit also affected both of these tests. According to ezBench, on SKL GT2 Volplosion dropped by 1% and FurMark improved by 2%. Most of the perf regression got fixed less than year ago, in Jan/Feb 2018, but main shader still fails to build as SIMD16 and therefore perf remains below original level. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1593. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.