Bug 99398 - 1% perf drop in GFXBench v4 tessellation test with "nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences"
Summary: 1% perf drop in GFXBench v4 tessellation test with "nir: Turn bcsel of +/- 1....
Status: VERIFIED WONTFIX
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-13 14:25 UTC by Eero Tamminen
Modified: 2017-09-15 08:05 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Eero Tamminen 2017-01-13 14:25:52 UTC
Following commit drops GFXBench v4 tessellation test (onscreen & offscreen) performance by 0.5 - 1% on GEN8 & GEN9:
-----------------------------------------------
commit 3371de38f282c77461bbe5007a2fec2a975776df
Author:     Kenneth Graunke <kenneth@whitecape.org>
AuthorDate: Tue Aug 9 01:44:38 2016 -0700
Commit:     Timothy Arceri <timothy.arceri@collabora.com>
CommitDate: Mon Jan 9 12:32:16 2017 +1100

    nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences
-----------------------------------------------

Commit affects the tessellation evaluation shaders in this test-case:
-----------------------------------------------
 Native code for unnamed tessellation evaluation shader GLSL10
-SIMD8 shader: 324 instructions. 1 loops. 7774 cycles. 0:0 spills:fills. Promoted 11 constants. Compacted 5184 to 3312 bytes (36%)
+SIMD8 shader: 323 instructions. 1 loops. 8050 cycles. 0:0 spills:fills. Compacted 5168 to 3280 bytes (37%)
...
 Native code for unnamed tessellation evaluation shader GLSL15
-SIMD8 shader: 328 instructions. 1 loops. 7778 cycles. 0:0 spills:fills. Promoted 11 constants. Compacted 5248 to 3360 bytes (36%)
+SIMD8 shader: 327 instructions. 1 loops. 8034 cycles. 0:0 spills:fills. Promoted 11 constants. Compacted 5232 to 3328 bytes (36%)
-----------------------------------------------

They're otherwise the same shader except that latter defines USE_GEOMSHADER, which causes few extra calculates for frag_color at the end.

On quick look there aren't much differences in the generated assembly.  2 sel() instructions have changed to extra cmp.ge.f0().  End part of the shader seems to have a worse scheduling. First shader has also now more register bank conflicts.

-> I'm OK if this is handled as WONTFIX, I just wanted to document it.

Besides marginal change in pixels rendered by GpuTest Piano, there were no functional or measurable performance changes from this change in rest of the tests we're tracking.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.