Bug 99398

Summary: 1% perf drop in GFXBench v4 tessellation test with "nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences"
Product: Mesa Reporter: Eero Tamminen <eero.t.tamminen>
Component: Drivers/DRI/i965Assignee: Intel 3D Bugs Mailing List <intel-3d-bugs>
Status: VERIFIED WONTFIX QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Eero Tamminen 2017-01-13 14:25:52 UTC
Following commit drops GFXBench v4 tessellation test (onscreen & offscreen) performance by 0.5 - 1% on GEN8 & GEN9:
-----------------------------------------------
commit 3371de38f282c77461bbe5007a2fec2a975776df
Author:     Kenneth Graunke <kenneth@whitecape.org>
AuthorDate: Tue Aug 9 01:44:38 2016 -0700
Commit:     Timothy Arceri <timothy.arceri@collabora.com>
CommitDate: Mon Jan 9 12:32:16 2017 +1100

    nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences
-----------------------------------------------

Commit affects the tessellation evaluation shaders in this test-case:
-----------------------------------------------
 Native code for unnamed tessellation evaluation shader GLSL10
-SIMD8 shader: 324 instructions. 1 loops. 7774 cycles. 0:0 spills:fills. Promoted 11 constants. Compacted 5184 to 3312 bytes (36%)
+SIMD8 shader: 323 instructions. 1 loops. 8050 cycles. 0:0 spills:fills. Compacted 5168 to 3280 bytes (37%)
...
 Native code for unnamed tessellation evaluation shader GLSL15
-SIMD8 shader: 328 instructions. 1 loops. 7778 cycles. 0:0 spills:fills. Promoted 11 constants. Compacted 5248 to 3360 bytes (36%)
+SIMD8 shader: 327 instructions. 1 loops. 8034 cycles. 0:0 spills:fills. Promoted 11 constants. Compacted 5232 to 3328 bytes (36%)
-----------------------------------------------

They're otherwise the same shader except that latter defines USE_GEOMSHADER, which causes few extra calculates for frag_color at the end.

On quick look there aren't much differences in the generated assembly.  2 sel() instructions have changed to extra cmp.ge.f0().  End part of the shader seems to have a worse scheduling. First shader has also now more register bank conflicts.

-> I'm OK if this is handled as WONTFIX, I just wanted to document it.

Besides marginal change in pixels rendered by GpuTest Piano, there were no functional or measurable performance changes from this change in rest of the tests we're tracking.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.