Bug 103517 - 10% perf drop in CPU bound tests with "i965: Call gen6_upload_push_constants() even when the stage is disabled"
Summary: 10% perf drop in CPU bound tests with "i965: Call gen6_upload_push_constants(...
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
Depends on:
Reported: 2017-10-30 15:34 UTC by Eero Tamminen
Modified: 2019-08-12 17:12 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Description Eero Tamminen 2017-10-30 15:34:07 UTC
ezBench bisected following commit:
commit 66b4a7a79e3da03e11233acd32b6e37bf3969dd8
Author:     Kenneth Graunke <kenneth@whitecape.org>
AuthorDate: Fri Oct 20 15:53:50 2017 -0700
Commit:     Kenneth Graunke <kenneth@whitecape.org>
CommitDate: Tue Oct 24 16:14:04 2017 -0700

    i965: Call gen6_upload_push_constants() even when the stage is disabled.
    This properly sets stage_state->push_constant_dirty = true, so that we
    emit 3DSTATE_CONSTANT_XS to disable the constant buffer for the shader
    stage.  It also sets stage_state->push_const_size = 0.
    Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

As the cause for ~10% performance drop in following CPU bound 3D tests:
- GfxBench v4 Driver2 onscreen/offscreen
- SynMark v7 Batch6, Batch7 & Multithread

Most visible the drop was on SKL i5 6600K (it's never TDP limited), then SKL GT4e & BDW GT2/GT3.  On other platforms it's visible in trends, but impact is a bit smaller and variance is too large to bisect it well.

There were no related performance improvements visible.

When looking at the system state before and after the change, main difference is that CPU usage seems to have moved slightly from user-space to something requested from kernel side, which makes sense for this commit.
Comment 1 Eero Tamminen 2017-11-09 16:50:13 UTC
According to ezBench:
* Indicated commit also broke Xonotic rendering, in addition to performance regression
* This commit fixed the rendering issue, but not the performance drop:
commit 877dd14e88de6ee115617a2a6412e86ba67db443
Author:     Kenneth Graunke <kenneth@whitecape.org>
AuthorDate: Fri Oct 20 15:38:52 2017 -0700
Commit:     Kenneth Graunke <kenneth@whitecape.org>
CommitDate: Mon Oct 30 20:38:08 2017 -0700

    i965: Don't flag BRW_NEW_SURFACES unless some push constants are dirty.
    Due to a gaffe on my part, we were re-emitting all binding table entries
    on every single draw call.  The push_constant_packets atom listens to
    BRW_NEW_DRAW_CALL, but skips emitting 3DSTATE_CONSTANT_XS for each stage
    unless stage_state->push_constants_dirty is true.  However, it flagged
    BRW_NEW_SURFACES unconditionally at the end, by mistake.
    Instead, it should only flag it if we actually emit 3DSTATE_CONSTANT_XS
    for a stage.  We can move it a few lines up, inside the loop - the early
    continues will skip over it if push constants aren't dirty for a stage.
    With INTEL_NO_HW=1 set, improves performance of GFXBench5 gl_driver_2
    on Apollolake at 1280x720 by 1.01122% +/- 0.470723% (n=35).
    Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

(And indeed, the perf is still at the regressed level, it hasn't improved noticeably since the regression.)
Comment 2 Eero Tamminen 2019-07-12 11:29:11 UTC
i965 CPU bound perf is still low, it actually dropped another ~10% year later, around mid November in 2018.
Comment 3 Kenneth Graunke 2019-07-14 07:25:58 UTC
Closing as WONTFIX.  Not planning to optimize the CPU usage of i965 constant handling.  iris does a much better job at this already.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.