Bug 103632 - [i965 bisected] commit 6ee4b352c90e93c82d900328716c1068a7188b91 regress ASTROKILL and EVERSPACE on Haswell
Summary: [i965 bisected] commit 6ee4b352c90e93c82d900328716c1068a7188b91 regress ASTRO...
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 17.3
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
Depends on:
Reported: 2017-11-08 23:59 UTC by Darius Spitznagel
Modified: 2017-11-16 08:31 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Description Darius Spitznagel 2017-11-08 23:59:25 UTC
Hello Intel Mesa team,

ASTROKILL and EVERSPACE (both OpenGL) get a performance hit by 66% on my Haswell Iris Pro 5200 by this commit...

6ee4b352c90e93c82d900328716c1068a7188b91 is the first bad commit
commit 6ee4b352c90e93c82d900328716c1068a7188b91
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Thu Sep 28 19:03:38 2017 -0700

    i965: Use prog->info.num_images for needs_dc computation
    This should be just as good as looking in prog_data but removes our one
    state setup dependency on brw_stage_prog_data::nr_image_param.
    Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

:040000 040000 30226198325dfb57e721eed2536d5f6a8f8d6068 db3921cc2c5ee969cef70b422f60bb744b5c144d M	src

Checking out to parent commit restores full performance in both games.
Both games use UE4 Engine.
ASTROKILL (Version comes also with a Vulkan build which is broken even with parent commit.
I will bisect this later.

Both games work with latest stable Mesa 17.2.4 and also the Vulkan build of ASTROKILL.

I hope you find a fix before Mesa 17.3.0 will be released.
Comment 1 Darius Spitznagel 2017-11-14 22:36:03 UTC
Hello devs,

regression is now fixed in mesa-17.3.0-rc4 and mesa master git-5041ea96a0.
This report can be closed.

Thank you.
Comment 2 Matt Turner 2017-11-14 23:45:59 UTC
Could you do us a huge favor and bisect the fix?

I'm surprised to hear it's fixed. There's nothing in the commit log that indicates anyone has investigated the bug.
Comment 3 Darius Spitznagel 2017-11-15 17:48:28 UTC
(In reply to Matt Turner from comment #2)
> Could you do us a huge favor and bisect the fix?
> I'm surprised to hear it's fixed. There's nothing in the commit log that
> indicates anyone has investigated the bug.

I'm glad to help Matt...

4cf6b9e7ed8a962f50b5e89ae36d6b7f3a92b96e is the first fixed commit
commit 4cf6b9e7ed8a962f50b5e89ae36d6b7f3a92b96e
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Tue Oct 31 00:56:24 2017 -0700

    i965: properly initialize brw->cs.base.stage to MESA_SHADER_COMPUTE
    This has a bit of a surprising effect:
    For the render pipeline, the upload_sampler_state_table atom emits
    3DSTATE_BINDING_TABLE_POINTERS_XS.  It tries to avoid this for compute:
       if (GEN_GEN >= 7 && stage_state->stage != MESA_SHADER_COMPUTE) {
          /* Emit a 3DSTATE_SAMPLER_STATE_POINTERS_XS packet. */
          genX(emit_sampler_state_pointers_xs)(brw, stage_state);
       } ...
    However, we were failing to initialize brw->cs.base.stage, so it was
    left as 0 (MESA_SHADER_VERTEX), causing this condition to break.  We
    then emitted 3DSTATE_SAMPLER_STATE_POINTERS_VS in GPGPU mode, when
    trying to upload CS samplers.  Nothing good can come of this.
    Found by inspection while debugging a GPU hang.  Jordan believes this
    helps the Deus Ex: Mankind Divided benchmark mode's stability when
    running with shader cache.
    Cc: mesa-stable@lists.freedesktop.org
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
    Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
    (cherry picked from commit a16dc04ad51c32e5c7d136e4dd6273d983385d3f)

:040000 040000 84b0b8b9d018cebba864022ac039a00923dd4cb7 2e21e6e28301fafab7bb35f10ebc833864d1d656 M	src
Comment 4 Kenneth Graunke 2017-11-16 08:31:30 UTC
Thanks for tracking that down, Darius!  Apparently there was a pretty nasty bug...the L3 config code was doing:

      const struct gl_program *prog =

but because we'd failed to set brw->cs.stage = MESA_SHADER_COMPUTE, it was initialized to 0 aka MESA_SHADER_VERTEX.  So, when calculating the data cache needs, prog_data properly pointed at the compute shader, but prog incorrectly was the vertex shader.

That means Jason's commit (the one regressing things) caused us to start considering the vertex shader's number of images rather than the compute shader.  That was probably near 0, so we thought we didn't need to bother allocating any cache to DC.  But, his patch was fine...and in fact, the bogus prog had been around for quite a while.  The real bug was the one I fixed.

While looking at this, I found another bug in the area, and sent a patch:

Again, thanks for helping us catch these issues!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.