Bug 91788 - [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%
Summary: [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
Depends on:
Reported: 2015-08-28 08:07 UTC by wendy.wang
Modified: 2015-10-15 04:32 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:

Xorg.0.log info (18.40 KB, text/plain)
2015-08-28 08:18 UTC, ye.tian
drop_image_unit_validation.patch (8.63 KB, patch)
2015-08-29 13:15 UTC, Francisco Jerez
Details | Splinter Review

Description wendy.wang 2015-08-28 08:07:56 UTC
System Environment:
Mesa Regression: yes, but cannot bisect now

Failed Platform: HSW
Observed HSW performance FPS reduced by 36% when running Synmark2_v6 Multithread case

Bug detailed description:
Compare with good commit, Synmark2_v6 Multithread case performance FPS dropped by 36%

Currently cannot bisect, only find good and bad commit.
Good commit: 
commit 87cea61b9e2681e5365e989c7fa7a0298e4005fa    
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Aug 10 15:35:21 2015 -0400

    radeonsi: add new OLAND pci id

Bad commit: 
commit ecebd3dbfcb769b44e99733279c8fb0745818708   
Author: Ilia Mirkin <imirkin@alum.mit.edu>
Date:   Wed Aug 19 18:43:47 2015 -0400

    glsl: expose textureQueryLod in GLSL 4.00+ fragment shaders

    See issue from the ARB_texture_query_lod spec for LOD vs Lod confusion:

        (3) The core specification uses the "Lod" spelling, not "LOD".  Should
            this extension be modified to use "Lod"?

Bisect progress log:
1,  bf3c50fba221f216e38d3f60f89161ced4c684c0  bad
2, 1bba29ed403e735ba0bf04ed8aa2e571884fcaaf  bad
3,d32c45ca7bd5a81b312504ba99cdab3d748251f7    skip
4, 3c04a90e91a64a4a09d77c76c6ddcaca949e9b0e   skip
5, bfac8ba9d32be351277c7ea814ac9848bdcb1f16    bad
6, c58534c1384dc63bb1b13eb37c06bdb4652c13ff  skip
7, c4b4bad68a90510406c0bef97039f7d0b4f8f5fe    skip
8, f72fead4a28d5d8a16bbc20781218ea7df0b9c9a  bad
9,13a04abc277089275217dce119e18acf4d4ce52d   skip
10. fb19df7a626d02cb54614d4610af2d14720a2ef3     skip
11, a47ae8de2cf30fbe45318a18a2ea032f30ab7d10    skip
12. b88f14702d9c02a34d517f95fe840527961631cd     skip
13, fe55ab2d12202236ba5bf9beae09803dfe97a7ac    skip
14, 86dbd8af40deaa99aedf011e863b908173e63012   skip
15, f9094691378722304dd94deb76ad013bd65c7a5b  skip
16, 50545882113b389decc3f05771764f6c62213af3  skip
17, 2cdb24a7c2238843d23b468275d479553f537e7e skip 
18, 7e8be000101cc6fe3846745b559f2d785430e253 skip
19, 5f1d5b1c7857f8680b47a7a450ee9e4530e22c6f     can't start gnome-session  can't run
20, caae52561dabb2d20f2369c547e660d078974285 skip
21, e3eb91af804f449005a2ff535c805eaa1d579d99  can't start gnome-session  can't run

Reproduce steps:
1.  xinit& 
2.  gnome-session&
3.  cd /usr/local/games/OpenGL/SynMark2_6/
4. ./synmark2 OglMultithread
Comment 1 ye.tian 2015-08-28 08:18:04 UTC
Created attachment 117961 [details]
Xorg.0.log  info
Comment 2 Matt Turner 2015-08-29 08:40:54 UTC
I'm bisecting it to

commit 3569742ec458c0a881857d9deb782c1e11f195d8
Author: Francisco Jerez <currojerez@riseup.net>
Date:   Fri Nov 22 16:00:33 2013 -0800

    i965: Define implementation constants for ARB_shader_image_load_store.
    Reviewed-by: Paul Berry <stereotype441@gmail.com>
    v2: Drop VS support pre-Gen8, drop GS support.
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

Specifically, commenting out the assignment of ctx->Const.MaxImageUnits brings performance back to previous levels.

Running strings on synmark indeed shows "GL_ARB_shader_image_load_store", so the benchmark may be doing additional work with the extension enabled.
Comment 3 Kenneth Graunke 2015-08-29 09:39:07 UTC
Except that the extension isn't enabled at that point - it's turned on several commits later.  I'm pretty sure only tests with "CS" in their name use ARB_shader_image_load_store, and this isn't one of them.
Comment 4 Francisco Jerez 2015-08-29 13:15:10 UTC
Yeah, apparently the regression is caused by the additional validation done for each image unit when _NEW_TEXTURE is flagged.  I would expect validation for a single disabled image unit to be cheap, little more than a couple of pointer dereferences and a branch, but still it seems to add up to a measurable overhead due to the huge number of image units we expose.  The attached patch removes the _NEW_TEXTURE-dependent state (actually just the _Valid field) from the image unit struct so this per-image unit validation is no longer necessary.  Instead the driver is required to call a function to calculate the same value when needed, what may make per-image state updates in the driver a little more expensive but is likely to be a net win because the overall complexity is reduced since now only image units which are actually bound to a shader need to be validated.
Comment 5 Francisco Jerez 2015-08-29 13:15:49 UTC
Created attachment 117980 [details] [review]
Comment 6 ye.tian 2015-09-01 07:47:15 UTC
Tested on the latest mesa with the patch, this problem does not exists.
Comment 7 Francisco Jerez 2015-09-03 13:54:02 UTC
(In reply to ye.tian from comment #6)
> Tested on the latest mesa with the patch, this problem does not exists.

Thanks, I've formatted the change as a patch series and sent it for review with your tested-by. [1]

[1] http://lists.freedesktop.org/archives/mesa-dev/2015-September/093362.html
Comment 8 Francisco Jerez 2015-10-09 15:08:58 UTC
I've just landed the series that fixes this regression, marking as resolved.
Comment 9 Jani Saarinen 2015-10-14 11:45:58 UTC
Thanks Curro

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.