Bug 103343 - [GEN9+] 2-11% performance drop in GfxBench ALU2 & SynMark TexFilterTri from "i965/tex: Use blorp texture upload for all CCS_E textures"
Summary: [GEN9+] 2-11% performance drop in GfxBench ALU2 & SynMark TexFilterTri from "...
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Jason Ekstrand
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-18 11:27 UTC by Eero Tamminen
Modified: 2019-09-25 19:04 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eero Tamminen 2017-10-18 11:27:24 UTC
While following change:
------------------------------------------------
commit 157faa407f51829fb8b2d2af723547dc8a0d3849
Author:     Jason Ekstrand <jason.ekstrand@intel.com>
AuthorDate: Wed May 31 17:53:34 2017 -0700
Commit:     Kenneth Graunke <kenneth@whitecape.org>
CommitDate: Thu Oct 12 19:58:40 2017 -0700

    i965/tex: Use blorp texture upload for all CCS_E textures
    
    This improves the FillTex benchmark in GLBench 2.7 by 30% on my Broxton.
    On Ken's Broxton which only has single-channel ram, it improves by 210%.
    
    v2 (Ken): Check mt->aux_usage == ISL_AUX_USAGE_CCS_E rather than using
              intel_miptree_is_lossless_compressed().
    
    Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
------------------------------------------------

improved GLB 2.7 Fill hugely and SynMark PSPom slightly on all GEN9+ SoC platforms, it also clearly regressed few tests on *all* GEN9+ platforms.

These regressions are (depending on platform):
* 3-5% in GfxBench v4 ALU2 onscreen & offscreen tests
* 2-6% in SynMark v7 TexFilterTri test
Comment 1 Eero Tamminen 2017-10-18 12:48:33 UTC
TexFilterTri trilinear filtering regression is more mysterious of the two.

This simple test is fully memory bandwidth bound and its 256x256 RGBA8 textures are almost completely black (one edge just has stripe or two) i.e. they should compress very well with CCS_E.  Test does several similar draw calls, with shader sampling 8 textures which have 8 mipmap levels, down to 1x1 size.

Each of the mipmap levels is separately uploaded with glTexImage2D().  I tried skipping the blorp usage for smaller levels, but that didn't change anything, only skipping blorp usage for all the levels for these 256x256 textures gets the  regression to go away.  I.e. issue isn't related to texture / mipmap level sizes.

The full texture settings are:
- type: TEXTURE_2D / RGBA8 / UNSIGNED_BYTE
- level 0: 256x256
- level 8: 1x1
- GL_TEXTURE_MAX_LEVEL = 8
- GL_TEXTURE_MIN_FILTER = GL_LINEAR_MIPMAP_LINEAR
- GL_TEXTURE_MAG_FILTER = GL_LINEAR
- GL_TEXTURE_WRAP_S = GL_REPEAT
- GL_TEXTURE_WRAP_T = GL_REPEAT


ALU2 test is partly GPU ALU, partly memory bandwidth bound.  Its textures won't compress as well as TexFilterTri ones should, and they're larger, with following formats:
* TEXTURE_2D / DEPTH_COMPONENT / UNSIGNED_INT:
  - size: 1920x1080 (8100 kiB)
* TEXTURE_2D / DEPTH_COMPONENT24:
  - size: 1920x1080 (2025 kiB)
* TEXTURE_2D / RGBA / UNSIGNED_BYTE:
  - max level: 0
  - min size: 256x256 (256 kiB)
  - max size: 1920x1080 (8100 kiB)
* TEXTURE_2D / RGBA8:
  - max level: 1
  - min size: 256x256 (256 kiB)
  - max size: 1920x1080 (8100 kiB)
Comment 2 Eero Tamminen 2017-10-18 12:50:10 UTC
On SkullCanyon (SKL GT4e), TexFilterTri perf drop is 11-12%.
Comment 3 Mark Janes 2017-10-27 17:06:00 UTC
When I measured this, I found that part of the performance drop was due to the 2 subsequent commits.

cdf626294e * i965: Use blorp instead of meta for PBO pixel reads
f933ef00e1 * i965: Use blorp instead of meta for PBO texture

I confirmed Eero's finding that 157faa407f51829fb8b2d2af723547dc8a0d3849 caused the biggest drop.
Comment 4 Mark Janes 2017-11-16 22:18:37 UTC
This benchmark is not significant enough to warrant blocking the release.
Comment 5 Nanley Chery 2018-03-05 19:51:17 UTC
Perhaps the regressions are due to the fact that a blorp upload will use a linear texture as its source in the non-PBO case.
Comment 6 Eero Tamminen 2018-03-06 08:33:12 UTC
(In reply to Nanley Chery from comment #5)
> Perhaps the regressions are due to the fact that a blorp upload will use a
> linear texture as its source in the non-PBO case.

Uploading happens before the benchmarking starts, so what happens during upload doesn't matter, only the resulting format (tiling, mocs, compression...).
Comment 7 GitLab Migration User 2019-09-25 19:04:46 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1639.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.