Bug 107223

Summary: [GEN9+] 50% perf drop in SynMark Fill* tests (E2E RBC gets disabled?)
Product: Mesa Reporter: Eero Tamminen <eero.t.tamminen>
Component: Drivers/DRI/i965Assignee: Jason Ekstrand <jason>
Status: VERIFIED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: agomez, mark.a.janes
Version: gitKeywords: bisected, regression
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 107457    

Description Eero Tamminen 2018-07-13 14:06:55 UTC
Setup:
* Ubuntu 16.04
* kernel & X server built from Git
* Modifiers support enabled (requires git version , or latest X release, which isn't yet in distros)

There's a large performance drop visible in many onscreen tests when using graphics stack supporting modifiers.

Performance drops on all GEN9+ platforms.  Drop is largest in SynMark Fill* tests on eDRAM machines where the drop is 50%.  On GT2 machines they drop ~30% and on BXT 20%.  This drop is visible both with Unity desktop and when using just plain X server.

Smaller performance drop is visible in most of the other SynMark tests.

I bisected the regression to following commit:
-------------------------------------------------
commit a26693493570a9d0f0fba1be617e01ee7bfff4db
Author:     Jason Ekstrand <jason.ekstrand@intel.com>
AuthorDate: Wed Jun 6 10:24:01 2018 -0700
Commit:     Jason Ekstrand <jason.ekstrand@intel.com>
CommitDate: Thu Jun 7 11:23:34 2018 -0700

    i965/screen: Return false for unsupported formats in query_modifiers

    Cc: mesa-stable@lists.freedesktop.org
    Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
-------------------------------------------------

Based on the size of the regressions, and GeomPoint improving with the change (only test which regresses with RBC), I'd say that above commit disables E2E RBC for these tests.

Any idea why?  Rendering looked fine before this change.


On GEN9+ GT2 machines, there's also 3-4% performance drop visible in GfxBench onscreen ALU2 & Tessellation tests from this commit, and maybe 1% in GfxBench Manhattan 3.0, but those regressions are slightly less than half of what the E2E RBC improvement was.


Interestingly, this same commit also improved SynMark VSInstancing performance noticeably on GEN7 & GEN8 (which don't support RBC).
Comment 1 Eero Tamminen 2018-07-20 11:12:20 UTC
SynMark uses EGL with GL tests, whereas most other benchmarks use GLX -> maybe this issue has something to do with EGL?
Comment 2 Mark Janes 2018-08-08 15:10:16 UTC
It took me a while to set up, but I did't see a drop in synmark OglFillPixel on Debian testing for the bisected commit.  For SKLGT2, I'm getting a score around 33.
Comment 3 Eero Tamminen 2018-08-08 16:31:35 UTC
(In reply to Mark Janes from comment #2)
> It took me a while to set up, but I did't see a drop in synmark OglFillPixel
> on Debian testing for the bisected commit.  For SKLGT2, I'm getting a score
> around 33.

If you were running it on FullHD resolution, you you don't have working E2E RBC setup (and your memory is 2x 2100Mhz DDR4?).  You should get 45-50 FPS with E2E RBC.


Maybe you didn't enable modifier support in your X server config:
--------------------------------
Section "ServerFlags"
    Option "Debug" "dmabuf_capable"
EndSection
--------------------------------
?
Comment 4 Eero Tamminen 2018-08-10 10:32:16 UTC
And naturally X server & Mesa XCB etc dependencies need to be from April or later, otherwise they're missing things required for X / modifier support being built in.

When I was testing modifier support in spring with X & Wayland, I needed to build following from git:
- git://anongit.freedesktop.org/xorg/proto/xorgproto
- git://anongit.freedesktop.org/xcb/proto
- git://anongit.freedesktop.org/xcb/libxcb
- git://anongit.freedesktop.org/mesa/drm
- git://anongit.freedesktop.org/mesa/mesa
- https://github.com/anholt/libepoxy (commit 516b4fb8d0bce)
- git://anongit.freedesktop.org/xorg/xserver
- git://anongit.freedesktop.org/xorg/app/xkbcomp
- git://anongit.freedesktop.org/xorg/driver/xf86-input-libinput

After the build, libxcb header needs to match:
grep "XCB_DRI3_M[AI][JN]OR_VERSION" $(find $PREFIX -name dri3.h)

And X server build:
grep \
-e GBM_BO_WITH_MODIFIERS \
-e GLAMOR_HAS_DRM_ATOMIC \
-e GLAMOR_HAS_DRM_MODIFIERS \
-e GLAMOR_HAS_DRM_NAME_FROM_FD_2 \
-e GLAMOR_HAS_EGL_QUERY_DMABUF \
-e GLAMOR_HAS_GBM \
-e GLAMOR_HAS_GBM_LINEAR \
include/dix-config.h

Otherwise you haven't built a working modifier supporting setup.

(Jason can tell more.)
Comment 5 Mark Janes 2018-08-16 01:40:18 UTC
I must not have a new enough X on debian testing.  I added the xorg config with no useful result.

I don't doubt Eero's results, I just can't easily confirm them to get traction on this bug.

Jason, do you have a properly configured system where you can measure performance for the synmark fill tests?
Comment 6 Andrés Gómez García 2018-08-24 15:45:09 UTC
Jason, Mark, any news on this bug ?

Notice that I've extenced the Release Candidates period for 18.2 in order to get room for fixing blockers from bug 107457.

If you don't think this can be solved in a reasonable period of time we should target for 18.3 and warn in the release notes about the known reduction in performance.

Please, let me know as soon as possible.
Comment 7 Jason Ekstrand 2018-08-25 21:21:05 UTC
I haven't had a system configured for this stuff for quite some time now.  I think I still have the scripts so maybe I can cook something up.
Comment 8 Kenneth Graunke 2018-08-27 08:35:09 UTC
Eero is spot on, I can confirm this.  OglFillPixel on my Skull Canyon (SKL GT4e) improves from ~46 FPS to ~93 FPS with the following change on top of master (partial revert of the cited commit):

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c
index eaf5a3b9feb..19ddf2b4072 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1333,9 +1333,6 @@ intel_query_dma_buf_modifiers(__DRIscreen *_screen, int fourcc, int max,
    if (f == NULL)
       return false;
 
-   if (!intel_image_format_is_supported(&screen->devinfo, f))
-      return false;
-
    for (i = 0; i < ARRAY_SIZE(supported_modifiers); i++) {
       uint64_t modifier = supported_modifiers[i].modifier;
       if (!modifier_is_supported(&screen->devinfo, f, 0, modifier))
Comment 9 Kenneth Graunke 2018-08-27 08:37:10 UTC
FWIW, it's easy to reproduce this on Archlinux - the xserver, xcb, etc. in their [testing] repo is new enough that you don't need to build any of that.  I only swapped in my Mesa, and added the xorg.conf.d change Eero suggested in comment #3.
Comment 10 Kenneth Graunke 2018-08-27 08:45:24 UTC
Mesa's DRI3 loader is calling intel_query_dma_buf_modifiers with __DRI_IMAGE_FOURCC_SARGB8888 every time, which Jason's patch rejects, leading to an empty modifier list every time.
Comment 11 Eero Tamminen 2018-08-27 08:52:36 UTC
FillPixel is pretty optimal case for E2E RBC.  It does layered write to onscreen buffer, with a well compressible moving gradient with its performance being fully dictated by the bandwidth used by those writes.
Comment 12 Jason Ekstrand 2018-08-27 13:47:03 UTC
Well, I know how to make the current bug go away but I'm not sure if it would introduce additional bugs. :-(  The problem is that we outright reject SRGB formats because there are no corresponding FOURCC formats and so you can't really create a truely SRGB modified surface.  However, we need to support it somehow because most apps ask for sRGB and really ought to get it.  I'll have to give this a think.
Comment 13 Andrés Gómez García 2018-08-28 14:43:02 UTC
18.2.0-rc5 is tomorrow and by Jason comment it seems solving this bug won't happen soon.

Will it be OK with you to go ahead making 18.2.0-rc5 the final for 18.2.0 (unless some other regression happens) and have this eventually fixed in a 18.2.x fix release ?
Comment 14 Mark Janes 2018-08-28 15:39:57 UTC
Jason is looking at this bug today.  50% is a pretty big perf regression, and end-to-end render buffer compression is a big feature for i965.  We'd like to get it fixed before release.
Comment 15 Jason Ekstrand 2018-08-29 01:05:33 UTC
I've got three patches on the mailing list.  I'm hoping Daniel Stone gets a chance to look at and review them tomorrow and we should be able to merge soon.
Comment 16 Eero Tamminen 2018-08-29 13:09:37 UTC
https://patchwork.freedesktop.org/series/48842/
Comment 17 Andrés Gómez García 2018-08-29 13:32:21 UTC
(In reply to Mark Janes from comment #14)
> Jason is looking at this bug today.  50% is a pretty big perf regression,
> and end-to-end render buffer compression is a big feature for i965.  We'd
> like to get it fixed before release.

I will extended yet another week and have a -rc6 before final release.

Thanks for the feedback and the big effort to make this regression disappear before 18.2.0.
Comment 18 Jason Ekstrand 2018-08-30 16:46:14 UTC
Fixed by the following commit in master:

commit d9cf4308ceea3762c1aab48f9c48e12a72162b5a (public/master)
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Tue Aug 28 15:25:23 2018 -0500

    i965/screen: Allow modifiers on sRGB formats
    
    This effectively reverts a26693493570a9d0f0fba1be617e01ee7bfff4db which
    was a misguided attempt at protecting intel_query_dma_buf_modifiers from
    invalid formats.  Unfortunately, in some internal EGL cases, we can get
    an SRGB format validly in this function.  Rejecting such formats caused
    us to not allow CCS in some cases where we should have been allowing it.
    This regressed the performance of some SynMark tests as well as GfxBench
    ALU2, Tessellation and Manhattan 3.0 tests
    
    There's some question of whether or not we really should be using SRGB
    "fourcc" formats that aren't actually in drm_foucc.h but there's not
    much harm in allowing them through here.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107223
    Fixes: a26693493570 "i965/screen: Return false for unsupported..."
    Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
    Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Comment 19 Eero Tamminen 2018-08-31 16:34:56 UTC
Thanks!  Verified with SynMark tests.

(GfxBench ALU2 perf is currently hidden behind bug 107772, and Tessalation behind bug 107706 though.)
Comment 20 Eero Tamminen 2018-09-05 15:04:50 UTC
FYI: this series seems to have fixed also earlier perf regression (from around May) that was visible on GEN6-GEN8 in (unity/compiz) composited windowed 3D benchmarks, but only with modifier enabled gfxstack.
Comment 21 Eero Tamminen 2018-10-04 13:12:07 UTC
(In reply to Eero Tamminen from comment #20)
> FYI: this series seems to have fixed also earlier perf regression (from
> around May) that was visible on GEN6-GEN8 in (unity/compiz) composited
> windowed 3D benchmarks, but only with modifier enabled gfxstack.

Btw. while it improved a lot of onscreen tests on older GENs too, it dropped perf on older GENs in SynMark ZBuffer & VSInstancing tests which draw a large amount of geometry on top of each other.  Apparently caching changed so that (depth?) reads are now more expensive on those older (6-8) GENs?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.