Bug 110344 - Performance regression in mpv caused by enabling SENDS for surface writes
Summary: Performance regression in mpv caused by enabling SENDS for surface writes
Status: RESOLVED DUPLICATE of bug 109517
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 19.0
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-06 03:06 UTC by Nicolas Frattaroli
Modified: 2019-04-17 14:36 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
compute shader_dumped (12.48 KB, text/plain)
2019-04-10 08:27 UTC, Denis
Details
itel_debug=cs with bad commit (561.26 KB, text/plain)
2019-04-10 08:28 UTC, Denis
Details
itel_debug=cs without commit (534.50 KB, text/plain)
2019-04-10 08:29 UTC, Denis
Details
itel_debug=cs previous commit (559.50 KB, text/plain)
2019-04-10 09:13 UTC, Denis
Details

Description Nicolas Frattaroli 2019-04-06 03:06:23 UTC
When playing back 720p50 (as in, yes, 50 FPS) content fullscreened on a 1080p screen in mpv using the gpu-hq profile and the ewa_lanczossharp scaler, mpv massively drops frames in Mesa 19.0, which it doesn't do with Mesa 18.3.

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)

To test this:

mpv --no-config --profile=gpu-hq --scale=ewa_lanczossharp --video-sync=display-resample --fullscreen testfile.mp4

Latest mpv from git, though it also affects the 0.29 version.

The person to fix this will be sent artisanal Swiss chocolate if so desired.
Comment 1 Denis 2019-04-08 11:54:01 UTC
hi Nicolas, thanks for the report.
Could you please clarify, how did you measure performance degradation?

In my case I could only enable fps counter and saw these values:

mesa-18.1.5 (system) - 43.* FPS
mesa-19.0.0 (from git, release) - 36.* FPS
video in use - http://cdn.clipcanvas.com/sample/clipcanvas_14348_ProResHQ_720p50.mov

My monitor is Dell (system resolution 1920x1080)

To enable fps counter I removed (--no-config) and in the mpv.conf file added this:
>osd-msg1="FPS: ${estimated-display-fps}"

path for config file:
>~/.config/mpv/mpv.conf

To summarize, this fps drop could be determined as degradation I think, but I want to clarify what did you mean.

oh, btw, mpv version is:

den@den-HP-ZBook-14u-G4:~/repositories/mesa$ mpv --version
mpv git-2019-02-24-5370069 Copyright © 2000-2018 mpv/MPlayer/mplayer2 projects
 built on Sun Mar 10 00:58:59 UTC 2019
ffmpeg library versions:
   libavutil       56.26.100
   libavcodec      58.47.102
   libavformat     58.26.101
   libswscale      5.4.100
   libavfilter     7.48.100
   libswresample   3.4.100
ffmpeg version: git-2019-02-27-4571c7c

Taken from this ppa
https://launchpad.net/~mc3man/+archive/ubuntu/mpv-tests
Comment 2 Nicolas Frattaroli 2019-04-08 15:36:09 UTC
You can see the dropped frames in mpv's terminal status line, or by hitting I (capital i) to show the stats overlay, which will show you dropped frames. On 0.18.x I get next to no dropped frames, on 0.19.x I get many dropped frames. --video-sync=display-resample also seems to play a role, as on X11 it will make mpv drive the display loop at the monitor's refresh rate, which amplifies the effect as it will essentially attempt to redraw the same frame or render a new frame 60 times a second.
Comment 3 Denis 2019-04-08 16:34:02 UTC
you are talking exactly about "Dropped" frames printed after finishing video?

here are my values:
mesa-18.1.5 - Dropped: 118
mesa-19.0.0 - Dropped: 158
mesa-18.3.2 - Dropped: 125

Apparently number of them was increased, but I didn't see "0" in old mesa versions.

All test I am running on the same conditions (2 browsers, youtube, and a lot of other apps opened). Tomorrow I will try to make them on clear (as possible) machine, without extra processes and apps.

And maybe will try to find more fresh ppa with mpv (I couldn't build it from source because of missed dependencies of ffmpeg, and was asked for to build them from source). Or maybe on manjaro - there all needed packages should be installed.

BTW my GPU is the same with your's (KBL, HD 620).
kernel 4.20.6
Ubuntu 18.04 with unity (on X)
Comment 4 Nicolas Frattaroli 2019-04-08 19:03:41 UTC
In 18.3, I get 7 dropped frames with the test clip you linked

mpv --no-config --profile=gpu-hq --scale=ewa_lanczossharp --video-sync=display-resample --fullscreen clipcanvas_14348_ProResHQ_720p50.mov

and 1 dropped frame in 18.3 if I reencode it to H.264 to remove the ProRes decoder out of the equation

ffmpeg -i clipcanvas_14348_ProResHQ_720p50.mov -c:v libx264 -preset veryslow -profile:v high422 -level 4.2 -crf 19 -c:a libopus testclip.mkv
mpv --no-config --profile=gpu-hq --scale=ewa_lanczossharp --video-sync=display-resample --fullscreen testclip.mkv

Maybe your GPU is stuck in a lower power state.
Comment 5 Denis 2019-04-10 08:25:00 UTC
hi again. If Drops is the main degradation trigger for now, so, I think, below information may help.

In my case (in compare with your's, Nicolas), in old mesa versions I had about 50-60 drops, and 19+ mesa - 120+ drops. So I desided to bisect between.
Providing full bisect logs with "dropps" on each commit:

den@den-HP-ZBook-14u-G4:~/repositories/mesa$ git bisect log
git bisect start

    good: [190a79f462710f04d67eaefe498ef6ae5b7f5b1a] docs: add release notes for 18.3.3
    git bisect good 190a79f462710f04d67eaefe498ef6ae5b7f5b1a
    Dropped: 52

    bad: [5925a5725831b22a92f4597388d1081126d8bc91] docs: Add release notes for 19.0.0
    git bisect bad 5925a5725831b22a92f4597388d1081126d8bc91
    Dropped: 130

    good: [1f41104b9bab50652050bf4524f2b9f371f7ca9b] meson: don't install translation files
    git bisect good 1f41104b9bab50652050bf4524f2b9f371f7ca9b
    Dropped: 52

    good: [e890aaabed777e7c7736a519e94aef648081bd1d] travis: meson: add unwind handling
    git bisect good e890aaabed777e7c7736a519e94aef648081bd1d
    Dropped: 62

    good: [5486c9d526f393eff4b189e0e0a44eafeedf4407] freedreno/a6xx: Turn on texture tiling by default
    git bisect good 5486c9d526f393eff4b189e0e0a44eafeedf4407
    Dropped: 64

    good: [41a0acd6a149ec9f47ea527ad08a2b29bf1ee6b2] Switch imx to kmsro and remove the imx winsys
    git bisect good 41a0acd6a149ec9f47ea527ad08a2b29bf1ee6b2
    Dropped: 52

    bad: [fb3485bc9248a12f47b07b593f0a81d58cbb3155] gallium/u_threaded: fix EXPLICIT_FLUSH for flush offsets > 0
    git bisect bad fb3485bc9248a12f47b07b593f0a81d58cbb3155
    Dropped: 111

    bad: [82365595e9b4d947f1bdeec2b2eff1cdb226de5a] automake: Add float64.glsl to dist tarball
    git bisect bad 82365595e9b4d947f1bdeec2b2eff1cdb226de5a
    Dropped: 112

    good: [7f1cf046cd1fb8a3af0e24b622179e4adb398764] intel/fs: Add a generic SEND opcode
    git bisect good 7f1cf046cd1fb8a3af0e24b622179e4adb398764
    Dropped: 51

    good: [014edff0d20d52191570a4cb125c37b63955d664] intel/fs: Add interference between SENDS sources
    git bisect good 014edff0d20d52191570a4cb125c37b63955d664
    Dropped: 51

    bad: [bcefa0f1cb99229b6dc241ff50b2c88da1dad950] freedreno: fix invalidate logic
    git bisect bad bcefa0f1cb99229b6dc241ff50b2c88da1dad950
    Dropped: 115

    bad: [820dfcea431e4f96f25e6b340edd9cd1e449158b] egl/wayland-drm: Only announce formats via wl_drm which the driver supports.
    git bisect bad 820dfcea431e4f96f25e6b340edd9cd1e449158b
    Dropped: 117

    bad: [a34b0d68bbf8571e4d858cf4e1176766a50364de] egl/wayland: Allow client->server format conversion for PRIME offload. (v2)
    git bisect bad a34b0d68bbf8571e4d858cf4e1176766a50364de
    Dropped: 118

    bad: [a920979d4f30a48a23f8ff375ce05fa8a947dd96] intel/fs: Use split sends for surface writes on gen9+
    git bisect bad a920979d4f30a48a23f8ff375ce05fa8a947dd96
    Dropped: 117

    first bad commit: [a920979d4f30a48a23f8ff375ce05fa8a947dd96] intel/fs: Use split sends for surface writes on gen9+

____

commit a920979d4f30a48a23f8ff375ce05fa8a947dd96
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Fri Nov 16 10:46:27 2018 -0600

    intel/fs: Use split sends for surface writes on gen9+
    
    Surface reads don't need them because they just have the one address
    payload.  With surface writes, on the other hand, we can put the address
    and the data in the different halves and avoid building the payload all
    together.
    
    The decrease in register pressure and added freedom in register
    allocation resulting from this change reduces spilling enough to improve
    the performance of one customer benchmark by about 2x.
    
    Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Comment 6 Denis 2019-04-10 08:27:07 UTC
Created attachment 143912 [details]
compute shader_dumped

Jason, could you please take a look into it? In attachments you may find compute shader, which might cause the regression, and 2 results of it compilation (with bad commit and without it).
Comment 7 Denis 2019-04-10 08:28:47 UTC
Created attachment 143913 [details]
itel_debug=cs with bad commit
Comment 8 Denis 2019-04-10 08:29:10 UTC
Created attachment 143914 [details]
itel_debug=cs without commit
Comment 9 Denis 2019-04-10 09:13:32 UTC
Created attachment 143915 [details]
itel_debug=cs previous commit

re-uploaded log file for commit right before "bad" one.
git-014edff0d2
Comment 10 Jason Ekstrand 2019-04-13 22:53:56 UTC
It's pretty clear what's going on here.  The change in a920979d4f30 caused RA to either succeed or fail differently with respect to scheduling, so the scheduling algorithm changed and the new scheduling is utterly horrible compared to the old one.  In other words, our scheduler sucks.  Unfortunately, this isn't news....  We've got a new scheduler in the works (in theory) which will hopefully degrade more gracefully.

In the mean time for this particular bug, one could look into why it's failing (or succeeding; I don't know) to register allocate with the a920979d4f30 and maybe try to improve it.  It's entirely possible, however, that what *should* be an improvement in RA is causing worse performance due to the terrible scheduler.
Comment 11 Jason Ekstrand 2019-04-14 01:56:01 UTC
More specifically, it's post-RA scheduling that's blowing up.  The shader register allocates on the first try in both cases.  RA must now be creating a more restrictive allocation which prevents post-RA scheduling from being able to schedule nicely and/or gives more freedom and post-RA scheduling makes a hash of things.
Comment 12 Jason Ekstrand 2019-04-16 14:35:27 UTC
*** Bug 110412 has been marked as a duplicate of this bug. ***
Comment 13 Jason Ekstrand 2019-04-17 14:36:08 UTC

*** This bug has been marked as a duplicate of bug 109517 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.