Bug 106175 - amdgpu.dc=1 shows performance issues with Xorg compositors when moving windows
Summary: amdgpu.dc=1 shows performance issues with Xorg compositors when moving windows
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-22 11:05 UTC by tempel.julian
Modified: 2018-12-07 21:25 UTC (History)
8 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (54.67 KB, text/plain)
2018-04-22 11:05 UTC, tempel.julian
no flags Details
xorg log (26.57 KB, text/plain)
2018-04-22 11:05 UTC, tempel.julian
no flags Details
GALLIUM_HUD showing stuttering (334.97 KB, image/png)
2018-07-27 14:32 UTC, grmat
no flags Details
Patch that "fixes" the problem. (2.97 KB, patch)
2018-11-21 23:53 UTC, Brandon Wright
no flags Details | Splinter Review
0001-drm-amd-display-Add-fast-path-for-legacy-cursor-plan.patch (4.65 KB, patch)
2018-11-22 19:30 UTC, Nicholas Kazlauskas
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description tempel.julian 2018-04-22 11:05:36 UTC
Created attachment 138987 [details]
dmesg

When amdgpu.dc=1 and an Xorg compositor are enabled at the same time, there is stuttering when moving windows.
It's most visible with Compton (which is completely stutter-free when amdgpu.dc=0), but also KWin and to a lesser extent also Gnome-Mutter.
It happens also with GPU clocks forced to maximum, so it doesn't seem to be a powersaving issue.
In the recent past, there was an issue with performance degrading when amdgpu.dc=1 and an Xorg compositor are enabled and the hardware mouse cursor was used, maybe it's still related (just guessing though)?

Tested with Linux 4.17 RC1 and drm-next-4.18-wip (4.16.1.52132fd03)
xorg-server 1.19.6+13+gd0d1a694f (amdgpu DDX & modesetting)
RX 560
Comment 1 tempel.julian 2018-04-22 11:05:56 UTC
Created attachment 138988 [details]
xorg log
Comment 2 Michel Dänzer 2018-04-23 09:56:57 UTC
Does this also happen without overriding the EDID of the DVI-D output?
Comment 3 tempel.julian 2018-04-23 11:08:22 UTC
Yes, then it also occurs with the monitor's resolution/refreshrate provided by its own edid (2560x1440 59.95Hz). Sorry, should have mentioned that.

Without compositor, moving of windows looks smooth. I also tried all vsync settings provided by Compton, but all show the stuttery behavior with amdgpu.dc.
Comment 4 tempel.julian 2018-04-23 11:56:41 UTC
I found out that it's still related to the hardware cursor.
When I set Option "SWCursor" "true" in Xorg config, moving windows is smooth.

The mouse uses 1000Hz polling rate, if that makes any difference.
To spot the stuttering best, set acceleration profile to flat for libinput.

Btw: There is also a little problem with Redshift and hardware cursor, it turns more yellow/orange than the rest of the screen. Using software cursor works around this problem as well.
Comment 5 Mariusz Ceier 2018-04-24 14:30:42 UTC
amdgpu.dc=1 also causes performance issue with 2 games I own: "Rise of the
Tomb Raider" and "Helium Rain" (UE4 game with sources publicly available on 
github).

I have 2 monitors (1st is 60Hz, 2nd in 144Hz). During tests I was using
1920x1080 resolution in both games which is 60Hz on both monitors.

1. With amdgpu.dc=0 everything is fine.

2. With amdgpu.dc=1:

The issue was showing up only in menus when cursor was visible and was in the
game window e.g. when Helium Rain was running on 2nd monitor (fullscreen)
and I was moving mouse over its window - mouse was lagging/stuttering/not 
responding and Xorg server was producing messages like below  in Xorg.0.log:

   (II) event5  - USB Gaming Mouse: SYN_DROPPED event - some input events have been lost.
   (EE) client bug: timer event5 debounce short: offset negative (-7ms)     


When I moved mouse to the 1st monitor the messages were not produced and mouse
was not lagging.

I tried few things (including switching from full dyntick system to idle
dyntick) until finally MrCooper at #xorg-devel suggested trying amdgpu.dc=0
which fixed the issue.

With amdgpu.dc=1 latencytop was showing that drm_modeset_backoff was taking ~20ms.

I was using kernel 4.17.0-rc1 and 4.17.0-rc2, Mesa 18.2.0-devel 
(git-d136a5fad9) and X.Org X Server 1.19.99.904 (1.20.0 RC 4) with
xf86-video-amdgpu and radeonsi.

I'm using 1000Hz gaming mouse.
Comment 6 Mariusz Ceier 2018-04-24 14:34:56 UTC
I forgot about - I have Radeon Fury X graphics card.
Comment 7 Harry Wentland 2018-04-24 14:57:05 UTC
Does your kernel tree have the following patches?

90fef6476917 Revert "drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)"
c7bd22893408 Revert "drm/amd/display: fix dereferencing possible ERR_PTR()"

If not can you grab the latest drm-next-4.18-wip and check again? Those reverts should have fixed problems where mouse movement would slow the system down.
Comment 8 Mariusz Ceier 2018-04-24 16:56:20 UTC
I had these patches in the kernel tree - mine is from 22nd April, while these patches were committed on 12th April.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?ofs=350
Comment 9 tempel.julian 2018-04-24 17:26:04 UTC
I got them too. Before those commits, my issue was way more severe. It's still really nasty stutter though.
Comment 10 Mike Bendel 2018-04-24 23:14:50 UTC
I have the same issue. For me amdgpu.dc=0 does not really fix it either. I have a 3840x1600 monitor running at 75 Hz. This is the different behavior I noticed when toggling the DC setting:

amdgpu.dc=0

Moving windows smooth most of the time but cursor frequently skips frames

amdgpu.dc=1

There is stutter/tearing in the movement of windows but the cursor is completely smooth otherwise.

Running 4.16.2 kernel here with a Radeon Pro WX 7100.
Comment 11 tempel.julian 2018-04-30 12:49:52 UTC
I noticed that this issue also exists apart from Xorg compositors.
When I run Serious Sam: Fusion (both OpenGL and Vulkan) in fullscreen (no Xorg compositor enabled in the background), the mouse cursor (hardware cursor) in the main menu can be moved without stuttering. But as soon as I enable vsync in the game, its movement becomes stuttery. Again no problem with amdgpu.dc=0.
Comment 12 tempel.julian 2018-05-18 14:47:42 UTC
Latest drm-next-4.18-wip aa1bce17d841a362d40da940487e13affe4c7b3b still shows the same behavior.
I'd be happy if more users would comment on this, since it makes use of amdgpu.dc totally impossible for me.
Comment 13 tempel.julian 2018-05-26 12:47:34 UTC
When I use modesetting driver with Option "PageFlip" "false", the stuttering is gone (however, as expected tearing is not fully prevented anymore).
So there might be an actual connection to pageflipping?
Comment 14 Michel Dänzer 2018-05-28 08:42:21 UTC
(In reply to tempel.julian from comment #13)
> So there might be an actual connection to pageflipping?

Yeah, the problem seems to be a bad interaction between page flipping and cursor updates.

FWIW, page flipping can be disabled with xf86-video-amdgpu as well, with

 Option "EnablePageFlip" "false"
Comment 15 tempel.julian 2018-05-28 09:27:31 UTC
I just tried that option with the xf86 amdgpu DDX driver and as expected, stuttering disappears in exchange for tearing close to the very top of the screen.

I'm really glad you could confirm the issue, the absent reports of other users really worried me that I'd have to live forever with it.
Comment 16 Z G 2018-06-03 23:03:59 UTC
I have this issue too, disabling page flipping fixes it for me on my vega10. It started with 4.16rc1 IIRC
Comment 17 Michel Dänzer 2018-06-06 07:59:36 UTC
https://patchwork.freedesktop.org/patch/227925/ might provide inspiration for how this could be solved.
Comment 18 David Francis 2018-06-15 13:54:59 UTC
My hypothesis is that has something to do with the mouse polling rate.  

Could you set the polling rate to 125 Hz (8 ms) and see if the problem persists?

This information will help us troubleshoot the problem.

Set mouse polling rate:
https://wiki.archlinux.org/index.php/mouse_polling_rate
Comment 19 Michel Dänzer 2018-06-15 14:02:47 UTC
(In reply to David Francis from comment #18)
> My hypothesis is that has something to do with the mouse polling rate.  

What is that hypothesis based on?

The kernel is supposed to be able to process any number of DRM_IOCTL_MODE_CURSOR(2) ioctls in parallel with a DRM_IOCTL_MODE_PAGE_FLIP ioctl, without them interfering with each other. Most likely there's an issue in the DC code interfering this. See the patch I referenced in comment 17 for an example of what might need to be done to solve this.
Comment 20 Michel Dänzer 2018-06-15 14:17:01 UTC
Note that the ioctls don't literally run "in parallel"; both ioctls are called by the Xorg main thread, so they can't preempt each other. What I mean is that any number of cursor ioctls can happen while a page flip is pending.
Comment 21 tempel.julian 2018-07-21 12:42:58 UTC
Don't want to nag at anyone, but this bug still makes DC unusable for me and thus is a real dealbreaker. Does implementing a fix for it require lots of efforts?
Comment 22 grmat 2018-07-27 14:32:24 UTC
Created attachment 140858 [details]
GALLIUM_HUD showing stuttering

I can confirm the issue.

Having a Radeon R9 290X (Hawaii XT), DC introduces heavy stuttering on a composited X desktop while interacting with the window manager. This stuttering can't be resolved by forcing high power states.

Attached is a screenshot to give an impression of the stuttering.
Top left is the GALLIUM_HUD with the compositor's frame rate and frame time graph. On the right, there is Firefox with the page https://testufo.com/photo#photo=quebec.jpg&pps=960&pursuit=0&height=0 having hardware acceleration force-enabled and also showing fps/frametime graphs. The web page has continuous movement, ensures that there is a screen update every frame and makes it easy to detect stutter.
This looks perfectly fine and the graphs represent that as well.
On the bottom there is the same setup but while changing window focus or moving a third window around. Firefox stutters as hell (suspecting vsync stuttering) and the graphs show that as well.

Disabling DC resolves the issue completely and the bottom scenario would look and feel the same as the upper one with DC.

In this current situation, I could disable DC and have a smooth desktop at the cost of several dozens of watts idle power or save power and use a stuttering desktop with DC enabled.
Comment 23 tempel.julian 2018-07-31 12:22:37 UTC
If I bought Vega, Raven Ridge or, in the future, Navi, I'd be really annoyed by this bug because I had to turn off page flipping, resulting in unacceptable tearing. :(

Could we please get an update?
Comment 24 Jordan L 2018-09-11 17:57:16 UTC
The challenge here is that we still can't seem to reproduce this internally on any of our setups. Can anyone identify a commonality in setup to help isolate the reproducing behaviour?
Comment 25 tempel.julian 2018-09-11 18:23:31 UTC
It doesn't seem to be related to a certain GCN generation, as there are exactly matching reports of at least Hawaii, Polaris 10/11 and Vega 10 (probably also Fiji).

It probably neither is related to the type of display output, as I am using DL-DVI and grmat afaik uses Display Port.
And I suppose we all tried standard refreshrate of 60Hz without success.

So, unfortunately, I am rather clueless.

But I just noticed we haven't yet followed David's idea of setting a low "standard" mouse polling rate of 125Hz.
I currently don't have my Radeon installed, so I can't give this a quick try (but can do so in the future).

Does anybody have this issue with a native resolution of 1920x1080 60Hz? I haven't tested such a display yet.
Comment 26 grmat 2018-09-11 22:05:23 UTC
Yes, I'm using DP (required for 144 Hz with WQHD).

However, I just reproduced the issue on a 19" monitor with 1280x1024 at 60 Hz and with a cheap old mouse with a 100 Hz polling rate. The issue is no *that* bad with the low polling rate but still very much noticeable.
Comment 27 tempel.julian 2018-10-14 12:38:54 UTC
Is this commit related to it?
https://lists.freedesktop.org/archives/amd-gfx/2018-October/027726.html
Comment 28 Nicholas Kazlauskas 2018-10-15 13:22:08 UTC
(In reply to tempel.julian from comment #27)
> Is this commit related to it?
> https://lists.freedesktop.org/archives/amd-gfx/2018-October/027726.html

It shouldn't be. You would likely be experiencing a driver hang in this case because of the fault.
Comment 29 tempel.julian 2018-10-24 20:26:50 UTC
I gave it a try again: Unfortunately, there are no improvements to report with latest 4.21-wip vs. the status of some months ago.
I really wonder how you can have trouble reproducing. This is not meant as a reproach, but it's really frustrating.
Comment 30 bmilreu 2018-10-31 23:17:44 UTC
https://github.com/yshui/compton/issues/25 - related to this issue

Some tests me and others did in compton shows there is some relation with vsync issues and the HW cursor. When I turn swcursor on in xorg config both kwin compositor and compton get significantly smoother.
Similar behavior is noticeable in some games for me, game stays smooth when playing with keyboard or gamepad then you move the mouse and it starts stuttering hard. With swcursor I get a bit of input lag but smoother performance overall.
Comment 31 Michel Dänzer 2018-11-01 11:11:16 UTC
Note that SWcursor completely disables page flipping, at least with xf86-video-amdgpu, because the two things are fundamentally incompatible with each other. Does only disabling page flipping also avoid the problem?
Comment 32 grmat 2018-11-01 12:26:34 UTC
(In reply to Michel Dänzer from comment #31)
> Does only disabling page flipping also avoid the problem?

Not from what I can tell.

> Option "EnablePageFlip" "off"

results in 

>[ 35496.178] (II) AMDGPU(0): KMS Pageflipping: disabled

obvious stuttering is still present with TearFree on.
Comment 33 tempel.julian 2018-11-01 12:30:16 UTC
I suppose TearFree forces pageflipping regardless, as we don't see any tearing with that configuration.
Comment 34 Michel Dänzer 2018-11-01 15:32:25 UTC
(In reply to tempel.julian from comment #33)
> I suppose TearFree forces pageflipping regardless, as we don't see any
> tearing with that configuration.

Right, you'd have to disable TearFree as well. Can be done at runtime with

xrandr --output <output name> --set TearFree off
Comment 35 bmilreu 2018-11-01 17:51:29 UTC
(In reply to Michel Dänzer from comment #31)
> Note that SWcursor completely disables page flipping, at least with
> xf86-video-amdgpu, because the two things are fundamentally incompatible
> with each other. Does only disabling page flipping also avoid the problem?

Justed tested it and yes, disabling pageflip also gets rid of stutter for me.
Comment 36 bmilreu 2018-11-01 18:01:29 UTC
So, to help find the origin of the issue, there are a few options that get rid of stutter when compositing:

1 - amdgpu.dc=0 - The old DC seems unaffected by the bug.
2 - SWcursor on - Unaffected by bug because it disables pageflipping
3 - Pageflipping off
Comment 37 tempel.julian 2018-11-01 18:15:44 UTC
I think software cursor would also be unusable even if it left pageflipping on. It causes nasty issues like flickering cursor or other visual corruption.
Comment 38 bmilreu 2018-11-01 18:40:31 UTC
(In reply to tempel.julian from comment #37)
> I think software cursor would also be unusable even if it left pageflipping
> on. It causes nasty issues like flickering cursor or other visual corruption.

Yes I also noticed those, I think we can open another issue for that.
Comment 39 grmat 2018-11-02 00:21:23 UTC
(In reply to Michel Dänzer from comment #34)
> 
> Right, you'd have to disable TearFree as well.

Then I think the logs should represent that, even when the manpage tells me that tearfree is using page flipping.  If i set explicitly to off, and the log says so, I expect it to be off.

And yes, disabling page flipping "resolves" the issue, but that's not new knowledge.
Comment 40 Michel Dänzer 2018-11-02 10:48:44 UTC
For the DC guys: We've now confirmed that the problem is due to some bad interaction between page flips and HW cursor updates.


(In reply to tempel.julian from comment #37)
> I think software cursor would also be unusable even if it left pageflipping
> on. It causes nasty issues like flickering cursor or other visual corruption.

Yeah, that's why xf86-video-amdgpu disables DRI page flipping while there's an SW cursor, as I said in comment 31. Note that the modesetting driver doesn't do this, allowing users to run into those issues.


(In reply to grmat from comment #39)
> (In reply to Michel Dänzer from comment #34)
> > 
> > Right, you'd have to disable TearFree as well.
> 
> Then I think the logs should represent that, even when the manpage tells me
> that tearfree is using page flipping.  If i set explicitly to off, and the
> log says so, I expect it to be off.

Patches or at least specific suggestions welcome, but I'm afraid it's tricky to describe all possible interactions concisely and clearly. DRI page flipping and TearFree are mostly separate things, but they use the same kernel page flipping mechanism, which is what matters for this issue.
Comment 41 bmilreu 2018-11-08 05:35:17 UTC
Guys, please take a closer look at this, its actually a lot worse than what OP describes and affect a lot of other use cases, vsync is a vital feature for any kind of PC activity, literally everything you do on a computer sucks with tearing.


amdgpu.dc=1 has been default for a few kernels, has been updated almost daily with features and various other fixes but basic vital stuff like vsync and higher frequencies (flickering and screen glitches) have been broken for many people with all ranges of cards for a while now.


BTW, TearFree is slow and stuttery even with old dc for me. I'd love to provide more info about any of those issues and help testing as I'm sure does others users, we just need a bit more attention from you devs. 

Features are awesome, I love to wake up everyday with new mesa/drm features but what I love even more is wake up with an annoying bug fixed.
Comment 42 Brandon Wright 2018-11-19 00:40:33 UTC
This is pretty serious. Just moving the mouse cursor around while something slightly GPU-heavy is running at 60hz can produce frame-skipping.

I switched the display core off with amdgpu.dc=0 and everything got significantly smoother and chromium doesn't chug on heavy pages any more.

I'm using 4.19.x. I haven't tried the drm-next-4.21-wip tree yet.
Comment 43 bmilreu 2018-11-19 01:28:24 UTC
(In reply to Brandon Wright from comment #42)
> This is pretty serious. Just moving the mouse cursor around while something
> slightly GPU-heavy is running at 60hz can produce frame-skipping.
> 
> I switched the display core off with amdgpu.dc=0 and everything got
> significantly smoother and chromium doesn't chug on heavy pages any more.
> 
> I'm using 4.19.x. I haven't tried the drm-next-4.21-wip tree yet.
Dont need to try drm-next-4.21-wip, just did and it still has the issue

If devs want an easy test case, use these links for reproducing it in chromium:

https://www.vsynctester.com/
https://www.testufo.com/photo
https://www.slither.io

move the cursor around, move/resize some windows. you will notice it

the vsync/cursor stutters and frame-skips are pretty noticeable with dc=1 on all three links

KWin, compton, TearFree, mutter, xfwm4 all have the same problems.
Comment 44 Brandon Wright 2018-11-19 01:48:43 UTC
You're too late, I already tried it. But as you say, there's no improvement.
Comment 45 rropid 2018-11-19 03:22:14 UTC
(In reply to bmilreu from comment #43)
> If devs want an easy test case, use these links for reproducing it in
> chromium:
> 
> https://www.vsynctester.com/
> https://www.testufo.com/photo
> https://www.slither.io
> 
> move the cursor around, move/resize some windows. you will notice it
> 
> the vsync/cursor stutters and frame-skips are pretty noticeable with dc=1 on
> all three links
> 
> KWin, compton, TearFree, mutter, xfwm4 all have the same problems.

I just tried dc=1 and I only seem to have a problem if I use TearFree. Things are totally fine without TearFree.

To be clear about what I'm doing here right now:

I made sure DC is enabled:

  $ systool -vm amdgpu | grep dc
      dc                  = "1"
  $ dmesg | grep -i display
  [    1.014297] [drm] Display Core initialized with v3.1.59!

I removed TearFree from my X config:

  $ cat /etc/X11/xorg.conf.d/20-amdgpu.conf 
  Section "OutputClass"
      Identifier "my amdgpu settings"
      MatchDriver "amdgpu"
      Option "DRI" "3"
  EndSection

And I started Compton like this to make sure it's a clean config:

  $ compton --config /dev/null --backend glx --vsync opengl

With this setup, I don't seem to have any stutter. I visited the websites you mention in a Chromium window, then opened another window and tried moving things around and resizing. It behaves fine, same as what I know from normally using dc=0.

Kernel is 4.19.2, Mesa 18.2.4, Xorg 1.20.3, the GPU is a RX480, monitor is 60 Hz.

After I had typed this, I have now added TearFree to the X config and restarted X:

  $ cat /etc/X11/xorg.conf.d/20-amdgpu.conf 
  Section "OutputClass"
      Identifier "my amdgpu settings"
      MatchDriver "amdgpu"
      Option "TearFree" "true"
      Option "DRI" "3"
  EndSection

Now, with TearFree enabled, things are super terrible. Trying to move a window around has extreme stutter, it seems to drop frames. If I restart Compton with "GALLIUM_HUD=fps" and then try moving a window around in circles, I can see it stays below 40 fps instead of hitting the 60 fps that it should be running at.
Comment 46 Brandon Wright 2018-11-19 16:09:02 UTC
I've never run TearFree, so that's not the case here. My Xorg config is similar to yours, just amdgpu and DRI 3. I did have an extra section to use evdev instead of libinput, but I tried removing that and there's still no change.
Comment 47 Michel Dänzer 2018-11-19 16:19:41 UTC
FWIW, note that TearFree can be toggled at runtime using the RandR output property of the same name. At its default value "auto", TearFree is automatically enabled for an output using rotation / scaling / other transformations.

(In reply to bmilreu from comment #41)
> BTW, TearFree is slow and stuttery even with old dc for me.

Sounds like the issue you're seeing with TearFree might be different from the one this report is about.
Comment 48 tempel.julian 2018-11-19 16:44:53 UTC
With amdgpu.dc=0, TearFree works as expected for me (no tearing without compositor, scrolling in Firefox windowed is free of stutter, no issues with compositor vsync either).

I think we should leave TearFree out of this as it's entirely unrelated, apart from the fact that it forces pageflipping.

Regarding the original issue with amdgpu.dc=1:
Still totally unchanged for me with latest stable versions and also 4.21-wip, llvm-svn, mesa-git, libdrm-git, xorg-server 1.20.3 and modesetting / xf86-video-amdgpu-git on Arch.

I'm getting an Asus Vega 56 Strix card tomorrow, which I will try instead of my current MSI Aero RX 560 card. But since there were already reports for Vega, I'm not hopeful.
Comment 49 Brandon Wright 2018-11-19 16:53:52 UTC
I'm going to speculate that maybe the hardware cursor updates are triggering an update to the vsync timestamp counter or msc that's incorrect and throwing off the timing.
Comment 50 Brandon Wright 2018-11-19 21:14:08 UTC
> I have this issue too, disabling page flipping fixes it for me on my vega10. It started with 4.16rc1 IIRC

Negative. I checked back as far as the DC/DAL was integrated (4.15) and it's been there from the start. 

It's in the kernel somewhere, in the DC DRM layer above the device specific stuff. I looked in and couldn't see anything that's grossly problematic. I suspect Michel's suggestion for async cursor updates might be the fix, but I can't help wondering why the legacy DRM code is unaffected.
Comment 51 Alex Deucher 2018-11-19 21:33:54 UTC
DC uses the atomic KMS interface, the old code uses the legacy KMS interface.
Comment 52 Brandon Wright 2018-11-19 22:48:37 UTC
Ok, I think I understand what's going on. Forgive me if this sounds stupid, I'm looking at the DRM code for the first time.

The old KMS interface uses what's flagged as "legacy" cursor updates. These are "asynchronous" in that they're handled and passed to the hardware as they come in. On the vertical retrace interrupt, it uses whatever the last data passed in was. 

My theory is the DC interface isn't passing these on to the hardware immediately. It's aggregating them until the next sync, when they're all handled at once. And that is what's causing the disturbance at page-flip time. High-report-rate mice might exacerbate it.

Intel's driver hasn't merged that async code yet. It's still using legacy cursor updates and working around this.

The DC code seems to have a TODO comment in amdgpu_dm.c that suggests something about the legacy_cursor_update flag, but it doesn't do anything with it.
Comment 53 bmilreu 2018-11-20 00:35:50 UTC
(In reply to rropid from comment #45)
> (In reply to bmilreu from comment #43)
> > If devs want an easy test case, use these links for reproducing it in
> > chromium:
> > 
> > https://www.vsynctester.com/
> > https://www.testufo.com/photo
> > https://www.slither.io
> > 
> > move the cursor around, move/resize some windows. you will notice it
> > 
> > the vsync/cursor stutters and frame-skips are pretty noticeable with dc=1 on
> > all three links
> > 
> > KWin, compton, TearFree, mutter, xfwm4 all have the same problems.
> 
> I just tried dc=1 and I only seem to have a problem if I use TearFree.
> Things are totally fine without TearFree.
> 
> To be clear about what I'm doing here right now:
> 
> I made sure DC is enabled:
> 
>   $ systool -vm amdgpu | grep dc
>       dc                  = "1"
>   $ dmesg | grep -i display
>   [    1.014297] [drm] Display Core initialized with v3.1.59!
> 
> I removed TearFree from my X config:
> 
>   $ cat /etc/X11/xorg.conf.d/20-amdgpu.conf 
>   Section "OutputClass"
>       Identifier "my amdgpu settings"
>       MatchDriver "amdgpu"
>       Option "DRI" "3"
>   EndSection
> 
> And I started Compton like this to make sure it's a clean config:
> 
>   $ compton --config /dev/null --backend glx --vsync opengl
> 
> With this setup, I don't seem to have any stutter. I visited the websites
> you mention in a Chromium window, then opened another window and tried
> moving things around and resizing. It behaves fine, same as what I know from
> normally using dc=0.
> 
> Kernel is 4.19.2, Mesa 18.2.4, Xorg 1.20.3, the GPU is a RX480, monitor is
> 60 Hz.
> 
> After I had typed this, I have now added TearFree to the X config and
> restarted X:
> 
>   $ cat /etc/X11/xorg.conf.d/20-amdgpu.conf 
>   Section "OutputClass"
>       Identifier "my amdgpu settings"
>       MatchDriver "amdgpu"
>       Option "TearFree" "true"
>       Option "DRI" "3"
>   EndSection
> 
> Now, with TearFree enabled, things are super terrible. Trying to move a
> window around has extreme stutter, it seems to drop frames. If I restart
> Compton with "GALLIUM_HUD=fps" and then try moving a window around in
> circles, I can see it stays below 40 fps instead of hitting the 60 fps that
> it should be running at.

"compton --vsync opengl" is a case less/not affected by this in my setup, try --vsync opengl-swc, --vsync opengl-oml or --vsync opengl-mswc

Also try other compositors. Kwin, mutter, xfwm4
Comment 54 tempel.julian 2018-11-20 09:10:21 UTC
You should btw. also set CPU clock governor to either acpi-cpufreq performance or intel_pstate performance, since governors like powersave, ondemand or schedutil can already cause severe stuttering at vsynctester.com, even without a compositor.

The result should be 100% stutter free, at least that's the case for me with amdgpu.dc=0. This way you should be able to be absolutely sure if the result is badly affected by amdgpu.dc=1.
Comment 55 Brandon Wright 2018-11-21 23:53:30 UTC
Created attachment 142558 [details] [review]
Patch that "fixes" the problem.

I've attached a patch that fixes the problem for me. It copies parts from the intel patch and uses the existing async infrastructure for the cursor. 

It's really tiny, so I hope this is helpful enough to get this problem fixed quick.
Comment 56 bmilreu 2018-11-22 01:22:29 UTC
(In reply to Brandon Wright from comment #55)
> Created attachment 142558 [details] [review] [review]
> Patch that "fixes" the problem.
> 
> I've attached a patch that fixes the problem for me. It copies parts from
> the intel patch and uses the existing async infrastructure for the cursor. 
> 
> It's really tiny, so I hope this is helpful enough to get this problem fixed
> quick.

Tested and solved for me on Polaris RX580. This also solves my stuttering with TearFree, which makes possible to avoid using a compositor only for vsync. Games that stuttered with mouse movement also fixed.

Review and push this asap as a fix, you are a hero.
Comment 57 bmilreu 2018-11-22 01:41:03 UTC
@Brandon Wright
Sorry for double posting, but I think if you send the patch to amd-gfx mailing-list directly it might get reviewed faster.
Comment 58 Nicholas Kazlauskas 2018-11-22 14:32:55 UTC
(In reply to Brandon Wright from comment #55)
> Created attachment 142558 [details] [review] [review]
> Patch that "fixes" the problem.
> 
> I've attached a patch that fixes the problem for me. It copies parts from
> the intel patch and uses the existing async infrastructure for the cursor. 
> 
> It's really tiny, so I hope this is helpful enough to get this problem fixed
> quick.

This is a nice attempt but it only resolves the problem because it relies on the blocking behavior in atomic check that amdgpu_dm currently does (and shouldn't be doing).

Asynchronous updates can and will occur in parallel with other commits on worker threads. Without the wait in atomic_check you'll see the IGT legacy cursor tests break with this patch (and there will probably be system faults as well).

There are larger problems within amdgpu_dm's commit tail that if addressed should resolve this issue for compton I'd imagine.
Comment 59 bmilreu 2018-11-22 15:32:44 UTC
(In reply to Nicholas Kazlauskas from comment #58)
> (In reply to Brandon Wright from comment #55)
> > Created attachment 142558 [details] [review] [review] [review]
> > Patch that "fixes" the problem.
> > 
> > I've attached a patch that fixes the problem for me. It copies parts from
> > the intel patch and uses the existing async infrastructure for the cursor. 
> > 
> > It's really tiny, so I hope this is helpful enough to get this problem fixed
> > quick.
> 
> This is a nice attempt but it only resolves the problem because it relies on
> the blocking behavior in atomic check that amdgpu_dm currently does (and
> shouldn't be doing).
> 
> Asynchronous updates can and will occur in parallel with other commits on
> worker threads. Without the wait in atomic_check you'll see the IGT legacy
> cursor tests break with this patch (and there will probably be system faults
> as well).
> 
> There are larger problems within amdgpu_dm's commit tail that if addressed
> should resolve this issue for compton I'd imagine.

Since you've been working on Freesync, you should know your patches are also affected by this bug on some wine games. Any chance you could you kindly try to tackle this? 

btw, I don't have igt on my system atm, nor got any system fault yet with the patch. I really need dc for the extra headphone jack, mine is broken atm :(
Comment 60 Brandon Wright 2018-11-22 16:09:37 UTC
> There are larger problems within amdgpu_dm's commit tail that if addressed 
> should resolve this issue for compton I'd imagine.
Honestly, I don't care about compton. I don't think you realize the effects of this issue. It seriously affects performance when the cursor is in motion with any page-flipping application. GNOME and KDE, while the window motion is less affected, stutter in composited client applications. 

> This is a nice attempt but it only resolves the problem because it relies on
> the blocking behavior in atomic check that amdgpu_dm currently does 
> (and shouldn't be doing).
>
> Asynchronous updates can and will occur in parallel with other commits on 
> worker threads. Without the wait in atomic_check you'll see the IGT legacy 
> cursor tests break with this patch (and there will probably be system faults 
> as well).
You'd have to point this out to me, because I didn't see anything that would obviously block, unless it's buried in dc_validate_plane.

Since, as you say, atomic_check is blocking for now, why not work around this issue with a tiny change. If someone ever gets around to doing things the correct way it's no big deal to remove.
Comment 61 tempel.julian 2018-11-22 17:31:26 UTC
Thanks a lot @ Brandon Wright, your patch really does the trick. I also totally agree on your opinion that it should be mainlined as at least a temporary solution (and also get backported to older kernels).

I just noticed that it works fine with xf86-video-amdgpu driver, but with modesetting driver, xorg or the driver freezes when starting/logging in. Not sure if this is related to latest 4.21-wip-changes or the cursor patch though.
Comment 62 Brandon Wright 2018-11-22 18:51:24 UTC
(In reply to tempel.julian from comment #61)
> I just noticed that it works fine with xf86-video-amdgpu driver, but with
> modesetting driver, xorg or the driver freezes when starting/logging in. Not
> sure if this is related to latest 4.21-wip-changes or the cursor patch
> though.
I'm getting the modesetting freeze, too, on 4.20-rc3, so it's likely the cursor patch. I called it a "fix", in quotation marks for a reason. I've barely looked at the KMS/DRM stuff for an hour, so I have no clue what I'm doing. I just wanted to show the AMD guys that we have pinpointed the problem, give them something that we can confirm no longer produces the problem, and hope that they'd go ahead and do things correctly.
Comment 63 bmilreu 2018-11-22 19:09:34 UTC
(In reply to Brandon Wright from comment #62)
> (In reply to tempel.julian from comment #61)
> > I just noticed that it works fine with xf86-video-amdgpu driver, but with
> > modesetting driver, xorg or the driver freezes when starting/logging in. Not
> > sure if this is related to latest 4.21-wip-changes or the cursor patch
> > though.
> I'm getting the modesetting freeze, too, on 4.20-rc3, so it's likely the
> cursor patch. I called it a "fix", in quotation marks for a reason. I've
> barely looked at the KMS/DRM stuff for an hour, so I have no clue what I'm
> doing. I just wanted to show the AMD guys that we have pinpointed the
> problem, give them something that we can confirm no longer produces the
> problem, and hope that they'd go ahead and do things correctly.

Probably easy to make the workaround only activate on xf86-video-amdgpu. I luckily don't need the modesetting driver for anything that I'm aware off, what do you guys use that driver for ? Is it for GPU switching?
Comment 64 Nicholas Kazlauskas 2018-11-22 19:30:10 UTC
Created attachment 142574 [details] [review]
0001-drm-amd-display-Add-fast-path-for-legacy-cursor-plan.patch

This patch is similar to the async_update one but it takes care to lock if anything is modifying the plane. It's very close to what i915 does with a few minor differences with framebuffer handling.

I've tested it for compton with Gallium HUD up and I no longer see the issue on mouse movement (cursor fb changes are still a bit slow, so you'll still probably see spikes on cursor changes).

You can try this on top of amd-staging-drm-next and I'd imagine it'd fix your problems.
Comment 65 bmilreu 2018-11-22 21:00:31 UTC
(In reply to Nicholas Kazlauskas from comment #64)
> Created attachment 142574 [details] [review] [review]
> 0001-drm-amd-display-Add-fast-path-for-legacy-cursor-plan.patch
> 
> This patch is similar to the async_update one but it takes care to lock if
> anything is modifying the plane. It's very close to what i915 does with a
> few minor differences with framebuffer handling.
> 
> I've tested it for compton with Gallium HUD up and I no longer see the issue
> on mouse movement (cursor fb changes are still a bit slow, so you'll still
> probably see spikes on cursor changes).
> 
> You can try this on top of amd-staging-drm-next and I'd imagine it'd fix
> your problems.

Patch does work for me.

Is there an easy way to backport this to 4.19 mainline? Would be very useful to integrate the fix into stable kernels.

As it is currently it wont work on 4.19 because it uses <drm/drm_atomic_uapi.h> which isnt mainlined yet. Brandon's hack works on 4.19 just in case it matters.

Last question, is this patch https://patchwork.freedesktop.org/patch/263412/ you just submitted related to this issue? 

Thanks a LOT for tackling this Nicholas and Brandon
Comment 66 Brandon Wright 2018-11-22 21:33:12 UTC
(In reply to bmilreu from comment #65)
> Is there an easy way to backport this to 4.19 mainline? Would be very useful
> to integrate the fix into stable kernels.
> 
> As it is currently it wont work on 4.19 because it uses
> <drm/drm_atomic_uapi.h> which isnt mainlined yet. Brandon's hack works on
> 4.19 just in case it matters.
Just remove the header include. There was some refactoring, and the functions needed in that file are in the others included.

> Last question, is this patch https://patchwork.freedesktop.org/patch/263412/
> you just submitted related to this issue? 
Looks like it's related. Thanks for taking on our issue, Nicholas.
Comment 67 tempel.julian 2018-11-24 15:24:42 UTC
Just wanted to note that applying
[PATCH 1/2] drm/amd/display: Use private obj helpers for dm_atomic_state
[PATCH 2/2] drm/amd/display: Remove wait for hw/flip done in atomic check
does not solve/workaround the issue, unlike Brandon's patch.
Comment 68 bmilreu 2018-11-24 18:30:41 UTC
(In reply to tempel.julian from comment #67)
> Just wanted to note that applying
> [PATCH 1/2] drm/amd/display: Use private obj helpers for dm_atomic_state
> [PATCH 2/2] drm/amd/display: Remove wait for hw/flip done in atomic check
> does not solve/workaround the issue, unlike Brandon's patch.

try 0001-drm-amd-display-Add-fast-path-for-legacy-cursor-plan.patch
Comment 69 tempel.julian 2018-11-24 19:38:41 UTC
(In reply to bmilreu from comment #68)
> try 0001-drm-amd-display-Add-fast-path-for-legacy-cursor-plan.patch

That one works, also with modesetting driver.

Regarding your question if modesetting driver is any beneficial: I'd say generally not, as it doesn't offer every feature of the xf86 DDX driver.
But it can be sufficient in many cases, and I also just found a bug with xf86 driver + amdgpu.dc=1 causing stutter in mpv. So I'm lucky to have modesetting as a fallback in the meantime.
Comment 70 Brandon Wright 2018-11-24 20:07:27 UTC
Comment on attachment 142558 [details] [review]
Patch that "fixes" the problem.

Marked my patch obsolete.
Comment 71 bmilreu 2018-11-28 21:28:13 UTC
@Nicholas Kazlauskas
any reason not to push this fix to staging or next?
Comment 72 Brandon Wright 2018-11-29 01:32:11 UTC
(In reply to bmilreu from comment #71)
> @Nicholas Kazlauskas
> any reason not to push this fix to staging or next?
I agree. This will reduce stuttering for everyone, especially those who think the problem is caused elsewhere and just discount it as bad software or graphics card performance like I did.
Comment 73 tempel.julian 2018-11-30 12:35:38 UTC
Yeah, I'd be extremely disappointing if this wouldn't land before linux 4.21 DRM merging window closes.
Like I already said, I think this is even worth getting backported to older kernels, as I'd consider it an important fix. Likely every AMD Xorg user has degraded performance because of this.
Comment 74 Brandon Wright 2018-12-04 23:18:02 UTC
Is anyone from the AMD driver team still following this? 

Could we please have a review of Nicholas's patch and try to get it into 4.20? It's not that disruptive code-wise, but it makes a big smoothness difference. I can quickly compile a kernel/module for myself pretty easily, but most users aren't going to be that technical or even know why things are so stuttery.
Comment 75 tempel.julian 2018-12-07 10:24:53 UTC
Any update, please?
Comment 76 Brandon Wright 2018-12-07 16:55:49 UTC
https://patchwork.freedesktop.org/series/53589/

A new patch has been submitted. So it's in the pipeline for inclusion now.
Comment 77 bmilreu 2018-12-07 21:25:58 UTC
@Nicholas Kazlauskas
is there anything important in the new patch vs the first one? it fails a hunk on 4.19 for me 
thanks for submiting it to amd-gfx


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.