|Summary:||pageflipping seems to cause jittering on mouse input when running Hitman 2 in Wine/DXVK with amdgpu.dc=1|
|Component:||DRM/AMDgpu||Assignee:||Default DRI bug account <dri-devel>|
|Status:||RESOLVED NOTOURBUG||QA Contact:|
|i915 platform:||i915 features:|
Description tempel.julian 2019-05-10 11:15:17 UTC
Created attachment 144213 [details] xorg log The jittering is already noticeable in the game's menu. It's most prominent with vsync enabled, but there is also noticeable jittering in the game when vsync is turned off. It seems there is no jittering with just keyboard input instead of mouse. The issue does not occur when: -playing the game in windowed mode -turning off pageflipping in xorg config -using modesetting driver instead -using legacy DC via amdgpu.dc=0 I haven't yet encountered this in other games with DXVK. xorg 1.20.5 xf86-video-amdgpu 19.0.1/-git mesa-git
Comment 2 tempel.julian 2019-05-10 13:06:48 UTC
Funny, this MR leads to the same behavior with the modesetting driver: https://gitlab.freedesktop.org/xorg/xserver/merge_requests/180
Comment 3 Michel Dänzer 2019-05-15 10:15:30 UTC
(In reply to tempel.julian from comment #0) > The issue does not occur when: > [...] > -using legacy DC via amdgpu.dc=0 Probably a DC issue then.
Comment 4 Nicholas Kazlauskas 2019-05-15 12:17:49 UTC
What's the latest commit in your WIP kernel? I know there was a regression caused by: https://patchwork.freedesktop.org/patch/304544/ that forces full updates on every commit leading to pretty poor performance. I have a patch that fixes this that didn't make into that set of DC patches. But I also don't think this last set is merged yet into amd-staging-drm-next, so it's likely something other than this - and likely something in the legacy codepath if disabling atomic support in modesetting causes the issue.
Comment 5 tempel.julian 2019-05-16 16:06:10 UTC
Yes, it also happens with Linux 5.1. It btw. runs fine on xwayland inside a Plasma Wayland session.
Comment 6 tempel.julian 2019-05-24 19:34:57 UTC
Situation is unchanged with 5.3-wip. It also occurs with amdvlk instead of radv if you turn on pageflipping via UseFlipHint,1 in amdPalSettings.cfg (for incomprehensible reasons it is disabled by default and the amdvlk developers unfortunately seem to ignore user complaints regarding it). Instead of pageflipping, the issue can also be triggered with amdvlk + TearFree. Btw: There is a free demo of Hitman 2 on Steam, it might work out of the box with Steam Play/Proton. ----- Little off-topic: I head to re-write this entire comment because freedesktop.org servers are a nightmare. Complete migration to GitLab would be great thing.
Comment 7 tempel.julian 2019-05-26 20:08:39 UTC
Playing Skyrim with Gallium Nine also shows this issue, it makes the games unplayable. Is it really certain that it's an amdgpu.dc problem when the modesetting DDX doesn't show this issue?
Comment 8 Nicholas Kazlauskas 2019-05-27 13:05:55 UTC
Do you happen to know if this was a regression?
Comment 9 tempel.julian 2019-05-27 14:13:19 UTC
Until I get a new GPU or a FreeSync display, I use amdgpu.dc=1 only for testing purposes. So I can't judge if this is a regression or has always existed. But I gave Linux 4.19.46 LTS a try and it shows the same behavior. Hm, maybe no one noticed because pageflipping wasn't working before this commit? https://gitlab.freedesktop.org/xorg/driver/xf86-video-amdgpu/commit/bf61e6d7ac1a5754b1026d7f80acf25ef622c491 Will retest with latest stable versions of xorg / amdgpu DDX. It's btw. really not happening in every game, e.g. Elex seems to be fine.
Comment 10 tempel.julian 2019-05-27 14:22:42 UTC
Nope, not related to it. Happens also with stable versions.
Comment 11 tempel.julian 2019-05-27 17:00:40 UTC
Happens also with plain wined3d inside official Steam Proton builds. In case of Skyrim, it is also affects the rendering performance and thus is visible in the frametime graph (unlike Hitman 2 with DXVK): https://abload.de/img/screenshot_20190527_1t1ktp.png Those spikes occur by just moving the mouse. Pressing keyboard buttons don't trigger them.
Comment 12 Nicholas Kazlauskas 2019-05-27 17:11:08 UTC
I'm wondering if this is the async cursor update bug again. Maybe something with WINE or the game is trying to swap cursor buffers frequently and it's interacting with the cursor double buffering in xf86-video-amdgpu. We still can't do fast cursor updates for swapping cursor framebuffers because we'll hit page faults that can kill the driver due to the cursor framebuffer not being properly refcounted. The fix for this particular bug is still under review in DRM. I plan on removing the restriction I added in amdgpu DM after the fix has been merged. But for now, whenever the cursor swaps framebuffers we can't perform fast cursor updates so we're forced to wait for the previous flip to finish and the vblank event to be sent back to userspace. This can cause small jitters depending on how often the cursor is updating and when it updates during the vblank interval.
Comment 13 tempel.julian 2019-05-27 17:31:07 UTC
Thanks for letting me know! Could you please provide me with a loose estimate if those general atomic modesetting performance limitations can be overcome in the next months? Would really put my mind at ease. :)
Comment 14 Nicholas Kazlauskas 2019-05-27 17:53:33 UTC
(In reply to tempel.julian from comment #13) > Thanks for letting me know! > Could you please provide me with a loose estimate if those general atomic > modesetting performance limitations can be overcome in the next months? > Would really put my mind at ease. :) The core bits + the bits that affect amdgpu are reviewed. But I think it's still waiting on review from maintainers of the other drivers the patch impacts. I wouldn't expect it to land before 5.3 or even 5.4 at the earliest unfortunately. I would still need to debug to know for sure if that's the actual bug that's going on here but it seems likely given that it's atomic DC + cursor movement + xf86-video-amdgpu that's causing the issue.
Comment 15 tempel.julian 2019-05-27 17:59:02 UTC
Well, hope on the horizon. If applying debug patches would be helpful for trying to shed light into this issue, I would of course do it.
Comment 16 Nicholas Kazlauskas 2019-05-27 18:07:37 UTC
Created attachment 144354 [details] [review] 0001-drm-amd-display-Allow-fast-updates-again-for-swappin.patch Sure, you can try the patch I've attached on applied after series fixing the problem in DRM: https://patchwork.kernel.org/cover/10837847/ Not sure if that applies cleanly, however. The important patches from should be: https://patchwork.kernel.org/patch/10837849/ https://patchwork.kernel.org/patch/10837853/
Comment 17 tempel.julian 2019-05-27 19:42:16 UTC
I applied your patch and patches 1 and 3 of that series on linux 5.2-rc2, but it unfortunately doesn't show any effect: -There is still the mouse input issue for the games described in this ticket. -Opening new windows still creates stutter. -And so do gamma adjustments via RedShift.
Comment 18 tempel.julian 2019-05-29 17:05:06 UTC
Huh, with modesetting driver, those patches eliminate the stutter when new windows are shown. Does the xf86-video-amdgpu driver need adjustments for this? However, turning on nightlight in Plasma Wayland still causes stutter, which is not there with amdgpu.dc=0. RedShift btw. is completely broken with amdgpu.dc=1 + modesetting DDX, it simply has no effect anymore (not related to the experimental atomic modesetting patches).
Comment 19 Nicholas Kazlauskas 2019-05-29 18:05:54 UTC
(In reply to tempel.julian from comment #18) > Huh, with modesetting driver, those patches eliminate the stutter when new > windows are shown. Does the xf86-video-amdgpu driver need adjustments for > this? It should eliminate stuttering for that case in xf86-video-amdgpu if it's the problem I think it is (double buffering the cursor). > > However, turning on nightlight in Plasma Wayland still causes stutter, which > is not there with amdgpu.dc=0. 1. Gamma updates are slow updates that do a lot of register programming. Nightlight and RedShift issue a lot of these updates. 2. Gamma updates, like everything that isn't a cursor update, currently target the next vblank period. 3. If the pageflip is in a separate commit or update than the gamma update, then it'll need to wait for the gamma update to finish and for the next vblank interval. If this takes too long then we might miss the next vblank interval and have to wait for the one after that. I think it's a combination of these 3 issues. Even though it's Wayland and should be using the full atomic API, I'm not sure if plasma is actually issuing all that state in the same commit or not. My guess would be no, since you're seeing the stuttering. We do have a bug with (2) for legacy gamma updates, since there isn't really any reason those should be waiting for the next flip / vblank other than to be consistent with the rest of the atomic commit framework. > RedShift btw. is completely broken with amdgpu.dc=1 + modesetting DDX, it > simply has no effect anymore (not related to the experimental atomic > modesetting patches). Not sure what the issue here would be. Gamma seems to work fine for legacy and atomic on amdgpu (we pass the IGT tests for this) and it works fine in legacy desktops like GNOME on Xorg with the xf86-video-amdgpu DDX. Was this still on Plasma, but on X?
Comment 20 tempel.julian 2019-05-29 20:56:40 UTC
I forgot that I patched this PR into my Xserver: https://gitlab.freedesktop.org/xorg/xserver/merge_requests/36 It is responsible for the blocked gamma adjustment and the better desktop window performance of the modesetting Xorg driver with the experimental atomic modesetting kernel patches vs. the xf86-video-amdgpu driver. So, since everything got a bit messy, let me recap the results and add a few more details: -The experimental atomic modesetting kernel patches actually improve the performance for desktop window usage for one aspect: When I open www.vsynctester.com in Chromium and quickly hover the mouse cursor over my system tray to trigger popup windows, this doesn't result in stuttering anymore. The same applies to little text popups (e.g. URLs of links) during regular web browsing. This is the case with both modesetting and xf86-video-amdgpu, window compositing is enabled and 100% free of tearing at the same time. -But there is still stutter on www.vsynctester.com in Chromium (please don't use Firefox for this, it even stutters on MS Windows when doing this...) when I hide and show any other window, e.g. of running Dolphin file browser by clicking its starter icon in the taskbar. It's just the window that is shown and hidden, the program itself continues running all the time. This applies to both modesetting and xf86-video-amdgpu driver. -But when I apply the aforementioned "WIP: modesetting: Use atomic more atomically" patch to Xserver (additionally to the experimental atomic modesetting kernel patches), the modesetting driver is also 100% free of stutter in this situation, while the xf86-video-amdgpu-driver is not. Question is: Can this also be incorporated into the xf86-video-amdgpu driver? This would be a VAST improvement, the stuttering during gamma adjustments imho is not close to being as important. -Now back to the stutter in games when moving the mouse: This is completely untouched by all this. The xf86-video-amdgpu driver always show stuttering in the mentioned games (as long as amdgpu.dc=1), while modesetting and also xwayland don't. Oof, I hope I didn't forget anything. ;)
Comment 21 tempel.julian 2019-06-07 09:03:16 UTC
I'm open to trying out other patches, e.g. concerning double buffering for the cursor. :)
Comment 22 tempel.julian 2019-06-08 17:50:13 UTC
The Witcher 3 is affected as well (a bit less obvious, but still quite bad vs. modesetting or amdgpu.dc=0). So, it seems this is a real dealbreaker for playing games on Linux, which imho justifies to raise this ticket's priority. :(
Comment 23 tempel.julian 2019-06-20 10:40:01 UTC
Any news on this? I'd really like to have this sorted out before I wholeheartedly recommended Navi for Linux gaming. I can imagine that Navi causes a ton of work, but still this issue is painful.
Comment 24 tempel.julian 2019-06-27 15:17:20 UTC
I've mentioned kwin-lowlatency in this ticket: https://bugs.freedesktop.org/show_bug.cgi?id=108917#add_comment It can be used as some kind of workaround for this wine issue, as the stutter doesn't occur when kwin compositing (and thus vsync) is enabled on top of the games' vsync. Of course this is far from being optimal, as 1. It breaks FreeSync. 2. There is an additional backbuffer queue, causing additional input (or perhaps better output) latency. 3. It may introduce additional stutter when framerate drops below refreshrate --- FreeSync situation for wine games seems really terrible with due to this bug. :(
Comment 25 tempel.julian 2019-07-10 09:42:32 UTC
Applying this MR and disabling HW cursor "fixes" the mouse skipping in the menu of Hitman 2 (as there is a cursor visible and thus pageflipping is turned off): https://gitlab.freedesktop.org/xorg/driver/xf86-video-amdgpu/merge_requests/38 But in the actual game, there is no cursor visible and so there is severe stutter again. I also reported the bug to the wine devs (still I think this is rather a bug of xf86-video-amdgpu): https://bugs.winehq.org/show_bug.cgi?id=47428 There I mentioned that setting "MouseWarpOverride = disable" (a wine features to work around/solve mouse issues) fixes the problem for wined3d/gallium nine. However, it does not fix the issue in Hitman 2. The issue in Hitman 2 also is a bit different, as it doesn't seem to have slowdowns regarding the rendering performance, but instead the mouse input rather seems to be partially blocked or discarded. But again: This does not occur without xf86-video-amdgpu or amdgpu.dc=1.
Comment 26 Michel Dänzer 2019-07-10 13:50:06 UTC
(In reply to tempel.julian from comment #25) > I also reported the bug to the wine devs (still I think this is rather a bug > of xf86-video-amdgpu): It's a kernel issue, not an xf86-video-amdgpu one.
Comment 27 tempel.julian 2019-07-10 19:08:33 UTC
(In reply to Michel Dänzer from comment #26) > It's a kernel issue, not an xf86-video-amdgpu one. Thanks for clarifying. I could also reproduce this issue with Doom OpenGL in Steam Play/Proton 4.9. As soon as I move the mouse enough, there are frametime spikes (the red ones in the "total" graph): https://abload.de/img/screenshot_20190710_26sk21.png When I turn off pageflipping in xorg config, the red spikes there are gone: https://abload.de/img/screenshot_20190710_2k7k67.png Luckily, the Vulkan renderer of the game doesn't show the issue. But it once again makes clear that this bug can affect a wide variety of software in Wine.
Comment 28 tempel.julian 2019-07-18 21:30:03 UTC
Situation is still unchanged with latest drm-next-5.4-wip kernel branch from a few minutes ago. :(
Comment 29 tempel.julian 2019-08-10 10:56:53 UTC
The patches by Nicholas are now merged in drm-next-5.4 branch (tested with recent commit that bases the branch on 5.3-rc3), but the mouse input issue in certain games is still unaffected. I was also able to reproduce it with a different system (also with RX 580) which features a 60Hz FreeSync display, it definitely makes FreeSync impossible to use in the aforementioned titles.
Comment 30 Nicholas Kazlauskas 2019-08-21 17:49:58 UTC
(In reply to tempel.julian from comment #29) > The patches by Nicholas are now merged in drm-next-5.4 branch (tested with > recent commit that bases the branch on 5.3-rc3), but the mouse input issue > in certain games is still unaffected. > > I was also able to reproduce it with a different system (also with RX 580) > which features a 60Hz FreeSync display, it definitely makes FreeSync > impossible to use in the aforementioned titles. I still can't reproduce on any setup I've tried. Here is the current setup I have: RX580 1920x1080 @ 144Hz amd-staging-drm-next (6c7a8d5c0772) xf86-video-amdgpu 19.0.1-1 mesa 19.1.4-1 plasma-desktop 5.16.4-1 There are no spikes in the DXVK overlay in Hitman 2 when moving the mouse and no noticeable jitter in input. I am using plasma with the compositor enabled, tearing prevention auto and fullscreen redirection allowed. PageFlip is enabled in xf86-video-amdgpu. I do see stutters in the graph when moving the cursor over the mission tiles on the menu, but this is GFX stuttering - not display. Moving the cursor at the top of the screen in the menu produces no stuttering. FreeSync isn't active since this is DXVK.
Comment 31 tempel.julian 2019-08-21 18:37:42 UTC
Created attachment 145117 [details] new dmesg log with staging-drm-next kernel default parameters
Comment 32 tempel.julian 2019-08-21 18:40:25 UTC
Created attachment 145118 [details] issue demonstrated with D9VK frametime graph in game Oblivion
Comment 33 Nicholas Kazlauskas 2019-08-21 18:48:13 UTC
I haven't seen anything like the video in my testing. It also doesn't seem to happen every time so I'm wondering if something else is going on in the background that's issuing atomic commits. Do you mind posting the relevant portion of a dmesg log after running the following: echo 0x52 > /sys/module/drm/parameters/debug This will generate a ton of debug information and will likely kill your performance, but if you can reproduce the portion of the frametime graph with the red spikes and post that part of the log that would help.
Comment 34 tempel.julian 2019-08-21 18:53:26 UTC
Thank you for being still with me on this. I've downgraded to stock packages provided by Arch stable repository, which is: xorg-server 1.20.5 xf86-video-amdgpu 19.0.1 mesa 19.1.4 stock (read: no) xorg config no custom kernel parameters (except of of disabling intel_pstate, see new dmesg.log attached) And I've also installed amd-staging-drm-next (6c7a8d5c0772) just like you. But: The issue is unchanged. To further illustrate the issue, I've recorded a capture of it in the game TESV: Oblivion in D9VK (no difference to WineD3D or Gallium Nine regarding this issue). The capturing process in OBS Studio via Xcomposite breaks pageflipping, but I can turn it on again via TearFree which I enable via hotkey on the fly. The result 100% matches running the game with modesetting DDX or amdgpu.dc=0 (no spikes) vs. xf86-video-amdgpu + amdgpu.dc=1 (nasty spikes). Hitman 2 is a bit different, as it doesn't show render spikes for me either (I think I was first mistaken regarding that difference, sorry for the confusion), but the mouse input just blocks/skips heavily and is even more unplayable than Oblivion/Skyrim etc. I was just writing this while I read your new reply. I'll gladly try what you have suggested.
Comment 35 tempel.julian 2019-08-21 19:17:04 UTC
Created attachment 145119 [details] debug dmesg.log
Comment 36 tempel.julian 2019-08-21 19:18:18 UTC
The log now starts at [ 2788.164016], I hope nothing important is cut out. Else I'd have to recheck my log size limits.
Comment 37 Nicholas Kazlauskas 2019-08-21 19:58:57 UTC
Are you running a color management tool in the background? The difference between my setup and yours is that there isn't anything locking the connectors hundreds of times per second and performing full updates. This is confirmed in your log by lines like the following: [ 2788.165907] [drm:drm_atomic_add_affected_connectors [drm]] Adding all current connectors for [CRTC:47:crtc-0] to 00000000acb155e9
Comment 38 tempel.julian 2019-08-21 20:26:19 UTC
Huh, that seems suspicious. I'm not aware of any such a tool which would be active for me all the time. I had redshift installed, but it wasn't running in background, so not surprisingly uninstalling it and restarting the system didn't help. There is colord running, but killing its process didn't help either. So I'm a bit clueless how this comes. If I knew how, I'd try to turn it off. Could you try manually disabling KWin compositing via Shift + Alt + F12 before starting Hitman 2? I just tricked myself when trying out Xfce instead of Plasma, but the reason why Xfce wasn't showing the issue was that it didn't turn off compositing in fullscreen (despite of an option telling otherwise). After manually turning off compositing, the issue was the same as on Plasma without compositor.
Comment 39 Nicholas Kazlauskas 2019-08-22 14:44:04 UTC
Disabling the compositor doesn't make a difference as far as stuttering goes for Hitman 2's DXVK - I don't see any commits in the log that are lock the connector and all the planes. I don't have Oblivion on my machine to test, but I tried running the DX9 version of Heaven under proton and I don't see stuttering or any gamma/color adjustment commits under that either. No issues with FreeSync when running it either from what I can tell with vsync both on/off. Those commits are definitely what's causing your stuttering, but I'm not sure what's actually creating them. My initial guess was something in the compatibility layer for DX9 games, but I don't see that on my setup. Is it only Oblivion that has this issue for you? I'm not sure how much of this can be a kernel level fix - I think we need to lock all the planes whenever gamma or color adjustments have been made and that probably includes the cursor plane as well. If the cursor plane is included that will block asynchronous cursor updates from occurring until the color adjustments have been done. This is why the cursor causes stuttering. A check could potentially be made to not lock all the planes for redundant color management commits, but I'm not sure if the color adjustments requested are redundant or not. It could be the case that the application is requesting different color adjustments every single time.
Comment 40 tempel.julian 2019-08-22 16:21:07 UTC
Created attachment 145137 [details] video demonstrating the issue in Hitman 2 (different to Oblivion/Skyrim)
Comment 41 tempel.julian 2019-08-22 16:40:59 UTC
Created attachment 145138 [details] debug dmesg.log after running Hitman 2 with the issue
Comment 42 tempel.julian 2019-08-22 17:00:27 UTC
(In reply to Nicholas Kazlauskas from comment #39) > Disabling the compositor doesn't make a difference as far as stuttering goes > for Hitman 2's DXVK - I don't see any commits in the log that are lock the > connector and all the planes. Thanks for trying! > I don't have Oblivion on my machine to test, but I tried running the DX9 > version of Heaven under proton and I don't see stuttering or any gamma/color > adjustment commits under that either. No issues with FreeSync when running > it either from what I can tell with vsync both on/off. I've given Heaven a try, it doesn't show the issue for me either (DX9 via D9VK). > Those commits are definitely what's causing your stuttering, but I'm not > sure what's actually creating them. My initial guess was something in the > compatibility layer for DX9 games, but I don't see that on my setup. I've attached a debug dmesg log after triggering the issue in Hitman 2. It is also what is shown in the new video capture I've provided. As you can see, there are no rendering spikes, but instead the mouse input (and perhaps partially also keyboard) seems to be discarded a lot, causing such jumps. This is also there without vsync enabled, but less obvious. Just like the render spikes in Oblivion/Skyrim, this issue completely disappears by turning off pageflipping in xorg config, switching to modesetting DDX or disabling atomic modesetting via amdgpu.dc=0. I wonder if the log confirms that it's the same issue (or the issue has the same roots)? > Is it only Oblivion that has this issue for you? I found out that also the native OpenGL renderer of Doom 2016 (which also has a free demo on Steam) shows the same behavior as Oblivion/Skyrim, despite of no 3D API wrapper involved. For whatever reason, the Vulkan renderer of the game doesn't show the issue, it seems to run flawlessly with both pageflipping + vsync. > I'm not sure how much of this can be a kernel level fix - I think we need to > lock all the planes whenever gamma or color adjustments have been made and > that probably includes the cursor plane as well. If the cursor plane is > included that will block asynchronous cursor updates from occurring until > the color adjustments have been done. This is why the cursor causes > stuttering. Would it be possible to provide a test patch that completely blocks any gamma adjustment either in Xorg or the kernel? Then we'd have ultimate proof. :) > A check could potentially be made to not lock all the planes for redundant > color management commits, but I'm not sure if the color adjustments > requested are redundant or not. It could be the case that the application is > requesting different color adjustments every single time. It seems some suboptimal behavior of Wine can trigger this issue, but I suppose it would automatically be fixed together with this issue which I reported regarding gamma adjustment performance and atomic modesetting: https://bugs.freedesktop.org/show_bug.cgi?id=108917 I btw. can reproduce that issue by simply booting Fedora 30 Workstation Gnome Live and enable the nightlight feature, the color grading phase makes everything stutter.
Comment 43 Nicholas Kazlauskas 2019-08-22 17:09:55 UTC
Created attachment 145139 [details] [review] 0001-drm-amd-display-Test-patch-for-disabling-color-adjus.patch From your video it looks like something is issuing a lot of full updates. Those are slow enough that they can miss the current vblank window and be forced to wait until the next one with vsync on. I've attached a debug patch you can try that should disable color adjustments from triggering full updates. I've also added some debug information to know when full updates are being issued in case it was something other than color management. You can view that output with log level 4, ie. echo 0x4 > /sys/module/drm/parameters/debug
Comment 44 tempel.julian 2019-08-22 18:14:45 UTC
I applied the patch to linux 5.2 (among 0001-drm-amd-display-Allow-fast-updates-again-for-swappin.patch) and as expected, gamma adjustments have stopped working. Unfortunately, the games still show the issue. Should the debug information be contained in dmesg? After doing echo 0x4 > /sys/module/drm/parameters/debug and starting Hitman 2, there don't seem to be any comprehensive debug information inside the dmesg log, at least not to my layman eyes (attaching right now nonetheless).
Comment 45 tempel.julian 2019-08-22 18:15:35 UTC
Created attachment 145140 [details] new dmesg log with debug patch applied after starting Hitman 2
Comment 46 tempel.julian 2019-08-25 15:44:59 UTC
Created attachment 145152 [details] new debug dmesg log saved after running Oblivion with drm-next kernel
Comment 47 tempel.julian 2019-08-25 15:48:05 UTC
I got a new 1440p 144 Hz FreeSync display, and as expected, the issue is unchanged with it. With it, I've created a new debug dmesg log for render stutter in Oblivion, this time with your patch applied to drm-next kernel. Perhaps this could be interesting? [ 529.556752] [drm:drm_mode_addfb2 [drm]] [FB:79] [ 529.557106] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] VRR packet update: crtc=47 enabled=1 state=3 [ 529.557164] [drm:dc_commit_updates_for_stream [amdgpu]] debug: full update issued [ 529.564401] [drm:drm_mode_addfb2 [drm]] [FB:86] [ 531.420971] [drm:drm_mode_addfb2 [drm]] [FB:95] [ 531.459067] [drm:drm_mode_addfb2 [drm]] [FB:96] [ 544.144771] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] VRR packet update: crtc=47 enabled=0 state=2 [ 544.145961] [drm:dc_commit_updates_for_stream [amdgpu]] debug: full update issued [ 544.169447] [drm:drm_mode_addfb2 [drm]] [FB:79] [ 544.172953] [drm:drm_mode_addfb2 [drm]] [FB:94]
Comment 48 Nicholas Kazlauskas 2019-08-26 12:39:37 UTC
(In reply to tempel.julian from comment #47) > I got a new 1440p 144 Hz FreeSync display, and as expected, the issue is > unchanged with it. > > With it, I've created a new debug dmesg log for render stutter in Oblivion, > this time with your patch applied to drm-next kernel. > > Perhaps this could be interesting? > > > [ 529.556752] [drm:drm_mode_addfb2 [drm]] [FB:79] > [ 529.557106] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] VRR packet > update: crtc=47 enabled=1 state=3 > [ 529.557164] [drm:dc_commit_updates_for_stream [amdgpu]] debug: full > update issued > [ 529.564401] [drm:drm_mode_addfb2 [drm]] [FB:86] > [ 531.420971] [drm:drm_mode_addfb2 [drm]] [FB:95] > [ 531.459067] [drm:drm_mode_addfb2 [drm]] [FB:96] > [ 544.144771] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] VRR packet > update: crtc=47 enabled=0 state=2 > [ 544.145961] [drm:dc_commit_updates_for_stream [amdgpu]] debug: full > update issued > [ 544.169447] [drm:drm_mode_addfb2 [drm]] [FB:79] > [ 544.172953] [drm:drm_mode_addfb2 [drm]] [FB:94] This is normal behavior for toggling into and out of VRR. I was expecting to see was a log with hundreds of full updates issued, but since this isn't the case I think it's something more fundamental with vblank timing though I'm still not quite sure why it I can't reproduce it in my testing.
Comment 49 tempel.julian 2019-08-26 16:26:45 UTC
Instead of my Arch installation, I tried a fresh Fedora 30 Workstation Gnome installation. It shows the same behavior, so I think we can rule out a packaging issue. Too bad, this issue makes me boot Windows much more often. It's not really a recommendable state. :(
Comment 50 tempel.julian 2019-08-29 12:54:05 UTC
Are we already out of options for debug output? :)
Comment 51 Nicholas Kazlauskas 2019-08-29 13:03:53 UTC
(In reply to tempel.julian from comment #50) > Are we already out of options for debug output? :) Might help to see what IOCTLs are being specifically called by userspace. I think you can enable that log with: echo 0x3f > /sys/module/drm/parameters/debug
Comment 52 tempel.julian 2019-08-29 13:23:38 UTC
Created attachment 145207 [details] dmesg ioctl log
Comment 53 tempel.julian 2019-08-29 13:26:37 UTC
Hm, it seems that maximum log size isn't enough for even one whole second?
Comment 54 tempel.julian 2019-08-29 13:35:32 UTC
Not sure if it was helpful, but I tried setting log_buf_len=131072 and ran sleep 5s && dmesg in background while I was provoking the issue in Oblivion (dmesg-ioctl_2.log).
Comment 55 tempel.julian 2019-08-29 13:36:06 UTC
Created attachment 145208 [details] hopefully extended dmesg ioctl log
Comment 56 Nicholas Kazlauskas 2019-08-29 13:43:01 UTC
It seems to be the frequent calls to DRM_IOCTL_MODE_SETPROPERTY that's causing the issue but I'm not entirely sure what specifically it's trying to set that's doing this. This probably isn't triggering anything that really needs hardware programming but it will end up waiting for all outstanding atomic commits to finish which would likely explain the stuttering. Does this patch improve the issue? https://patchwork.freedesktop.org/patch/309217/?series=61778&rev=1
Comment 57 tempel.julian 2019-08-29 14:06:48 UTC
Unfortunately unchanged :( . New log:
Comment 58 tempel.julian 2019-08-29 14:07:19 UTC
Created attachment 145209 [details] new ioctl log with patch applied
Comment 59 tempel.julian 2019-08-31 10:28:00 UTC
Created attachment 145222 [details] debug dmesg output for patch applied to drm-next
Comment 60 tempel.julian 2019-08-31 10:29:53 UTC
I've applied your patch to current drm-next branch head (tag drm-next-5.4-2019-08-30), situation is still unchanged. I've also attached a new log with it. This time I ran Oblivion with Gallium Nine statetracker instead of D9VK.
Comment 61 tempel.julian 2019-09-01 12:23:58 UTC
Did I get it right that modesetting DDX and xwayland aren't affected by this problem because xf86-video-amdgpu technically differs in some substantial aspects? Apart from FreeSync (which is triggered by pageflipping, thus disabling pageflipping doesn't help at all as a workaround), it would be bearable if modesetting DDX was usable. But then there is this bug, which forces the user to play everything with vsync, which is not always desirable either: https://gitlab.freedesktop.org/xorg/xserver/issues/629
Comment 62 tempel.julian 2019-09-02 13:41:32 UTC
I could reproduce the issue on a system with a Radeon RX 5700 XT Navi 10 GPU + drm-next kernel in Hitman 2. Really devastating.
Comment 63 Michel Dänzer 2019-09-03 16:15:52 UTC
Can you find out which property is getting set? If there's no (easy) way to get that out of the kernel, one possibility is to (from another machine via SSH) attach gdb to the Xorg process while an affected app is running, set breakpoints in drmModeConnectorSetProperty and drmModeObjectSetProperty, and get backtraces of where they're getting called from.
Comment 64 tempel.julian 2019-09-03 20:43:24 UTC
I can try that. But I really wonder why there are differences between systems showing the issue or not.
Comment 65 tempel.julian 2019-09-04 09:28:03 UTC
@Nicholas Since this commit, the modesetting driver shows the same behavior as xf86-video-amdgpu: https://gitlab.freedesktop.org/xorg/xserver/commit/f0d78b47ac49977a6007f5fe081f00c6eb19a12e So, now only xwayland isn't affected. But I think this can very well be just because Wayland compositors yet don't support turning compositing off in fullscreen. We might see the same effect as on Xorg, where keeping some compositors enabled seems to "fix" the issue (at the cost of high input latency, other performance issues and non-functional FreeSync).
Comment 66 Michel Dänzer 2019-09-04 15:14:13 UTC
(In reply to tempel.julian from comment #65) > Since this commit, the modesetting driver shows the same behavior as > xf86-video-amdgpu: > https://gitlab.freedesktop.org/xorg/xserver/commit/ > f0d78b47ac49977a6007f5fe081f00c6eb19a12e Hmm, could the property update be part of the legacy => atomic compatibility code in the kernel? > So, now only xwayland isn't affected. But I think this can very well be just > because Wayland compositors yet don't support turning compositing off in > fullscreen. They still use page flipping though, similar to TearFree. Which Wayland compositor(s) have you tried?
Comment 67 tempel.julian 2019-09-04 20:34:36 UTC
(In reply to Michel Dänzer from comment #66) > They still use page flipping though, similar to TearFree. Which Wayland > compositor(s) have you tried? I have tried a Wayland Gnome and Plasma session, in both cases the render stutter of Oblivion and the "mouse blocking" of Hitman 2 are gone. But: As I have stated earlier, this can also be achieved on Xorg with Xfce's xfwm and kwin-lowlatency (a KWin fork with different vsync implementation) compositing. They also trigger pageflipping and thus achieve a tearig free result. I would guess that having that extra vsync + extra backbuffer queue of a compositor can avoid that "collision" of pageflipping + mouse input, in which case we can't be sure if Wayland really isn't affected.
Comment 68 tempel.julian 2019-09-06 12:08:09 UTC
I did it, but it stopped after hitting the two breakpoints the first time without me having moved the mouse at all. I suppose this isn't enough? Would it be possible to provide me with a short hint how to let it run longer? I'd have to read quite some doc pages otherwise. Anyhow, here's the log.
Comment 70 Michel Dänzer 2019-09-06 13:53:31 UTC
(In reply to tempel.julian from comment #68) > I did it, but it stopped after hitting the two breakpoints the first time > without me having moved the mouse at all. I suppose this isn't enough? Would > it be possible to provide me with a short hint how to let it run longer? Enter "continue" (or just "c") at the gdb prompt to continue, and wait for a few breakpoint hits, to see if they're getting hit from multiple places. Also, make sure debugging symbols are available for /usr/lib/xorg/modules/drivers/amdgpu_drv.so (and ideally also for Xorg and its modules), otherwise the backtraces are useless.
Comment 71 tempel.julian 2019-09-06 15:02:13 UTC
Hope this helps:
Comment 72 tempel.julian 2019-09-06 15:02:50 UTC
Created attachment 145280 [details] 2nd gdb backtrace log, now with debug symbols
Comment 73 Michel Dänzer 2019-09-06 16:41:42 UTC
Looks like some client repeatedly calls XForceScreenSaver (probably to prevent the monitors from blanking), which results in the DPMS property getting re-set over and over. Nicholas, maybe the kernel could ignore such no-op property "updates"?
Comment 74 Nicholas Kazlauskas 2019-09-06 16:45:43 UTC
(In reply to Michel Dänzer from comment #73) > Looks like some client repeatedly calls XForceScreenSaver (probably to > prevent the monitors from blanking), which results in the DPMS property > getting re-set over and over. Nicholas, maybe the kernel could ignore such > no-op property "updates"? Even if it's no-op update it's still going to be locking the connector and potentially blocking other commits from occurring so ideally the client userspace wouldn't be dong this. I can try writing up a patch that doesn't lock everything if the connector state hasn't change at the very least but I'm not sure if it will fully address the issue or not.
Comment 75 tempel.julian 2019-09-06 17:38:49 UTC
Is it possible that Wine or the affected programs in Wine are the clients that are at fault for this? As it happens with my Arch Plasma setup and also a fresh Fedora Gnome installation, it seems impossible to identify the "culprit" for me.
Comment 76 Michel Dänzer 2019-09-09 14:06:36 UTC
(In reply to tempel.julian from comment #75) > Is it possible that Wine or the affected programs in Wine are the clients > that are at fault for this? Certainly. It looks like the intention is to prevent the monitors from entering power saving mode.
Comment 77 tempel.julian 2019-09-09 15:12:57 UTC
(In reply to Michel Dänzer from comment #76) > Certainly. It looks like the intention is to prevent the monitors from > entering power saving mode. I turned DPMS off in Xorg config which leaves the issue unchanged. Is this to be expected? I'll open a ticket for the issue on the Wine tracker, but I'd still be happy to try out any patch for Xorg or kernel.
Comment 78 Michel Dänzer 2019-09-09 15:29:21 UTC
(In reply to tempel.julian from comment #77) > I turned DPMS off in Xorg config which leaves the issue unchanged. Is this > to be expected? Yeah, this isn't directly related to X's DPMS functionality.
Comment 79 Pierre-Eric Pelloux-Prayer 2019-09-09 15:44:05 UTC
(In reply to Michel Dänzer from comment #76) > (In reply to tempel.julian from comment #75) > > Is it possible that Wine or the affected programs in Wine are the clients > > that are at fault for this? > > Certainly. It looks like the intention is to prevent the monitors from > entering power saving mode. Maybe the code from this wine patch https://www.winehq.org/pipermail/wine-devel/2018-July/129014.html ("winex11.drv: Wake up the display on user input") is causing this? (adding a breakpoint in wine for the XResetScreenSaver function should confirm)
Comment 80 tempel.julian 2019-09-09 17:53:08 UTC
Oh my. I've tried the oldest Proton build offered by Steam, which is based on Wine 3.7, and indeed it doesn't show the issue (neither in Oblivion with wined3d OGL nor Hitman 2 DXVK Vulkan). I tried with the older Proton 3.16 version before, which unfortunately is the one that started showing the issue. I don't know why I didn't try 3.7 in the first place. I'm sorry for the time you have invested into this issue. :( Though it would appear to me that the old non-atomic DC is very resilient toward such issues. To be sure, I also tested with untouched Arch 5.2.13 kernel: Without that Wine commit, it is totally free of that stutter issue as well. Pierre-Eric, I reverted that commit c6b6935bb433dbbd30f5ba122a7c45ad3a2d6eed, and indeed, it introduced this issue. Should I create a Wine bug ticket for this? Closing now. Big thanks @ Michel and Nicholas for their great support. AMD's kernel and Linux windowing support is simply outstanding, a dam good reason to stay with team (former) red. :)
Comment 81 Andrew Eikum 2019-09-17 15:06:48 UTC
(In reply to Nicholas Kazlauskas from comment #74) > (In reply to Michel Dänzer from comment #73) > > Looks like some client repeatedly calls XForceScreenSaver (probably to > > prevent the monitors from blanking), which results in the DPMS property > > getting re-set over and over. Nicholas, maybe the kernel could ignore such > > no-op property "updates"? > > Even if it's no-op update it's still going to be locking the connector and > potentially blocking other commits from occurring so ideally the client > userspace wouldn't be dong this. > I've submitted a patch to Wine to throttle our calls to XResetScreenSaver to once every five seconds: https://source.winehq.org/patches/data/169958 However, I'd argue our previous behavior isn't obviously wrong. What we were doing was calling XResetScreenSaver each time we received user input on a joystick in order to delay the screensaver coming up despite no keyboard/mouse input. It's not obvious that this behavior was incorrect. Are you sure you shouldn't be able to handle this behavior from a client?
Comment 82 tempel.julian 2019-09-17 17:20:28 UTC
(In reply to Andrew Eikum from comment #81) > I've submitted a patch to Wine to throttle our calls to XResetScreenSaver to > once every five seconds: https://source.winehq.org/patches/data/169958 > > However, I'd argue our previous behavior isn't obviously wrong. What we were > doing was calling XResetScreenSaver each time we received user input on a > joystick in order to delay the screensaver coming up despite no > keyboard/mouse input. It's not obvious that this behavior was incorrect. Are > you sure you shouldn't be able to handle this behavior from a client? Thank you. Subjectively, I'd say it completely solves the issue. I don't see any noteworthy spikes in frametime graphs and I haven't noticed any suspicious stutter/blocking until now (will do some more tests to be sure). I'd be very happy if this could be included in 4.17 to increase chances that it gets picked up by one of the next Proton releases. It'd be interesting to know at what calling rate the display driver starts to have issues.
Comment 83 tempel.julian 2019-09-17 20:58:10 UTC
Andrew, with your patch the issue is still there in a weak shape: When I open the inventory in TES IV Oblivion, don't move the cursor for some seconds and then move it again, there is always a frametime spike: https://abload.de/img/screenshot_20190917_2f4jds.png It's not there with XResetScreenSaver calls completely disabled in Wine (by reverting the commit mentioned by Pierre-Eric). It appears 100% reproducible.
Comment 84 Andrew Eikum 2019-09-18 16:55:57 UTC
If one call every five seconds is causing a problem, surely it's not Wine's bug, right?
Comment 85 tempel.julian 2019-09-18 17:54:49 UTC
Thanks for the new Proton release including your fix (didn't realize you are the release manager :) ). As far as I can judge until now, the result is good enough in practice. There is this this reproducible spike, but moving the camera while in game seems free of stutter (a bit hard to judge, since the game is very stuttery in general). It's definitely not the only case where atomic modesetting is hypersensitive, I wouldn't be surprised if it required some kind of rather comprehensive restructuring to make it more resilient in general. E.g. there is still this aforementioned problem with gamma adjustments: https://bugs.freedesktop.org/show_bug.cgi?id=108917 And there is also weak stutter each time when other windows on Xorg receive focus. All those issues where non-existent with old legacy DC (amdgpu.dc=0 on Polaris and older). Intel probably is affected the same.
Comment 86 tempel.julian 2019-09-18 19:10:59 UTC
While the Gallium performance overlay clearly shows a spike in the game's GPU render time, I'm fairly certain it doesn't exist in the actual display output. It's not limited to the character inventory, but instead happens anywhere, even in the main menu. But I neither can see the cursor skipping, nor the camera in game, despite of the graph's spike. I also noticed that there is no such spike when the framerate is below the refreshrate. I also tested if there's something similar to observe in Hitman 2: I reduced visual fidelity to achieve constant 75fps/Hz with vsync, and it always seems to be 100% free of stutter, also when resuming to move the mouse after suspending it for a few seconds.