Since last week I have random freezes with the amdgpu driver (running on Kaveri). Once the issue occurs the display freezes. It's not fixable by switch to VT2 and back. In Xorg.0.log I can find multiple times: [ 92357.021] (WW) AMDGPU(0): flip queue failed: Device or resource busy [ 92357.021] (WW) AMDGPU(0): Page flip failed: Device or resource busy [ 92357.021] (EE) AMDGPU(0): present flip failed No related messages in the journal or dmesg afaics. It does not seem to be related to a specific event (like a video playing), but just happens out of nowhere. I didn't find a way to reproduce it specifically. Possibly related packages that I built in that time: * dev-lang/llvm-scm::arbor 2016-06-11 07:42:19 UTC * dev-lang/llvm-scm::arbor 2016-06-19 07:29:42 UTC * x11-dri/mesa-12.0.0-rc4::x11 2016-06-21 21:40:48 UTC * dev-lang/llvm-scm::arbor 2016-07-02 11:57:34 UTC * dev-lang/clang-scm::arbor 2016-07-02 12:43:00 UTC * dev-lang/llvm-3.8.0-r1::arbor 2016-07-12 20:04:14 UTC * dev-lang/clang-3.8.0::arbor 2016-07-12 20:48:27 UTC * x11-dri/mesa-12.0.0::x11 2016-07-13 04:42:47 UTC * x11-dri/mesa-12.0.1::x11 2016-07-17 14:25:44 UTC * x11-server/xorg-server-1.18.4::x11 2016-07-20 16:06:18 UTC I couldn't get mesa 12 to built with llvm-scm anymore, so I downgraded. Still, I doubt it's related. It's hard to be certain about this, but it could have been a regressing coming with mesa 12 and possibly mesa-12.0.0. I'm pretty sure I haven't seen the freeze before 12.0.0 final, but it's hard to be certain about this with an issue so random. In case it matters, my xorg settings are: Section "Device" Identifier "AMDGPU" Driver "amdgpu" Option "TearFree" "Off" Option "EnablePageFlip" "On" Option "DRI" "3" EndSection IIRC, this is now standard, so nothing special here.
Please attach the Xorg log and dmesg output corresponding to the problem. (In reply to Bernd Steinhauser from comment #0) > * x11-server/xorg-server-1.18.4::x11 2016-07-20 16:06:18 UTC Which version of xorg-server were you using before? Does going back to that fix the problem?
(In reply to Michel Dänzer from comment #1) > Please attach the Xorg log and dmesg output corresponding to the problem. > > (In reply to Bernd Steinhauser from comment #0) > > * x11-server/xorg-server-1.18.4::x11 2016-07-20 16:06:18 UTC > > Which version of xorg-server were you using before? Does going back to that > fix the problem? Before it was 1.18.3 installed in April. I hoped that the update might improve the situation, but it didn't. So I'm pretty sure that the xorg-server update is unrelated.
I noticed that I updated my kernel from 4.6.3 to 4.6.4 on 12th of July, so I thought it could be related and had a little investigation. Then I stumbled across this log, which I think was the first time this happened. This is from journald: Jul 09 08:59:43 orionis kernel: Linux version 4.6.3-amdgpu (root@orionis) (gcc version 5.3.0 (GCC) ) #1 SMP PREEMPT Sat Jun 25 21:20:12 CEST 2016 [...] Jul 09 17:04:08 orionis kernel: [drm:amdgpu_crtc_page_flip] *ERROR* failed to get vblank before flip Jul 09 17:04:09 orionis kernel: [drm:amdgpu_crtc_page_flip] *ERROR* failed to get vblank before flip No idea why in this case I can find some messages in the journal and in the other cases not. Anyway, this means that the origin is not the update mesa-12.0.0-rc4 -> final and also not linux 4.6.3 -> 4.6.4. Also unlikely 4.6.2 -> 4.6.3, since (as you can see above) this was built approx. 2 weeks before and within that amount of time I would surely have experienced the problem. (Had it approx. 8 to 10 times during the last 2 weeks.) Another message I found in a different log is: Jul 24 00:45:17 orionis kernel: [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed Jul 24 00:45:17 orionis kernel: [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed Not sure if it is related. With regards to what package started to bring this up, I'm now almost out of ideas. The only thing left would be kwin/plasma 5. The Update from Plasma 5.6.95 to 5.7.0 was performed on the 5th of July. So, since kwin is what I use as a compositor (and Plasma 5 as a desktop), it might be able that this triggers a bug?
Still looking for the full Xorg log and dmesg output, preferably captured after the problem occurred. Does restarting kwin recover from the hang?
Sorry, missed that request in your post above. dmesg output I don't have available as I didn't have ssh activated when the problem occurred. (now I do) I could attach the journald kernel output if that would be sufficient?
Created attachment 125329 [details] Xorg.0.log
Created attachment 125333 [details] dmesg output dmesg output from the currently running system. Attaching this as I noticed that I do get those vblank/flip messages even now, when I didn't experience the bug (yet).
I noticed that those two lines coincident with a certain event I can trigger: Switching the DP-0 display off (an Eizo EV2455). This leads to a disconnect of the DP connection and that leads (somehow) to the quoted messages about the failed vblank. (I'm not sure if the disconnect is actually a bug in the kernel (as it's a DP1.2 display) or if it's my hardware/mainboard/gpu too old.) However, this disconnect does not lead straight to the freeze. And so far I haven't seen the bug directly after a DP disconnect, but just at some random point.
Created attachment 125433 [details] dmesg output after the freeze I logged into the machine during a freeze and saved the dmesg output. Unfortunately, it doesn't seem to contain additional information.
Since the weekend, I ran kwin without compositing. Since then, I haven't seen this happening, so I think this is a bug that is triggered by kwin when compositing, likely since 4.7.0.
Does explicitly disabling the DP output in the KDE configuration before turning off the monitor avoid the problem?
It does prevent the vblank messages in dmesg, I don't know if it'll prevent the freeze.
One more remark: I've only observed the effect when the OpenGL 3.1 compositing backend in kwin is active. I tested with OpenGL 2 backend over the last week and have not seen this happening since. I should also mention that I've had the egl interface activated, which is not recommended for kwin. I've not had issues with it before, but it could be related, so the next thing I'm testing is glx/OpenGL 3.1 and hope I can narrow this down this way.
Ok, it's not egl, the same happens with glx/OpenGL3.
I tried a few things, but wasn't really able to nail this down. I downgraded to mesa 11.2 to see if that helps, but it does not. However, today I had plasmashell freezing after unlocking the screen. Only plasmashell froze, everything else kept working as expected. I contacted Martin on IRC and he thought it might be related to this. I'll attach the log from the conversation as well as the backtrace. He might be right, because around the time when this happened, I get these messages in dmesg: [88765.431890] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [88765.436865] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [88765.441940] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [88765.446861] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [88765.451865] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [88765.456903] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [89579.510005] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [89579.514998] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [89579.520053] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [89579.525158] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [113833.139104] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [113833.139117] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed [113833.361471] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [113833.361484] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed [113836.962993] [drm:amdgpu_crtc_page_flip] *ERROR* failed to get vblank before flip
Created attachment 126131 [details] irc conversiaton with Martin Grässlin
Created attachment 126132 [details] plasmashell backtrace
(In reply to Bernd Steinhauser from comment #15) > However, today I had plasmashell freezing after unlocking the screen. > Only plasmashell froze, everything else kept working as expected. [...] > [...] I get these messages in dmesg: There are some messages with a timestamp around 89xxx and some with a timestamp around 11383x. Almost 7 hours passed in between, so which group of messages corresponds to the plasmashell freeze? Probably the latter? Those look again like the DP connection is lost. Were you able to determine if explicitly disabling the DP output in the kwin settings avoids the freezes?
Yes, the ones around 11383x. I can't yet be sure about DP, but I'll check again. The problem is that I can't find a way to trigger it, it just happens randomly. The DisplayPort Monitor is my main screen, it would mean I have to work for 1 week or so without it.
Ok, running for approx. 4 days now with DP-0 deactivated and so far didn't spot any problems. Only at the very start, I could find these messages, but that was before running kde: [ 14.404932] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [ 14.404939] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed Still, it's hard to tell for this kind of problem that occurs so randomly. I'll have a search if I have another DP cable, so I can check that.
Ok, so I replaced the DP cable and reenabled the screen. Immediately after that I got these messages in dmesg. Note the time. [338324.267684] [drm:amdgpu_crtc_page_flip] *ERROR* failed to get vblank before flip [338324.489710] [drm:amdgpu_crtc_page_flip] *ERROR* failed to get vblank before flip [338526.834794] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [338526.834801] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed [338526.838652] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [338526.838655] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed After that, no messages (related to the graphics stack) appeared in dmesg so far. However, the X server log is now spammed with messages every few seconds: [338324.859] (WW) AMDGPU(0): flip queue failed: Invalid argument [338324.859] (WW) AMDGPU(0): Page flip failed: Invalid argument [338324.859] (EE) AMDGPU(0): present flip failed [338324.940] (WW) AMDGPU(0): get vblank counter failed: Invalid argument [338324.942] (WW) AMDGPU(0): get vblank counter failed: Invalid argument [338324.942] (WW) AMDGPU(0): flip queue failed: Device or resource busy [338324.942] (WW) AMDGPU(0): Page flip failed: Device or resource busy This started right after activating the DP screen. I guess sooner or later that will result in the freeze that I'm seeing. (I'll upload both dmesg and Xorg.0.log.) So yeah, it seems like this a problem with the DP. Since I don't think that I have two broken DP cables, I guess the problem is somewhere else. If that would help, I can connect one of the other screens via DP and see if that makes a difference.
Created attachment 126285 [details] dmesg after reenabled DP
Created attachment 126286 [details] Xorg.0.log after reenabled DP
I am experiencing what I think may be a similar issue. When my display sleeps, it often does not wake up on keypress. I have to wait anywhere from a few seconds to a few minutes and then have errors in my log like the following [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed I am running Antergos 64-bit with GNOME 3.22.2 on Wayland Kernels 4.8.13 and 4.10.0-rc3-ga121103c9228 AMD FX-8370 Sapphire Fury X
Created attachment 128916 [details] Delayed recovery from display sleep logs
Does https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=daf8809626c0ee7a152f9c34058fc3b43385dd51 help for this?
Thanks, I'm testing it right now on linux 4.16.8. Although I'm not sure if it works as expected, since the display does still seem to disconnect when I turn the screen off. At least the messages in dmesg are gone, so it's definitely different compared to previous tests. Can't say anything about the freezes without extensive testing, though.
(In reply to Bernd Steinhauser from comment #27) > > Although I'm not sure if it works as expected, since the display does still > seem to disconnect when I turn the screen off. AFAIK that's either a monitor or general DisplayPort issue. The drivers can't prevent it but have to cope with it.
(In reply to Michel Dänzer from comment #28) > (In reply to Bernd Steinhauser from comment #27) > > > > Although I'm not sure if it works as expected, since the display does still > > seem to disconnect when I turn the screen off. > > AFAIK that's either a monitor or general DisplayPort issue. The drivers > can't prevent it but have to cope with it. Quite possible. I've seen such behaviour on Windows as well on some displays. Don't really get it, it's very annoying if your windows are rearrange just because you turned off a display to save some power. Anyway back to topic: [595475.710884] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [595475.710902] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed I do still get those messages sometimes, but at least I didn't experience any lockups or freezes.
note, experiencing the same (or at least similar) issues -- my story is bug'd here: * https://bugs.freedesktop.org/show_bug.cgi?id=107560
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/80.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.