Summary: | Black screen when reconnecting display | ||
---|---|---|---|
Product: | DRI | Reporter: | Bernd Steinhauser <linux> |
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | mario.kleiner |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Bernd Steinhauser
2016-01-17 21:05:07 UTC
Please attach your xorg log and dmesg output. Created attachment 121189 [details]
dmesg output
Output by dmesg up to the point where the screen was switched off and on again, including a VT change (working around the problem) afterwards.
Created attachment 121190 [details]
Xorg log
Corresponding X log.
(In reply to Bernd Steinhauser from comment #0) > The issue goes away when I set > Option "DRI" "2" I think I have to revert myself here. I've seen this with DRI2 as well. I have yet to find out what changed. Does this only happen with Option "TearFree"? Only with a 4.4 kernel? It sounds like it could be the problem discussed in http://lists.freedesktop.org/archives/dri-devel/2016-January/098823.html and followups. P.S. How does this relate to bug 90987? (In reply to Michel Dänzer from comment #5) > Does this only happen with Option "TearFree"? Only with a 4.4 kernel? No, it doesn't seem to be related to the Pageflip option either. > > It sounds like it could be the problem discussed in > http://lists.freedesktop.org/archives/dri-devel/2016-January/098823.html and > followups. Could be, yes. However, I think that I had problems with linux 4.2 as well. 4.3 for sure. I bought this screen (the Eizo) in September and since 4.2 came out in August, this seems likely. I will try it out, though during my next test run tomorrow. > > P.S. How does this relate to bug 90987? Not sure. The difference is, that in this one, only that single screen misbehaves, in bug 90987, all of them do. I can reproduce both individually. Also, while this one is fixable by switching to VT2 and back, bug 90987 seems only fixable by restarting X (or at least I didn't find another way yet). I'm not a 100% sure, but quite certain, that I couldn't reproduce this bug with kernel 4.2.5, so I think it's at least related to the linked email. I also was able to reproduce this in such a way, that all screens go black. From the description of the option, I think that this was because I didn't have TearFree set. So I'm not sure anymore if it is really a separate bug or a dupe of 90987. Could be that the options just change the buffer behavior in such a way that it can either be fixed by switching to VT2 or not. If it's the same, I would have been able to reproduce with 4.2.5, with the effect described in bug 90987. (In reply to Bernd Steinhauser from comment #7) > I'm not a 100% sure, but quite certain, that I couldn't reproduce this bug > with kernel 4.2.5, so I think it's at least related to the linked email. The problem referenced there was only introduced in 4.4, so if you can reproduce this with 4.3, it's probably a different issue; can you bisect in that case? (In reply to Michel Dänzer from comment #8) > (In reply to Bernd Steinhauser from comment #7) > > I'm not a 100% sure, but quite certain, that I couldn't reproduce this bug > > with kernel 4.2.5, so I think it's at least related to the linked email. > > The problem referenced there was only introduced in 4.4, so if you can > reproduce this with 4.3, it's probably a different issue; can you bisect in > that case? Will try, but it could take a while, don't know if I find time for that during the weekend. Ok, I'm pretty sure I tracked it down: commit 5b5561b3660db734652fbd02b4b6cbe00434d96b Author: Mario Kleiner Date: Wed Nov 25 20:14:31 2015 +0100 drm/radeon: Fixup hw vblank counter/ts for new drm_update_vblank_count() (v2) Obviously the same (or similar) commit did go into amdgpu which I didn't test as it is marked experimental for Kaveri. The rev before that didn't show the issue, with that rev, it occured. To be sure I also tested 4.4.1 vanilla and with the commit reverted. For vanilla reconnecting the screen resulted in a black screen as described. With the patch reverted, it did not happen. Ok, I compiled a kernel with amdgpu enabled and radeon disabled. ddx is now xf86-video-amdgpu-scm. So far I have not seem this issue with amdgpu, even though the corresponding version of that commit is still applied. (commit 8e36f9d33c134d5c6448ad65b423a9fd94e045cf) So at least it doesn't happen as often as with radeon (during my last tests, I've seen it every time, but I am not sure it happens always). I will try it for a couple of days to see if it does occur or not. Should note, I set the same options for amdgpu as for radeon: Section "Device" Identifier "AMDGPU" Driver "amdgpu" Option "TearFree" "On" Option "DRI" "3" EndSection (In reply to Bernd Steinhauser from comment #12) > Should note, I set the same options for amdgpu as for radeon: > Section "Device" > Identifier "AMDGPU" > Driver "amdgpu" > Option "TearFree" "On" > Option "DRI" "3" > EndSection Hi, i just cc'ed you on a series of patches with fixes for Linux 4.4 and later. Can you try if applying those on top of 4.4 helps? At least it should remove the problems discussed in http://lists.freedesktop.org/archives/dri-devel/2016-January/098823.html and maybe this is related. Documenting some progress made "outside" the bug reporter. So attached a proposed patch. And a trimmed and annotated syslog output from some debugging with Bernd's help: > On 15/02/16 19:34, Mario Kleiner wrote: >> Bernd, can you apply the attached patch on top of the others and test >> with >> drm.debug=35 again, and ideally provide syslog output with the >> microsecond >> timestamps included? This adds some debug statements to see if >> something goes >> wrong in radeon_flip_work_func on radeon-kms. > Sure, I tried that, but I ran into this problem: > Feb 16 21:21:17.870174 orionis systemd-journald[618]: Missed 3 kernel > messages > > The log is spammed with these messages. So I did some research and was > pointed to the kernel parameter log_buf_leng, which I set to 16M > (actually tried values up to 128M), but that didn't help. > > I'll attach the log anyway, but if you tell me how to get rid of that > issue, I can of course give it another go. > For journald, I set RateLimitBurst=0, which should prevent it from not > accepting messages due to the log spam. > > I also did another experiment. The Dell U2415 is normally connected to > HDMI-0. > I connected that to DP-0 (with the Eizo disconnected) and found out, > that now the Dell shows the same behavior as the Eizo. When disconnected > and reconnected, the screen will be black. > If I turn on the DP1.2 setting for the Dell, it will do so even when > switched off, most likely because the connection is cut with DP1.2 > turned on (really really annoying behavior, that's also the reason why I > have so many problems with the Eizo, but for that DP1.2 cannot be > switched off). > > Thus, it doesn't seem like the problem is related to specific display > hardware. > If you'd like, I can test the HP ZR24W as well (normally connected to DVI). > > Best Regards, > Bernd Thanks, that one was useful. Can you revert the debug patch i sent you last and instead apply the attached patch and retest? Maybe also remove the patch "[PATCH 1/2] drm/radeon: Make vbl counter/ts queries robust against dpms on/off. [RFC]", so your tree corresponds more closely to what is in 4.4/4.5rc. This one is a proposed fix for the problem, also for stable 4.4. Seems to be that when the connection gets cut while a pageflip gets queued by the userspace driver, the radeon-kms driver does DPMS OFF as part of its hotplug work function -- apparently only for DP displays, if i understand the code correctly? Then when radeon_flip_work_func() executes the "wait until real start of vblank" code introduced in Linux 4.4 to fix other regressions, that code executes while the display engine is already disabled and the scanout is no longer moving. That leads the wait code to go into an infinite loop - hence the huge amount of messages in your syslog - flip_work_func hangs -> pageflip hangs -> game over. So the attached patch should make that new wait code robust against such unexpected things as dpms off/on in parallel. Maybe we could manage to also get this into 4.5-rc5, now that the other vblank fixes have landed in Linus tree. I don't know if it is expected behavior that pageflips can be queued by userspace while the display is disabled, or if the ddx shouldn't already prevent that? The log suggests that the ddx got the hot(un)plug event before it tried to page flip anyway in TearFree. Created attachment 121821 [details] [review] First proposed patch to fix this on radeon-kms Created attachment 121822 [details]
Annotated and trimmed kernel log which points to the cause of the problem.
See "-->" for key events.
(In reply to Mario Kleiner from comment #15) > Created attachment 121821 [details] [review] [review] > First proposed patch to fix this on radeon-kms This does work, I no longer observe the problem, thanks. Maybe the reporter of bug 90987 should test the patch as well as after all it wasn't clear if the bug was the same or different (or maybe the patch fixes both bugs). (I know that this was a regression from 4.4, but maybe there was a different way to trigger this before.) Created attachment 121834 [details] [review] Patch for fix on radeon-kms (v2) reviewed and tested. Final patch for Linux 4.4 stable and later. Created attachment 121835 [details] [review] Port of the radeon-kms patch v2 to amdgpu Identical patch for amdgpu. Thanks for the report. Resolving, as Mario's fixes landed long ago. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.