Summary: | Screen regularly turns black, reboot needed | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Vik-T <viktor> | ||||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||||
Status: | RESOLVED MOVED | QA Contact: | |||||||||
Severity: | critical | ||||||||||
Priority: | medium | CC: | harry.wentland, nethershaw, nicholas.kazlauskas, sunpeng.li, taijian | ||||||||
Version: | XOrg git | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Description
Vik-T
2018-09-14 09:44:25 UTC
Please attach the full dmesg output. Created attachment 141560 [details]
Dmesg Output
I hope it's ok if I bump this bug report. It's been almost a month since I reported it and I haven't received any feedback since. Nothing really new I can report as such otherwise. The problem still exists, the driver crashes regularly. I noticed a certain possible correlation with usage: When I'm working on the computer, the problem may happen several times a day. When I'm not touching it, it may take up to 2-3 days for the driver to crash. Sometimes logging on through ssh and restarting the desktop manager (lxdm) helps already. But most of the times, only a reboot solves the issue. @ Vik-T: The way you describe your symptoms let it seem possible to me that you are experiencing the very same long-standing bug that I reported in https://bugs.freedesktop.org/show_bug.cgi?id=102322 If you want to verify if that bug and yours are actually the same, you could try the following: (a) Check whether you experience your bug also after disabling dynamic power management. To do this, switch to manual power management like this: > cd /sys/class/drm/card0/device > echo manual >power_dpm_force_performance_level > echo 0 >pp_dpm_mclk > echo 0 >pp_dpm_sclk In my case, the bug does not occur while clocks are set manually. Cave: These settings are ignored/overwritten by the amdgpu driver after each display mode change and each off/on of display output or monitor. So this test has meaning only if manual settings are re-activated after each such display mode change / on-switching. (This bug I reported with https://bugs.freedesktop.org/show_bug.cgi?id=107141 ) (b) You could check if you can reproduce the symptom more quickly with a certain load pattern: (1) Enable dynamic power management (which is also the default) (2) Start X11, but not any client (or desktop environment) that draws anything on the screen (3) Replay an (at least 1080p) video with only 3 frames per second, e.g. via: "mpv --no-correct-pts --fps=3 --ao=null some_arbitrary_video.webm" This kind of load causes (at least in the case of my system) frequent changes to the pp_dpm_mclk and pp_dpm_sclk values, and the system crashes after only a short while (seconds up to 15 minutes) under this kind of load, with the symptom (blanked screen, system crash) you described. @dwagner: Thanks for your comment. I could not reproduce the error with the 3fps 1080p video on naked X. I let it run for 25 minutes without any issues. Besides, unlike yourself, I never experienced any sort of full system crash. I can always shut down and reboot cleanly. As I mentioned, what happens, at random, is that the screens go blank and the log shows the call trace I posted in the first message. But the system never fully crashed so far. I can usually ssh into the box. Sometimes, a restart of the desktop manager solves the issue, but more often a reboot is necessary. @ Vik-T: Thanks for testing, so at least we know it's a different bug that haunts your system. I am able to reproduce this bug report in every detail on my machine. The only difference is that I am never present to directly observe the driver deadlock; it always occurs when I have left the machine idle for at least a few hours. Both tests dwagner proposed yielded negative results. I am attaching dmesg logs from the most recent instance of the problem. Please advise. I run Gentoo, and am able to easily introduce patches into any part of the system for testing. Created attachment 142018 [details]
Trimmed dmesg logs
Created attachment 142022 [details]
Full dmesg logs
I've determined that the deadlock and stack trace found in my dmesg logs is emitted precisely when I attempt to wake the machine's display from sleep by touching the keyboard or mouse, and not before. If I leave the machine on a terminal console instead of in a running X session, the display never sleeps, and the deadlock never occurs. The first instance of the deadlock on my machine occurred during a session following an upgrade of the xf86-video-amdgpu drivers from version 18.0.1 to 18.1.0, and simultaneously, of Mesa to a checkout of the master branch ca. commit 0d495bec25bd7584de4e988c2b4528c1996bc1d0, or approximately 2018-09-26 04:16 UTC. I am attempting now to revert both of these upgrades one at a time in order to determine whether either of them is implicated. Downgrading from x11-drivers/xf86-video-amdgpu-18.1.0 to x11-drivers/xf86-video-amdgpu-18.0.1 has prevented the issue from occurring on my system. (In reply to Matthew Vaughn from comment #11) > Downgrading from x11-drivers/xf86-video-amdgpu-18.1.0 to > x11-drivers/xf86-video-amdgpu-18.0.1 has prevented the issue from occurring > on my system. Can you bisect xf86-video-amdgpu? (In reply to Matthew Vaughn from comment #11) > Downgrading from x11-drivers/xf86-video-amdgpu-18.1.0 to > x11-drivers/xf86-video-amdgpu-18.0.1 has prevented the issue from occurring > on my system. Premature. Does not work consistently. Same thing on Fedora 29. 4.18.16-300.fc29.x86_64; also RX Vega 64. One monitor, connected either via DP or MDP. Someone might find this helpful: I managed to reduce the number of driver crashes considerably by disabling "suspend" and "off" mode in X. Section "ServerFlags" Option "SuspendTime" "0" Option "OffTime" "0" EndSection In the last 5-6 days, the driver crashed only once and I managed to bring X back without rebooting. For me, that's a huge improvement over the situation before where I had to reboot 2-3 times a day. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/525. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.