Summary: | Distorted colours after suspend / resume cycle | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Jan-Marek Glogowski <glogow> | ||||||
Component: | Driver/Radeon | Assignee: | xf86-video-ati maintainers <xorg-driver-ati> | ||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||
Severity: | normal | ||||||||
Priority: | medium | CC: | felix.schwarz | ||||||
Version: | 7.7 (2012.06) | ||||||||
Hardware: | x86-64 (AMD64) | ||||||||
OS: | Linux (All) | ||||||||
URL: | https://bugs.launchpad.net/bugs/1643843 | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
Jan-Marek Glogowski
2016-11-23 16:42:51 UTC
Created attachment 128167 [details]
Broken unity login screen after resume
I can't bisect the problem between v4.0 and v4.1, because for the two kernels I build in this range, suspend was broken and the PC didn't wake up from 2nd suspend, so I don't have a way to reproduce the bug :-( I'm thinking about building the radeon driver only as an external module. An I just saw it'S possibe to include a path in the bisect, which might help to bisect just changes in drivers/gpu/drm/radeon/… I also tried 4.9-rc6 (albeit on an Arch Linux), which shows the same bug. Maybe we need to add a radeon_crtc_load_lut call somewhere in the resume path. Just finished my directory based bisecting, which came to the same conclusion $ git bisect log # bad: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1 # good: [39a8804455fb23f09157341d3ba7db6d7ae6ee76] Linux 4.0 git bisect start 'v4.1' 'v4.0' 'drivers/gpu/drm/radeon/' # bad: [9e87e48f8e5de2146842fd0ff436e0256b52c4a9] Merge tag 'drm-intel-next-2015-03-27-merge' of git://anongit.freedesktop.org/drm-intel into drm-next git bisect bad 9e87e48f8e5de2146842fd0ff436e0256b52c4a9 # bad: [c6d2ac2c36f80b8be15d47a8da6fca803a432e1c] drm/radeon: add get_allowed_info_register for r6xx/r7xx git bisect bad c6d2ac2c36f80b8be15d47a8da6fca803a432e1c # bad: [296deb7167b960d935025de770f3e3c6c2998fbd] drm/radeon/rv7xx/eg: implement get_current_sclk/mclk git bisect bad 296deb7167b960d935025de770f3e3c6c2998fbd # good: [a1dcc2778b682361351a369652b66dd2d66cf1d9] drm/radeon: setup quantization_range in AVI infoframe git bisect good a1dcc2778b682361351a369652b66dd2d66cf1d9 # bad: [d7dbce09b61dbd8c00ea401a2dc734193309cb91] drm/radeon/dpm: add new callbacks to get the current sclk/mclk git bisect bad d7dbce09b61dbd8c00ea401a2dc734193309cb91 # bad: [d6d2a1882a79c1a5425d6f82b2fc7b934916f893] drm/radeon: add INFO query for GPU temperature git bisect bad d6d2a1882a79c1a5425d6f82b2fc7b934916f893 # bad: [b9729b17a414f99c61f4db9ac9f9ed987fa0cbfe] drm/radeon: dont switch vt on suspend git bisect bad b9729b17a414f99c61f4db9ac9f9ed987fa0cbfe # first bad commit: [b9729b17a414f99c61f4db9ac9f9ed987fa0cbfe] drm/radeon: dont switch vt on suspend What else can I provide to help fixing the real bug? I would like to prevent rolling out the revert as a fix. So I can put radeon_crtc_load_lut somewhere in the codepath in evergreen_resume, but that will take quite a while without any further advice. Created attachment 128190 [details] [review] Ubuntu Linux kernel diff of 3.13 So I thought this bug is the actual problem I'm seeing, or has at least the same origin, but it's not. Using Linux 3.13, I also see this bug when returning after a longer idle time. No suspend is involved. And still a "xset dpms for off" call fixes the palette problem. Now I had a look at the HWE kernel of our previous Precise (12.04) based release, where we didn't see this problem with the same HW. The attached diff includes all radeon changes between our working version and the current kernel (~2k lines). The most prominent change is probably the default switch to DPM for CHIP_CAICOS, but that's really a long shot (profile => dpm in /sys/class/drm/card0/device/power_method). While trying to reproduce the original bug, which involves long idle / wait times, I'm currently at the point, where I'm quite sure that that radeon.dpm=0 prevents the distorted colours. I don't know if there exists a code path, which is shared between DPMS, DPM and resume. I saw the same bug on my Radeon HD6450 and I used to revert the mentioned commit locally for quite some time (which fixed the bug). However I stopped doing that when I noticed I could work around the issue simply by switching the VT manually (Ctrl+Alt+F?). Also I did not notice the bug anymore since I switched to Wayland (F25). Btw: Bug 99163 is about HDMI audio but supposedly fixed by reverting this commit. Seems I actually forgot to post my last comment… So I added various versions of: printk(KERN_ALERT "JMG - %s:%d: ...\n", __FUNCTION__, __LINE__, ....); And it seems the problem occurs less often now, which is adding to my suspicion of a race condition, as DPMS wake-up doesn't always show broken colours. I couldn't really identify a place with a missing radeon_crtc_load_lut [Mo Dez 5 16:10:43 2016] JMG - dce5_crtc_load_lut:108: 0 [Mo Dez 5 16:12:56 2016] JMG - atombios_crtc_dpms:294: 3 (off) [Mo Dez 5 16:12:56 2016] JMG - atombios_crtc_dpms:294: 1 (standby) [Mo Dez 5 16:13:56 2016] JMG - atombios_crtc_dpms:294: 2 (suspend) [Mo Dez 5 16:14:56 2016] JMG - atombios_crtc_dpms:294: 3 (off) [Mo Dez 5 16:45:52 2016] JMG - atombios_crtc_dpms:294: 0 (on) [Mo Dez 5 16:45:52 2016] JMG - radeon_crtc_load_lut:196: 1 [Mo Dez 5 16:45:52 2016] JMG - dce5_crtc_load_lut:108: 0 This was definitely calling dce5_crtc_load_lut, but nevertheless the colours were wrong. I'm not sure why "atombios_crtc_dpms:294: 3" is called twice, probably also for the inactive port? I couldn't reproduce by just waiting for 5 minutes or setting radeon.dpm=0. So we did a rollout for my few test machines a few days ago with disabled dpm. Currently the feedback is positive, so we're planing to do the rollout to our few thousand machines next year. I'm not caring that much about the suspend / resume, but the broken colours after DPMS (so no suspend involved) are highly irritating for the users. I'm experiencing this bug with an R5 230 (basically a 6450) and linux 4.11.1. I can confirm that reverting b9729b17a414f99c61f4db9ac9f9ed987fa0cbfe solves the issue for me. Since it's a one liner and it seems completely harmless, could we revert this upstream? Yes, supposedly this should have been fixed in 4.15 (18c437caa5b18a235dd65cec224eab54bebcee65) Checks it. Reverted upstream and stable. I still can reproduce the bug on a current bionic system (which has 4.15 now) One possible workaround for this bug is setting the kernel (module) parameter radeon.dpm=0 (In reply to Christoph Lutz from comment #12) > I still can reproduce the bug on a current bionic system (which has 4.15 now) Sounds like you may be seeing a different bug. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.