Summary: | Latest radeon dri driver on HD6950 with GRUB set "GRUB_GFXPAYLOAD_LINUX=keep" put the display in a flickering state | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Alexandre Demers <alexandre.f.demers> | ||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||
Severity: | critical | ||||||||||
Priority: | medium | CC: | blinxwang, magist3r | ||||||||
Version: | XOrg git | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=56139 https://bugs.freedesktop.org/show_bug.cgi?id=57567 |
||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Description
Alexandre Demers
2011-12-08 22:06:54 UTC
Can you bisect? Did you update any other components (mesa, xf86-video-ati) or just the kernel? More info about this bug: I have both kernel 3.1.0 and 3.2.0-rc4 installed right now (compiled from kernel.org). I had 3.2.0-rc3 installed, before moving to rc4 to test if the bug had been solved. Have I updated other components? Of course, I'm testing with latest versions of both mesa and xf86-video-ati (I'll have to test today's versions though). But then, it shouldn't be a problem since I'm testing the same components with both kernels. I'll bisect kernel's commits in the next couple of days to find which one is breaking things. OK, so after testing first all RCs, I narrowed the problem between RC3 and RC4. So, bisecting gave me the following culprit: commit 9b5a4d4f65e260a109eaeea8bbc8062a7c58b55e Merge: cb35999 67589c7 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon Nov 28 13:49:43 2011 -0800 Merge branch 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/gi * 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu percpu: explain why per_cpu_ptr_to_phys() is more complicated than necessa percpu: fix chunk range calculation percpu: rename pcpu_mem_alloc to pcpu_mem_zalloc It has nothing to do with drm in itself. But it must be related at some point... I'll reset my tree tomorrow and retest to be sure by compiling just before this commit. Strangely, when rebisecting, I found commit a34815b96f9a21b3a2e2912dfd0d994acd2855e3 to be the bad one... It is really near to the first one. So, I'm retesting both to be sure. It sounds like the problem may happen or not with a certain probability with any given kernel. You should probably test each kernel a certain number of times before declaring it as good, or the bisection may not work correctly. I tested today's latest kernel version after fighting with the beast for the last couple of days. Just to be sure, I made a clean compilation and it now works properly without any problem. I'll assume for the moment it was related to something stuck in the compilation. If anything goes wrong again, I'll reopen the bug. This is one driving me crazy. You were right, it is no reproducible everytime. I have to reboot a couple of time to trigger it or to fix it... Going back to bisection. I think I've found a hint. Here's the thing: Whatever kernel version is the first entry in my Grub's list, the problem will appear. If I select a different kernel manually or if I change the default menu entry to a different one, everything is fine. The only exception is if my first menu entry is Windows. Then, there is never any problem. Here is what I see when selecting the first entry. First, the Grub's background stays for a moment and then it switches to the boot screen (using Ubuntu, it shows the Ubuntu loading screen). However, most of the time, it will flicker, usually showing only a couple of clear lines at the top of the screen. If I select another entry, it switches to the kernel initialization (showing step by step what is being done) and then it switches to the boot screen only after having initialized correctly the screen. The only difference I can see between the first entries and the others is the following in my grub.cfg: set gfxpayload=$linux_gfx_mode I suspect a bad interaction between Grub and the rest of the initialization process. Does my suspicion make sense? (In reply to comment #8) > I suspect a bad interaction between Grub and the rest of the initialization > process. Does my suspicion make sense? Quite possibly. Can you test your hypothesis by moving this line between entries? (In reply to comment #9) > (In reply to comment #8) > > I suspect a bad interaction between Grub and the rest of the initialization > > process. Does my suspicion make sense? > > Quite possibly. Can you test your hypothesis by moving this line between > entries? Tested and confirmed. Whenever I added "set gfxpayload=$linux_gfx_mode", there was a really high chance of hitting this bug (near 90% of the time). Without it, I booted flawlessly. Should I try to bisect drm driver to see if there is a version without that problem? I've had this new 6950 for less than 2 weeks, so I don't even know if it worked correctly at some point. *** Bug 49262 has been marked as a duplicate of this bug. *** It's been some time now. Since my initial report, I moved from Ubuntu to Arch. Today, it was officially announced Arch was moving to Grub2. So I updated my setup (I was using Grub legacy since my move to Arch). Suprise, this bug is still valide. So I played around with grub default options. So GRUB_GFXMODE=auto works fine, but GRUB_GFXPAYLOAD_LINUX=keep triggers the bug. Removing the latest option makes everything runs smoothly. I read bug 49262 and two things are common with my setup: we are both using a 69XX radeon card and we are both using a DVI-to-VGA adaptor. I'm wondering if the combination of card AND adaptor was the root of the problem. Before having this 6950 card, I was using an Radeon HD 3200 IGP without any adaptor and I had no problem. I don't have any other monitor here and I don't have a DVI or HDMI input on my monitor, so I can't tell yet. But still, what would you suggest to try to help figure out what's going on? Any comment from Alex or Michel would be appreciated. I could have access to a different monitor if I ask for it. May well be the same as bug 42373. I'll try to find a way to dig this following 42373 repro steps. If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch there should fix your issue. (In reply to comment #15) > If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch > there should fix your issue. I'll try it as soon as I'll have time. Thank you Jerome for your follow-up. (In reply to comment #16) > (In reply to comment #15) > > If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch > > there should fix your issue. > > I'll try it as soon as I'll have time. Thank you Jerome for your follow-up. (In reply to comment #15) > If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch > there should fix your issue. It fixes it. Applied, rebooted 3 times without problem, went back to 3.6-rc1 (no patch) problem appeared, went back to patched kernel and still no problem. *** This bug has been marked as a duplicate of bug 42373 *** Fixed by attachment 64759 [details] [review] (proposed in bug 42373 which is similar to this bug but is not the same since it is not fixed by the attachment) I'm reopening this bug for two reasons: -It is still happening with kernel 3.9.0-rc4 because attachment 64759 [details] [review] from bug 42373 seems to never have been pushed -It is not a duplicate of bug 42373 since attachment 64759 [details] [review] fixes current bug but not 42373 It would be nice to have a revised version of attachment 64759 [details] [review] that applies correctly on latest kernel, then to have it tested and pushed to kernel's git. So I'm trying to narrow down what is going on. Kernel 3.5 + patch 64759 works OK. I'm now testing kernel's commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 that was supposed to supersede patch 64759 in kernel 3.6. I'll see what I get. My feeling is we are not saving/restoring an address (VM, VRAM, TTM, whatever) correctly somewhere along the path. The current code should do the right thing with respect to disabling display access to vram when we reconfigure the memory controller. The current code disables memory reads but leaves the display controllers enabled while we change the MC setup. Turning off the crtcs as the patch you mentioned does has two problems: 1. it breaks some systems which the current method fixes 2. it defeats the purpose of GRUB_GFXPAYLOAD_LINUX=keep which is to avoid turning off the displays for flickerless boot up. If you turn off the crtcs you have to re-init the entire display pipeline. The problem seems to be that disabling the crtc memory reads seems to take longer than expected on some systems which leads to invalid reads while the MC is being reprogrammed. One possible solution may be to leave the MC as configured by the vbios and try and put the gart aperture either before or after the location of varm in the GPU's address space. (In reply to comment #22) > The current code should do the right thing with respect to disabling display > access to vram when we reconfigure the memory controller. The current code > disables memory reads but leaves the display controllers enabled while we > change the MC setup. Turning off the crtcs as the patch you mentioned does > has two problems: > 1. it breaks some systems which the current method fixes > 2. it defeats the purpose of GRUB_GFXPAYLOAD_LINUX=keep which is to avoid > turning off the displays for flickerless boot up. If you turn off the crtcs > you have to re-init the entire display pipeline. > The problem seems to be that disabling the crtc memory reads seems to take > longer than expected on some systems which leads to invalid reads while the > MC is being reprogrammed. One possible solution may be to leave the MC as > configured by the vbios and try and put the gart aperture either before or > after the location of varm in the GPU's address space. I understand what you are explaining. Meanwhile, I'm bisecting to find out where it was broken again since commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 does indeed what it is supposed to do (no problem when using GRUB_GFXPAYLOAD_LINUX=keep). So, somewhere between commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 and 3.9.0-rcx, something went wrong. I'll keep in touch. 62444b7462a2b98bc78d68736c03a7c4e66ba7e2 is the first bad commit commit 62444b7462a2b98bc78d68736c03a7c4e66ba7e2 Author: Alex Deucher <alexander.deucher@amd.com> Date: Wed Aug 15 17:18:42 2012 -0400 drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2) - Stop the displays from accessing the FB - Block CPU access - Turn off MC client access This should fix issues some users have seen, especially with UEFI, when changing the MC FB location that result in hangs or display corruption. v2: fix crtc enabled check noticed by Luca Tettamanti Signed-off-by: Alex Deucher <alexander.deucher@amd.com> :040000 040000 3e0d33c9b4eda29ced814fe9a863efe63e53f14c 4932561607b160734ec1eade927a9fe18c9f3f1b M drivers So in other words, your explanation Alex seems to be right. I'll be waiting if anything has to be tested. Created attachment 77332 [details] [review] possible fix Does this patch help? (In reply to comment #25) > Created attachment 77332 [details] [review] [review] > possible fix > > Does this patch help? Applied on 3.9-rc5 and it doesn't help. (In reply to comment #26) > Applied on 3.9-rc5 and it doesn't help. Can you attach your dmesg output with the patch applied? Created attachment 77348 [details]
dmesg from 3.9-rc5 with patch
Et voilà, as asked
Created attachment 77350 [details]
3.9-rc5 with patch and drm.debug=14
With more debug info
does attachment 77441 [details] [review] help? (In reply to comment #30) > does attachment 77441 [details] [review] [review] help? Still the same. 1 boot on 4 was OK, the three others were showing the same kind of corruptions as before. Closing this bug since fixed since a few of releases. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.