Bug 43655 - Latest radeon dri driver on HD6950 with GRUB set "GRUB_GFXPAYLOAD_LINUX=keep" put the display in a flickering state
Summary: Latest radeon dri driver on HD6950 with GRUB set "GRUB_GFXPAYLOAD_LINUX=keep"...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 49262 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-12-08 22:06 UTC by Alexandre Demers
Modified: 2013-08-09 18:25 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
possible fix (9.66 KB, patch)
2013-04-02 17:29 UTC, Alex Deucher
no flags Details | Splinter Review
dmesg from 3.9-rc5 with patch (67.46 KB, text/plain)
2013-04-03 00:08 UTC, Alexandre Demers
no flags Details
3.9-rc5 with patch and drm.debug=14 (111.49 KB, text/plain)
2013-04-03 01:56 UTC, Alexandre Demers
no flags Details

Description Alexandre Demers 2011-12-08 22:06:54 UTC
My new HD6950 flickers like hell right from the initialization. Using kernel 3.1.0 is just fine, but kernels 3.2.0-rc3 and over make the screen flicker (I can't tell for versions in between for now). Also, the screen seems shifted by a bit more than half my monitor's width.

I also have an integrated HD3200 and I have no problem at all when selecting this integrated chipset over my PCI-E 6950 with the same driver and kernel combinations.
Comment 1 Alex Deucher 2011-12-09 06:00:10 UTC
Can you bisect?  Did you update any other components (mesa, xf86-video-ati) or just the kernel?
Comment 2 Alexandre Demers 2011-12-11 09:00:57 UTC
More info about this bug: I have both kernel 3.1.0 and 3.2.0-rc4 installed right now (compiled from kernel.org). I had 3.2.0-rc3 installed, before moving to rc4 to test if the bug had been solved.

Have I updated other components? Of course, I'm testing with latest versions of both mesa and xf86-video-ati (I'll have to test today's versions though). But then, it shouldn't be a problem since I'm testing the same components with both kernels.

I'll bisect kernel's commits in the next couple of days to find which one is breaking things.
Comment 3 Alexandre Demers 2011-12-12 01:03:23 UTC
OK, so after testing first all RCs, I narrowed the problem between RC3 and RC4. So, bisecting gave me the following culprit:

commit 9b5a4d4f65e260a109eaeea8bbc8062a7c58b55e
Merge: cb35999 67589c7
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Nov 28 13:49:43 2011 -0800

    Merge branch 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/gi
    
    * 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
      percpu: explain why per_cpu_ptr_to_phys() is more complicated than necessa
      percpu: fix chunk range calculation
      percpu: rename pcpu_mem_alloc to pcpu_mem_zalloc

It has nothing to do with drm in itself. But it must be related at some point... I'll reset my tree tomorrow and retest to be sure by compiling just before this commit.
Comment 4 Alexandre Demers 2011-12-13 17:48:47 UTC
Strangely, when rebisecting, I found commit a34815b96f9a21b3a2e2912dfd0d994acd2855e3 to be the bad one... It is really near to the first one. So, I'm retesting both to be sure.
Comment 5 Michel Dänzer 2011-12-15 10:18:01 UTC
It sounds like the problem may happen or not with a certain probability with any given kernel. You should probably test each kernel a certain number of times before declaring it as good, or the bisection may not work correctly.
Comment 6 Alexandre Demers 2011-12-15 14:03:15 UTC
I tested today's latest kernel version after fighting with the beast for the last couple of days. Just to be sure, I made a clean compilation and it now works properly without any problem. I'll assume for the moment it was related to something stuck in the compilation.

If anything goes wrong again, I'll reopen the bug.
Comment 7 Alexandre Demers 2011-12-15 16:31:19 UTC
This is one driving me crazy. You were right, it is no reproducible everytime. I have to reboot a couple of time to trigger it or to fix it... Going back to bisection.
Comment 8 Alexandre Demers 2011-12-15 17:50:38 UTC
I think I've found a hint. Here's the thing:

Whatever kernel version is the first entry in my Grub's list, the problem will appear. If I select a different kernel manually or if I change the default menu entry to a different one, everything is fine. The only exception is if my first menu entry is Windows. Then, there is never any problem.

Here is what I see when selecting the first entry. First, the Grub's background stays for a moment and then it switches to the boot screen (using Ubuntu, it shows the Ubuntu loading screen). However, most of the time, it will flicker, usually showing only a couple of clear lines at the top of the screen.

If I select another entry, it switches to the kernel initialization (showing step by step what is being done) and then it switches to the boot screen only after having initialized correctly the screen. The only difference I can see between the first entries and the others is the following in my grub.cfg:
	set gfxpayload=$linux_gfx_mode

I suspect a bad interaction between Grub and the rest of the initialization process. Does my suspicion make sense?
Comment 9 Michel Dänzer 2011-12-16 01:38:38 UTC
(In reply to comment #8)
> I suspect a bad interaction between Grub and the rest of the initialization
> process. Does my suspicion make sense?

Quite possibly. Can you test your hypothesis by moving this line between entries?
Comment 10 Alexandre Demers 2011-12-16 06:51:47 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > I suspect a bad interaction between Grub and the rest of the initialization
> > process. Does my suspicion make sense?
> 
> Quite possibly. Can you test your hypothesis by moving this line between
> entries?

Tested and confirmed. Whenever I added "set gfxpayload=$linux_gfx_mode", there was a really high chance of hitting this bug (near 90% of the time). Without it, I booted flawlessly.
Comment 11 Alexandre Demers 2011-12-17 10:23:06 UTC
Should I try to bisect drm driver to see if there is a version without that problem? I've had this new 6950 for less than 2 weeks, so I don't even know if it worked correctly at some point.
Comment 12 Peter Wang 2012-04-29 12:45:29 UTC
*** Bug 49262 has been marked as a duplicate of this bug. ***
Comment 13 Alexandre Demers 2012-07-21 03:49:15 UTC
It's been some time now. Since my initial report, I moved from Ubuntu to Arch. Today, it was officially announced Arch was moving to Grub2. So I updated my setup (I was using Grub legacy since my move to Arch). Suprise, this bug is still valide.

So I played around with grub default options. So GRUB_GFXMODE=auto works fine, but GRUB_GFXPAYLOAD_LINUX=keep triggers the bug. Removing the latest option makes everything runs smoothly.

I read bug 49262 and two things are common with my setup: we are both using a 69XX radeon card and we are both using a DVI-to-VGA adaptor. I'm wondering if the combination of card AND adaptor was the root of the problem. Before having this 6950 card, I was using an Radeon HD 3200 IGP without any adaptor and I had no problem.

I don't have any other monitor here and I don't have a DVI or HDMI input on my monitor, so I can't tell yet. But still, what would you suggest to try to help figure out what's going on? Any comment from Alex or Michel would be appreciated. I could have access to a different monitor if I ask for it.
Comment 14 Alexandre Demers 2012-07-26 05:33:08 UTC
May well be the same as bug 42373. I'll try to find a way to dig this following 42373 repro steps.
Comment 15 Jerome Glisse 2012-07-30 14:59:56 UTC
If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch there should fix your issue.
Comment 16 Alexandre Demers 2012-07-30 15:12:24 UTC
(In reply to comment #15)
> If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch
> there should fix your issue.

I'll try it as soon as I'll have time. Thank you Jerome for your follow-up.
Comment 17 Alexandre Demers 2012-08-04 04:58:03 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch
> > there should fix your issue.
> 
> I'll try it as soon as I'll have time. Thank you Jerome for your follow-up.

(In reply to comment #15)
> If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch
> there should fix your issue.

It fixes it. Applied, rebooted 3 times without problem, went back to 3.6-rc1 (no patch) problem appeared, went back to patched kernel and still no problem.
Comment 18 Alex Deucher 2012-08-04 13:26:36 UTC

*** This bug has been marked as a duplicate of bug 42373 ***
Comment 19 Alexandre Demers 2012-08-20 03:00:22 UTC
Fixed by attachment 64759 [details] [review] (proposed in bug 42373 which is similar to this bug but is not the same since it is not fixed by the attachment)
Comment 20 Alexandre Demers 2013-03-29 03:07:53 UTC
I'm reopening this bug for two reasons:
 -It is still happening with kernel 3.9.0-rc4 because attachment 64759 [details] [review] from bug 42373 seems to never have been pushed
 -It is not a duplicate of bug 42373 since attachment 64759 [details] [review] fixes current bug but not 42373

It would be nice to have a revised version of attachment 64759 [details] [review] that applies correctly on latest kernel, then to have it tested and pushed to kernel's git.
Comment 21 Alexandre Demers 2013-03-29 22:23:59 UTC
So I'm trying to narrow down what is going on. Kernel 3.5 + patch 64759 works OK. I'm now testing kernel's commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 that was supposed to supersede patch 64759 in kernel 3.6. I'll see what I get.

My feeling is we are not saving/restoring an address (VM, VRAM, TTM, whatever) correctly somewhere along the path.
Comment 22 Alex Deucher 2013-03-30 00:38:44 UTC
The current code should do the right thing with respect to disabling display access to vram when we reconfigure the memory controller.  The current code disables memory reads but leaves the display controllers enabled while we change the MC setup.  Turning off the crtcs as the patch you mentioned does has two problems:
1. it breaks some systems which the current method fixes
2. it defeats the purpose of GRUB_GFXPAYLOAD_LINUX=keep which is to avoid turning off the displays for flickerless boot up.  If you turn off the crtcs you have to re-init the entire display pipeline.
The problem seems to be that disabling the crtc memory reads seems to take longer than expected on some systems which leads to invalid reads while the MC is being reprogrammed.  One possible solution may be to leave the MC as configured by the vbios and try and put the gart aperture either before or after the location of varm in the GPU's address space.
Comment 23 Alexandre Demers 2013-03-30 02:44:08 UTC
(In reply to comment #22)
> The current code should do the right thing with respect to disabling display
> access to vram when we reconfigure the memory controller.  The current code
> disables memory reads but leaves the display controllers enabled while we
> change the MC setup.  Turning off the crtcs as the patch you mentioned does
> has two problems:
> 1. it breaks some systems which the current method fixes
> 2. it defeats the purpose of GRUB_GFXPAYLOAD_LINUX=keep which is to avoid
> turning off the displays for flickerless boot up.  If you turn off the crtcs
> you have to re-init the entire display pipeline.
> The problem seems to be that disabling the crtc memory reads seems to take
> longer than expected on some systems which leads to invalid reads while the
> MC is being reprogrammed.  One possible solution may be to leave the MC as
> configured by the vbios and try and put the gart aperture either before or
> after the location of varm in the GPU's address space.

I understand what you are explaining. Meanwhile, I'm bisecting to find out where it was broken again since commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 does indeed what it is supposed to do (no problem when using GRUB_GFXPAYLOAD_LINUX=keep). So, somewhere between commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 and 3.9.0-rcx, something went wrong. I'll keep in touch.
Comment 24 Alexandre Demers 2013-03-30 23:32:25 UTC
62444b7462a2b98bc78d68736c03a7c4e66ba7e2 is the first bad commit
commit 62444b7462a2b98bc78d68736c03a7c4e66ba7e2
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Aug 15 17:18:42 2012 -0400

    drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2)
    
    - Stop the displays from accessing the FB
    - Block CPU access
    - Turn off MC client access
    
    This should fix issues some users have seen, especially
    with UEFI, when changing the MC FB location that result
    in hangs or display corruption.
    
    v2: fix crtc enabled check noticed by Luca Tettamanti
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 3e0d33c9b4eda29ced814fe9a863efe63e53f14c 4932561607b160734ec1eade927a9fe18c9f3f1b M	drivers




So in other words, your explanation Alex seems to be right. I'll be waiting if anything has to be tested.
Comment 25 Alex Deucher 2013-04-02 17:29:59 UTC
Created attachment 77332 [details] [review]
possible fix

Does this patch help?
Comment 26 Alexandre Demers 2013-04-02 23:50:29 UTC
(In reply to comment #25)
> Created attachment 77332 [details] [review] [review]
> possible fix
> 
> Does this patch help?

Applied on 3.9-rc5 and it doesn't help.
Comment 27 Alex Deucher 2013-04-02 23:54:06 UTC
(In reply to comment #26)
> Applied on 3.9-rc5 and it doesn't help.

Can you attach your dmesg output with the patch applied?
Comment 28 Alexandre Demers 2013-04-03 00:08:44 UTC
Created attachment 77348 [details]
dmesg from 3.9-rc5 with patch

Et voilà, as asked
Comment 29 Alexandre Demers 2013-04-03 01:56:39 UTC
Created attachment 77350 [details]
3.9-rc5 with patch and drm.debug=14

With more debug info
Comment 30 Alex Deucher 2013-04-04 19:31:59 UTC
does attachment 77441 [details] [review] help?
Comment 31 Alexandre Demers 2013-04-05 02:38:35 UTC
(In reply to comment #30)
> does attachment 77441 [details] [review] [review] help?

Still the same. 1 boot on 4 was OK, the three others were showing the same kind of corruptions as before.
Comment 32 Alexandre Demers 2013-08-09 18:25:22 UTC
Closing this bug since fixed since a few of releases.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.