Bug 56139 - [bisected] kernel 3.7.0-rc1 breaks 6950 (boot/grub2 and suspend/resume) (CAYMAN)
[bisected] kernel 3.7.0-rc1 breaks 6950 (boot/grub2 and suspend/resume) (CAYMAN)
Status: RESOLVED DUPLICATE of bug 60439
Product: DRI
Classification: Unclassified
Component: DRM/Radeon
XOrg git
All All
: medium major
Assigned To: Default DRI bug account
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-18 13:48 UTC by Alexandre Demers
Modified: 2013-03-11 13:38 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
possible fix (1.78 KB, patch)
2012-10-18 14:16 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (3.24 KB, patch)
2012-10-26 14:08 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (5.01 KB, patch)
2012-10-31 19:13 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (3.22 KB, patch)
2012-11-05 14:28 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (1.12 KB, patch)
2012-11-05 16:34 UTC, Alex Deucher
no flags Details | Splinter Review
Partial kernel.log (18.58 KB, text/plain)
2012-11-18 17:05 UTC, Alexandre Demers
no flags Details
Partial kernel.log - fixed string (23.46 KB, text/plain)
2012-11-18 19:40 UTC, Alexandre Demers
no flags Details
patch applied on 3.7.0-rc6 for debug output (8.95 KB, text/plain)
2012-11-18 19:41 UTC, Alexandre Demers
no flags Details
fix save for real (952 bytes, patch)
2012-11-19 14:19 UTC, Alex Deucher
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Alexandre Demers 2012-10-18 13:48:51 UTC
Using ArchLinux 64, own built kernel (vanilla) with an HD 6950 (CAYMAN).

When testing kernel 3.7-rc1, the screen becomes unreadable. It's either completely black or in a flickering state making the display unusable. It is very similar to what was observed in bug 43655.

At the time of bug 43655, the problem had been fixed by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=81ee8fb6b52ec69eeed37fe7943446af1dccecc5. So it was working OK with kernel 3.6.

I've bisected between kernel 3.6 and 3.7-rc1 and the faulty identified commit is the following:
62444b7462a2b98bc78d68736c03a7c4e66ba7e2 is the first bad commit
commit 62444b7462a2b98bc78d68736c03a7c4e66ba7e2
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Aug 15 17:18:42 2012 -0400

    drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2)
    
    - Stop the displays from accessing the FB
    - Block CPU access
    - Turn off MC client access
    
    This should fix issues some users have seen, especially
    with UEFI, when changing the MC FB location that result
    in hangs or display corruption.
    
    v2: fix crtc enabled check noticed by Luca Tettamanti
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 3e0d33c9b4eda29ced814fe9a863efe63e53f14c 4932561607b160734ec1eade927a9fe18c9f3f1b M	drivers
Comment 1 Alex Deucher 2012-10-18 14:16:24 UTC
Created attachment 68760 [details] [review]
possible fix

Does this patch help?
Comment 2 Alexandre Demers 2012-10-18 23:46:29 UTC
(In reply to comment #1)
> Created attachment 68760 [details] [review] [review]
> possible fix
> 
> Does this patch help?

Applied suggested patch on top of 62444b7462a2b98bc78d68736c03a7c4e66ba7e2 and it doesn't fix it.

By the way, I confirmed it is exactly as bug 43655 (same symptoms, same workaround).
Comment 3 Alexandre Demers 2012-10-22 05:57:10 UTC
So, I went further and I split commit 62444b in 3 patches. On for the register values, one for the stop function and one for the resume function. I applied the first and the second patches for now and it seems to work well. I suspended my computer, resumed and everything is going normal (for now at least). I'll test it a bit more, but the problem seems to be somewhere in the resume function.
Comment 4 Alexandre Demers 2012-10-26 03:17:36 UTC
First, I made a mistake last time I wrote: the evergreen_mc_resume() function patch was fine. The bad code was in the evergreen_mc_stop() function. Sorry for the wrong lead, I realized it today when I continued my tests.

Now, I split the evergreen_mc_stop() code in two parts. The following ran fine:
-	WREG32(D1VGA_CONTROL, 0);
-	WREG32(D2VGA_CONTROL, 0);
-	if (rdev->num_crtc >= 4) {
-		WREG32(EVERGREEN_D3VGA_CONTROL, 0);
-		WREG32(EVERGREEN_D4VGA_CONTROL, 0);
-	}
-	if (rdev->num_crtc >= 6) {
-		WREG32(EVERGREEN_D5VGA_CONTROL, 0);
-		WREG32(EVERGREEN_D6VGA_CONTROL, 0);
+	radeon_mc_wait_for_idle(rdev);
+
+	blackout = RREG32(MC_SHARED_BLACKOUT_CNTL);
+	if ((blackout & BLACKOUT_MODE_MASK) != 1) {
+		/* Block CPU access */
+		WREG32(BIF_FB_EN, 0);
+		/* blackout the MC */
+		blackout &= ~BLACKOUT_MODE_MASK;
+		WREG32(MC_SHARED_BLACKOUT_CNTL, blackout | 1);

and I was able to boot, stop, restart and suspend correctly.

However, when I applied:
void evergreen_mc_stop(struct radeon_device *rdev, struct evergreen_mc_save *save)
 {
-	u32 blackout;
+	u32 crtc_enabled, tmp, frame_count, blackout;
+	int i, j;

 	save->vga_render_control = RREG32(VGA_RENDER_CONTROL);
 	save->vga_hdp_control = RREG32(VGA_HDP_CONTROL);
 
-	/* Stop all video */
+	/* disable VGA render */
 	WREG32(VGA_RENDER_CONTROL, 0);
-	WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC0_REGISTER_OFFSET, 1);
-	WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC1_REGISTER_OFFSET, 1);
-	if (rdev->num_crtc >= 4) {
-		WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC2_REGISTER_OFFSET, 1);
-		WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC3_REGISTER_OFFSET, 1);
-	}
-	if (rdev->num_crtc >= 6) {
-		WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC4_REGISTER_OFFSET, 1);
-		WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC5_REGISTER_OFFSET, 1);
-	}
-	WREG32(EVERGREEN_CRTC_CONTROL + EVERGREEN_CRTC0_REGISTER_OFFSET, 0);
-	WREG32(EVERGREEN_CRTC_CONTROL + EVERGREEN_CRTC1_REGISTER_OFFSET, 0);
-	if (rdev->num_crtc >= 4) {
-		WREG32(EVERGREEN_CRTC_CONTROL + EVERGREEN_CRTC2_REGISTER_OFFSET, 0);
-		WREG32(EVERGREEN_CRTC_CONTROL + EVERGREEN_CRTC3_REGISTER_OFFSET, 0);
-	}
-	if (rdev->num_crtc >= 6) {
-		WREG32(EVERGREEN_CRTC_CONTROL + EVERGREEN_CRTC4_REGISTER_OFFSET, 0);
-		WREG32(EVERGREEN_CRTC_CONTROL + EVERGREEN_CRTC5_REGISTER_OFFSET, 0);
-	}
-	WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC0_REGISTER_OFFSET, 0);
-	WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC1_REGISTER_OFFSET, 0);
-	if (rdev->num_crtc >= 4) {
-		WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC2_REGISTER_OFFSET, 0);
-		WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC3_REGISTER_OFFSET, 0);
-	}
-	if (rdev->num_crtc >= 6) {
-		WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC4_REGISTER_OFFSET, 0);
-		WREG32(EVERGREEN_CRTC_UPDATE_LOCK + EVERGREEN_CRTC5_REGISTER_OFFSET, 0);
+	/* blank the display controllers */
+	for (i = 0; i < rdev->num_crtc; i++) {
+		crtc_enabled = RREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i]) & EVERGREEN_CRTC_MASTER_EN;
+		if (crtc_enabled) {
+			save->crtc_enabled[i] = true;
+			if (ASIC_IS_DCE6(rdev)) {
+				tmp = RREG32(EVERGREEN_CRTC_BLANK_CONTROL + crtc_offsets[i]);
+				if (!(tmp & EVERGREEN_CRTC_BLANK_DATA_EN)) {
+					radeon_wait_for_vblank(rdev, i);
+					tmp |= EVERGREEN_CRTC_BLANK_DATA_EN;
+					WREG32(EVERGREEN_CRTC_BLANK_CONTROL + crtc_offsets[i], tmp);
+				}
+			} else {
+				tmp = RREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i]);
+				if (!(tmp & EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE)) {
+					radeon_wait_for_vblank(rdev, i);
+					tmp |= EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;
+					WREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i], tmp);
+				}
+			}
+			/* wait for the next frame */
+			frame_count = radeon_get_vblank_counter(rdev, i);
+			for (j = 0; j < rdev->usec_timeout; j++) {
+				if (radeon_get_vblank_counter(rdev, i) != frame_count)
+					break;
+				udelay(1);
+			}
+		}
 	}

the bug appeared. So it seems blanking the display controllers with for(i = 0; i < rdev->num_crtc; i++) is not equivalent to the code that it replaces. The original code first wrote in the EVERGREEN_CRTC_UPDATE_LOCK registers, before setting EVERGREEN_CRTC_CONTROL registers and writing again in the EVERGREEN_CRTC_UPDATE_LOCK registers. On the other hand, the new code doesn't write in the EVERGREEN_CRTC_UPDATE_LOCK neither before nor after setting EVERGREEN_CRTC_CONTROL.

Could this be a clue?
Comment 5 Alex Deucher 2012-10-26 14:08:26 UTC
Created attachment 69113 [details] [review]
possible fix

(In reply to comment #4)
> the bug appeared. So it seems blanking the display controllers with for(i =
> 0; i < rdev->num_crtc; i++) is not equivalent to the code that it replaces.
> The original code first wrote in the EVERGREEN_CRTC_UPDATE_LOCK registers,
> before setting EVERGREEN_CRTC_CONTROL registers and writing again in the
> EVERGREEN_CRTC_UPDATE_LOCK registers. On the other hand, the new code
> doesn't write in the EVERGREEN_CRTC_UPDATE_LOCK neither before nor after
> setting EVERGREEN_CRTC_CONTROL.

It should be equivalent.  CRTC_UPDATE_LOCK turns off double buffering in the crtc which makes register updates atomic.  The new code waits for the frame count to increase (the double buffered updates happen at vblank) so it should be equivalent.  That said, it shouldn't hurt to take the lock.  Does this patch help?
Comment 6 Alexandre Demers 2012-10-27 04:51:49 UTC
(In reply to comment #5)
> Created attachment 69113 [details] [review] [review]
> possible fix
> 
> (In reply to comment #4)
> > the bug appeared. So it seems blanking the display controllers with for(i =
> > 0; i < rdev->num_crtc; i++) is not equivalent to the code that it replaces.
> > The original code first wrote in the EVERGREEN_CRTC_UPDATE_LOCK registers,
> > before setting EVERGREEN_CRTC_CONTROL registers and writing again in the
> > EVERGREEN_CRTC_UPDATE_LOCK registers. On the other hand, the new code
> > doesn't write in the EVERGREEN_CRTC_UPDATE_LOCK neither before nor after
> > setting EVERGREEN_CRTC_CONTROL.
> 
> It should be equivalent.  CRTC_UPDATE_LOCK turns off double buffering in the
> crtc which makes register updates atomic.  The new code waits for the frame
> count to increase (the double buffered updates happen at vblank) so it
> should be equivalent.  That said, it shouldn't hurt to take the lock.  Does
> this patch help?

Sadly, it didn't help.
Comment 7 Alexandre Demers 2012-10-31 18:29:13 UTC
Alex, would it be possible to print what is going on or if an error occurred in evergreen_mc_stop()?

I see four things that could be going on:
1- we are not using the right path for CAYMAN -> (ASIC_IS_DCE6(rdev));
2- lock mechanism synced with vblank is not working properly;
3- all the registers should be locked at the same time, then all modified and finally unlocked together, which is not done with the for loop where we move through each at a time;
4- we are not setting the right registers.

What do you think?

I can test and investigate 1, 3 and 4. But what about 2?
Comment 8 Alex Deucher 2012-10-31 19:13:52 UTC
Created attachment 69370 [details] [review]
possible fix

(In reply to comment #7)
> Alex, would it be possible to print what is going on or if an error occurred
> in evergreen_mc_stop()?
> 
> I see four things that could be going on:
> 1- we are not using the right path for CAYMAN -> (ASIC_IS_DCE6(rdev));

cayman is DCE5.  It is using the correct code path.

> 2- lock mechanism synced with vblank is not working properly;

Locking makes updates atomic rather than double buffered.

> 3- all the registers should be locked at the same time, then all modified
> and finally unlocked together, which is not done with the for loop where we
> move through each at a time;

doesn't matter.

> 4- we are not setting the right registers.

The existing sequence should be correct.  It's the same sequence our hw team recommends.  I can't reproduce this on my cayman boards unfortunately and this patch fixes the exact same problem you are having for a number of other people :/

Maybe an issue with the icon or cursor, but I think those should be disabled when we disable mem requests in the crtc.  Does this patch help?
Comment 9 Alexandre Demers 2012-11-01 00:38:45 UTC
(In reply to comment #8)
> Created attachment 69370 [details] [review] [review]
> possible fix
> 
> (In reply to comment #7)
> > Alex, would it be possible to print what is going on or if an error occurred
> > in evergreen_mc_stop()?
> > 
> > I see four things that could be going on:
> > 1- we are not using the right path for CAYMAN -> (ASIC_IS_DCE6(rdev));
> 
> cayman is DCE5.  It is using the correct code path.
> 
> > 2- lock mechanism synced with vblank is not working properly;
> 
> Locking makes updates atomic rather than double buffered.
> 
> > 3- all the registers should be locked at the same time, then all modified
> > and finally unlocked together, which is not done with the for loop where we
> > move through each at a time;
> 
> doesn't matter.
> 
> > 4- we are not setting the right registers.
> 
> The existing sequence should be correct.  It's the same sequence our hw team
> recommends.  I can't reproduce this on my cayman boards unfortunately and
> this patch fixes the exact same problem you are having for a number of other
> people :/
> 
> Maybe an issue with the icon or cursor, but I think those should be disabled
> when we disable mem requests in the crtc.  Does this patch help?

Not working either. Also, with 3.7.0-rc3 + this patch, the computer freezes when coming back from suspend. I see Gnome-Shell for a couple of seconds, then some garbage and it stops. Kernel 3.6 works fine on that matter. I'll test with my split patches to see if it is caused by the same commit.

So, pretty much back to square 1, wherever that is.
Comment 10 Alexandre Demers 2012-11-01 05:30:53 UTC
The good news is: the code under resume() is not to be blamed for the lock after coming back from suspend.
Comment 11 Alexandre Demers 2012-11-05 06:58:17 UTC
Found what is wrong with the help of a few printk and by comparing to the code being replaced. All the logic is good (going through crtc, disabling them, waiting for vblank) BUT setting "tmp |= EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;" is wrong.

If I do as in the previous code by setting tmp = 0 and then continuing with:
radeon_wait_for_vblank(rdev, i);
WREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i], tmp);
everything works fine as before.

What is expected from "tmp |= EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;"? From what I read with printk, it is far from a 0 or a 1. Is this normal?
Comment 12 Alex Deucher 2012-11-05 14:03:53 UTC
(In reply to comment #11)
> Found what is wrong with the help of a few printk and by comparing to the
> code being replaced. All the logic is good (going through crtc, disabling
> them, waiting for vblank) BUT setting "tmp |=
> EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;" is wrong.
> 
> If I do as in the previous code by setting tmp = 0 and then continuing with:
> radeon_wait_for_vblank(rdev, i);
> WREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i], tmp);
> everything works fine as before.
> 
> What is expected from "tmp |= EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;"?
> From what I read with printk, it is far from a 0 or a 1. Is this normal?

That's the most important bit in the entire sequence.  It's a bit field in a register (bit 24 to be exact).  That bit is the bit that actually disables the requests from the display controller in the memory controller.  The whole point of this code is to disable all clients of the memory controller (mc_stop()) so that we can change the location of vram within the GPU's address space.  Once we've moved vram, we can re-enable the clients (mc_resume()) so that they point to the new vram location.
Comment 13 Alex Deucher 2012-11-05 14:28:10 UTC
Created attachment 69564 [details] [review]
possible fix

Slightly adjusted wait for vblank function.  Maybe this will help?
Comment 14 Alexandre Demers 2012-11-05 14:55:16 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > Found what is wrong with the help of a few printk and by comparing to the
> > code being replaced. All the logic is good (going through crtc, disabling
> > them, waiting for vblank) BUT setting "tmp |=
> > EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;" is wrong.
> > 
> > If I do as in the previous code by setting tmp = 0 and then continuing with:
> > radeon_wait_for_vblank(rdev, i);
> > WREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i], tmp);
> > everything works fine as before.
> > 
> > What is expected from "tmp |= EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;"?
> > From what I read with printk, it is far from a 0 or a 1. Is this normal?
> 
> That's the most important bit in the entire sequence.  It's a bit field in a
> register (bit 24 to be exact).  That bit is the bit that actually disables
> the requests from the display controller in the memory controller.  The
> whole point of this code is to disable all clients of the memory controller
> (mc_stop()) so that we can change the location of vram within the GPU's
> address space.  Once we've moved vram, we can re-enable the clients
> (mc_resume()) so that they point to the new vram location.

Thank you, you confirmed what I had assumed. I had already understood that it was the most important part in the sequence since it is associated to a "write" instruction. I don't have the (decimal/binary) values with me right now, but I'll be able to give you what was read from the register and from the result returned from the bitwise OR assignment we are pushing in the register. I'll confirm which bit(s) are changing. I'm sure the assignment was setting one (or more) bit in the register to "1". Is bit 24 the only one expected to change in the register? Should it be put to "1" or "0"?

Another question: why were we setting "0" in the register before? By doing so, we were setting the whole register to "0" (the whole 32 bits), right? If I remember correctly, from what I saw yesterday with the help of printk, it seems we are setting at least one bit to "1" in this register, which would be the opposite of what was being done before and therefore of what was working correctly with my video card. I'm just trying to understand why we were doing something in the first place that was working correctly and that was introduce to fix this problem in the first place, both at boot time for grub (set gfxpayload=keep) and when suspending/resuming, and we are now doing the opposite, thus breaking the code for some setups. Is it possible that the bit/register logic is not the same for all Radeon GPUs? After all, we already introduced a different path for DCE6.

I'll also try your patch when I'll get home tonight.
Comment 15 Alex Deucher 2012-11-05 16:20:59 UTC
(In reply to comment #14)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > Found what is wrong with the help of a few printk and by comparing to the
> > > code being replaced. All the logic is good (going through crtc, disabling
> > > them, waiting for vblank) BUT setting "tmp |=
> > > EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;" is wrong.
> > > 
> > > If I do as in the previous code by setting tmp = 0 and then continuing with:
> > > radeon_wait_for_vblank(rdev, i);
> > > WREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i], tmp);
> > > everything works fine as before.
> > > 
> > > What is expected from "tmp |= EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;"?
> > > From what I read with printk, it is far from a 0 or a 1. Is this normal?
> > 
> > That's the most important bit in the entire sequence.  It's a bit field in a
> > register (bit 24 to be exact).  That bit is the bit that actually disables
> > the requests from the display controller in the memory controller.  The
> > whole point of this code is to disable all clients of the memory controller
> > (mc_stop()) so that we can change the location of vram within the GPU's
> > address space.  Once we've moved vram, we can re-enable the clients
> > (mc_resume()) so that they point to the new vram location.
> 
> Thank you, you confirmed what I had assumed. I had already understood that
> it was the most important part in the sequence since it is associated to a
> "write" instruction. I don't have the (decimal/binary) values with me right
> now, but I'll be able to give you what was read from the register and from
> the result returned from the bitwise OR assignment we are pushing in the
> register. I'll confirm which bit(s) are changing. I'm sure the assignment
> was setting one (or more) bit in the register to "1". Is bit 24 the only one
> expected to change in the register? Should it be put to "1" or "0"?
> 

Setting bit 24 to 1 disables memory requests from the display controller.  Setting bit 24 to 0, enables memory requests from the display controller.

> Another question: why were we setting "0" in the register before? By doing
> so, we were setting the whole register to "0" (the whole 32 bits), right? If
> I remember correctly, from what I saw yesterday with the help of printk, it
> seems we are setting at least one bit to "1" in this register, which would
> be the opposite of what was being done before and therefore of what was
> working correctly with my video card. I'm just trying to understand why we
> were doing something in the first place that was working correctly and that
> was introduce to fix this problem in the first place, both at boot time for
> grub (set gfxpayload=keep) and when suspending/resuming, and we are now
> doing the opposite, thus breaking the code for some setups. Is it possible
> that the bit/register logic is not the same for all Radeon GPUs? After all,
> we already introduced a different path for DCE6.

Bit 0 for CRTC_CONTROL turns on/off the entire crtc.  If bit 0 is 0 the crtc is disabled.  If bit 0 is 1, the crtc is enabled.  If the crtc is disabled (bit 0 = 0) bit 24 is irrelevant.  There are separate bits to enable the crtc and disable memory requests since there are cases (like this one) where we want to keep the crtc timing running so that the monitor stays synced, but disable reads from memory so we can reconfigure the memory controller.  If we disable the crtc entirely, the monitor would lose sync and you would get additional flicker during boot up.  Ideally, eventually we'd like to just hand over control from the firmware without touching the display hardware so the user gets a flicker free boot experience.

DCE4 and 5 have the same logic and bit layout for these registers.  The logic is different on DCE6 chips.  On DCE6, the the memory controller request bit is now tied in with the crtc blanking bit.  When the crtc is blanked, memory requests are also disabled.
Comment 16 Alex Deucher 2012-11-05 16:34:12 UTC
Created attachment 69573 [details] [review]
possible fix

Actually, I think I found the problem.  Missing index in mc_resume().
Comment 17 Alexandre Demers 2012-11-05 22:46:53 UTC
I'm about to test patches. But before, as promised, here are the values retrieved read and written to the registers. By the way, I have only a single monitor.
tmp = RREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i]); -> 272696081 (0001 0000 0100 0001 0000 0011 0001 0001)
tmp |= EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE; -> 289473297 (0001 0001 0100 0001 0000 0011 0001 0001)

So, it is set as you explained me earlier. I'll be back soon with some news from the patches.
Comment 18 Alexandre Demers 2012-11-06 00:33:01 UTC
(In reply to comment #16)
> Created attachment 69573 [details] [review] [review]
> possible fix
> 
> Actually, I think I found the problem.  Missing index in mc_resume().

This seems to fix my resume problem I was experiencing where the display would come up, but then it would crash. However, it doesn't solve the boot/grub2 bug.

So, this patch should be kept (but not for the current bug).
Comment 19 Alexandre Demers 2012-11-06 01:02:44 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #12)
> > > (In reply to comment #11)
> > > > Found what is wrong with the help of a few printk and by comparing to the
> > > > code being replaced. All the logic is good (going through crtc, disabling
> > > > them, waiting for vblank) BUT setting "tmp |=
> > > > EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;" is wrong.
> > > > 
> > > > If I do as in the previous code by setting tmp = 0 and then continuing with:
> > > > radeon_wait_for_vblank(rdev, i);
> > > > WREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i], tmp);
> > > > everything works fine as before.
> > > > 
> > > > What is expected from "tmp |= EVERGREEN_CRTC_DISP_READ_REQUEST_DISABLE;"?
> > > > From what I read with printk, it is far from a 0 or a 1. Is this normal?
> > > 
> > > That's the most important bit in the entire sequence.  It's a bit field in a
> > > register (bit 24 to be exact).  That bit is the bit that actually disables
> > > the requests from the display controller in the memory controller.  The
> > > whole point of this code is to disable all clients of the memory controller
> > > (mc_stop()) so that we can change the location of vram within the GPU's
> > > address space.  Once we've moved vram, we can re-enable the clients
> > > (mc_resume()) so that they point to the new vram location.
> > 
> > Thank you, you confirmed what I had assumed. I had already understood that
> > it was the most important part in the sequence since it is associated to a
> > "write" instruction. I don't have the (decimal/binary) values with me right
> > now, but I'll be able to give you what was read from the register and from
> > the result returned from the bitwise OR assignment we are pushing in the
> > register. I'll confirm which bit(s) are changing. I'm sure the assignment
> > was setting one (or more) bit in the register to "1". Is bit 24 the only one
> > expected to change in the register? Should it be put to "1" or "0"?
> > 
> 
> Setting bit 24 to 1 disables memory requests from the display controller. 
> Setting bit 24 to 0, enables memory requests from the display controller.
> 
> > Another question: why were we setting "0" in the register before? By doing
> > so, we were setting the whole register to "0" (the whole 32 bits), right? If
> > I remember correctly, from what I saw yesterday with the help of printk, it
> > seems we are setting at least one bit to "1" in this register, which would
> > be the opposite of what was being done before and therefore of what was
> > working correctly with my video card. I'm just trying to understand why we
> > were doing something in the first place that was working correctly and that
> > was introduce to fix this problem in the first place, both at boot time for
> > grub (set gfxpayload=keep) and when suspending/resuming, and we are now
> > doing the opposite, thus breaking the code for some setups. Is it possible
> > that the bit/register logic is not the same for all Radeon GPUs? After all,
> > we already introduced a different path for DCE6.
> 
> Bit 0 for CRTC_CONTROL turns on/off the entire crtc.  If bit 0 is 0 the crtc
> is disabled.  If bit 0 is 1, the crtc is enabled.  If the crtc is disabled
> (bit 0 = 0) bit 24 is irrelevant.  There are separate bits to enable the
> crtc and disable memory requests since there are cases (like this one) where
> we want to keep the crtc timing running so that the monitor stays synced,
> but disable reads from memory so we can reconfigure the memory controller. 
> If we disable the crtc entirely, the monitor would lose sync and you would
> get additional flicker during boot up.  Ideally, eventually we'd like to
> just hand over control from the firmware without touching the display
> hardware so the user gets a flicker free boot experience.
> 
> DCE4 and 5 have the same logic and bit layout for these registers.  The
> logic is different on DCE6 chips.  On DCE6, the the memory controller
> request bit is now tied in with the crtc blanking bit.  When the crtc is
> blanked, memory requests are also disabled.

If I followed you correctly, setting bit 24 to "1" disables memory but keeps the CRTC state as it is (hopefully in sync with the monitor). However, what I see when grub2 kicks in with the "gfxpayload=keep" is an unsynced monitor. Sometimes the display will be black, other times it will only appear in the first couple of vertical lines, in others it will be vertically synced but shifted to the right at half or at two third of the screen. In other words, this really seems to be a sync problem. Should I try to combine patch 69113 and patch 69370 with the others?
Comment 20 Alexandre Demers 2012-11-06 22:32:02 UTC
Should I have a look at how grub2 handles and talks to the kernel when using gfxpayload=keep? A quick search shows this issue has been lurking for some time (2010) with various radeon card (https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/605614, http://frugalware.org/pipermail/frugalware-devel/2012-February/011500.html, http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=567245, http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=567393).

Could something be wrong when handling the framebuffer?
Comment 21 Alex Deucher 2012-11-06 22:50:08 UTC
(In reply to comment #19)
> If I followed you correctly, setting bit 24 to "1" disables memory but keeps
> the CRTC state as it is (hopefully in sync with the monitor). However, what
> I see when grub2 kicks in with the "gfxpayload=keep" is an unsynced monitor.
> Sometimes the display will be black, other times it will only appear in the
> first couple of vertical lines, in others it will be vertically synced but
> shifted to the right at half or at two third of the screen. In other words,
> this really seems to be a sync problem. Should I try to combine patch 69113
> and patch 69370 with the others?

Are you saying the monitor gets messed up when grub kicks in?  I.e., before the OS loads?  If so, that sounds like a grub issue.
Comment 22 Alexandre Demers 2012-11-06 22:55:08 UTC
(In reply to comment #21)
> (In reply to comment #19)
> > If I followed you correctly, setting bit 24 to "1" disables memory but keeps
> > the CRTC state as it is (hopefully in sync with the monitor). However, what
> > I see when grub2 kicks in with the "gfxpayload=keep" is an unsynced monitor.
> > Sometimes the display will be black, other times it will only appear in the
> > first couple of vertical lines, in others it will be vertically synced but
> > shifted to the right at half or at two third of the screen. In other words,
> > this really seems to be a sync problem. Should I try to combine patch 69113
> > and patch 69370 with the others?
> 
> Are you saying the monitor gets messed up when grub kicks in?  I.e., before
> the OS loads?  If so, that sounds like a grub issue.

I'll take some pictures or record a small video of what's going on. That will be easier to understand. But yes, I'm beginning to think Grub2 is playing a bad trick somewhere in there and is not playing nicely when handling the framebuffer.
Comment 23 Alex Deucher 2012-11-06 22:56:20 UTC
(In reply to comment #19)
> Should I try to combine patch 69113
> and patch 69370 with the others?

Worth a shot.
Comment 24 Alexandre Demers 2012-11-06 22:59:04 UTC
I'll retest it with the latest two patches, but I think setting gfxpayload=text prevents the bug. I know for sure that with the index patch, suspend/resume cycle is fixed. Also, if I remove gfxpayload=keep completely, my monitor works fine.
Comment 25 Florian Mickler 2012-11-11 18:55:03 UTC
A patch referencing this bug report has been merged in Linux v3.7-rc5:

commit 695ddeb457584a602f2ba117d08ce37cf6ec1589
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Nov 5 16:34:58 2012 +0000

    drm/radeon: fix typo in evergreen_mc_resume()
Comment 26 Alexandre Demers 2012-11-12 04:27:15 UTC
I've been playing with different options in Grub2 and the workaround for the boot sequence bug, beside reverting commit 62444b7462a2b98bc78d68736c03a7c4e66ba7e2, is to force gfxpayload=text.

I tried many things by changing gfxmode, forcing it to known good values, in combination with gfxpayload. But whatever I tried, if gfxpayload=keep, it was not synced correctly (as described, sometimes I had a black screen, some other times, it is vertically synced but shifted, etc.) As already explained, if I either revert commit 62444b7462a2b98bc78d68736c03a7c4e66ba7e2 or if I set tmp=0 before WREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i], tmp), which is disabling the whole crtc, the boot process is fine.

Also, I tested kernel 3.7.0-rc5 (which contains attachment 69573 [details] [review]). Since it also contains modified code in evergreen_mc_stop() from commit 62444b7462a2b98bc78d68736c03a7c4e66ba7e2, it crashes on resume. In other words, attachment 69573 [details] [review] fixes a part of the suspend/resume problem, but doesn't the part coming from evergreen_mc_stop().

While I can live with working around the boot process/grub2 problem, the suspend part is really annoying.
Comment 27 Alexandre Demers 2012-11-12 05:32:15 UTC
pictures/videos to come tomorrow
Comment 28 Alexandre Demers 2012-11-12 23:31:20 UTC
Alex, a simple question: you said bit 0 in EVERGREEN_CRTC_CONTROL stops the CRTC sync. With the culprit commit, when is it set? I mean, I had a quick look in the driver's code and I couldn't find it. When going in suspend state, shouldn't it be set to 0? Then, on resume, shouldn't it be set to 1? I may just have missed it, but could this be something missing? Same questions goes for GPU soft reset.

Before the culprit commit, we were setting bit 0 to 0 on stop and setting it back to 1 on resume, which was working great. Why aren't we doing it anymore when suspending and resuming?
Comment 29 Alex Deucher 2012-11-13 15:15:10 UTC
(In reply to comment #28)
> Alex, a simple question: you said bit 0 in EVERGREEN_CRTC_CONTROL stops the
> CRTC sync. With the culprit commit, when is it set? I mean, I had a quick
> look in the driver's code and I couldn't find it. When going in suspend
> state, shouldn't it be set to 0? Then, on resume, shouldn't it be set to 1?
> I may just have missed it, but could this be something missing? Same
> questions goes for GPU soft reset.

It's set by the vbios at boot before the OS loads.  Once the driver loads, it's then handled by the modesetting code.  crtcs are enabled or disabled based on what's connected and what displays the user has chosen to enable.

These functions have nothing to do with system suspend/hibernate and resume directly.  evergreen_mc_stop() and evergreen_mc_resume() are for disabling MC clients when we make changes to the GPU memory controller.  System suspend/hibernate are handled by radeon_suspend_kms() and radeon_resume_kms().  radeon_suspend_kms() explicitly disables all of the display hardware:

list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
       drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
}

And then the displays are enabled again in radeon_resume_kms():

/* blat the mode back in */
drm_helper_resume_force_mode(dev);
/* turn on display hw */
list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
	drm_helper_connector_dpms(connector, DRM_MODE_DPMS_ON);
}

> 
> Before the culprit commit, we were setting bit 0 to 0 on stop and setting it
> back to 1 on resume, which was working great. Why aren't we doing it anymore
> when suspending and resuming?

The previous code[1] didn't do that.  It set bit 0 to 0 in evergreen_mc_stop() and then left it disabled in evergreen_mc_resume().  The crtc was not subsequently re-enabled until the displays were set up later by the modesetting code.  Prior to that patch[1], we just saved and restored the value of the register which is basically what you are proposing, but that didn't work otherwise we wouldn't have needed the subsequent patches.

We'd like to avoid disabling the crtc during driver load or GPU reset to avoid the visible flicker it causes.

[1]: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=81ee8fb6b52ec69eeed37fe7943446af1dccecc5
Comment 30 Alexandre Demers 2012-11-13 15:45:32 UTC
(In reply to comment #29)
> (In reply to comment #28)
> > Alex, a simple question: you said bit 0 in EVERGREEN_CRTC_CONTROL stops the
> > CRTC sync. With the culprit commit, when is it set? I mean, I had a quick
> > look in the driver's code and I couldn't find it. When going in suspend
> > state, shouldn't it be set to 0? Then, on resume, shouldn't it be set to 1?
> > I may just have missed it, but could this be something missing? Same
> > questions goes for GPU soft reset.
> 
> It's set by the vbios at boot before the OS loads.  Once the driver loads,
> it's then handled by the modesetting code.  crtcs are enabled or disabled
> based on what's connected and what displays the user has chosen to enable.
> 
> These functions have nothing to do with system suspend/hibernate and resume
> directly.  evergreen_mc_stop() and evergreen_mc_resume() are for disabling
> MC clients when we make changes to the GPU memory controller.  System
> suspend/hibernate are handled by radeon_suspend_kms() and
> radeon_resume_kms().  radeon_suspend_kms() explicitly disables all of the
> display hardware:
> 
> list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
>        drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
> }
> 
> And then the displays are enabled again in radeon_resume_kms():
> 
> /* blat the mode back in */
> drm_helper_resume_force_mode(dev);
> /* turn on display hw */
> list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
> 	drm_helper_connector_dpms(connector, DRM_MODE_DPMS_ON);
> }
> 
> > 
> > Before the culprit commit, we were setting bit 0 to 0 on stop and setting it
> > back to 1 on resume, which was working great. Why aren't we doing it anymore
> > when suspending and resuming?
> 
> The previous code[1] didn't do that.  It set bit 0 to 0 in
> evergreen_mc_stop() and then left it disabled in evergreen_mc_resume().  The
> crtc was not subsequently re-enabled until the displays were set up later by
> the modesetting code.  Prior to that patch[1], we just saved and restored
> the value of the register which is basically what you are proposing, but
> that didn't work otherwise we wouldn't have needed the subsequent patches.
> 
> We'd like to avoid disabling the crtc during driver load or GPU reset to
> avoid the visible flicker it causes.
> 
> [1]:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;
> h=81ee8fb6b52ec69eeed37fe7943446af1dccecc5

Your explanations are appreciated, but that kills another idea. By the way, pictures and videos are coming, my camera had no power yesterday and I had to charge it during last night.
Comment 31 Alexandre Demers 2012-11-14 16:14:52 UTC
Here are the videos. Sorry, they were recorded with my iPod and they are inverted.

Normal boot sequence before the culprit commit (kernel 3.6)
http://www.filedropper.com/k36
A normal boot sequence with kernel 3.6 from grub 2 to booting console. Resolution is kept from grub to console, there is no sync problem and no purple flicker.

A problematic boot with kernel 3.7-rc5 using gfxpayload=keep in Grub 2.
http://www.filedropper.com/k37keep
It starts normally, the booting console uses the same resolution as Grub 2. However, possibly when radeon driver kicks in, there is a purple flicker followed by a wrong (badly synced) display (shifted by about half a screen to the right). The result could have been a black screen, a couple of interlaced vertical lines or a different shifted display.

A non problematic boot with kernel 3.7-rc5 using gfxpayload=text in Grub 2.
http://www.filedropper.com/k37text
It doesn't show the purple flicker we see in the previous video, it just flickers as a normal boot would do, but it does show the purple color from time to time. However, the display is OK.

A problematic suspend/resume sequence with 3.7-rc5 (gfxpayload doesn't change anything)
http://www.filedropper.com/k37suspend
Going in suspend mode just after booting up. When resuming, X/gdm briefly appears, then it crashes. The computer continue to run, but the display stays black and no input can be used (tried to switch console and reboot but it doesn't work). Often, it will reboot by itself after some time. Suspend and resume work great before the culprit commit or if I remove the changes in "...mc_stop()"
Comment 32 Alexandre Demers 2012-11-18 17:05:05 UTC
Created attachment 70223 [details]
Partial kernel.log

Here is a partial kernel.log tracking what is going on at boot with some debug printk in evergreen_mc_stop() and evergreen_mc_resume(). Strangely, only one crtc is saved as enabled in the _stop function, but four are restored in the _resume function. Pretty sure it's not the problem we are looking for, but it is abnormal since only one crtc should be restored on resume. This is a bug by itself showing a problem in the save(stop)/restore(resume) mechanism.
Comment 33 Alexandre Demers 2012-11-18 19:04:24 UTC
(In reply to comment #32)
> Created attachment 70223 [details]
> Partial kernel.log
> 
> Here is a partial kernel.log tracking what is going on at boot with some
> debug printk in evergreen_mc_stop() and evergreen_mc_resume(). Strangely,
> only one crtc is saved as enabled in the _stop function, but four are
> restored in the _resume function. Pretty sure it's not the problem we are
> looking for, but it is abnormal since only one crtc should be restored on
> resume. This is a bug by itself showing a problem in the
> save(stop)/restore(resume) mechanism.

I should have said "... but six are restored..." instead of "... but four are restored..." I also made a small error in a debug string, but it doesn't change the read value. If you have any question on my patch, let me know.
Comment 34 Alexandre Demers 2012-11-18 19:40:06 UTC
Created attachment 70232 [details]
Partial kernel.log - fixed string

Same partial kernel.log with fixed string.
Comment 35 Alexandre Demers 2012-11-18 19:41:44 UTC
Created attachment 70233 [details]
patch applied on 3.7.0-rc6 for debug output

Debug patch applied on kernel 3.7.0-rc6 to log debug info.
Comment 36 Alex Deucher 2012-11-19 14:05:23 UTC
(In reply to comment #33)
> > Here is a partial kernel.log tracking what is going on at boot with some
> > debug printk in evergreen_mc_stop() and evergreen_mc_resume(). Strangely,
> > only one crtc is saved as enabled in the _stop function, but four are
> > restored in the _resume function. Pretty sure it's not the problem we are
> > looking for, but it is abnormal since only one crtc should be restored on
> > resume. This is a bug by itself showing a problem in the
> > save(stop)/restore(resume) mechanism.
> 
> I should have said "... but six are restored..." instead of "... but four
> are restored..." I also made a small error in a debug string, but it doesn't
> change the read value. If you have any question on my patch, let me know.

That issue is fixed in attachment 69573 [details] [review] which is already upstream.
Comment 37 Alexandre Demers 2012-11-19 14:18:13 UTC
(In reply to comment #36)
> (In reply to comment #33)
> > > Here is a partial kernel.log tracking what is going on at boot with some
> > > debug printk in evergreen_mc_stop() and evergreen_mc_resume(). Strangely,
> > > only one crtc is saved as enabled in the _stop function, but four are
> > > restored in the _resume function. Pretty sure it's not the problem we are
> > > looking for, but it is abnormal since only one crtc should be restored on
> > > resume. This is a bug by itself showing a problem in the
> > > save(stop)/restore(resume) mechanism.
> > 
> > I should have said "... but six are restored..." instead of "... but four
> > are restored..." I also made a small error in a debug string, but it doesn't
> > change the read value. If you have any question on my patch, let me know.
> 
> That issue is fixed in attachment 69573 [details] [review] [review] which is already
> upstream.

I know it should have been fixed and that's why I added a printk(KERN_INFO "Restoring crtc[%d] since it was enabled.\n", i) just after it to check it. For some reason it seems to not be working properly. Have a look at my patch (attachment 70233 [details]) which was written on a 3.7.0-rc6 kernel. I'll add some other printk when I'll get home (like a printk just before this check to output its value).
Comment 38 Alex Deucher 2012-11-19 14:19:51 UTC
Created attachment 70255 [details] [review]
fix save for real

Should be fixed with this patch.
Comment 39 Alexandre Demers 2012-11-19 23:31:22 UTC
Comment on attachment 70255 [details] [review]
fix save for real

Review of attachment 70255 [details] [review]:
-----------------------------------------------------------------

Tested. It fixes the last reported problem. Only enabled crtc are restored.
Comment 40 Alexandre Demers 2012-11-20 06:08:54 UTC
In evergreen_reg.h, /* CRTC blocks at 0x6df0, 0x79f0, 0x105f0, 0x111f0, 0x11df0, 0x129f0 */ is not used to define the registers following this comment it. It seems to correspond to display controller offsets.

Is the comment at the wrong place or are the registers not defined with the right addresses?
Comment 41 Alex Deucher 2012-11-20 15:23:15 UTC
(In reply to comment #40)
> In evergreen_reg.h, /* CRTC blocks at 0x6df0, 0x79f0, 0x105f0, 0x111f0,
> 0x11df0, 0x129f0 */ is not used to define the registers following this
> comment it. It seems to correspond to display controller offsets.
> 
> Is the comment at the wrong place or are the registers not defined with the
> right addresses?

They are correct.  The crtc blocks are repeated at different offsets within the register aperture.  So if you wanted to access EVERGREEN_CRTC_CONTROL for crtc 0, you access: EVERGREEN_CRTC_CONTROL + EVERGREEN_CRTC0_REGISTER_OFFSET, for crtc 1:
EVERGREEN_CRTC_CONTROL + EVERGREEN_CRTC1_REGISTER_OFFSET, etc.
Comment 42 Alexandre Demers 2012-11-28 06:44:02 UTC
Going fishing: Alex, you said you were unable to reproduce this bug with your cayman device. Could it be related to the display connector? I'm using the DVI connector from my card and it is connected to a VGA connector on my monitor through a DVI -> VGA adaptor. I may be able to test with a different monitor/connector if you think this could be related.
Comment 43 Alex Deucher 2012-11-28 14:29:28 UTC
(In reply to comment #42)
> Going fishing: Alex, you said you were unable to reproduce this bug with
> your cayman device. Could it be related to the display connector? I'm using
> the DVI connector from my card and it is connected to a VGA connector on my
> monitor through a DVI -> VGA adaptor. I may be able to test with a different
> monitor/connector if you think this could be related.

You can try , but I doubt it will make a difference.  It seem to be related to the display controllers and access to memory when we make changes to the MC, so the connector type shouldn't matter.
Comment 44 Alexandre Demers 2012-11-28 17:08:37 UTC
Could this new commit 0980633afd9c7eecc0c75ef3bea4d3c6b7aa1898 help (drm/radeon: track global bo name and always return the same)?
Comment 45 Alexandre Demers 2013-01-05 05:57:18 UTC
If I had a virtual machine, would it help debug?
Comment 46 John Lenz 2013-01-31 00:52:25 UTC
Hi, I recently started having the same problem. I am running Arch Linux x86_64 with the 3.7.5 kernel.

After bisecting to 7c6b6eb9c4dbf21b3358bd1e68b9f1880e364a4b a google search found me this bug.  Should maybe have searched for the bug ahead of time to save me some work :)

I can confirm that changing grub.cfg to use "set gfxpayload=text" fixes the problem, I haven't tested any of the proposed patches yet, although would be available to test some.  Just a little about my setup.

lspci

00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780/RS880 PCI to PCI bridge (int gfx)
00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (ext gfx port 0)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] RS780/RS880 PCI to PCI bridge (PCIE port 5)
00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
00:12.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.1 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0 USB OHCI1 Controller
00:12.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.1 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0 USB OHCI1 Controller
00:13.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller (rev 3a)
00:14.1 IDE interface: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 IDE Controller
00:14.2 Audio device: Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC host controller
00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI Bridge
00:14.5 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
01:05.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI RS780D [Radeon HD 3300]
01:05.1 Audio device: Advanced Micro Devices [AMD] nee ATI RS780 HDMI Audio [Radeon HD 3000-3300 Series]
02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Turks [Radeon HD 6670]
02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Turks/Whistler HDMI Audio [Radeon HD 6000 Series]
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
04:0e.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)


I have three monitors connected: two monitors connected to the onboard HD 3300 and one monitor connected to the pci card HD 6670.  During the bios and grub, the two monitors connected to the onboard card are blank (ctrc off) while the monitor for the 6670 is showing the bios and grub.  Then I see the boot messages for the kernel and then systemd scroll by for about one second before the monitor connected to the 6670 goes black and starts kind of flashing.  Strangely enough, even though I can't see anything, I can log in on the console and run startx.  This starts X and X appears normally on the two monitors connected to the onboard card, just the monitor connected to the 6670 is still flashing and not displaying X.
Comment 47 Florian Mickler 2013-02-23 10:28:06 UTC
A patch referencing this bug report has been merged in Linux v3.8-rc7:

commit ed39fadd6df01095378e499fac3674883f16b853
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Thu Jan 31 09:00:52 2013 -0500

    drm/radeon/evergreen+: wait for the MC to settle after MC blackout
Comment 48 Alexandre Demers 2013-02-24 03:21:51 UTC
(In reply to comment #47)
> A patch referencing this bug report has been merged in Linux v3.8-rc7:
> 
> commit ed39fadd6df01095378e499fac3674883f16b853
> Author: Alex Deucher <alexander.deucher@amd.com>
> Date:   Thu Jan 31 09:00:52 2013 -0500
> 
>     drm/radeon/evergreen+: wait for the MC to settle after MC blackout

Still not fixed with 3.8.0
Comment 49 Alexandre Demers 2013-02-25 01:56:17 UTC
What if the problem is not from this code, but underneath? I'll try to suspend and resume without having Xorg running. Would that help in any way?
Comment 50 Alex Deucher 2013-02-25 13:57:16 UTC
(In reply to comment #49)
> What if the problem is not from this code, but underneath? I'll try to
> suspend and resume without having Xorg running. Would that help in any way?

If the commit you bisected is actually the culprit, this is an issue with the memory controller and the displays on the GPU.  When we reprogram the memory controller we need to stop all the GPU memory clients.  We used to disable the display controllers but this caused additional flicker on boot up and caused hangs on some cards.  So we switched to the current method which was recommended by the hardware team.  This avoids the flicker by just stopping the MC interface in the displays but leaving them enabled and also fixes hangs related to this on a number of chips.  Unfortunately, it seems to cause other problems in certain causes.
Comment 51 Alexandre Demers 2013-03-11 04:41:03 UTC
(In reply to comment #50)
> (In reply to comment #49)
> > What if the problem is not from this code, but underneath? I'll try to
> > suspend and resume without having Xorg running. Would that help in any way?
> 
> If the commit you bisected is actually the culprit, this is an issue with
> the memory controller and the displays on the GPU.  When we reprogram the
> memory controller we need to stop all the GPU memory clients.  We used to
> disable the display controllers but this caused additional flicker on boot
> up and caused hangs on some cards.  So we switched to the current method
> which was recommended by the hardware team.  This avoids the flicker by just
> stopping the MC interface in the displays but leaving them enabled and also
> fixes hangs related to this on a number of chips.  Unfortunately, it seems
> to cause other problems in certain causes.

Since I had just installed kernel 3.9.0-rc2, I tried a suspend and resume cycle. As expected, it end up frozen (I saw some text from the console, Xorg/Gnome tried to display something, couldn't, reset, couldn't, reset and froze). I then rebooted and limited the runlevel to 2 (no Xorg in the way). From the command line, I launched a pm-suspend. Once I woke up my computer, everything was responsive as expected. Does it help in anyway?
Comment 52 Alexandre Demers 2013-03-11 04:51:20 UTC
Also, nice to know: a suspend/resume cycle under Gnome-Shell or KDE will end up hung. However, a suspend/resume cycle under XFCE will work correctly.

In other words, the problem seems not to be with the memory controller, but with something that as to do with how the OpenGL/3D/else state is being restored.
Comment 53 Alex Deucher 2013-03-11 13:38:27 UTC
(In reply to comment #52)
> Also, nice to know: a suspend/resume cycle under Gnome-Shell or KDE will end
> up hung. However, a suspend/resume cycle under XFCE will work correctly.
> 
> In other words, the problem seems not to be with the memory controller, but
> with something that as to do with how the OpenGL/3D/else state is being
> restored.

Both use the 3D engine.  The 3D driver uses VM however, so it's probably a duplicate of bug 60439.

*** This bug has been marked as a duplicate of bug 60439 ***