Bug 68235

Summary: Display freezes after login with kernel 3.11.0-rc5 on Cayman with dpm=1
Product: DRI Reporter: Alexandre Demers <alexandre.f.demers>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: frederic.romagne, vmerlet
Version: XOrg git   
Hardware: Other   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=69721
https://bugs.freedesktop.org/show_bug.cgi?id=69723
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
errors when freeze happens
none
Cayman 6950 XFX vbios
none
disable various dpm features
none
dpm=1 with partial patch applied on 3.11.0
none
testing patch
none
testing patch - force mclk to high
none
testing patch - force mclk to high
none
mclk debugging pll debugging output
none
dmesg with 86147
none
patch 1/2
none
patch 2/2 none

Description Alexandre Demers 2013-08-18 01:33:30 UTC
I was testing kernel 3.11.0-rc5 and ended up with my display freezing after login (2 tests: one this morning, one tonight). It always freezes when dpm=1, but it doesn't if disabled.

The result on my screen looks like screenshot posted in bug 66963 (https://bugs.freedesktop.org/attachment.cgi?id=83470)

So I connected through ssh and I got some error in dmesg just after display froze. I'll be attaching my errors.log file in a moment. You'll see 

VM is also enabled. I could try without it.

Also, it doesn't freezes with commit 69e0b57, I've been using and testing it regularly for the last week and half. So I could bisect if we don't have enough info.
Comment 1 Alexandre Demers 2013-08-18 01:35:29 UTC
Created attachment 84186 [details]
errors when freeze happens

Errors logged from my two last try at booting and logging with kernel 3.11.0-rc5 when dpm=1 with RADEON_va=1
Comment 2 Alexandre Demers 2013-08-22 04:42:20 UTC
I began bisecting tonight. Rc2 was already having this bug. More news to come before the weekend.
Comment 3 Alexandre Demers 2013-08-23 02:33:58 UTC
kernel 3.11.0-rc1 was experiencing a bug, but not the one seen in rc2 and beyond. I'll dig on the "fix" that brought us to the state seen since rc2. If nothing can be found, I'll go up the drm-next branch that was included in rc1.
Comment 4 Alexandre Demers 2013-08-25 03:50:07 UTC
After bisect in one direction, I've ended up with the following commit:
f90555cbe629e14c6af1dcec1933a3833ecd321f is the first bad commit
commit f90555cbe629e14c6af1dcec1933a3833ecd321f
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Jul 17 16:34:12 2013 -0400

    drm/radeon/dpm/atom: fix broken gcc harder
    
    See bugs:
    https://bugs.freedesktop.org/show_bug.cgi?id=66932
    https://bugs.freedesktop.org/show_bug.cgi?id=66972
    https://bugs.freedesktop.org/show_bug.cgi?id=66945
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 c32ad9a80c5356236e935eeb5198683727b9d00d eb5aa1083eb33e7b9aebebdb310dda0399152e87 M	drivers

Now, I must say this commit actually fixes a visual problem after commit 69e0b57 (which is a good commit over here without any known problem). So, I'll dig in the other direction to find which commit broke the known good state.
Comment 5 Alex Deucher 2013-08-26 15:13:10 UTC
You might try this branch in case gcc is having problems with the variable sized arrays used in the driver:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.12-wip-gcc-fixes
Comment 6 Alexandre Demers 2013-08-26 15:53:47 UTC
(In reply to comment #5)
> You might try this branch in case gcc is having problems with the variable
> sized arrays used in the driver:
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.12-wip-gcc-fixes

Ok, I'll try it tonight.

About the bisection I'm doing on the other direction (to find what broke the display), I should also be able to narrow it down tonight.
Comment 7 Alexandre Demers 2013-08-27 05:10:24 UTC
Hi Alex. I'm about to test your suggestion. Meanwhile, I identified the original commit that broke the driver before being fixed by f90555cbe629e14c6af1dcec1933a3833ecd321f (but ending by the display hanging, eventhough I can connect through ssh)

So the first bad commit was:
7ad8d0687bb5030c3328bc7229a3183ce179ab25 is the first bad commit
commit 7ad8d0687bb5030c3328bc7229a3183ce179ab25
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Jul 1 16:07:18 2013 -0400

    drm/radeon/dpm: re-enable state transitions for Cayman
    
    Was disabled due to stability issues on certain boards
    caused by the a bug in the parsing of the atom mc reg tables.
    That's fixed now so re-enable.
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 de8dfc2a15d5114e81636811d7e3b39c15fc515b d0e1ee828f10456d39e2ab30cc6598203e50fa6e M	drivers

Heading for your suggestion right away with http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.12-wip-gcc-fixes.
Comment 8 Alexandre Demers 2013-08-27 06:32:58 UTC
Tested with http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.12-wip-gcc-fixes and it does exactly the same thing: it boots fine, show the login screen. I can even login in if it doesn't hang right away. Then, it will hang at some point (either at login screen or after loading the desktop). It displays generally grey vertical bars.
Comment 9 Alexandre Demers 2013-09-03 03:57:58 UTC
Still the same with kernel 3.11.0. Tried with VM=0, aspm=0, disconnected my UPC (just in case it was something with a "battery" state or something similar), tried Gnome 3 and XFCE, all the same. The only thing working for now is to set dpm=0 or to force ret=1 in ni_dpm_set_power_state when checking what ni_restrict_performance_levels_before_switch answered.

However, I don't know if the problem is with ni_dpm_set_power_state or with something executed after, so I'll play in there.
Comment 10 Alexandre Demers 2013-09-03 04:47:36 UTC
If ret=1 just after ni_restrict_performance_levels_before_switch(), ni_dpm_set_power_state() doesn't go any further and there is no hang. So, it seems like if the problem is not with ni_restrict_performance_levels_before_switch() but instead with a combination of some sort.
Comment 11 Alexandre Demers 2013-09-03 23:42:15 UTC
So, after getting out at different points from ni_dpm_set_power_state(), it seems I can go down to ni_power_control_set_level() without problem. However, if I move to the next call which is ret = ni_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO), it hangs.

Could it be because we are setting something wrong in auto performance level? I'll be attaching my vbios just in case.
Comment 12 Alexandre Demers 2013-09-03 23:43:02 UTC
Created attachment 85157 [details]
Cayman 6950 XFX vbios
Comment 13 Alexandre Demers 2013-09-07 00:12:03 UTC
Is there anything else I can do to give a better idea of what is happening and why it crashes?

 If this can be of any value, my 6950 is of the following model: XFX HD-695X-ZNDC (1GB DDR5, 830MHz Core Clock and 5200MHz Memory Clock)
Comment 14 Alex Deucher 2013-09-10 17:31:00 UTC
Created attachment 85578 [details] [review]
disable various dpm features

I would suggest disabling various dpm features and see if you can narrow down which, if any, help.  This patch disables just about everything.

ni_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO) is what actually sets the dynamic performance switching into motion.  Prior to that the hw is locked into the low performance level.  I sounds like there is some bad parameter that is causing a lock up when the smc enables state switching.

Separate from the patch can you also try changing the ni_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO) call in ni_dpm_set_power_state() to low (RADEON_DPM_FORCED_LEVEL_LOW) or high (RADEON_DPM_FORCED_LEVEL_HIGH) rather than auto?  See if you still get a lock up.
Comment 15 Alex Deucher 2013-09-10 18:15:26 UTC
Another thing worth checking, what is the value of module_index passed to radeon_atom_init_mc_reg_table() in ni_initialize_mc_reg_table() in ni_dpm.c on your system?
Comment 16 Alexandre Demers 2013-09-10 19:55:00 UTC
(In reply to comment #14)
> Created attachment 85578 [details] [review] [review]
> disable various dpm features
> 
> I would suggest disabling various dpm features and see if you can narrow
> down which, if any, help.  This patch disables just about everything.
> 
> ni_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO) is what
> actually sets the dynamic performance switching into motion.  Prior to that
> the hw is locked into the low performance level.  I sounds like there is
> some bad parameter that is causing a lock up when the smc enables state
> switching.
> 
> Separate from the patch can you also try changing the
> ni_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO) call in
> ni_dpm_set_power_state() to low (RADEON_DPM_FORCED_LEVEL_LOW) or high
> (RADEON_DPM_FORCED_LEVEL_HIGH) rather than auto?  See if you still get a
> lock up.

I'll try it later today.
Comment 17 Alexandre Demers 2013-09-10 19:55:37 UTC
(In reply to comment #15)
> Another thing worth checking, what is the value of module_index passed to
> radeon_atom_init_mc_reg_table() in ni_initialize_mc_reg_table() in ni_dpm.c
> on your system?

How can I get it? Should I print it in dmesg?
Comment 18 Alex Deucher 2013-09-10 20:40:55 UTC
(In reply to comment #17)
> (In reply to comment #15)
> > Another thing worth checking, what is the value of module_index passed to
> > radeon_atom_init_mc_reg_table() in ni_initialize_mc_reg_table() in ni_dpm.c
> > on your system?
> 
> How can I get it? Should I print it in dmesg?

yes, that would be great.
Comment 19 Alexandre Demers 2013-09-11 01:46:35 UTC
(In reply to comment #16)
> (In reply to comment #14)
> > Created attachment 85578 [details] [review] [review] [review]
> > disable various dpm features
> > 
> > I would suggest disabling various dpm features and see if you can narrow
> > down which, if any, help.  This patch disables just about everything.
> > 
> > ni_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO) is what
> > actually sets the dynamic performance switching into motion.  Prior to that
> > the hw is locked into the low performance level.  I sounds like there is
> > some bad parameter that is causing a lock up when the smc enables state
> > switching.
> > 
> > Separate from the patch can you also try changing the
> > ni_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO) call in
> > ni_dpm_set_power_state() to low (RADEON_DPM_FORCED_LEVEL_LOW) or high
> > (RADEON_DPM_FORCED_LEVEL_HIGH) rather than auto?  See if you still get a
> > lock up.
> 
> I'll try it later today.

I had time for now to play with forcing RADEON_DPM_FORCED_LEVEL_LOW and RADEON_DPM_FORCED_LEVEL_HIGH. The first one works fine, the second triggers the problem.

I'm about to play with the suggested patch.
Comment 20 Alexandre Demers 2013-09-12 04:00:25 UTC
Ok, if I apply the whole suggested patch but the following, it hangs:
@@ -4152,14 +4152,14 @@ int ni_dpm_init(struct radeon_device *rdev)
 	}
 	ni_pi->mclk_rtt_mode_threshold = eg_pi->mclk_edc_wr_enable_threshold;
 
-	pi->voltage_control =
-		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
+	pi->voltage_control = false;
+//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
 
-	pi->mvdd_control =
-		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_MVDDC, 0);
+	pi->mvdd_control = false;
+//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_MVDDC, 0);
 
-	eg_pi->vddci_control =
-		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDCI, 0);
+	eg_pi->vddci_control = false;
+//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDCI, 0);
 
 	rv770_get_engine_memory_ss(rdev);
 
I'll try to play with that a bit and I'll come back. I also still have to give you the module_index.
Comment 21 Alexandre Demers 2013-09-12 05:36:24 UTC
Adding printk(KERN_DEBUG "DEBUG: about to pass the following value of module_index to radeon_atom_init_mc_reg_table(): %d", module_index); just before calling radeon_atom_init_mc_reg_table() returns 2.
Comment 22 Alex Deucher 2013-09-12 13:26:43 UTC
(In reply to comment #20)
> Ok, if I apply the whole suggested patch but the following, it hangs:
> @@ -4152,14 +4152,14 @@ int ni_dpm_init(struct radeon_device *rdev)
>  	}
>  	ni_pi->mclk_rtt_mode_threshold = eg_pi->mclk_edc_wr_enable_threshold;
>  
> -	pi->voltage_control =
> -		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
> +	pi->voltage_control = false;
> +//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
>  
> -	pi->mvdd_control =
> -		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_MVDDC, 0);
> +	pi->mvdd_control = false;
> +//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_MVDDC, 0);
>  
> -	eg_pi->vddci_control =
> -		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDCI, 0);
> +	eg_pi->vddci_control = false;
> +//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDCI, 0);
>  
>  	rv770_get_engine_memory_ss(rdev);

So does just applying this portion of the patch by itself fix the hang?
Comment 23 Alex Deucher 2013-09-12 13:31:33 UTC
(In reply to comment #21)
> Adding printk(KERN_DEBUG "DEBUG: about to pass the following value of
> module_index to radeon_atom_init_mc_reg_table(): %d", module_index); just
> before calling radeon_atom_init_mc_reg_table() returns 2.

Ok, that looks good.
Comment 24 Alexandre Demers 2013-09-12 14:53:31 UTC
(In reply to comment #22)
> (In reply to comment #20)
> > Ok, if I apply the whole suggested patch but the following, it hangs:
> > @@ -4152,14 +4152,14 @@ int ni_dpm_init(struct radeon_device *rdev)
> >  	}
> >  	ni_pi->mclk_rtt_mode_threshold = eg_pi->mclk_edc_wr_enable_threshold;
> >  
> > -	pi->voltage_control =
> > -		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
> > +	pi->voltage_control = false;
> > +//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
> >  
> > -	pi->mvdd_control =
> > -		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_MVDDC, 0);
> > +	pi->mvdd_control = false;
> > +//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_MVDDC, 0);
> >  
> > -	eg_pi->vddci_control =
> > -		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDCI, 0);
> > +	eg_pi->vddci_control = false;
> > +//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDCI, 0);
> >  
> >  	rv770_get_engine_memory_ss(rdev);
> 
> So does just applying this portion of the patch by itself fix the hang?

Applying just this returns an error when booting: ni_upload_sw_state failed, but obviously the system doesn't hang after that (though it can't change its performance state)
Comment 25 Alexandre Demers 2013-09-13 06:04:04 UTC
(In reply to comment #22)
> (In reply to comment #20)
> > Ok, if I apply the whole suggested patch but the following, it hangs:
> > @@ -4152,14 +4152,14 @@ int ni_dpm_init(struct radeon_device *rdev)
> >  	}
> >  	ni_pi->mclk_rtt_mode_threshold = eg_pi->mclk_edc_wr_enable_threshold;
> >  
> > -	pi->voltage_control =
> > -		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
> > +	pi->voltage_control = false;
> > +//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
> >  
> > -	pi->mvdd_control =
> > -		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_MVDDC, 0);
> > +	pi->mvdd_control = false;
> > +//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_MVDDC, 0);
> >  
> > -	eg_pi->vddci_control =
> > -		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDCI, 0);
> > +	eg_pi->vddci_control = false;
> > +//		radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDCI, 0);
> >  
> >  	rv770_get_engine_memory_ss(rdev);
> 
> So does just applying this portion of the patch by itself fix the hang?

The only way I don't have a "ni_upload_sw_state failed" is by letting pi->voltage_control = radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0); However, I inevitably end up with a hang either at login or when my session is loading (however, going in a terminal before it hangs prevents any hang from happening as long as I stay in terminal).

If I patch that part of code, I always have the "ni_upload_sw_state failed" error, thus not hanging but preventing any dpm.

I can patch everything else or nothing at all (I tried different combinations) and they don't seem to change a thing about the hang.
Comment 26 Alex Deucher 2013-09-13 19:13:25 UTC
Can you attach your dmesg with dpm enabled?
Comment 27 Alexandre Demers 2013-09-13 22:25:12 UTC
(In reply to comment #26)
> Can you attach your dmesg with dpm enabled?

Do you mean with the patch applied (total and/or problematic part left alone)?
Comment 28 Alex Deucher 2013-09-13 22:32:17 UTC
(In reply to comment #27)
> (In reply to comment #26)
> > Can you attach your dmesg with dpm enabled?
> 
> Do you mean with the patch applied (total and/or problematic part left
> alone)?

Doesn't matter.  I just want to see the basic driver output and power state list.
Comment 29 Alexandre Demers 2013-09-14 01:19:00 UTC
Created attachment 85798 [details]
dpm=1 with partial patch applied on 3.11.0

dmesg output when dpm=1 with partial patch applied (deactivation of pretty much everything but one to pass ni_upload_sw_state)
Comment 30 Alexandre Demers 2013-09-17 14:19:14 UTC
If there were any fixes pushed in kernel 3.12-rc1, none changed anything.
Comment 31 Alex Deucher 2013-09-17 14:34:25 UTC
Created attachment 85989 [details] [review]
testing patch

Try this patch independent from any other patches.  It forces the engine and memory clocks of all performance levels within a power state to the lowest level.  If it works, then try and comment out either the sclk part or the mclk part and see if either helps.  That should help us narrow down whether it's a mclk problem or an sclk problem.
Comment 32 Alexandre Demers 2013-09-17 22:54:25 UTC
(In reply to comment #31)
> Created attachment 85989 [details] [review] [review]
> testing patch
> 
> Try this patch independent from any other patches.  It forces the engine and
> memory clocks of all performance levels within a power state to the lowest
> level.  If it works, then try and comment out either the sclk part or the
> mclk part and see if either helps.  That should help us narrow down whether
> it's a mclk problem or an sclk problem.

Running with the patch works fine over a vanilla kernel 3.12-rc1. The following works also fine:
//	if (pl->sclk > 25000)
//		pl->sclk = 25000;
	if (pl->mclk > 15000)
		pl->mclk = 15000;
Which means sclk is working properly.

However, the opposite results in a blank screen before I can even get at the login screen. It seems mclk is the problematic part.
Comment 33 Alex Deucher 2013-09-18 21:44:15 UTC
Created attachment 86111 [details] [review]
testing patch - force mclk to high

Try this patch by itself.  This patch will force the mclk to the highest for all performance levels. If it works, the issue is probably related to the changing of mclks, if not, then we are probably programming one of the mclk parameters wrong.
Comment 34 Alex Deucher 2013-09-18 21:46:46 UTC
Created attachment 86112 [details] [review]
testing patch - force mclk to high

Sorry, had some garbage in my tree.  use this one instead.
Comment 35 Alexandre Demers 2013-09-19 00:37:48 UTC
(In reply to comment #34)
> Created attachment 86112 [details] [review] [review]
> testing patch - force mclk to high
> 
> Sorry, had some garbage in my tree.  use this one instead.

Tested and the screen ended up blank or frozen somewhere near when Xorg and gdm are being launched (tried twice). Before that, the console was being displayed OK.
Comment 36 Alexandre Demers 2013-09-19 05:32:35 UTC
A test of my own:
diff --git a/drivers/gpu/drm/radeon/ni_dpm.c b/drivers/gpu/drm/radeon/ni_dpm.c
index f7b625c..c1875d2 100644
--- a/drivers/gpu/drm/radeon/ni_dpm.c
+++ b/drivers/gpu/drm/radeon/ni_dpm.c
@@ -3952,10 +3952,14 @@ static void ni_parse_pplib_clock_info(struct radeon_device *rdev,
        pl->mclk = le16_to_cpu(clock_info->evergreen.usMemoryClockLow);
        pl->mclk |= clock_info->evergreen.ucMemoryClockHigh << 16;
 
+       pl->mclk = 100000;
+
        pl->vddc = le16_to_cpu(clock_info->evergreen.usVDDC);
        pl->vddci = le16_to_cpu(clock_info->evergreen.usVDDCI);
        pl->flags = le32_to_cpu(clock_info->evergreen.ulFlags);
 
+       pl->vddci = 1150;
+
        /* patch up vddc if necessary */
        if (pl->vddc == 0xff01) {
                if (radeon_atom_get_max_vddc(rdev, 0, 0, &vddc) == 0)

This works. I haven't pushed higher yet.
Comment 37 Alexandre Demers 2013-09-19 05:44:17 UTC
Went to pl->mclk = 115000, runs fine.
Comment 38 Alexandre Demers 2013-09-19 06:15:08 UTC
Running with mclk at 120000.

I went under Windows and launch GPU-Z. We should be able to reach 1300MHz.

I've read that some Cayman cards were made to use a VDDCi between 1.15 and 1.16. I'm pretty sure I can reach stability at 130000 by rising VDDCI a bit.
Comment 39 Alexandre Demers 2013-09-19 06:30:40 UTC
Running with mclk at 125000
Comment 40 Alexandre Demers 2013-09-19 13:38:05 UTC
Should I continu to see what value I can reach?
Comment 41 Alex Deucher 2013-09-19 14:46:13 UTC
Created attachment 86147 [details] [review]
mclk debugging pll debugging output

Can you attach the dmesg output with this patch applied?  I want to make sure the mclk parameters are being properly calculated for the 130000 mclk.
Comment 42 Alexandre Demers 2013-09-19 15:34:13 UTC
(In reply to comment #41)
> Created attachment 86147 [details] [review] [review]
> mclk debugging pll debugging output
> 
> Can you attach the dmesg output with this patch applied?  I want to make
> sure the mclk parameters are being properly calculated for the 130000 mclk.

I'll try it at home later today.
Comment 43 Alexandre Demers 2013-09-19 21:49:04 UTC
Created attachment 86168 [details]
dmesg with 86147
Comment 44 Alex Deucher 2013-09-21 19:45:06 UTC
Created attachment 86296 [details] [review]
patch 1/2

This patch set works around the issue by limiting the sclk and mclk to the highest levels listed in the clk/voltage dependency tables.  I'll need to dig a bit more internally to try and figure out how to handle these clks properly.
Comment 45 Alex Deucher 2013-09-21 19:45:58 UTC
Created attachment 86297 [details] [review]
patch 2/2

apply these two patches independent of any others.
Comment 46 Alexandre Demers 2013-09-22 16:51:48 UTC
It seems to allow the system to work properly. No crash with patches on 3.11.0 (but another problem with 3.12-rc1, probably a new bug). I added a printk to show what are the max values. Here is what I get:
[    3.088984] : Hitting max values... max_sclk_vddc->80000, max_mclk_vddci->125000, max_mclk_vddc->125000

So, as it is, I'm unable to run at top speed (mem) if I understand correctly, right?
Comment 47 Alex Deucher 2013-09-22 16:55:44 UTC
(In reply to comment #46)
> It seems to allow the system to work properly. No crash with patches on
> 3.11.0 (but another problem with 3.12-rc1, probably a new bug). I added a
> printk to show what are the max values. Here is what I get:
> [    3.088984] : Hitting max values... max_sclk_vddc->80000,
> max_mclk_vddci->125000, max_mclk_vddc->125000
> 
> So, as it is, I'm unable to run at top speed (mem) if I understand
> correctly, right?

Right, it will limit you the the fastest clock in the voltage dependency tables until I sort out how I'm suuposed to properly handle faster clocks.
Comment 48 Alexandre Demers 2013-09-23 02:50:31 UTC
OK, then with the two last patches on top of kernel 3.11.0, it works fine and I'm closing this bug. Should I open a new "bug" for the part about the faster clock and vddci?
Comment 49 Alexandre Demers 2013-09-23 03:04:08 UTC
Also, the bug I saw when testing patches with kernel 3.12-rc1 just happened with 3.11.0. The screen turns white and everything is frozen. I can't connect through ssh (without the patches, when the screen hanged, I was able to connect through ssh).

I can't find anything in logs that could help identify what is going on. I wasn't doing anything special and I can start a game under Steam where the GPU's fan will accelerate (which is a sign the card is now running faster) without any problem. The computer can just sit there while nothing happens and freezes (with a white screen).

I'm tempted to open a different bug, what do you think Alex?
Comment 50 Alex Deucher 2013-09-23 14:07:27 UTC
Go ahead and open new bugs for those issues.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.