Bug 66963 - Rv6xx dpm problems
Summary: Rv6xx dpm problems
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 67085 70189 72905 74420 89196 89262 89294 92662 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-07-16 14:11 UTC by Eugene
Modified: 2019-11-19 08:36 UTC (History)
32 users (show)

See Also:
i915 platform:
i915 features:


Attachments
log after "successful" load without hanging (62.42 KB, text/plain)
2013-07-27 15:45 UTC, Sergey
no flags Details
successfully booting (6.72 KB, text/plain)
2013-07-27 17:50 UTC, Daniel
no flags Details
hacks to test (1.79 KB, patch)
2013-07-30 00:08 UTC, Alex Deucher
no flags Details | Splinter Review
disable lvtma resync (549 bytes, patch)
2013-07-30 03:41 UTC, Alex Deucher
no flags Details | Splinter Review
use alternate fb_div scale (514 bytes, patch)
2013-07-30 03:42 UTC, Alex Deucher
no flags Details | Splinter Review
dmesg for patches from comment 23 and 24 (20.62 KB, application/octet-stream)
2013-07-30 19:04 UTC, Sergey
no flags Details
disable voltage control (530 bytes, patch)
2013-07-30 19:34 UTC, Alex Deucher
no flags Details | Splinter Review
disable clockgating (573 bytes, patch)
2013-07-30 19:35 UTC, Alex Deucher
no flags Details | Splinter Review
disable dynamic spread spectrum (631 bytes, patch)
2013-07-30 19:35 UTC, Alex Deucher
no flags Details | Splinter Review
disable dynamic pcie gen2 (459 bytes, patch)
2013-07-30 19:36 UTC, Alex Deucher
no flags Details | Splinter Review
possible frosting fix (1.55 KB, patch)
2013-07-31 20:08 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (8.79 KB, patch)
2013-07-31 22:49 UTC, Alex Deucher
no flags Details | Splinter Review
vbios used for ATI RV620/M82 [Mobility Radeon HD 3450/3470] (62.50 KB, application/octet-stream)
2013-08-01 12:46 UTC, Sergey
no flags Details
vbios for RV635/M86 [Mobility Radeon HD 3650] (64.00 KB, application/octet-stream)
2013-08-01 13:06 UTC, Scias
no flags Details
[AMD/ATI] RV635/M86 [Mobility Radeon HD 3650] vbios (63.00 KB, application/octet-stream)
2013-08-01 14:49 UTC, Francisco Pina Martins
no flags Details
Advanced Micro Devices [AMD] nee ATI RV635 [Mobility Radeon HD 3650] (63.00 KB, application/octet-stream)
2013-08-01 15:11 UTC, Hrvoje Senjan
no flags Details
vbios for RV620/M82 [Mobility Radeon HD 3450/3470 (63.50 KB, application/octet-stream)
2013-08-01 17:12 UTC, Daniel
no flags Details
Corrupt screen 'frosting' after DPM enabled #1 (143.58 KB, image/jpeg)
2013-08-01 18:13 UTC, Shawn Starr
no flags Details
Corrupt screen 'frosting' after DPM enabled #2 both screens (78.50 KB, image/jpeg)
2013-08-01 18:14 UTC, Shawn Starr
no flags Details
dmesg snippet with latest patches/successful dpm enabled boot (7.03 KB, text/plain)
2013-08-01 19:27 UTC, Hrvoje Senjan
no flags Details
successfully booting and waking up from suspend to ram dmesg with dpm=1, RV620/M82 [Mobility Radeon HD 3450/3470 (84.94 KB, text/plain)
2013-08-01 20:25 UTC, Daniel
no flags Details
wake up from suspend with "battery" radeon_pm_state (1.29 MB, image/jpeg)
2013-08-01 22:52 UTC, Daniel
no flags Details
could see and move the cursor, also waking up from suspend with "battery" state (1.34 MB, image/jpeg)
2013-08-01 22:53 UTC, Daniel
no flags Details
journalctl relevant output (4.50 KB, text/plain)
2013-08-13 08:28 UTC, Francisco Pina Martins
no flags Details
Radeon HD2600XT vbios (59.00 KB, application/octet-stream)
2013-09-02 16:31 UTC, Eugene
no flags Details
3870/RV670 - dmesg manually loading radeon (65.93 KB, text/plain)
2013-09-05 03:23 UTC, Bryan Quigley
no flags Details
3870/RV670 - kern.log dpm on boot (104.87 KB, text/plain)
2013-09-05 03:23 UTC, Bryan Quigley
no flags Details
3870/RV670 - vbios.rom (64.00 KB, text/plain)
2013-09-05 03:24 UTC, Bryan Quigley
no flags Details
dmesg file (64.82 KB, text/plain)
2013-09-05 13:23 UTC, Eugene
no flags Details
add callback for UVD (2.71 KB, patch)
2013-09-05 14:00 UTC, Alex Deucher
no flags Details | Splinter Review
Xorg log (40.80 KB, text/plain)
2013-09-11 16:04 UTC, Sergey
no flags Details
Dmesg for Xorg freeze during video playing. (68.23 KB, text/plain)
2013-09-16 19:51 UTC, Sergey
no flags Details
dmesg when under works due to setting .debug=1 (59.28 KB, text/plain)
2013-09-30 04:20 UTC, Bryan Quigley
no flags Details
boot good 3.12 rv620 (64.10 KB, text/plain)
2013-11-06 13:37 UTC, Paul Bodenbenner
no flags Details
boot slow 3.12 rv620 (63.95 KB, text/plain)
2013-11-06 13:38 UTC, Paul Bodenbenner
no flags Details
Disable the DMA ring in R6xx (3.39 KB, patch)
2014-01-10 18:38 UTC, Jaime Velasco Juan
no flags Details | Splinter Review
dmesg with dpm-reorder patches, DMA ring test failed (88.40 KB, text/plain)
2014-01-10 18:40 UTC, Jaime Velasco Juan
no flags Details
dmesg with dpm-reorder patches plus DMA ring deactivation patch (working so far) (78.73 KB, text/plain)
2014-01-10 18:41 UTC, Jaime Velasco Juan
no flags Details
patch from comment 94 (1.07 KB, patch)
2014-01-13 14:05 UTC, Alex Deucher
no flags Details | Splinter Review
journalctl crash log (2.96 KB, text/plain)
2014-01-18 14:54 UTC, Francisco Pina Martins
no flags Details
(Disable the DMA ring in R6xx) Adapted to current git state (3.56 KB, text/plain)
2014-01-29 18:48 UTC, Paul Bodenbenner
no flags Details
Xorg.0.log for failed start of X server (19.15 KB, text/plain)
2014-03-29 18:51 UTC, Nicola Mori
no flags Details
workaround for basic enablement (2.31 KB, patch)
2014-09-10 20:13 UTC, Alex Deucher
no flags Details | Splinter Review
fixes GPU freeze by reverting 02376d8282b88f07d0716da6155094c8760b1a13 on 4.6.3, tested with r9 290 (33.41 KB, patch)
2016-09-01 20:52 UTC, Zetok
no flags Details | Splinter Review
dmesg without reverted 02376d8282b88f07d0716da6155094c8760b1a13 on 4.6.3 (4.70 MB, text/plain)
2016-09-01 20:57 UTC, Zetok
no flags Details
dmesg with reverted 02376d8282b88f07d0716da6155094c8760b1a13 on 4.6.3 (87.90 KB, text/plain)
2016-09-01 21:01 UTC, Zetok
no flags Details

Description Eugene 2013-07-16 14:11:41 UTC
Linux 3.11RC1 isnt' booting with radeon.dpm=1 option in grub: screen becomes blank after grub trying to boot it. Without radeon.dpm=1 option system boots well. So, it seems, DPM isn't working.

Graphics: Radeon HD2600 XT
Linux: 3.11RC1
OS: Kubuntu 13.04
KDE: 4.11 beta2

Please, ask me for any additional info needed to help fix this issue.
Comment 1 Sergey 2013-07-18 07:28:14 UTC
I can see same behavior on 

Graphics: RV620/M82 Mobility Radeon HD 3450/3470
Kernel: 3.11-rc1
Mesa: 9.1.2-rc1, 9.2_pre20130619
OS: Gentoo
ARCH: x86_64
Other: fluxbox-1.3.2, xdm-1.1.11-r1, slim-1.3.5-r2

System works fine without 'radeon.dpm=1'. But with this option it completely dies. It is not only graphics that fails. System is not responding to ping and don't continue to boot. This happens after system started booting and before it switches to radeon driver and higher resolutions.
Comment 2 Alex Deucher 2013-07-19 12:55:16 UTC
*** Bug 67085 has been marked as a duplicate of this bug. ***
Comment 3 Sergey 2013-07-20 07:37:21 UTC
Tried patches from bug 66932 (attachment 82563 [details] [review], attachment 82560 [details] [review]), but it didn't help.
Comment 4 Alex Deucher 2013-07-20 16:29:21 UTC
Those patches aren't relevant to r6xx asics.  To help debug it, you might try disabling options in rv6xx_dpm_init() in rv6xx_dpm.c and see if any of those help.
Comment 5 Sergey 2013-07-21 10:44:48 UTC
Tried to play with rv6xx_dpm_init(), but no luck at all.
Toggling init conditions leads to same system stuck.

Though if I put 'return -1' system seems to boot (screen is still off). And I can connect via ssh, but dmesg shows NULL pointer dereference in rv6xx_dpm_ebable(). Probably it is expected if init fails.
Comment 6 Scias 2013-07-22 17:18:40 UTC
Same situation here on a RV635/HD3650 and on a RC-2 kernel. The new radeon.aspm=0 parameter doesn't help either.
Comment 7 Sergey 2013-07-24 23:07:11 UTC
Some findings:

The actual hand happens in
drivers/gpu/drm/radeon/rv6xx_dpm.c:
int rv6xx_dpm_enable(struct radeon_device *rdev)
on 'r600_start_dpm(rdev)' call.
Makes sense that configurations are harmless before actual dpm is enabled.

From this function it looks like we disable sclk and mclk do pll configuration, then enable clocks back. But even if I leave:
void r600_start_dpm(struct radeon_device *rdev)
{
        r600_enable_sclk_control(rdev, false);
        r600_enable_mclk_control(rdev, false);

        r600_enable_sclk_control(rdev, true);
        r600_enable_mclk_control(rdev, true);
}
System still fails with black screen.

There is also peace of code in r600_start_dpm() that is duplicated:
         r600_enable_spll_bypass(rdev, true);
         r600_wait_for_spll_change(rdev);
         r600_enable_spll_bypass(rdev, false);
         r600_wait_for_spll_change(rdev);

         r600_enable_spll_bypass(rdev, true);
         r600_wait_for_spll_change(rdev);
         r600_enable_spll_bypass(rdev, false);
         r600_wait_for_spll_change(rdev);
(probably this intentional, just checking)

Sorry if it doesn't make any sense, I'm new with radeon driver and don't know how 'dpm' is supposed to work for this HW.
Comment 8 John 2013-07-25 16:14:25 UTC
Same issue, with 3.11 RC2 and radeon.dpm=1 on a Lenovo T500 with a AMD 3650. Though my screen goes black and then gradually goes white when enabled. Without the option pc boots normally.

Graphics: Switchable (disabled) between AMD 3650 and Intel 4500
Linux: 3.11 rc2
OS: Kubuntu 13.04
KDE: 4.11 beta 2
xorg: xorg-edgers ppa enabled
Comment 9 Alex Deucher 2013-07-26 00:20:12 UTC
I pushed a few bug fixes here:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=rv6xx-dpm-fixes
Comment 10 Sergey 2013-07-26 06:05:48 UTC
Tried last 4 patches from Alex. Didn't work for me. The issue remains.
Comment 11 Alex Deucher 2013-07-26 14:08:35 UTC
(In reply to comment #10)
> Tried last 4 patches from Alex. Didn't work for me. The issue remains.

Does disabling mclk switching help?

comment out:

        r600_enable_mclk_control(rdev, true);

in r600_start_dpm()
Comment 12 Sergey 2013-07-26 14:27:41 UTC
> Does disabling mclk switching help?
No. Still the same issue.
Comment 13 Scias 2013-07-27 06:16:50 UTC
Unexpectedly the unpatched RC-2 just booted for me with the dpm=1 option while I didn't change anything since the last time I tried. I just initially wanted to try the new patches but forgot to remove the dpm=1 kernel argument from grub and surprisingly it booted after a short screen "frosting up" !

Here's the drm/radeon dmesg log : http://pastebin.com/DxbddVMW

Changing power states works, however the power level doesn't seem to change and always stayed at 1 (tried Unigine sanctuary and Minecraft) and can't be forced either.

The GPU Temp was stable around 60 C, still hotter than Catalyst in low performance mode but at least much better than before when it would just endlessly rise until hard shutdown due to overheat.

Now the problem is that after rebooting after doing no change, it hung again like before, I tried several times but it's still hanging, so I have really no idea why it suddently worked once...

I will try the new patches now.
Comment 14 Scias 2013-07-27 08:08:44 UTC
Okay so with the latest patches it seems to successfully boot with dpm=1 much more often (I would say 1/4 times) while before it was almost never. Again that's without changing anything between boots and without aspm=0.

Same remarks than my previous post however, it seems it can't reach power level 2 even when forcing high power level and performance mode. Another issue is that the display gets completely garbled (http://wstaw.org/m/2013/07/27/IMG_20130727_090107.jpg) when trying to set battery power mode until the GPU soft resets or that I set it back to "balanced" or "performance". Here is what is thrown into dmesg when this happens (sorry couldn't paste it since X was totally unresponsive) : http://wstaw.org/m/2013/07/27/IMG_20130727_095805.jpg
Comment 15 Sergey 2013-07-27 15:45:47 UTC
Created attachment 83091 [details]
log after "successful" load without hanging

Looks like I can reproduce similar behavior as Scias. But in my case it is very difficult to get system booted. Sometimes it is more than 10 unsuccessful iterations in a row.

The system behaves a bit different though:
 - Xorg output is very delayed. It is about 5-10s delay between image change and sound, which works in normal mode. 'dmesg' output in xterm is printed line by line with about 1s interval.
 - in 'tty' there is no such delays and it looks to work without issues.

This results is with 3.11-rc2 + 4 patches. Also got booted 1 time with removed 'r600_enable_mclk_control(rdev, true);' (same issues seen here).
Comment 16 Daniel 2013-07-27 17:50:21 UTC
Created attachment 83096 [details]
successfully booting
Comment 17 Daniel 2013-07-27 17:51:20 UTC
Graphics: RV620/M82 Mobility Radeon HD 3450/3470
Kernel: 3.11.0-12-agd5f from mesa-git repo 
Mesa: ati-dri-git-57713.f2be639 from mesa-git repo
OS: Arch
ARCH: x86_64
Other: KDE 4.11 rc1

I installed mesa and linux packages from mesa-git repo, the linux-agd5f contains the lates patches. The system booted successfully 3 out of 4 times, and the temp of GPU is actually a little bit lower than before, about 3-4C(env temp 35C). But same as Sergey posted, the X respose is slow, about 1-2s delayed.

Here is my dmesg of drm and radeon.
Comment 18 Alex Deucher 2013-07-27 20:46:45 UTC
Make sure you power off the system completely before trying the new kernel as the dpm hardware may be in a bad state if just just did a warm reboot.
Comment 19 Sergey 2013-07-27 21:12:40 UTC
I've tried complete power off for most of the tests (actually after system hangs it is the only way to reboot it). But it doesn't seem to influence the behavior.
Comment 20 Daniel 2013-07-28 13:49:58 UTC
No lucky today, all booting failed(system got hung and then the screen faded to white).
Comment 21 Sergey 2013-07-28 16:36:22 UTC
Here is '/sys/kernel/debug/dri/0/radeon_pm_info' output I get:

1. with dpm=1
    uvd    vclk: 0 dclk: 0
    power level 1    sclk: 30000 mclk: 40500 vddc: 900

2. without dpm=1
    default engine clock: 500000 kHz
    current engine clock: 499500 kHz
    default memory clock: 500000 kHz
    current memory clock: 495000 kHz
    voltage: 1000 mV
    PCIE lanes: 16

The output format differs, but if clocks values dimensions are the same, then it is about 10 times less the in (2)
Comment 22 Alex Deucher 2013-07-28 20:14:35 UTC
(In reply to comment #21)
> Here is '/sys/kernel/debug/dri/0/radeon_pm_info' output I get:
> 
> 1. with dpm=1
>     uvd    vclk: 0 dclk: 0
>     power level 1    sclk: 30000 mclk: 40500 vddc: 900
> 
> 2. without dpm=1
>     default engine clock: 500000 kHz
>     current engine clock: 499500 kHz
>     default memory clock: 500000 kHz
>     current memory clock: 495000 kHz
>     voltage: 1000 mV
>     PCIE lanes: 16
> 
> The output format differs, but if clocks values dimensions are the same,
> then it is about 10 times less the in (2)

The driver uses 10khz units internally.  Without dpm the output adds the extra 0.

30000 = 300 Mhz
Comment 23 Alex Deucher 2013-07-30 00:08:10 UTC
Created attachment 83246 [details] [review]
hacks to test

That attached patch disables some options in the dpm driver.  See if it helps at all.  Make sure you are using a kernel that contains the fixes mentioned in comment 9 or my drm-fixes-3.11 branch.

If the patch doesn't work as is, try changing the:
#if 1
in the patch to:
#if 0
and see if that helps.

Please attach your dmesg output with the patch applied.
Comment 24 Alex Deucher 2013-07-30 03:41:29 UTC
Created attachment 83257 [details] [review]
disable lvtma resync

Another patch to test.
Comment 25 Alex Deucher 2013-07-30 03:42:14 UTC
Created attachment 83258 [details] [review]
use alternate fb_div scale

Another patch to test.
Comment 26 Sergey 2013-07-30 19:04:41 UTC
Created attachment 83321 [details]
dmesg for patches from comment 23 and 24

Here are results:

With patch from comment 23: 1 hang in 5 boots, good work (no delays in Xorg). Hang was on first try after hot reset, and was never seen afterwards.

With patch from comment 24: 2 hang in 4 boots, huge delays in Xorg (same as was before)

With patch from comment 25: never got booted after 10 iterations. (behavior might be similar as it was without this patch, but less luck during testing)
Comment 27 Alex Deucher 2013-07-30 19:34:43 UTC
Created attachment 83324 [details] [review]
disable voltage control

The following 4 patches disable specific dpm features:

bug66963-no-voltage.diff - disables automatic voltage control
bug66963-no-cg.diff - disables clockgating
bug66963-no-ss.diff - disables dynamic spread spectrum
bug66963-no-gen2.diff - disables dynamic pcie gen2 selection

Please let me know what patch or combination of patches fixes the issues.
Comment 28 Alex Deucher 2013-07-30 19:35:18 UTC
Created attachment 83325 [details] [review]
disable clockgating
Comment 29 Alex Deucher 2013-07-30 19:35:54 UTC
Created attachment 83326 [details] [review]
disable dynamic spread spectrum
Comment 30 Alex Deucher 2013-07-30 19:36:32 UTC
Created attachment 83327 [details] [review]
disable dynamic pcie gen2
Comment 31 Alex Deucher 2013-07-30 19:37:13 UTC
please make sure to cold reset between attempts.
Comment 32 Sergey 2013-07-30 21:20:44 UTC
Here are some testing results:
bug66963-no-voltage.diff - hangs 5 of 5 times
bug66963-no-cg.diff - hangs 5 of 5 times
bug66963-no-gen2.diff - hangs 5 of 5 times

bug66963-no-ss.diff
hanged 4 times, then 2 normal boots, 2 hangs, 1 boot with Xorg delays, 1 boot with garbage for Xorq (tty console was fine), 1 hang
Very inconsistent.

Also retried all 4 together:
Boots OK most of the time, though I get 2 hangs and 1 Xorg big delays in between ~10 successful tries. Specially if reset is used instead of shutdown, but sometimes it happens after shutdown too.

Will continue trying other combinations.
Comment 33 Scias 2013-07-31 16:09:14 UTC
Sorry I'm a bit late.
My tests are a little bit different from Sergey's.

Unpatched : 
Hangs 3/4 times - Unresponsive X - Setting battery state = corruption - "Frost" flicker at modeset

No clockgating :
Same as unpatched

No voltage control :
Hangs everytime

No PCIE2 :
Hangs everytime

No dynamic spectrum :
Boots everytime - X is okay - Setting battery state is okay - "Frost" flicker at modeset.

So clearly disabling the dynamic spectrum really helped for me.
Another finding is that setting radeon in the initramfs (for early KMS) results in guaranteed hang except with disabled spectrum where it still boots everytime.
Comment 34 Alex Deucher 2013-07-31 16:25:44 UTC
(In reply to comment #33)
> 
> No dynamic spectrum :
> Boots everytime - X is okay - Setting battery state is okay - "Frost"
> flicker at modeset.

What is ""Frost" flicker at modeset."? temporary distortion? permanent distortion?

Does combining multiple patches help any of the remaining issues?

Finally, does this more limited disable ss patch help?

diff --git a/drivers/gpu/drm/radeon/rv6xx_dpm.c b/drivers/gpu/drm/radeon/rv6xx_dpm.c
index 363018c..03b402d 100644
--- a/drivers/gpu/drm/radeon/rv6xx_dpm.c
+++ b/drivers/gpu/drm/radeon/rv6xx_dpm.c
@@ -1993,7 +1993,7 @@ int rv6xx_dpm_init(struct radeon_device *rdev)
                                    &frev, &crev, &data_offset)) {
                pi->sclk_ss = true;
                pi->mclk_ss = true;
-               pi->dynamic_ss = true;
+               pi->dynamic_ss = false;//true;
        } else {
                pi->sclk_ss = false;
                pi->mclk_ss = false;
Comment 35 Scias 2013-07-31 19:08:08 UTC
By frost flicker I mean that when the radeon kernel framebuffer replaces vesafb during boot the display flickers and "frosts up" (as it does when it's hanging) then flickers back to normal. It's not a big issue tho.

Anyways, after some hours of testing, the only real issue I spotted so far is that the screen gets corrupted and the laptop hangs when waking up from sleep. Didn't try hibernation as I have no swap partition. 
Else pretty much everything works, the temperature is stable at 50°C when idle (which is awesome), forcing power modes and levels works.

Going forward to try your new patch now. Do I have to test it in combinaison with the other no-ss one or alone ?
Comment 36 Alex Deucher 2013-07-31 19:50:35 UTC
(In reply to comment #35)
> By frost flicker I mean that when the radeon kernel framebuffer replaces
> vesafb during boot the display flickers and "frosts up" (as it does when
> it's hanging) then flickers back to normal. It's not a big issue tho.
> 
> Anyways, after some hours of testing, the only real issue I spotted so far
> is that the screen gets corrupted and the laptop hangs when waking up from
> sleep. Didn't try hibernation as I have no swap partition. 

By sleep you mean system suspend or dpms off (only displays off)?

> Else pretty much everything works, the temperature is stable at 50°C when
> idle (which is awesome), forcing power modes and levels works.
> 
> Going forward to try your new patch now. Do I have to test it in combinaison
> with the other no-ss one or alone ?

Please try the patch in comment 34 by itself.
Comment 37 Alex Deucher 2013-07-31 20:08:47 UTC
Created attachment 83392 [details] [review]
possible frosting fix

Does this patch fix the frosting issue?  Use in conjunction with whatever other patches you need for a stable boot.
Comment 38 Scias 2013-07-31 20:42:15 UTC
Comment 34 patch works (just like Comment 29 one).
Frosting fix patch doesn't work - nothing changed. (Used it with Comment 34 patch)

And yes I meant suspend to ram.
Comment 39 Scias 2013-07-31 21:02:58 UTC
Another finding :
Waking up from suspend to ram only hangs if the power state is set to "battery". Else it works fine.
Comment 40 Alex Deucher 2013-07-31 22:49:29 UTC
Created attachment 83397 [details] [review]
possible fix

I think I may have found the issue.  Can you guys try this patch by itself?
Comment 41 Sergey 2013-08-01 06:24:52 UTC
Unfortunately patch from comment 40 didn't help in my case. I've seen 6 hangs and 2 boots with Xorg delays out of 8 tries.
Comment 42 Scias 2013-08-01 09:13:00 UTC
Comment 40 patch works here.
Comment 43 Alex Deucher 2013-08-01 12:39:50 UTC
Can you guys attach a copy of your vbios?

(as root)
(use lspci to get the bus id)
cd /sys/bus/pci/devices/<pci bus id>
echo 1 > rom
cat rom > /tmp/vbios.rom
echo 0 > rom
Comment 44 Sergey 2013-08-01 12:46:47 UTC
Created attachment 83431 [details]
vbios used for ATI RV620/M82 [Mobility Radeon HD 3450/3470]
Comment 45 Scias 2013-08-01 13:06:10 UTC
Created attachment 83432 [details]
vbios for RV635/M86 [Mobility Radeon HD 3650]

vbios for RV635/M86 [Mobility Radeon HD 3650]
Comment 46 Francisco Pina Martins 2013-08-01 13:13:47 UTC
I am trying to do some testing too, but I cannot seem to apply the patch from comment 40 to the mainline kernel.
Should I be getting the changes from a specific branch in git://people.freedesktop.org/~agd5f/linux ?
Thanks and sorry about the noise.
Comment 47 Alex Deucher 2013-08-01 13:19:05 UTC
I've pushed my latest -fixes branch:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-3.11
it already includes the fix from comment 40.
Comment 48 Alex Deucher 2013-08-01 13:24:45 UTC
Sergey,

On a kernel that has the patch in comment 40, does booting with radeon.aspm=0 help?
Comment 49 Francisco Pina Martins 2013-08-01 14:49:59 UTC
Created attachment 83443 [details]
[AMD/ATI] RV635/M86 [Mobility Radeon HD 3650] vbios

Booting with kernel compiled from http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-3.11 setting radeon.dpm=1 results in the mentioned "frosting", with the machine becoming totally irresponsive. This happens during modesetting.
Tried to boot 3 times, all of them the same problem.
Trying with radeon.dpm=1 and radeon.aspm=0 gives the exact same result. 3 out of 3 times I get the crash.
I am attaching my vbios too.
Comment 50 Hrvoje Senjan 2013-08-01 15:11:47 UTC
Created attachment 83444 [details]
Advanced Micro Devices [AMD] nee ATI RV635 [Mobility Radeon HD 3650]

Pretty same results as Francisco. The screen remained black after kms tried to kick in. Also, no difference with aspm off.
Comment 51 Scias 2013-08-01 16:30:30 UTC
I just compiled the latest drm-fixes-3.11 and I got back to the starting point (3/4 hangs, messed up/slow X....)
Trying to figure out why... I was using a 3 days-old drm-fixes-3.11, maybe one of the latest pulls messed it up ?
Comment 52 Daniel 2013-08-01 17:12:19 UTC
Created attachment 83462 [details]
vbios for RV620/M82 [Mobility Radeon HD 3450/3470

Just compiled the latest kernel from drm-fixes-3.11, all boots faild 13 out of 13 times. With or without radeon.aspm=0 makes no difference.
Comment 53 Sergey 2013-08-01 17:22:43 UTC
Patch from comment 40 + 'radeon.aspm=0' boot option doesn't change anything.
I got: 1 successful boot, 8 hangs, and one boot with slow Xorg.
Comment 54 Scias 2013-08-01 17:54:30 UTC
I reverted to the "old" kernel + Comment 40 patch and it works now.
I can't retrieve the git revision as I cleaned the sources but I cloned the drm-fixes-3.11 the 30/7 near 11:30 gmt. 

I'll try different versions in between with Comment 40 patch to spot the change that cause it to not work anymore.
Comment 55 Shawn Starr 2013-08-01 18:13:56 UTC
Created attachment 83469 [details]
Corrupt screen 'frosting' after DPM enabled #1

Corrupt screen 'frosting' after DPM enabled #1
Comment 56 Shawn Starr 2013-08-01 18:14:27 UTC
Created attachment 83470 [details]
Corrupt screen 'frosting' after DPM enabled #2 both screens

Corrupt screen 'frosting' after DPM enabled #2 both screens
Comment 57 Shawn Starr 2013-08-01 18:16:31 UTC
I have a HD3650 Mobile RV635 and also a HD3650 non-mobile (didn't try DPM on this machine yet).

I've attached some screenshots, is this what people are calling 'frosting' or do you you refer to another kind of effect (attach pictures if you can).

These pictures came from 3.11 RC2+ w/o the latest changes from Alex.

Thanks
Shawn
Comment 58 Alex Deucher 2013-08-01 18:34:45 UTC
I finally found a system I could reproduce it on and I think I've figured it out.  Patches momentarily...
Comment 59 Alex Deucher 2013-08-01 18:49:57 UTC
please try this branch:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-3.11
after a cold shutdown.
Comment 60 Hrvoje Senjan 2013-08-01 19:23:40 UTC
(In reply to comment #59)
> please try this branch:
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-3.11
> after a cold shutdown.

This seems to have fixed the issues here :-)

/sys/kernel/debug/dri/0/radeon_pm_info
uvd    vclk: 0 dclk: 0
power level 0    sclk: 11000 mclk: 40000 vddc: 900
Comment 61 Hrvoje Senjan 2013-08-01 19:27:11 UTC
Created attachment 83472 [details]
dmesg snippet with latest patches/successful dpm enabled boot
Comment 62 Scias 2013-08-01 20:13:00 UTC
(In reply to comment #59)
> please try this branch:
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-3.11
> after a cold shutdown.

Working here.

I still have the rather minor frost flickering issue and it still hangs if trying to wake up from suspend in the battery state. Actually now it doesn't hang, but it just takes ages to wake up (GPU just keeps soft resetting) and left X in an unuseable state, I still managed to kill X and get the dmesg output when the system was waking up from suspend : http://pastebin.com/UphthyFc .
Restarting X afterwards results in unuseable X (and GPU reset loops again), obliged to restart completely.
Comment 63 Daniel 2013-08-01 20:25:27 UTC
Created attachment 83484 [details]
successfully booting and waking up from suspend to ram dmesg with dpm=1, RV620/M82 [Mobility Radeon HD 3450/3470

Latest commit works fine here.
Comment 64 Scias 2013-08-01 21:01:29 UTC
(In reply to comment #63)
> Created attachment 83484 [details]
> successfully booting and waking up from suspend to ram dmesg with dpm=1,
> RV620/M82 [Mobility Radeon HD 3450/3470
> 
> Latest commit works fine here.

Yes suspend works fine if the power profile/state isn't set to battery. Try setting the battery state before.
Comment 65 Daniel 2013-08-01 22:52:12 UTC
Created attachment 83491 [details]
wake up from suspend with "battery" radeon_pm_state

OK, I know what you mean now, Scias.
A few tests here:
1. Setting to "balanced" state, waking up is fast and OK. 5/5 times.
2. Setting to "battery" state, waking up is fast, but with a white screen, the GPU hung. Sometimes I can see and move the cursor, but any other operation cannot be done, e.g. switching to a tty. 4/4 times. I have to shutdown by force.
3. Sometimes when the system booted into X(KDE splash screen), the GPU hung, have to shutdown by force, not very often.
4. Powered by AC or BAT, the state is always "balanced".

For the last test, I have a question here:
Should it switch from "balanced" to "battery" automatically when I unplug the power cabble or just stay "balanced"?
Comment 66 Daniel 2013-08-01 22:53:40 UTC
Created attachment 83492 [details]
could see and move the cursor, also waking up from suspend with "battery" state
Comment 67 Alex Deucher 2013-08-01 22:59:35 UTC
(In reply to comment #65)
> For the last test, I have a question here:
> Should it switch from "balanced" to "battery" automatically when I unplug
> the power cabble or just stay "balanced"?

There is no automatic switching, you need to select the state you want via sysfs:
/sys/class/drm/card0/device/power_dpm_state
Since it's a policy decision, I leave it to userspace (some users want to switch to battery state others want to stay in performance/balanced).  You can have a script listen for ac plug events and select a different state depending on what you want it to do.
Comment 68 Alex Deucher 2013-08-01 23:04:31 UTC
In the short term until we sort out why the battery state causes resume problems, you can select balanced or performance state in your suspend script.
Comment 69 Hrvoje Senjan 2013-08-01 23:16:29 UTC
(In reply to comment #60)
> (In reply to comment #59)
> > please try this branch:
> > http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-3.11
> > after a cold shutdown.
> 
> This seems to have fixed the issues here :-)
> 

Works correctly only on cold boot. With reboots i get:

[   52.630102] radeon 0000:01:00.0: GPU lockup CP stall for more than 27055msec
[   52.630115] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000058 last fence id 0x0000000000000001)
[   52.993495] radeon 0000:01:00.0: Saved 2809 dwords of commands on ring 0.
[   52.993508] radeon 0000:01:00.0: GPU softreset: 0x00000348
[   52.993511] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[   52.993514] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[   52.993517] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200291C0
[   52.993520] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   52.993523] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[   52.993526] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000802
[   52.993528] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x800000C1
[   52.993531] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   53.369474] radeon 0000:01:00.0: Wait for MC idle timedout !
[   53.369478] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
[   53.369532] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00022500
[   53.371674] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[   53.371677] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[   53.371680] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200210C0
[   53.371683] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   53.371685] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[   53.371688] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[   53.371691] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
[   53.371694] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   53.371700] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[   53.536342] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   53.536389] radeon 0000:01:00.0: WB enabled
[   53.536393] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff880137055c00
[   53.536396] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff880137055c0c
[   53.734069] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
[   53.734073] [drm:r600_resume] *ERROR* r600 startup failed on resume
[   53.736171] switching from power state:
[   53.736176]  ui class: none
[   53.736180]  internal class: boot 
[   53.736184]  caps: video 
[   53.736189]  uvd    vclk: 0 dclk: 0
[   53.736193]          power level 0    sclk: 60000 mclk: 50000 vddc: 1100
[   53.736196]          power level 1    sclk: 60000 mclk: 50000 vddc: 1100
[   53.736199]          power level 2    sclk: 60000 mclk: 50000 vddc: 1100
[   53.736201]  status: c b 
[   53.736206] switching to power state:
[   53.736208]  ui class: performance
[   53.736210]  internal class: none
[   53.736214]  caps: single_disp video 
[   53.736219]  uvd    vclk: 0 dclk: 0
[   53.736223]          power level 0    sclk: 11000 mclk: 40000 vddc: 900
[   53.736226]          power level 1    sclk: 30000 mclk: 40000 vddc: 900
[   53.736229]          power level 2    sclk: 60000 mclk: 50000 vddc: 1100
[   53.736231]  status: r 
[   60.620870] SysRq : Keyboard mode set to system default

SysRQ + R,I brings back normal display
Comment 70 Sergey 2013-08-01 23:25:06 UTC
Works good for me:

Major:
5 of 5 boots after shutdown.
5 of 5 boots after reboot.
First boot after shutdown hanged though.

Suspend:
Works always for me 10 of 10. Checked different states.

Hibernate:
Works most of the time but sometimes hangs.
Have seen a lot hangs on 'battery' state. But last boot to confirm worked fine 2 times.
2 hangs in default 'balanced' mode. After this hangs got 2-3 hangs in a row after shutdowns.

Minor:
If I run 'glxgear' hibernate resumes with slow Xorg, sometimes dies after resume.
Same works with suspend.
Comment 71 Francisco Pina Martins 2013-08-02 08:14:11 UTC
Confirming that the latest drm-fixes-3.11 also works fine for me. (4/4) successful boots.
System boots OK, suspend works fine too. (5/5)
I cannot try hibernate as my swap partition is too small for that.
Thank you for all your hard work Alex!
Comment 72 Shawn Starr 2013-08-02 11:33:51 UTC
Comment on attachment 83469 [details]
Corrupt screen 'frosting' after DPM enabled #1

This is due to missing some of the firmware.
Comment 73 Shawn Starr 2013-08-02 11:34:20 UTC
Comment on attachment 83470 [details]
Corrupt screen 'frosting' after DPM enabled #2 both screens

This is due to missing some of the firmware.
Comment 74 Shawn Starr 2013-08-02 11:35:33 UTC
RV635 works for me. 

Make sure you have all of the firmware if you compile this kernel make sure kernel firmware is installed.
Comment 75 Shawn Starr 2013-08-02 14:06:52 UTC
(In reply to comment #74)
> RV635 works for me. 
> 
> Make sure you have all of the firmware if you compile this kernel make sure
> kernel firmware is installed.

Well, it works but the clocking isn't adjusting properly. Working on IRC with Alex, but no crashes.
Comment 76 720 2013-08-03 06:49:18 UTC
(In reply to comment #68)
> In the short term until we sort out why the battery state causes resume
> problems, you can select balanced or performance state in your suspend
> script.

A bit of a git noob question: How do you check out only the -next, or the -fixes branch?

Another question regarding this workaround-fix: Will you fix and re-enable the feature later on? (In a foreseeable future, that is?)
Comment 77 720 2013-08-03 09:36:01 UTC
(In reply to comment #76)
> (In reply to comment #68)
> > In the short term until we sort out why the battery state causes resume
> > problems, you can select balanced or performance state in your suspend
> > script.
> 
> A bit of a git noob question: How do you check out only the -next, or the
> -fixes branch?
> 
> Another question regarding this workaround-fix: Will you fix and re-enable
> the feature later on? (In a foreseeable future, that is?)

Managed to check out the branch, compile the kernel, and run it flawless.
But my question about the feature is still valid:
Will we have this feature fixed until the 3.11 release or later?
Comment 78 Alex Deucher 2013-08-03 13:46:51 UTC
(In reply to comment #77)
> Will we have this feature fixed until the 3.11 release or later?

Depends on whether the sclk ss actually worked in the first place.  It's possible that it's a hardware bug.  Anyway, I wouldn't be too concerned about it.  Lots of rv6xx cards don't even have it enabled to begin with.  It won't affect power usage.  ss is basically just for helping to mitigate EMI.
Comment 79 720 2013-08-04 09:59:20 UTC
(In reply to comment #78)
> (In reply to comment #77)
> > Will we have this feature fixed until the 3.11 release or later?
> 
> Depends on whether the sclk ss actually worked in the first place.  It's
> possible that it's a hardware bug.  Anyway, I wouldn't be too concerned
> about it.  Lots of rv6xx cards don't even have it enabled to begin with.  It
> won't affect power usage.  ss is basically just for helping to mitigate EMI.

Thanks for the fast answer.
Finally I can use my laptop with Linux every day. Neat.
(Hope you can iron out the rest of the problems that people had here.)
Comment 80 Sergey 2013-08-05 09:13:18 UTC
Looks like there are still some issues left in my case. Not as sever as it was thought.
Over weekend got system hanged 2 time after cold start. And one time after hibernation. In case of hibernate system resumed and start working for few seconds, then window manager stopped updating image and in few seconds mouse stopped and system was not responding at all.
Comment 81 Hrvoje Senjan 2013-08-08 00:10:38 UTC
Similar as Sergey. It's pretty stable now, and boots correctly 100%, but sometimes it can hang during usage.  Found the following in the log:


Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000015f962 last fence id 0x000000000015f960)
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0: Saved 601 dwords of commands on ring 0.
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0: GPU softreset: 0x00000008
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200010C0
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000002
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80000645
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Aug 07 23:35:49 shumarija kernel: [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0: WB enabled
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8801359cac00
Aug 07 23:35:49 shumarija kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff8801359cac0c
Aug 07 23:35:49 shumarija kernel: [drm] ring test on 0 succeeded in 1 usecs
Aug 07 23:35:49 shumarija kernel: [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
Aug 07 23:35:49 shumarija kernel: [drm:r600_resume] *ERROR* r600 startup failed on resume
Comment 82 Alex Deucher 2013-08-08 00:32:42 UTC
(In reply to comment #81)
> Similar as Sergey. It's pretty stable now, and boots correctly 100%, but
> sometimes it can hang during usage.  Found the following in the log:

Do you also get GPU hangs with dpm disabled?
Comment 83 Hrvoje Senjan 2013-08-08 00:34:53 UTC
(In reply to comment #82)
> (In reply to comment #81)
> > Similar as Sergey. It's pretty stable now, and boots correctly 100%, but
> > sometimes it can hang during usage.  Found the following in the log:
> 
> Do you also get GPU hangs with dpm disabled?

Nope. I was running with the old dynpm method for a day, did not get a hang. With dpm i had it once or twice already today.
Comment 84 Alex Deucher 2013-08-08 12:28:42 UTC
Does disabling clockgating avoid the hangs?

diff --git a/drivers/gpu/drm/radeon/rv6xx_dpm.c b/drivers/gpu/drm/radeon/rv6xx_dpm.c
index bdd888b..85f2ab8 100644
--- a/drivers/gpu/drm/radeon/rv6xx_dpm.c
+++ b/drivers/gpu/drm/radeon/rv6xx_dpm.c
@@ -1985,7 +1985,7 @@ int rv6xx_dpm_init(struct radeon_device *rdev)
        pi->voltage_control =
                radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
 
-       pi->gfx_clock_gating = true;
+       pi->gfx_clock_gating = false;
 
        pi->sclk_ss = radeon_atombios_get_asic_ss_info(rdev, &ss,
                                                       ASIC_INTERNAL_ENGINE_SS, 0);


Make sure you do a cold shutdown after changing that.
Comment 85 Sergey 2013-08-08 19:30:48 UTC
> Does disabling clockgating avoid the hangs?
Caught hang after hibernate.
Comment 86 Alex Deucher 2013-08-08 19:44:36 UTC
Ingoring suspend/hibernate, is anyone getting hangs or gpu resets with dpm enabled under normal operation?  If so, can you try disabling clockgating as per comment 84?  Lets address suspend/hibernate issues separately.
Comment 87 Hrvoje Senjan 2013-08-08 19:50:52 UTC
(In reply to comment #86)
> Ingoring suspend/hibernate, is anyone getting hangs or gpu resets with dpm
> enabled under normal operation?  If so, can you try disabling clockgating as
> per comment 84?  Lets address suspend/hibernate issues separately.

Using it for some 6h, so far so good. Will be able to tell with more accuracy after a day or two without hangs.
Comment 88 Francisco Pina Martins 2013-08-08 20:52:07 UTC
I've been using DPM in my rv635 for a few days now (intensive use, as this is my main machine) and had no issues yet.
I can however, still tempt fate and try some nasty things such as quick zoom in/out cycles in big .svg files in inkscape, and see if that causes issues. I know it used to a long time ago.
Comment 89 Alex Deucher 2013-08-08 21:26:57 UTC
(In reply to comment #88)
> I've been using DPM in my rv635 for a few days now (intensive use, as this
> is my main machine) and had no issues yet.
> I can however, still tempt fate and try some nasty things such as quick zoom
> in/out cycles in big .svg files in inkscape, and see if that causes issues.
> I know it used to a long time ago.

If you do run into a problem make sure it's specific to dpm.  If you had a particular problem in the past and you get it again, it's probably not dpm related.
Comment 90 Sergey 2013-08-10 14:03:32 UTC
> Does disabling clockgating avoid the hangs?
No. Behaves the same. Today it was ~40% of hangs after shutdown.
Comment 91 Hrvoje Senjan 2013-08-10 17:52:43 UTC
(In reply to comment #87)
> (In reply to comment #86)
> > Ingoring suspend/hibernate, is anyone getting hangs or gpu resets with dpm
> > enabled under normal operation?  If so, can you try disabling clockgating as
> > per comment 84?  Lets address suspend/hibernate issues separately.
> 
> Using it for some 6h, so far so good. Will be able to tell with more
> accuracy after a day or two without hangs.

After almost 2 days, also got a hang. Did not reboot, suspended twice.
Did a kwin_gles --replace &
kwin --replace &
MESA_DEBUG=1
kwin_gles --replace &
GPU hanged. Note that this *never* happened without dpm. But it was also first time i tried with dpm.
Comment 92 720 2013-08-13 08:08:15 UTC
The bug is still not fixed.
Been using Radeon 3650 with dpm, and now running rc5.

Everything went well until I had a reboot due to a NOD32 install.
That time I encountered the white screen of death once again.

Can I gather any more debug info somehow Alex?
(I use a total non-debug kernel that I compiled but I can prepare one full debug version if you would like.)

ps.: I'm not too lucky with this DPM thing. The old card dies randomly (WSOD), and the new one is not supported (SI). Such is life with Radeon on Linux. :(
Comment 93 Francisco Pina Martins 2013-08-13 08:28:33 UTC
Created attachment 83998 [details]
journalctl relevant output

I had a crash with the white screen yestreday.
It occurred while playing Civilization IV under wine, during the replay part, at full speed, after finishing a game.
All that was logged by systemd in journalctl is attached. The last line (radeon: The kernel rejected CS, see dmesg for more information.) Was repeated thousands of times after this.
After the crash, everything tried to recover (the game kept running, as well as thunderbird, etc...) - I could hear sounds and **sometimes** X would be displayed for a fraction of a second, and then everything would get garbled.
I switched to a tty and rebooted from there.
I did not get dmesg outupt, since I only read the log after rebooting. I will try to reproduce the crash and report back.
Comment 94 Alex Deucher 2013-08-13 12:58:20 UTC
(In reply to comment #92)
> The bug is still not fixed.
> Been using Radeon 3650 with dpm, and now running rc5.
> 
> Everything went well until I had a reboot due to a NOD32 install.
> That time I encountered the white screen of death once again.
> 
> Can I gather any more debug info somehow Alex?
> (I use a total non-debug kernel that I compiled but I can prepare one full
> debug version if you would like.)

Can you try disabling additional dpm features as per previoous comments?  e.g,

diff --git a/drivers/gpu/drm/radeon/rv6xx_dpm.c b/drivers/gpu/drm/radeon/rv6xx_dpm.c
index bdd888b..ad17ae8 100644
--- a/drivers/gpu/drm/radeon/rv6xx_dpm.c
+++ b/drivers/gpu/drm/radeon/rv6xx_dpm.c
@@ -1982,10 +1982,10 @@ int rv6xx_dpm_init(struct radeon_device *rdev)
        else
                pi->fb_div_scale = 0;
 
-       pi->voltage_control =
-               radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
+       pi->voltage_control = false;
+//             radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0);
 
-       pi->gfx_clock_gating = true;
+       pi->gfx_clock_gating = false;
 
        pi->sclk_ss = radeon_atombios_get_asic_ss_info(rdev, &ss,
                                                       ASIC_INTERNAL_ENGINE_SS, 0);
@@ -1994,13 +1994,14 @@ int rv6xx_dpm_init(struct radeon_device *rdev)
 
        /* Disable sclk ss, causes hangs on a lot of systems */
        pi->sclk_ss = false;
+       pi->mclk_ss = false;
 
        if (pi->sclk_ss || pi->mclk_ss)
                pi->dynamic_ss = true;
        else
                pi->dynamic_ss = false;
 
-       pi->dynamic_pcie_gen2 = true;
+       pi->dynamic_pcie_gen2 = false;
 
        if (pi->gfx_clock_gating &&
            (rdev->pm.int_thermal_type != THERMAL_TYPE_NONE))

See if disabling any additional features helps stability on your system.

> 
> ps.: I'm not too lucky with this DPM thing. The old card dies randomly
> (WSOD), and the new one is not supported (SI). 

SI cards are supported just fine, including dpm.
Comment 95 Francisco Pina Martins 2013-09-02 08:07:15 UTC
After using DPM for a while I car report that it did cause some random crashes. I never figured out what caused them, and was never able to reproduce them.
However, after having updated mesa to 9.2, I have had no crashes at all, so that might have solves whatever was causing the issues.
Just tossing in my 0,02€:
"It works!"

Thanks for all the hard work on this one.
Comment 96 Eugene 2013-09-02 15:40:11 UTC
Not so fast fellas. Recently checked 3.11RC7 and DRM-Next (current). The result is the same: it still can't boot with my Radeon HD 2600 XT :(
Comment 97 Alex Deucher 2013-09-02 15:48:02 UTC
(In reply to comment #96)
> Not so fast fellas. Recently checked 3.11RC7 and DRM-Next (current). The
> result is the same: it still can't boot with my Radeon HD 2600 XT :(

Can you attach a copy of your vbios?

(as root)
(use lspci to get the bus id)
cd /sys/bus/pci/devices/<pci bus id>
echo 1 > rom
cat rom > /tmp/vbios.rom
echo 0 > rom

Also, can you get a copy of the dmesg output from the driver after it loads?  Try booting into a non-X runlevel without loading radeon (either blacklist it, or set radeon.modeset=0 on the kernel command line in grub) and then manually load it.  E.g.,
modprobe -r radeon
modprobe radeon modeset=1 dpm=1

If possible, try and do it over ssh from a second machine so you can still access it if you lose the display.
Comment 98 Eugene 2013-09-02 16:31:34 UTC
Created attachment 85079 [details]
Radeon HD2600XT vbios
Comment 99 Eugene 2013-09-02 17:10:33 UTC
> Can you attach a copy of your vbios?
> 
> (as root)
> (use lspci to get the bus id)
> cd /sys/bus/pci/devices/<pci bus id>
> echo 1 > rom
> cat rom > /tmp/vbios.rom
> echo 0 > rom
Yes, here it is, in attachment.
 
> Also, can you get a copy of the dmesg output from the driver after it loads?
> Try booting into a non-X runlevel without loading radeon (either blacklist
> it, or set radeon.modeset=0 on the kernel command line in grub) and then
> manually load it.  E.g.,
> modprobe -r radeon
> modprobe radeon modeset=1 dpm=1

I'm sorry I don't understand fully how to do this.

> If possible, try and do it over ssh from a second machine so you can still
> access it if you lose the display.
Also there is no possibility to connect to my pc through ssh.
Comment 100 Alex Deucher 2013-09-03 21:09:19 UTC
(In reply to comment #99)
> > Also, can you get a copy of the dmesg output from the driver after it loads?
> > Try booting into a non-X runlevel without loading radeon (either blacklist
> > it, or set radeon.modeset=0 on the kernel command line in grub) and then
> > manually load it.  E.g.,
> > modprobe -r radeon
> > modprobe radeon modeset=1 dpm=1
> 
> I'm sorry I don't understand fully how to do this.

boot with `radeon.modeset=0 1` on the kernel command line in grub, then when the kernel boots to single user mode, reload the radeon module with dpm=1.  E.g.,

modprobe -r radeon
modprobe radeon modeset=1 dpm=1
Comment 101 Alex Deucher 2013-09-03 21:13:09 UTC
(In reply to comment #96)
> Not so fast fellas. Recently checked 3.11RC7 and DRM-Next (current). The
> result is the same: it still can't boot with my Radeon HD 2600 XT :(

Does disabling aspm help?  `radeon.aspm=0 radeon.dpm=0` on the kernel command line in grub.
Comment 102 Eugene 2013-09-04 19:19:24 UTC
If you could explain me where is kernel command line (when I'm seeing GRUB menu push "e" or "c"; and if "c" what to do after I entered radeon.modeset=0 in there?), I would try it. I'm just a newbie using kubuntu a few months.
Comment 103 Alex Deucher 2013-09-04 19:25:54 UTC
(In reply to comment #102)
> If you could explain me where is kernel command line (when I'm seeing GRUB
> menu push "e" or "c"; and if "c" what to do after I entered radeon.modeset=0
> in there?), I would try it. I'm just a newbie using kubuntu a few months.

in the grub menu select the kernel you want to boot and press 'e' then move to the end of the line that starts:
        linux   /boot/<blah blah blah>
and append the options to the end of that line.  e.g.,
       linux   /boot/<blah blah blah> radeon.modeset=0 1
The '1' means boot into single user mode rather than X.

Also, if you haven't already, try disabling aspm:
       linux   /boot/<blah blah blah> radeon.aspm=0 radeon.dpm=1
Comment 104 Eugene 2013-09-04 20:27:37 UTC
(In reply to comment #103)
> (In reply to comment #102)
> > If you could explain me where is kernel command line (when I'm seeing GRUB
> > menu push "e" or "c"; and if "c" what to do after I entered radeon.modeset=0
> > in there?), I would try it. I'm just a newbie using kubuntu a few months.
> 
> in the grub menu select the kernel you want to boot and press 'e' then move
> to the end of the line that starts:
>         linux   /boot/<blah blah blah>
> and append the options to the end of that line.  e.g.,
>        linux   /boot/<blah blah blah> radeon.modeset=0 1
> The '1' means boot into single user mode rather than X.
> 
> Also, if you haven't already, try disabling aspm:
>        linux   /boot/<blah blah blah> radeon.aspm=0 radeon.dpm=1

Thanks for your explanations. Recently tried it. But...:

with
> linux   /boot/<blah blah blah> radeon.aspm=0 radeon.dpm=1
still blank screen after starting kernel loading

with
> linux   /boot/<blah blah blah> radeon.modeset=0 1
it seems it's booting but I can't see command line - only black screen or text line: "kernel booting... bla-bla-bla" and also no command line. Trying to enter anything gives nothing. But if a enter "reboot" and press Enter it restarts. So it gets the commands but I can't see any display output.
Comment 105 Alex Deucher 2013-09-04 21:09:49 UTC
(In reply to comment #104)
> with
> > linux   /boot/<blah blah blah> radeon.modeset=0 1
> it seems it's booting but I can't see command line - only black screen or
> text line: "kernel booting... bla-bla-bla" and also no command line. Trying
> to enter anything gives nothing. But if a enter "reboot" and press Enter it
> restarts. So it gets the commands but I can't see any display output.

can you blindly type:
modprobe -r radeon
modprobe radeon modeset=1 dpm=1
dmesg > dmesg.log
reboot
Comment 106 Alex Deucher 2013-09-04 21:10:58 UTC
Assuming that works you should have the dmesg.log in the root user directory that you can attach here.
Comment 107 Bryan Quigley 2013-09-05 03:23:17 UTC
Created attachment 85216 [details]
3870/RV670 - dmesg manually loading radeon

I'm also having the issue on a 3870/RV670 using Sept4 drm-next (d30645ae from Ubuntu's mainline builds) and previous builds.

Disabling radeon.aspm=0 also didn't help.  My machine is accessible via SSH so I can do further debugging.
Comment 108 Bryan Quigley 2013-09-05 03:23:58 UTC
Created attachment 85217 [details]
3870/RV670 - kern.log dpm on boot
Comment 109 Bryan Quigley 2013-09-05 03:24:24 UTC
Created attachment 85218 [details]
3870/RV670 - vbios.rom
Comment 110 Bryan Quigley 2013-09-05 03:27:27 UTC
Also, the motherboard has a built-in HD4290 in it, that is BIOS disabled.
Comment 111 Eugene 2013-09-05 13:20:52 UTC
(In reply to comment #106)
> Assuming that works you should have the dmesg.log in the root user directory
> that you can attach here.

Here is my blind result (in attachment).
Comment 112 Eugene 2013-09-05 13:23:56 UTC
Created attachment 85253 [details]
dmesg file

Trying to load Radeon driver (for my HD2600XT card) blindly in single user mode -> dmesg output.
Comment 113 Alex Deucher 2013-09-05 14:00:07 UTC
Created attachment 85256 [details] [review]
add callback for UVD

Hi Eugene,

This patch should fix the crash you are seeing.
Comment 114 Alex Deucher 2013-09-05 14:04:57 UTC
(In reply to comment #107)

> I'm also having the issue on a 3870/RV670 using Sept4 drm-next (d30645ae
> from Ubuntu's mainline builds) and previous builds.

What sort of issue are you having?  blank screen?  currupt image?  GPU hang?
Comment 115 Bryan Quigley 2013-09-05 15:35:11 UTC
(In reply to comment #114)
> 
> What sort of issue are you having?  blank screen?  currupt image?  GPU hang?

Screen goes to powersave when booted with dpm=1.  Still able to ssh in, but seems frozen from keyboard.
Comment 116 Alex Deucher 2013-09-05 16:10:31 UTC
(In reply to comment #115)
> (In reply to comment #114)
> > 
> > What sort of issue are you having?  blank screen?  currupt image?  GPU hang?
> 
> Screen goes to powersave when booted with dpm=1.  Still able to ssh in, but
> seems frozen from keyboard.

Does disabling any of the dpm features as per comment 94 help?
Comment 117 Eugene 2013-09-05 18:12:01 UTC
(In reply to comment #113)
> Created attachment 85256 [details] [review] [review]
> add callback for UVD
> 
> Hi Eugene,
> 
> This patch should fix the crash you are seeing.

Will it be in 3.11.1 or 3.12 next Monday ?
Comment 118 Alex Deucher 2013-09-05 18:20:01 UTC
(In reply to comment #117)
> 
> Will it be in 3.11.1 or 3.12 next Monday ?

Not likely.  I haven't sent the patch upstream yet.
Comment 119 Eugene 2013-09-05 18:24:14 UTC
Then 3.11.2 ?
Comment 120 lucky_beta 2013-09-06 05:26:40 UTC
I've read the previous comments carefully and I'm using the ati3470 GPU card which is the same as "Sergey", also the problem similar, when trying to boot with "radeon.dmp=1", the screen goes white gradually and just hang there, sometimes it can boot but the windows respond very slowly. I want to know whether the problem has been fixed. I just can't find any sure answers from the comments above.
Comment 121 Sergey 2013-09-06 06:02:48 UTC
Hi lucky_beta,

For me it was not 100% fixed. At the moment with Kernel 3.11 it is ~50% chance that system hangs  during boot or the windows are slow.
It better than it was initially, but the issue is not fully resolved.
(Don't have white screen issue even without patches)
Comment 122 Bryan Quigley 2013-09-06 20:43:46 UTC
(In reply to comment #116)
> Does disabling any of the dpm features as per comment 94 help?

Nope, still broken.  (although I did get distracted by a hang that was fixed by the latest drm pull && I thought it was fixed at one point, but it seems I may have mistyped radeon.dpm.. )
Comment 123 lucky_beta 2013-09-07 00:43:14 UTC
(In reply to comment #121)
> Hi lucky_beta,
> 
> For me it was not 100% fixed. At the moment with Kernel 3.11 it is ~50%
> chance that system hangs  during boot or the windows are slow.
> It better than it was initially, but the issue is not fully resolved.
> (Don't have white screen issue even without patches)
Thank you for your reply, i will keep an eye on this topic, although i'm a linux beginner, but if i can do some help except for waiting, just with pleasure.
Comment 124 Sergey 2013-09-11 16:01:42 UTC
I've noticed, that while watching youtube videos, sometimes (actually rather often) everything freezes. Looks like Xorg crash. It tries to recover but screen freezes, though mouse is working and sound (probably everything else is alive).

Dmesg is flooded with messages:
[drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB !
radeon 0000:01:00.0: couldn't schedule ib
Comment 125 Sergey 2013-09-11 16:04:09 UTC
Created attachment 85644 [details]
Xorg log

Here is Xorg log, but according to timestamps it is only for Xorg, that tried to restart. Initial error is not seen.
Comment 126 Alex Deucher 2013-09-11 16:08:27 UTC
(In reply to comment #124)
> I've noticed, that while watching youtube videos, sometimes (actually rather
> often) everything freezes. Looks like Xorg crash. It tries to recover but
> screen freezes, though mouse is working and sound (probably everything else
> is alive).
> 
> Dmesg is flooded with messages:
> [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB !
> radeon 0000:01:00.0: couldn't schedule ib

That's a GPU lock up which may not be related to dpm.  I recently fixed an alignment issue with command buffers that may fix these hangs for you.  you'll need this patch for libdrm:
http://cgit.freedesktop.org/mesa/drm/commit/?id=58d008883165ba35c83041fa9ed84937163d5f76
and this patch for mesa:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=a81beee37e0dd7b75422448420e8e8b0b4b76c1e
Comment 127 Sergey 2013-09-11 16:10:48 UTC
Thanks, I'll try to check these patches.
Comment 128 Eugene 2013-09-15 19:24:07 UTC
Recently tried kernel 3.11.1. Still not booting with radeon.dpm=1. Patch still isn't there for HD2600 ?
Comment 129 Alex Deucher 2013-09-15 23:19:55 UTC
(In reply to comment #128)
> Recently tried kernel 3.11.1. Still not booting with radeon.dpm=1. Patch
> still isn't there for HD2600 ?

I sent the pull request that contained attachment 85256 [details] [review] to Dave last week so it will probably be merged to 3.12 at some point this week and them end up in 3.11 sometime later.
Comment 130 Eugene 2013-09-16 15:30:15 UTC
Thanks for info. I'll wait to test it.
Comment 131 Sergey 2013-09-16 16:38:08 UTC
(In reply to comment #126)
> That's a GPU lock up which may not be related to dpm.  I recently fixed an
> alignment issue with command buffers that may fix these hangs for you. 
> you'll need this patch for libdrm:

With latest mesa and libdrm from git it looks much better. (never got same issue since update; though maybe just lucky so far)
Comment 132 Sergey 2013-09-16 19:51:35 UTC
Created attachment 85936 [details]
Dmesg for Xorg freeze during video playing.

(In reply to comment #131)
> With latest mesa and libdrm from git it looks much better. (never got same
> issue since update; though maybe just lucky so far)
Just was lucky. Caught same issue again. Maybe 'dmesg' will help a bit.
Comment 133 Eugene 2013-09-18 15:43:20 UTC
I seems 3.12 RC1 still has not patch for my HD2600.
Comment 134 Francisco Pina Martins 2013-09-18 15:53:15 UTC
I just want to add that after upgrading to kernel 3.11.1 (from the -RC2 version I was using), I have not experienced any more crashes on my RV635. Nothing I have tried so far has been able to trigger it.
Once again, thank you for all the hard work.
Comment 135 Bryan Quigley 2013-09-30 04:20:34 UTC
Created attachment 86825 [details]
dmesg when under works due to setting .debug=1

I tried again with the latest rc3 build and it still doesn't work; I had left changes from comment 94 intact.  Trying to get more information I tried booting with radeon.modeset=0 radeon.debug=1.  Once I re-loaded the module with dpm, now it works!  So it works correctly if you set it to debug mode, otherwise I can't get any logs of the event.  

This is the dmesg from when it worked (with debug on).  I noticed there are some HDMI errors; I only have DVI actually hooked up.
Comment 136 Alex Deucher 2013-09-30 05:44:17 UTC
(In reply to comment #135)
> Created attachment 86825 [details]
> dmesg when under works due to setting .debug=1
> 
> I tried again with the latest rc3 build and it still doesn't work; I had
> left changes from comment 94 intact.  Trying to get more information I tried
> booting with radeon.modeset=0 radeon.debug=1.  Once I re-loaded the module
> with dpm, now it works!  So it works correctly if you set it to debug mode,
> otherwise I can't get any logs of the event.  
> 

There is no radeon.debug parameter:
[   34.991462] radeon: unknown parameter 'debug' ignored

Seems like you just got lucky this time. Does it work reliably if you disable radeon and boot into a non-X runlevel, then manually load radeon?  E.g., boot with:
radeon.modeset=0 1
on the kernel command line in grub to boot into single user mode.  then at the command prompt:
modprobe -r radeon
modprobe radeon modeset=1 dpm=1

> This is the dmesg from when it worked (with debug on).  I noticed there are
> some HDMI errors; I only have DVI actually hooked up.

You can ignore those.
Comment 137 Bryan Quigley 2013-09-30 18:39:39 UTC
> (In reply to comment #135)
> Seems like you just got lucky this time. Does it work reliably if you
> disable radeon and boot into a non-X runlevel, then manually load radeon? 

Nope definitely not reliably, but I did have it work one more time doing the above (booting with modeset=0) after about 10 or so reboots.  When it fails I've never been able to get any debug information..

Any suggestions of other ways to get more debug information?
Comment 138 Alex Deucher 2013-09-30 21:08:19 UTC
(In reply to comment #137)
> Nope definitely not reliably, but I did have it work one more time doing the
> above (booting with modeset=0) after about 10 or so reboots.  When it fails
> I've never been able to get any debug information..
> 
> Any suggestions of other ways to get more debug information?

Does it hang the entire system as soon as you load the driver, or only when you start X or something like that?

As for debugging, you can try disabling rv6xx_dpm_set_power_state() by returning early (see the patch below).  If that works, move the the return statement further and further down in the function until you can identify at which point in rv6xx_dpm_set_power_state() the hang occurs.  Once we pin point that, we can debug further.


diff --git a/drivers/gpu/drm/radeon/rv6xx_dpm.c b/drivers/gpu/drm/radeon/rv6xx_dpm.c
index 5811d27..bfa2922 100644
--- a/drivers/gpu/drm/radeon/rv6xx_dpm.c
+++ b/drivers/gpu/drm/radeon/rv6xx_dpm.c
@@ -1670,6 +1670,8 @@ int rv6xx_dpm_set_power_state(struct radeon_device *rdev)
        struct radeon_ps *old_ps = rdev->pm.dpm.current_ps;
        int ret;
 
+       return 0;
+
        pi->restricted_levels = 0;
 
        rv6xx_set_uvd_clock_before_set_eng_clock(rdev, new_ps, old_ps);
@@ -2094,6 +2096,8 @@ int rv6xx_dpm_force_performance_level(struct radeon_device *rdev,
 {
        struct rv6xx_power_info *pi = rv6xx_get_pi(rdev);
 
+       return 0;
+
        if (level == RADEON_DPM_FORCED_LEVEL_HIGH) {
                pi->restricted_levels = 3;
        } else if (level == RADEON_DPM_FORCED_LEVEL_LOW) {
Comment 139 Bryan Quigley 2013-10-01 05:04:37 UTC
> Does it hang the entire system as soon as you load the driver, or only when
> you start X or something like that?
Hangs on driver load.  

> As for debugging, you can try disabling rv6xx_dpm_set_power_state() by
> returning early (see the patch below).  
This doesn't work.. actually it seems to fail a bit faster now..  previously it would display kernel messages for a bit, not it goes off after extracting the kernel.

I tried exiting early out of a few other functions like rv6xx_dpm_init, but haven't had any better results.  I put a printk statement in _init, which never got printed.. Could we never make it there?  I couldn't find anything that comes before _init...
Comment 140 Alex Deucher 2013-10-01 13:03:15 UTC
(In reply to comment #139)
> 
> I tried exiting early out of a few other functions like rv6xx_dpm_init, but
> haven't had any better results.  I put a printk statement in _init, which
> never got printed.. Could we never make it there?  I couldn't find anything
> that comes before _init...

rv6xx_dpm_init() doesn't actually touch the hw, it just initializes the driver structures used by dpm.  Try returning early in rv6xx_setup_asic().

The order at module load time looks like:

dpm_init()
dpm_setup_asic()
dpm_enable()
dpm_set_power_state()
Comment 141 Bryan Quigley 2013-10-01 20:08:32 UTC
In /r600_dpm.c - void r600_start_dpm(struct radeon_device *rdev)

+       //return; //returning here works

        r600_enable_sclk_control(rdev, true);

+       return; //returning here doesn't.

Will just try setting it to false next..
Comment 142 Bryan Quigley 2013-10-01 21:22:54 UTC
-       r600_enable_sclk_control(rdev, true);
+       r600_enable_sclk_control(rdev, false);
Does indeed fix it.
Comment 143 Alex Deucher 2013-10-01 22:05:12 UTC
(In reply to comment #142)
> -       r600_enable_sclk_control(rdev, true);
> +       r600_enable_sclk_control(rdev, false);
> Does indeed fix it.

Unfortunately, that disables dynamic engine scaling which on your particular board pretty much disables dpm since the voltage and mclk are static :(
Comment 144 Bryan Quigley 2013-10-05 04:47:21 UTC
(In reply to comment #143)
> Unfortunately, that disables dynamic engine scaling which on your particular
> board pretty much disables dpm since the voltage and mclk are static :(

Well damn.. Any other things to try?  (I think I've exhausted all the other places to put return 0; in the discussed functions)
Comment 145 Paul Bodenbenner 2013-10-06 21:51:35 UTC
I have also a HD 3470 (RV620/M82) and my experience is really great by using dpm, but following problems still occur with kernel 3.12rc3-1 and 3.11.4:
1. Sometimes it doesn't boot. Only a black screen is shown.
2. Most of the times after suspending, system doesn't wake up properly.
3. Rarely system is totally slow, probably clocks... not correctly set.
4. Sometimes GUI crashes when attatching or removing a monitor over HDMI.

I am using the default settings for power states... One strange thing I have also noticed: At bootup I can see the Raw EDID matric and it does change sometimes.
I have pasted some logs at Bug 69729 already. If you need some further information, please let me know.
Comment 146 Alex Deucher 2013-10-07 16:08:52 UTC
*** Bug 70189 has been marked as a duplicate of this bug. ***
Comment 147 Bryan Quigley 2013-11-01 04:20:11 UTC
Tried again with latest git.  Issue still exists.  Any other troubleshooting to try?
Comment 148 720 2013-11-02 10:30:25 UTC
Read that DPM is coming to 3.13.
The issue is still not fixed however.

Any info on this? Is there a chance that it will ever get fixed?
( I followed the conversation but so far it seemed Alex wasn't able to track down the issue, and could only apply workarounds that literally disable the whole power management. )
Comment 149 Paul Bodenbenner 2013-11-02 13:18:13 UTC
I think dpm won't be enabled per default for 3.13 on hd3XXX...

The boot bug is a real big problem. Sometimes it works two or three times, sometimes it takes three tries for a success. Also waking up from suspend doesn't work all the time.

Catalyst is broken with Gnome Shell 3.10... What a suprise.

Real good experience at the moment with ATI/AMD! :-)
Comment 150 720 2013-11-02 13:30:34 UTC
Too bad there is no quirk or some dirty workaround that we could use to fix the boot issue. Because damn, it's not cheap to replace a laptop.
(Which works just perfect on Windows 7.)
Comment 151 Paul Bodenbenner 2013-11-02 13:50:53 UTC
I respect all the good work what has been done in the past for the AMD driver.
But probably they have no hd3XXX card to test the driver. I would pay 20€ for getting them a card...
Comment 152 720 2013-11-02 14:02:58 UTC
(In reply to comment #151)
> I respect all the good work what has been done in the past for the AMD
> driver.
> But probably they have no hd3XXX card to test the driver. I would pay 20€
> for getting them a card...

Fine with me. If Alex is willing to make a paypal address or something, I'll throw a few bucks in too. These cards are old as hell, he can buy them in bulk.
Comment 153 Alex Deucher 2013-11-04 14:04:16 UTC
(In reply to comment #151)
> I respect all the good work what has been done in the past for the AMD
> driver.
> But probably they have no hd3XXX card to test the driver. I would pay 20€
> for getting them a card...

I've got quite a few r6xx cards, however, all of the cards I have access to work fine with dpm.  Unfortunately, r6xx cards are five generations old, so it's much harder to find people that remember or documents describing the details and quirks of dpm on these asics so even if I managed to get a problematic card, I'm not sure how much luck I'd have.
Comment 154 Shawn Starr 2013-11-04 19:55:10 UTC
With the people getting hangs and GPU resets, with DPM on: are you using KDE or GNOME?

I've noticed problems with KDE and the rv6xx and sometimes doing a soft power cycle of laptop will result in the display hanging when it attempts to set up kernel mode setting (KMS).
Comment 155 Paul Bodenbenner 2013-11-04 20:13:42 UTC
(In reply to comment #153)
> I've got quite a few r6xx cards, however, all of the cards I have access to
> work fine with dpm.  Unfortunately, r6xx cards are five generations old, so
> it's much harder to find people that remember or documents describing the
> details and quirks of dpm on these asics so even if I managed to get a
> problematic card, I'm not sure how much luck I'd have.

Ok, thanks for the clarification!

(In reply to comment #154)
> With the people getting hangs and GPU resets, with DPM on: are you using KDE
> or GNOME?
> 
> I've noticed problems with KDE and the rv6xx and sometimes doing a soft
> power cycle of laptop will result in the display hanging when it attempts to
> set up kernel mode setting (KMS).

I am not sure if there are also hangs, but I'm using GNOME Shell 3.10 by getting my previous described failures.
Also I have to use "acpi_sleep=nonvs" as kernel parameter, otherwise system will reboot immediately instead of "trying" to wake up. With catalyst driver this worked well though.
Comment 156 Sergey 2013-11-04 21:47:36 UTC
(In reply to comment #154)
> With the people getting hangs and GPU resets, with DPM on: are you using KDE
> or GNOME?
> 
> I've noticed problems with KDE and the rv6xx and sometimes doing a soft
> power cycle of laptop will result in the display hanging when it attempts to
> set up kernel mode setting (KMS).

I use Fluxbox and still see hand issue. Boot time hang is not likely to be related to DE, since it happens during KMS.

Sometimes it hangs during ordinary work (video playing etc.), then it might recover but usually it don't. And you can see in dmesg that GPU was stuck.
Comment 157 Paul Bodenbenner 2013-11-06 13:37:52 UTC
Created attachment 88753 [details]
boot good 3.12 rv620
Comment 158 Paul Bodenbenner 2013-11-06 13:38:35 UTC
Created attachment 88754 [details]
boot slow 3.12 rv620
Comment 159 Paul Bodenbenner 2013-11-06 13:42:35 UTC
Last two attachments show the output of dmesg.
I described previously that I have next to the general boot problem also troubles after boot that the GUI is really slow.
Hope that helps a bit!
Comment 160 Paul Bodenbenner 2013-11-06 13:44:18 UTC
Forgot to mention, that that happens sometimes, nearly as often, as the general boot problem.
Comment 161 Sergey 2013-11-26 01:22:17 UTC
Hi Alex,

Is there anything we can do to try help fixing the issues left?
Comment 162 Anonymous Helper 2013-11-26 13:34:36 UTC
I had a similar problem with my HD 5730M. But I'm not sure if it belongs to this bug. For me it's possible to workaround by delaying the boot process manually:
e.g. wait ~20 seconds in the GRUB-Menu or pause the bios booting with the key "Break"

Can somebody confirm that?
Comment 163 Bryan Quigley 2013-11-27 14:32:45 UTC
(In reply to comment #162)
> Can somebody confirm that?

Doesn't help with my boottime freeze on a 3870.
Comment 164 Alex Deucher 2013-12-20 13:49:49 UTC
*** Bug 72905 has been marked as a duplicate of this bug. ***
Comment 165 Kajzer 2013-12-21 01:38:14 UTC
Hello, I had problems with just radeon.dpm=1 , as soon as I hit enter screen would go black and monitor would blink like there was no connection, it was either that or it would just reboot.
Anyway, after adding this line :
radeon.audio=0 radeon.aspm=0 radeon.dpm=1
everything works GREAT !
Several reboots (to be sure) and it worked every time.
Played few games on Steam, watched couple of videos and everything works fine without any issues.
 
some info :
sensors :
radeon-pci-0400
Adapter: PCI adapter
temp1:        +49.0°C  (crit = +120.0°C, hyst = +90.0°C)

/sys/kernel/debug/dri/64/radeon_pm_info:
uvd    vclk: 0 dclk: 0                                                                                                                                        
power level 0    sclk: 11000 mclk: 80000 vddc: 900

using this kernel : 3.13.0-rc3-7.gfbe0eb5-desktop

video card is : Radeon HD 3650

Don't know what else I can say about it, if you need more info I will be more that happy to help :)
Comment 166 Nicola Mori 2013-12-23 10:31:01 UTC
(In reply to comment #165)
> radeon.audio=0 radeon.aspm=0 radeon.dpm=1

It didn't work with my Mobility HD3470. Two hangs out 5 reboots.
Comment 167 Nicola Mori 2013-12-23 10:33:50 UTC
(In reply to comment #166)
> (In reply to comment #165)
> > radeon.audio=0 radeon.aspm=0 radeon.dpm=1
> 
> It didn't work with my Mobility HD3470. Two hangs out 5 reboots.

I forgot to mention that I'm using kernel 3.12.6 (with ck patches). Maybe is kernel 3.13 that does the magic together with the kernel command line options?
Comment 168 Kajzer 2013-12-23 13:50:32 UTC
I don't think it's the kernel (3.12 or 3.13) , when I managed to make it work I tried to boot under kernel 3.11 and it worked. 

Anyhow, today it doesn't work anymore, no matter what I do.
I didn't touch a thing, it was working fine for 2-3 days and now it's not.

boots fine with radeon.dpm=0
Comment 169 Francisco Pina Martins 2013-12-23 23:41:16 UTC
I would like to add something to this.
On my Mobility 3650 (rv635/M86), I can normally boot with DPM without any issue at all since 3.11.
However - sometimes (sporadically) I get the frosting screen when playing Freecol.
When this happens, I can no longer boot with DPM enabled. If I don't disable DPM in my bootloader, I always get the frosting screen when modesetting was supposed to be happening.
In order to be able to boot again, I have to disable DPM, boot and leave the computer on for a while (not sure how long, but less than 10 minutes is insufficient and over 1 hour is enough).
If I do this, DPM will work again just fine.
I have no idea why this happens, and I am not sure what to log when it does (actually, nothing gets logged on the failed boots).
Comment 170 Thierry Vignaud 2013-12-30 06:07:16 UTC
I don't know if it can helps you (Alex), but as of kernel-3.12.6/mesa-10.0.1, my RV670 ([Radeon HD 3690/3850]) is pretty reliable when booting wit dpm (usually for resuming hibernation):
- it consistently fails on the first electric power on
  (the screen blacked out, "no signal")
- it consistently works smoothly after powering off then
  powering on after a couple seconds: it resumes smoothly

Does it helps you figure what can be wrong?
Comment 171 Thierry Vignaud 2014-01-09 17:16:55 UTC
Is there any patch I can try against 3.12.6?
Comment 172 Alex Deucher 2014-01-09 17:54:45 UTC
(In reply to comment #171)
> Is there any patch I can try against 3.12.6?

You can try disabling additional dpm features as per comment 94.
Comment 173 Eugene 2014-01-10 10:06:08 UTC
lspci | grep VGA
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV670 [Radeon HD 3870]

It boot if boot from power button, but is case of 'sudo reboot' command, black screen and, it seems, it freezes completely.
Comment 174 Eugene 2014-01-10 10:07:46 UTC
Oh, sorry, I forgot:
uname -a
Linux admin 3.13.0-031300rc7-generic #201401041835 SMP Sat Jan 4 23:36:50 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Comment 175 Jaime Velasco Juan 2014-01-10 18:38:43 UTC
Created attachment 91835 [details] [review]
Disable the DMA ring in R6xx

Hi, I've got an RV620 (HD3450/3470, pci id 1002:95c4). It usually worked with dpm, but hanged on boot maybe 1/10 times. I've been testing the dpm-reorder branch (now drm-next-3.14) and I haven't seen it hang since then.

However, with this branch I got the following:
[drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
and acceleration got disabled. This starts happening with "drm/radeon/pm: move pm handling into the asic specific code".

I think the DMA ring is not used by mesa nor xf86-video-ati so I changed the kernel to skip initialisation of the DMA ring and keep acceleration and with that I've had it working every time (can suspend/resume as well).

I can't be sure this is the cause as the hangs are random, but the results seem promising so far.
Comment 176 Jaime Velasco Juan 2014-01-10 18:40:38 UTC
Created attachment 91836 [details]
dmesg with dpm-reorder patches, DMA ring test failed
Comment 177 Jaime Velasco Juan 2014-01-10 18:41:57 UTC
Created attachment 91837 [details]
dmesg with dpm-reorder patches plus DMA ring deactivation patch (working so far)
Comment 178 Thierry Vignaud 2014-01-11 10:42:57 UTC
(In reply to comment #172)
> > Is there any patch I can try against 3.12.6?
> 
> You can try disabling additional dpm features as per comment 94.

The patch seems to fix the issue.
Booting with only the second half is not enough (the voltage_control & the gfx_clock_gating changes):
I experienced black screen freeze on boot & disabled accel another time.
Comment 179 Alex Deucher 2014-01-11 16:05:24 UTC
(In reply to comment #178)
> (In reply to comment #172)
> > > Is there any patch I can try against 3.12.6?
> > 
> > You can try disabling additional dpm features as per comment 94.
> 
> The patch seems to fix the issue.
> Booting with only the second half is not enough (the voltage_control & the
> gfx_clock_gating changes):
> I experienced black screen freeze on boot & disabled accel another time.

To be clear all of the changes are necessary?

pi->voltage_control = false;
pi->gfx_clock_gating = false;
pi->mclk_ss = false;
pi->dynamic_pcie_gen2 = false;

Are things any better with my 3.14 branch (which contains some dpm rework):
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.14
Comment 180 Thierry Vignaud 2014-01-12 15:21:19 UTC
Care to provide a patch against 3.13rc8?
Comment 181 Alex Deucher 2014-01-12 23:17:29 UTC
(In reply to comment #180)
> Care to provide a patch against 3.13rc8?

I'm not sure I follow.  I though you said you tried the patch in comment 94?  I'm trying to clarify which parts of the patch are necessary for stability on your system.
Comment 182 Thierry Vignaud 2014-01-13 06:03:25 UTC
I've issues doing it, like there's no part enough.
So I would like to test your latest work.
Comment 183 Alex Deucher 2014-01-13 14:05:45 UTC
Created attachment 91947 [details] [review]
patch from comment 94

(In reply to comment #182)
> I've issues doing it, like there's no part enough.
> So I would like to test your latest work.

It's the same as the patch in comment 94.

You might also try my 3.14 branch:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.14
Comment 184 Paul Bodenbenner 2014-01-13 23:18:16 UTC
Tried "drm-next-3.14" branch to see if the boot and suspend problem has been solved.
Unfortunately I got the same problem like "Jaime Velasco Juan" described and skipped therefore further testing.
Comment 185 Thierry Vignaud 2014-01-17 06:19:57 UTC
(In reply to comment #183)
> > I've issues doing it, like there's no part enough.
> > So I would like to test your latest work.
> 
> It's the same as the patch in comment 94.

I meant your drm branch. I would have loved to have a patch against
3.13~rc8. git.fo delivers at 30kb/s :-(


Anway, I'd quite a lot successes with 3.13~rc8 until yesterday evening where it fails several time in a loop (disabled accel then freeze after resuming, several boot failures).
I then tested your 3.14 branch that doesn't help.
Comment 186 Francisco Pina Martins 2014-01-18 14:54:41 UTC
Created attachment 92344 [details]
journalctl crash log

Hello, I am back with something to report for my RV635/M86.
Like I said in comment 93, the crashes still occur, when using DPM.
I have now found a way to reproduce it (or rather - it found me).
I have just switched from using e17 as my DM to e18.
Now, in order to reproduce the crash I just have to use my desktop for a couple of minutes and then it crashes. 
However, unlike before, the system can now recover from the crash. After about a minute with the white screen, everything gets back to normal. Usually only for a few seconds, and then I get the white screen again.
The experience is very stable when booting with "radeon.dpm=0".

I am using ArchLinux with kernel version 3.12

I have attached the journalctl output and I hope this can help you debug things.
Comment 187 Sergey 2014-01-18 22:07:14 UTC
(In reply to comment #183)
> You might also try my 3.14 branch:
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.14

I've been trying this branch for a week. So far no hanging is seen. Though I was not turning off my machine too often, mostly sending it to sleep. Will try to run more shutdown tests.
Comment 188 Sergey 2014-01-20 11:42:14 UTC
(In reply to comment #183)
> You might also try my 3.14 branch:
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.14

Not sure if there are any important changes in this branch, but so far it looks more stable for me. I haven't seen hangs while video was played, though the week before i got them almost daily. And have seen hand during start up only once for last week.

I might be just very lucky this week.
Comment 189 Sergey 2014-01-26 10:33:49 UTC
(In reply to comment #188)
> (In reply to comment #183)
> > You might also try my 3.14 branch:
> > http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.14
> 
> Not sure if there are any important changes in this branch, but so far it
> looks more stable for me. I haven't seen hangs while video was played,
> though the week before i got them almost daily. And have seen hand during
> start up only once for last week.
> 
> I might be just very lucky this week.

Still don't see hangs, but noticed but FPS is very low on this branch. Half Life is like a slideshow, though works fine with vanilla kernel.
Comment 190 Jaime Velasco Juan 2014-01-26 18:11:27 UTC
(In reply to comment #189)
> (In reply to comment #188)
> > (In reply to comment #183)
> > > You might also try my 3.14 branch:
> > > http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.14
> > 
> > Not sure if there are any important changes in this branch, but so far it
> > looks more stable for me. I haven't seen hangs while video was played,
> > though the week before i got them almost daily. And have seen hand during
> > start up only once for last week.
> > 
> > I might be just very lucky this week.
> 
> Still don't see hangs, but noticed but FPS is very low on this branch. Half
> Life is like a slideshow, though works fine with vanilla kernel.

Are you sure you have acceleration enabled, the behaviour you describe matches what I found when started testing that branch (grep dmesg for "*ERROR* radeon: ring 3 test failed (0xCAFEDEAD)").

If you have that problem you could test the patch I sent in comment 175 ( attachment 91835 [details] [review]), It will skip initialization of the DMA ring and enable acceleration again. I've been using it for several weeks, no hangs yet, works like a charm.
Comment 191 Sergey 2014-01-26 21:43:16 UTC
(In reply to comment #190)
> Are you sure you have acceleration enabled, the behaviour you describe
> matches what I found when started testing that branch (grep dmesg for
> "*ERROR* radeon: ring 3 test failed (0xCAFEDEAD)").
Looks like you are right. I have the same error. Thanks.
Will try your patch.
Comment 192 Paul Bodenbenner 2014-01-29 18:46:37 UTC
(In reply to comment #190)
> (In reply to comment #189)
> > (In reply to comment #188)
> > > (In reply to comment #183)
> > > > You might also try my 3.14 branch:
> > > > http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.14
> > > 
> > > Not sure if there are any important changes in this branch, but so far it
> > > looks more stable for me. I haven't seen hangs while video was played,
> > > though the week before i got them almost daily. And have seen hand during
> > > start up only once for last week.
> > > 
> > > I might be just very lucky this week.
> > 
> > Still don't see hangs, but noticed but FPS is very low on this branch. Half
> > Life is like a slideshow, though works fine with vanilla kernel.
> 
> Are you sure you have acceleration enabled, the behaviour you describe
> matches what I found when started testing that branch (grep dmesg for
> "*ERROR* radeon: ring 3 test failed (0xCAFEDEAD)").
> 
> If you have that problem you could test the patch I sent in comment 175 (
> attachment 91835 [details] [review] [review]), It will skip initialization of the DMA
> ring and enable acceleration again. I've been using it for several weeks, no
> hangs yet, works like a charm.

Thanks for your patch! I adapted it to the current git state and I will attach it here (hope I made no mistake ;-)).
So with that patch the drm-next-3.14 branch works very well:
Booted up about 25 times with only 3 hangs. Suspend worked every time. :-)

@Alex Deucher:
Very good job! Whats wrong with ring stuff. Do we need a new version of xf86-video-ati?! Also can I do anything against the seldom hangs?

Best Regards,
Paul
Comment 193 Paul Bodenbenner 2014-01-29 18:48:16 UTC
Created attachment 93010 [details]
(Disable the DMA ring in R6xx) Adapted to current git state
Comment 194 Shawn Starr 2014-01-30 02:54:17 UTC
I wonder if something isn't being reset in VBIOS when it fails to boot after POST, I see the BIOS boot up, grub display, but after kernel starts booting (when radeon is loaded), we just die.

Alex, would it be possible for the radeon driver prior to loading DPM to dump the register state of GPU and then dump it after DPM to console (so maybe netconsole or serial cable can capture the output)?

So, 

1) prior to rebooting machine, use a tool to dump the current register states of GPU/hw

2) upon loading radeon.ko but prior to DPM being switched on, dump the registers of GPU/hw to screen and then switch DPM on, then do the dump again (assuming this isn't going to lock up system).

maybe we can narrow down something not being cleared upon booting up sometimes?

Thanks,
Shawn
Comment 195 Sergey 2014-02-08 06:17:15 UTC
Still see hangs with patch from comment 193.
Not during boot though, only during work.
Comment 196 Paul Bodenbenner 2014-02-16 22:19:13 UTC
I don't need the patch anymore. Probably it's because of switching to the git packages of mesa and xorg. Also using 3.14rc2 kernel now.
Booting and suspending works great with this uptodate packages!
Comment 197 Shawn Starr 2014-02-17 13:11:29 UTC
With 3.14-rcX are people seeing lockup on reboots anymore on HD2000-HD4000 series? I'm going to switch my W500 to it, right now it's hung while im away *sigh*, so I can't remotely reset it.
Comment 198 Thierry Vignaud 2014-02-17 17:22:14 UTC
AFAIC I still see hang black screen with 3.14-rc2 + attachment #91947 [details] [review] from comment 183
Comment 199 Bryan Quigley 2014-02-18 05:06:41 UTC
Latest kernel (daily build from the 12th) get's me a bit closer:
Instead of booting to a video hang (monitor switch to powersave mode) it now hangs with the monitor on.

> > Does it hang the entire system as soon as you load the driver, or only when
> > you start X or something like that?
> Hangs on driver load.  
It now hangs on startx.  I can load the system in single user mode and then switch to dpm=1 fine.  It's when starting X it hangs now.

What is relevant to try now?
Comment 200 Alex Deucher 2014-02-24 15:09:46 UTC
*** Bug 74420 has been marked as a duplicate of this bug. ***
Comment 201 Michel Dänzer 2014-03-10 06:32:04 UTC
*** Bug 74420 has been marked as a duplicate of this bug. ***
Comment 202 Nicola Mori 2014-03-29 14:33:28 UTC
With linux 3.14-RC8 things are much better for my RV620 (Mobility HD3470). I booted about ten times without any hang, only once I was dumped to tty with X server refusing to start (unfortunately I lost the log, thanks to the foolish log retain policy of X...). But no hang with black screens as before, yet.
Comment 203 Thierry Vignaud 2014-03-29 18:31:51 UTC
(In reply to comment #202)
> With linux 3.14-RC8 things are much better for my RV620 (Mobility HD3470). I
> booted about ten times without any hang, only once I was dumped to tty with

That's just the law of series.
I can boot smoothly ten times and then fails 10 times too..
Comment 204 Nicola Mori 2014-03-29 18:51:47 UTC
Created attachment 96597 [details]
Xorg.0.log for failed start of X server

@Thierry: understood, I still do not experience any hang but I'll keep eyes open. Meanwhile it happened again that X server failed to start and this time I have saved the X log. It seems that the problem is:

[     2.501] (II) [KMS] drm report modesetting isn't supported.

while for successful boots I get:

[     4.203] (II) [KMS] Kernel modesetting enabled.

Also, for successful boot I sse in the log:

[     2.510] (II) xfree86: Adding drm device (/dev/dri/card0)

which is missing in the log for failed X start.
Comment 205 Bryan Quigley 2014-04-01 04:27:09 UTC
1. (In reply to comment #199)

I've reproduced the hang without X, instead using kmscon.
This causes the graphics hang (can still switch back to the original VT and kill it though):  /usr/local/bin/kmscon --vt 5 --hwaccel
Booting with dpm off or without --hwaccel doesn't trigger the issue.
Comment 206 Kajzer 2014-04-07 21:27:42 UTC
With kernel 3.14 things are much better, sometimes it doesn't boot but that's very rarely, once booted everything works fine, had no crashes at all.
The only problem for me is that I cannot run any game from steam except two (Penumbra Overture and Half Life) , those games that wouldn't start are working with catalyst.
I tried latest mesa and x11, makes no difference.
Comment 207 lockheed 2014-04-07 21:30:31 UTC
Well, for me nothing changed on RV635. Computer boots perfectly fine, but video still crashes on random state dpm changes, sometimes in 10 minutes, sometimes in 10 hours after boot. But always when there is unsaved work on the screen.
Comment 208 Kajzer 2014-04-07 23:38:03 UTC
(In reply to comment #207)
> Well, for me nothing changed on RV635. Computer boots perfectly fine, but
> video still crashes on random state dpm changes, sometimes in 10 minutes,
> sometimes in 10 hours after boot. But always when there is unsaved work on
> the screen.

Hm, that's interesting, I'm using the same driver RV635, card is HD 3650
Had the system up for 3 days and had no crash, had to reboot for other reasons.
Watched many videos (XBMC, mostly HD movies) , played few games.
Working fine really, except like I said, I had black screens on boot few times, when that happens the very next reboot is success.
Maybe there are other things in play here, really can't tell.
btw I'm running OpenSuse 13.1 x64 , stock mesa 9.2 , kernel 3.14.0-5

Can you reproduce it every time ? 
If yes what do you do exactly ? I would like to try that.
Comment 209 lockheed 2014-04-08 07:59:36 UTC
> (In reply to comment #207)
> Hm, that's interesting, I'm using the same driver RV635, card is HD 3650
> Had the system up for 3 days and had no crash, had to reboot for other
> reasons.
> Watched many videos (XBMC, mostly HD movies) , played few games.
> Working fine really, except like I said, I had black screens on boot few
> times, when that happens the very next reboot is success.
> Maybe there are other things in play here, really can't tell.
> btw I'm running OpenSuse 13.1 x64 , stock mesa 9.2 , kernel 3.14.0-5
> 
> Can you reproduce it every time ? 
> If yes what do you do exactly ? I would like to try that.

It's a Radeon HD 3650 in a ThinkPad W500.
Arch Linux x64 
mesa 10.1, 10.2, all git, updated daily
Kernel 3.12-3.14 (not tested earlier)

Yes, I can reproduce it every time (ie. I can't say WHEN it will happen apart from the fact it WILL happen between 20 minutes and 20 hours), which is why I have to stay on Catalysts.

The only thing I noticed that is most likely to cause it almost immediately is moving windows inside a VNC client session connected to a server to a server. In such case the crash occurs within seconds to minutes.

That's my logs: https://bugs.freedesktop.org/show_bug.cgi?id=74420
Comment 210 Shawn Starr 2014-04-08 13:44:25 UTC
It's same for me, I can have several days of stability on my W500 w/ RV636 (HD 3650) and then GPU reset.

I note, the GPU never recovers, when it attempts to Xorg is basically frozen, I can VT switch to console but sometimes switching back to X it deadlocks completely.

I do recall eariler in 3.x GPU reset worked and it would recover. Now, it's a lost cause.
Comment 211 lockheed 2014-04-08 13:50:39 UTC
(In reply to comment #210)
> I note, the GPU never recovers, when it attempts to Xorg is basically
> frozen, I can VT switch to console but sometimes switching back to X it
> deadlocks completely.
>

You are lucky. I can't recall a single instance where I could switch to VT when EQ starts overflowing.
Comment 212 Nicola Mori 2014-04-14 07:38:17 UTC
During the last two weeks I had no problem with dpm on my RV620 with kernel 3.14. I booted every day, sometimes multiple times, and the system never hanged on boot nor during normal usage. Do other users with RV620 experience the same? For what concerns my experience, dpm can be deemed as stable for RV620.
Comment 213 Sergey 2014-04-14 09:05:56 UTC
(In reply to comment #212)
> Do other users with RV620 experience
> the same? For what concerns my experience, dpm can be deemed as stable for
> RV620.

I still see the issue. (I use RV620)

Actually for me it is quite the opposite. I'm seeing more hangs after I've switched to 3.14. Specially during playing video from YouTube. And after hang it usually fails to boot normally from first attempt. On 3.13 it looked more stable. But it might be just pure luck, since I haven't run any objective testing. And number of hangs just correlates with frequency of video playing.
Comment 214 Kajzer 2014-04-15 12:47:14 UTC
On my RV635 things are really great, still not a single crash, DPM working without any issues.
Running 24/7 for weeks.

However, still seeing dark screen on boot sometimes, eventually it's going to boot and then it's golden.
Not a big issue though would love to see that fixed finally,
Comment 215 lockheed 2014-04-15 12:50:45 UTC
(In reply to comment #214)
> On my RV635 things are really great, still not a single crash, DPM working
> without any issues.
> Running 24/7 for weeks.
> 

Try mesa 10.1 or 10.2
I thought that is the whole point of using open drivers on Radeon - to use the superior 10.1+ drivers over Catalyst and feature-, performance, and powersaving- gutted Mesa 9.2
Comment 216 Kajzer 2014-04-15 14:25:32 UTC
(In reply to comment #215)
> (In reply to comment #214)
> > On my RV635 things are really great, still not a single crash, DPM working
> > without any issues.
> > Running 24/7 for weeks.
> > 
> 
> Try mesa 10.1 or 10.2
> I thought that is the whole point of using open drivers on Radeon - to use
> the superior 10.1+ drivers over Catalyst and feature-, performance, and
> powersaving- gutted Mesa 9.2

I did, had to go back to 9.2 though, there were some strange artifacts (textures were sometimes bizarre) on few games and in xbmc , not on desktop. However, dpm was still working fine and that's the point of this topic I think :)
Performance and mesa features will come in time I guess.
Main thing for me is dpm, powersaving.
I mean, beside debian wheezy I can't even use catalyst, it's possible with a painful downgrading of xorg but I don't like that.
With open drivers I can now use any distro without overheating.
Yeah, opengl sucks (mesa) at the moment but things might change soon.
Comment 217 lockheed 2014-04-15 15:55:35 UTC
> However, dpm was still working fine and that's the point of this topic I
> think :)
> Performance and mesa features will come in time I guess. Main thing for me is dpm, powersaving.

Yeah, that's the whole point. Performance and features are already here. What is not here is working DPM. As it is now, if crashes profusely on our chips.


> I mean, beside debian wheezy I can't even use catalyst, it's possible with a
> painful downgrading of xorg but I don't like that.
> With open drivers I can now use any distro without overheating.
> Yeah, opengl sucks (mesa) at the moment but things might change soon.

Try Arch.
Comment 218 Kajzer 2014-04-15 17:36:36 UTC
(In reply to comment #217)
> Yeah, that's the whole point. Performance and features are already here.
> What is not here is working DPM. As it is now, if crashes profusely on our
> chips.

Yeah, well it's a weird thing this problem, it crashes for you and it doesn't crash for me, you can boot every time and I can't boot every time, you see performance and features and I don't.

Although, regarding performance and features, I was only curious to see if some games from Steam which worked under catalyst will work with open driver, and they didn't.
So for me I guess features are not there yet, performance maybe.
Comment 219 lockheed 2014-04-15 17:48:47 UTC
> Yeah, well it's a weird thing this problem, it crashes for you and it
> doesn't crash for me, you can boot every time and I can't boot every time,
> you see performance and features and I don't.
> 
> Although, regarding performance and features, I was only curious to see if
> some games from Steam which worked under catalyst will work with open
> driver, and they didn't.
> So for me I guess features are not there yet, performance maybe.

I don't game, so am not sure about performance. What I do know is the power management got way better, and with current drivers I get temperatures comparable (or even slightly lower) than with Catalysts.

What I think is key here, is that you are using 9.2, and I get my issues on 10.x.

Also, my understanding was that this bug list is for DPM problems in the newest mesa.  If this is so, then I don't even know why are you in this thread in the first place, since you are using 9.2.
Comment 220 Alex Deucher 2014-04-15 18:47:23 UTC
(In reply to comment #219)
> 
> Also, my understanding was that this bug list is for DPM problems in the
> newest mesa.  If this is so, then I don't even know why are you in this
> thread in the first place, since you are using 9.2.

Mesa and dpm are unrelated.  This bug is about dpm stability on rv6xx asics regardless of what version of the userspace drivers you are using.
Comment 221 lockheed 2014-04-15 18:50:27 UTC
> Mesa and dpm are unrelated.  This bug is about dpm stability on rv6xx asics
> regardless of what version of the userspace drivers you are using.

So what is DPM dependant on/part of? xf86-video-ati?
Comment 222 Alex Deucher 2014-04-15 18:52:12 UTC
(In reply to comment #221)
> > Mesa and dpm are unrelated.  This bug is about dpm stability on rv6xx asics
> > regardless of what version of the userspace drivers you are using.
> 
> So what is DPM dependant on/part of? xf86-video-ati?

It's part of the radeon kernel driver.  It doesn't matter what userspace drivers (mesa, xf86-video-ati) you use.
Comment 223 lockheed 2014-04-15 18:54:17 UTC
> It's part of the radeon kernel driver.  It doesn't matter what userspace
> drivers (mesa, xf86-video-ati) you use.

Ok, thanks for enlightening me. I had things confused. So this is a kernel bug list? I thought it is X-related.
Comment 224 Alex Deucher 2014-04-15 18:58:36 UTC
(In reply to comment #223)
> Ok, thanks for enlightening me. I had things confused. So this is a kernel
> bug list? I thought it is X-related.

https://bugs.freedesktop.org covers lots of projects including the drm kernel drivers.  The product and component fields target different parts of the stack.  In this case Product = DRI and Component = DRM/Radeon is the radeon kernel driver.  Product = Mesa and Component = Drivers/Gallium/r600 would be the mesa driver.
Comment 225 Kajzer 2014-04-15 19:49:05 UTC
(In reply to comment #219)
> Also, my understanding was that this bug list is for DPM problems in the
> newest mesa.  If this is so, then I don't even know why are you in this
> thread in the first place, since you are using 9.2.

I'm here because the title of this thread says it all.

Maybe you should try to test your dpm issues with something more stable, just install kernel 3.14, don't go crazy with git versions of X and Mesa, maybe your current problems will go away.
Comment 226 Kajzer 2014-04-17 18:49:10 UTC
Gotta correct something about what I said regarding Steam and some games not working with open drivers, just found out what the problem was with that, problem was patent S3TC, installing very small library libtxc_dxtn solved all those problems, every game works now and performance is great, even with Mesa 9.2 :)
Comment 227 Nicola Mori 2014-04-22 07:39:26 UTC
This morning I got my first black screen on boot since the update to kernel 3.14. The subsequent boot went fine. On the journal boot log I noticed that the first boot hangs just before this line:

apr 22 09:08:27 elric kernel: [Firmware Bug]: ACPI(VGA) defines _DOD but not _DOS

i.e., the above line is missing in the log for the failed boot while it is present in the log for the subsequent successful boot.
Sorry if this is only noise, maybe it doesn't help at all but I'm trying to give any possibly relevant piece of information.
Comment 228 sleepforlife 2014-05-10 17:58:12 UTC
I think I got the same problem

kernel 3.14.2-1 on arch linux.

lspci | egrep "VGA|3D|Display"
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV635/M86 [Mobility Radeon HD 3650]


$cat /etc/X11/xorg.conf.d/20-radeon.conf
Section "Device"
       Identifier  "My Graphics Card"
       Option "AccelMethod" "EXA"
       Option "EXAVSync" "yes"
EndSection

/var/log/Xorg.0.log = http://pastebin.com/dR2Vayr7

I'm opening DPM
and sometimes freezes the computer
alt + f2 does not respond

Is there a solution to this or the updates will solve this problem?
I want to use DPM
because perfected  performance
hopefully resolve
Comment 229 Francisco Pina Martins 2014-05-17 23:47:37 UTC
Just wanted to add that after I started using linux-3.14.x my issues with DPM are gone when using my Mobility Radeon 3650 (rv635/M86). Everything is just working fine.
Once again, thanks for all the hard work.
Comment 230 lockheed 2014-05-18 08:05:04 UTC
(In reply to comment #229)
> Just wanted to add that after I started using linux-3.14.x my issues with
> DPM are gone when using my Mobility Radeon 3650 (rv635/M86). Everything is
> just working fine.
> Once again, thanks for all the hard work.

Francisco, were your problems related to random video crashes while using your system, or related to boot?

What laptop are you using, if I may ask?
Comment 231 Francisco Pina Martins 2014-05-18 14:39:44 UTC
@lockheed:
During initial testing (linux 3.11-rcX - 3.11.X) I used to have booting problems.
During 3.12 - 3.13, I had the "frosting" screen problem during usage.
Since 3.14 I am not experiencing any of them.
Also note that this is no longer my main machine, and as such I don't use it as often nowadays. However, I could very easily reproduce the frosting screen by using enlightenment18 for a few minutes, and this frosting no longer happens even after hours of usage.

My laptop is a Compal IFL90 (http://www.notebookcheck.net/Review-Compal-FL90-Notebook.4209.0.html), with a T7500 CPU, and I have replaced the stock nVidia 8600M GT (broken) with a Mobility Radeon HD 3650 (256Mb DDR3 RAM - there is a slower variant of this card with 512Mb DDR2 RAM).

The card I am now using comes with the ACER VBios.

On the software side, I have a fully up-to-date (as of today) Arch Linux x86_64 (testing repositories disabled - only "stable" stuff).
Comment 232 lockheed 2014-05-18 16:32:21 UTC
@Francisco,
Thanks for the clarification.
This “frosting” screne problem – does it freeze the screen apart from scrambling, and forces you to reboot?

I found that on 3.13 I was able to use my laptop sometimes for hours, or even a day, without this happening. But it eventually always happened.
I am on a Thinkpad W500 now with two GPUs. I switched to Intel for the moment and everything is fine. I will try again switching back to rv635 to try open drivers. Maybe soon, as you report no problems.

However, I came to appreciate the temperature drop, battery life increase, and better Flash performance on the Intel 4500 chip.

And I am on arch, too. Which Mesa version are you using? 
Have you tried this? https://aur.archlinux.org/packages/mesa-r300-r600-radeonsi-git/
Comment 233 Francisco Pina Martins 2014-05-19 09:47:42 UTC
@lockheed:

The frosting screen forced me to do a reboot, since after it happened, X became unusably slow.

I am using the stable version of mesa: 10.1.3-1
I have used mesa-git in the past (well before DPM), to squeeze a bit of extra performance from my card.
But since this is no longer my main machine I have reverted to stable packages in order to lighten up the maintenance burden.
Let's hope DPM starts working fine on your card too.
Comment 234 Paul Bodenbenner 2014-07-10 12:43:12 UTC
Just want to note:
Still about every 10 time of booting or waking up from suspend an / the hard freeze occurs with an up-to-date Arch Linux installation.
Comment 235 Harald Judt 2014-07-27 10:35:27 UTC
With the latest kernel 3.15.6 my RV635 boots fine and no longer hangs. The only problems that are still present are when resuming from hibernation. Even there things are much better now, because most times there is actually something to read on the screen instead of blankness.

It got nothing to do with the hibernation process itself, because I can resume from the same image after a few tries. I'll see if I can get more info by using an initrd.

Also, I will test whether this occurs with suspend/resume too or only with hibernate/resume.
Comment 236 Harald Judt 2014-07-27 12:02:29 UTC
Ok, unfortunately I only thought I was lucky. My X server froze again, but the machine remained accessible via ssh, so here is the last part of the dmesg:

radeon 0000:01:00.0: ring 0 stalled for more than 10017msec
radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000c72a7 last fence id 0x00000000000c7299 on ring 0)
radeon 0000:01:00.0: Saved 3289 dwords of commands on ring 0.
radeon 0000:01:00.0: GPU softreset: 0x00000008
radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000002
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80000645
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[drm] PCIE gen 2 link speeds already enabled
[drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
radeon 0000:01:00.0: WB enabled
radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88007fd6ec00
[drm] ring test on 0 succeeded in 1 usecs
radeon 0000:01:00.0: ring 0 stalled for more than 10000msec
radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000c7301 last fence id 0x00000000000c729a on ring 0)
[drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
[drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
radeon 0000:01:00.0: ib ring test failed (-35).
radeon 0000:01:00.0: GPU softreset: 0x00000019
radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA20034E0
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x01000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00001002
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00028486
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80838645
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[drm] PCIE gen 2 link speeds already enabled
[drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
radeon 0000:01:00.0: WB enabled
radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88007fd6ec00
[drm] ring test on 0 succeeded in 1 usecs
[drm] ib test on ring 0 succeeded in 0 usecs
[drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
switching from power state:
	ui class: none
	internal class: boot 
	caps: video 
	uvd    vclk: 0 dclk: 0
		power level 0    sclk: 72500 mclk: 40000 vddc: 1250
		power level 1    sclk: 72500 mclk: 40000 vddc: 1250
		power level 2    sclk: 72500 mclk: 40000 vddc: 1250
	status: c b 
switching to power state:
	ui class: performance
	internal class: none
	caps: single_disp video 
	uvd    vclk: 0 dclk: 0
		power level 0    sclk: 11000 mclk: 25200 vddc: 900
		power level 1    sclk: 30000 mclk: 35000 vddc: 1000
		power level 2    sclk: 72500 mclk: 40000 vddc: 1250
	status: r
Comment 237 Shawn Starr 2014-08-03 03:47:37 UTC
Can people try the following options:

Boot your kernel with:

radeon.runpm=1 radeon.dpm=1 [ radeon.hard_reset=1 optional one]

Get the correct PCI address from sysfs)

echo high > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power_dpm_force_performance_level
echo performance > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power_dpm_state

You will get hotter, is your GPU stable during use?

I've been running it this way now and no resets have occurred, rebooting works w/o issue also.
Comment 238 Paul Bodenbenner 2014-08-04 21:32:06 UTC
I am pretty sure that the problem which I encounter isn't related to a thermal problem. At the next hang I will check if I still can connect through ssh for checking logs.
Comment 239 Kajzer 2014-08-31 00:06:20 UTC
@Shawn Starr, 

I can confirm 100% that there are no freezes anymore with auto->high and balanced->performance.
I can tell that for sure because I was able to reproduce freeze every time.
With this there were no freezes, not once.

As for booting I didn't try your suggestions because I rarely reboot and since I'm on these changes (no freezes) it doesn't crash on boot either.
Before those changes it used to crash on boot.
I guess one is related to other, but on that I'm not sure, really not in the mood to reboot hundreds of times to see if its gonna hang.
If it does I might try with additional boot kernel radeon options.

btw theres no need to complicate the commands, I do it like this :

echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level

echo performance > /sys/class/drm/card0/device/power_dpm_state
Comment 240 Kajzer 2014-09-02 01:16:42 UTC
Update: just had a crash on boot, I guess Ill try your options for boot and see will it happen again.
Comment 241 Alex Deucher 2014-09-08 02:11:10 UTC
(In reply to comment #239)
> echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
> 
> echo performance > /sys/class/drm/card0/device/power_dpm_state


This is effectively the same as disabling dpm.
Comment 242 Kajzer 2014-09-09 17:32:21 UTC
(In reply to comment #241)
> (In reply to comment #239)
> > echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
> > 
> > echo performance > /sys/class/drm/card0/device/power_dpm_state
> 
> 
> This is effectively the same as disabling dpm.

Hm, doesn't appear like that.
The only difference I can see from the default dpm values (auto, balanced) is that I'm running 2 Celsius hotter and there are no crashes.

Without dpm it's 25 Celsius hotter and that's just on idle.
Comment 243 Kajzer 2014-09-09 23:51:20 UTC
I mean, sure, it's not dynamic and it runs on max, but Ill take it.
It is what it is, on auto it freezes, theres not much I can do about it.
This way I can switch manually from high to low when needed, yeah its a pain but at least it's working without crashes.
Comment 244 Alex Deucher 2014-09-10 20:13:22 UTC
Created attachment 106085 [details] [review]
workaround for basic enablement

As per feedback from the last few comments the attached patch forces the performance level to high rather than auto which should fix the stability issues and lower power usage due to clockgating, etc. and enables dpm by default for rv6xx.
Comment 245 Kajzer 2014-09-11 13:21:36 UTC
(In reply to comment #244)
> Created attachment 106085 [details] [review] [review]
> workaround for basic enablement
> 
> As per feedback from the last few comments the attached patch forces the
> performance level to high rather than auto which should fix the stability
> issues and lower power usage due to clockgating, etc. and enables dpm by
> default for rv6xx.

That's great, thanks.
Comment 246 Mateusz Jończyk 2014-10-25 09:30:37 UTC
(In reply to Alex Deucher from comment #244)
> Created attachment 106085 [details] [review] [review]
> workaround for basic enablement
> 
> As per feedback from the last few comments the attached patch forces the
> performance level to high rather than auto which should fix the stability
> issues and lower power usage due to clockgating, etc. and enables dpm by
> default for rv6xx.

Is this patch going to be mainlined? (or was it mainlined already?)
Comment 247 Alex Deucher 2014-10-27 16:04:41 UTC
(In reply to Mateusz Jończyk from comment #246)
> (In reply to Alex Deucher from comment #244)
> > Created attachment 106085 [details] [review] [review] [review]
> > workaround for basic enablement
> > 
> > As per feedback from the last few comments the attached patch forces the
> > performance level to high rather than auto which should fix the stability
> > issues and lower power usage due to clockgating, etc. and enables dpm by
> > default for rv6xx.
> 
> Is this patch going to be mainlined? (or was it mainlined already?)

Not until it's tested and it proves to work reliably on all the problematic systems here.
Comment 248 Laurento Frittella 2014-10-27 20:58:55 UTC
(In reply to Alex Deucher from comment #247)
> Not until it's tested and it proves to work reliably on all the problematic
> systems here.

Unfortunately I tried your patch on kernel 3.17.1 and it doesn't work, my system still hangs resuming after hibernate. Everything is working well with DPM disabled.

$ lspci | grep VGA
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV620/M82 [Mobility Radeon HD 3450/3470]

Please let me know if I can provide any other useful information.
Comment 249 Kajzer 2014-11-05 23:44:56 UTC
Patch is working fine, had zero issues.
However, it still hangs on boot sometimes, it doesn't actually hang, it looks like it's crashed for a couple of seconds, monitor flashes on and off, then it boots but without dpm.
Comment 250 Kajzer 2014-12-30 00:58:59 UTC
With kernels 3.17 and 3.18 I have freezes when playing some games, those freezes can happen somewhere between 10 minutes and 1 hour.
Freeze occur only in games.
With kernel 3.16 everything is normal and there are no freezes.
Latest patch (auto->high and radeon.dpm=1) is used in each of those kernels.
Everything else is exactly the same, just booting with different kernel makes a difference.
Comment 251 Alex Deucher 2014-12-31 03:04:55 UTC
(In reply to Kajzer from comment #250)
> With kernels 3.17 and 3.18 I have freezes when playing some games, those
> freezes can happen somewhere between 10 minutes and 1 hour.
> Freeze occur only in games.
> With kernel 3.16 everything is normal and there are no freezes.
> Latest patch (auto->high and radeon.dpm=1) is used in each of those kernels.
> Everything else is exactly the same, just booting with different kernel
> makes a difference.

Can you bisect?
Comment 252 Kajzer 2015-01-01 01:27:44 UTC
(In reply to Alex Deucher from comment #251)
> (In reply to Kajzer from comment #250)
> > With kernels 3.17 and 3.18 I have freezes when playing some games, those
> > freezes can happen somewhere between 10 minutes and 1 hour.
> > Freeze occur only in games.
> > With kernel 3.16 everything is normal and there are no freezes.
> > Latest patch (auto->high and radeon.dpm=1) is used in each of those kernels.
> > Everything else is exactly the same, just booting with different kernel
> > makes a difference.
> 
> Can you bisect?

I can try, never done that.
I'm on gentoo and I guess I'll follow this guide for bisecting :
http://wiki.gentoo.org/wiki/Kernel_git-bisect

I'm using these kernel versions :
  [1]   linux-3.16.7-gentoo *
  [2]   linux-3.17.7-gentoo
  [3]   linux-3.18.1-gentoo

I'll mark 3.16.7 as good and 3.17.7 as bad, might take some time though because there are lots of patches between those two versions. My guess is that 3.18 picked up whatever was bad for this bug from 3.17
I'll try to do this ASAP but it might take some time.
This line from that bisect guide got me scared :
"Try to narrow the versions down as much as possible before starting the bisect, you might need to recompile the kernel a lot of times otherwise."

If you know any faster way or have any suggestion please do tell.
It might be easier (for me) if I can see somewhere changes in radeon only (3.16 -> 3.17) and then compile the kernel with or without those changes until I find which one is a culprit.
Comment 253 Alex Deucher 2015-02-21 15:13:41 UTC
*** Bug 89262 has been marked as a duplicate of this bug. ***
Comment 254 Alex Deucher 2015-02-24 14:42:00 UTC
*** Bug 89294 has been marked as a duplicate of this bug. ***
Comment 255 Maciej Gluszek 2015-02-28 15:28:25 UTC
*** Bug 89196 has been marked as a duplicate of this bug. ***
Comment 256 Guram Savinov 2015-04-04 17:57:58 UTC
What is the progress of this issue?

I have this bug for a long time since I set DPM on for my HD3650(RV635 chip).
My current kernel is 3.13.0-48-generic from Ubuntu.

Few days ago I set this two kernel parameters: radeon.hard_reset=1 radeon.lockup_timeout=20000, but I'm not sure that it helps.

I think that workoround with setting high performance level is not a solution, because it same as turning off DPM at all. People say that this workaround doesn't make GPU too hotter, but I think it's not true for every graphic card.
HD3650 with performance profile (before DPM was developed I used profiles) make it very hot, cooler buzz loudly trying to decrease GPU temperature.
Comment 257 Maciej Gluszek 2015-04-30 09:04:53 UTC
@Guram Savinov: I also tried setting DPM + high performance for my HD3650 and it worked but the fan was still running too loud (not overheating just loud).

Then i installed Kernel 4.0, removed "high performance" settings and it works great now. I still have DPM set during boot but no more setting the card to be running on high performance. Couple of days and no lockups.

I'm on Ubuntu.
Comment 258 Guram Savinov 2015-04-30 22:34:38 UTC
(In reply to Maciej Gluszek from comment #257)
> @Guram Savinov: I also tried setting DPM + high performance for my HD3650
> and it worked but the fan was still running too loud (not overheating just
> loud).
> 
> Then i installed Kernel 4.0, removed "high performance" settings and it
> works great now. I still have DPM set during boot but no more setting the
> card to be running on high performance. Couple of days and no lockups.
> 
> I'm on Ubuntu.

Maybe turning off DPM (do not set radeon.dpm=1 in kernel parameters) is better than using DPM with high performance?

What Ubuntu release do you use? How you installed kernel 4.0?
Comment 259 Maciej Gluszek 2015-04-30 22:39:47 UTC
Sorry, I was wrong after all. After 2 days i got a lockup again and went back to the old method.

When i don't set DPM at boot GPU is overheating and the fan goes crazy. When setting performance to something other than "high" - lockups happen.

I'm on Ubuntu 14.04 and using kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/
Comment 260 Guram Savinov 2015-05-01 18:35:52 UTC
I mean that we should try two old powersave methods: profiles and dynamic frequency switching: https://wiki.archlinux.org/index.php/ATI#Powersaving
I used to use profiles and I didn't have any problems with GUI, but GPU was more hot than with DPM.
Comment 261 Armin Wehrfritz 2015-05-24 17:07:06 UTC
I just tested dpm for the radeon RV635 card in my Thinkpad T500 with the kernel 4.0.4-2.g4f5e0d5-desktop x86_64 under OpenSUSE 13.2. 
I probably should mention that I have not tested to change 'power_dpm_force_performance_level' or 'power_dpm_state', so both of them are at their default.

1) with radeon.dpm=1 the GPU will not be initialised correctly due to some issues with the UVD, here the log:

[drm:uvd_v1_0_start [radeon]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[drm:uvd_v1_0_start [radeon]] *ERROR* UVD not responding, giving up!!!
[drm:r600_startup [radeon]] *ERROR* radeon: failed initializing UVD (-1).
radeon 0000:01:00.0: ring 0 stalled for more than 10020msec
radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000000 last fence id 0x0000000000000001 on ring 0)
[drm:r600_ib_test [radeon]] *ERROR* radeon: fence wait failed (-35).
[drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on GFX ring (-35).
[drm:radeon_device_init [radeon]] *ERROR* ib ring test failed (-35).



2) with radeon.dpm=1 and radeon.hard_reset=1 the GPU will be initialised correctly and the system will bootup properly.
However, when opening google-chrome the GPU will lockup and the X-server crashes, here the log:

radeon 0000:01:00.0: ring 0 stalled for more than 10465msec
radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000009ca0 last fence id 0x0000000000009cc3 on ring 0)
radeon 0000:01:00.0: Saved 1113 dwords of commands on ring 0.
radeon 0000:01:00.0: GPU softreset: 0x00000008
radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020186
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80028645
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0: GPU pci config reset
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume failed
radeon 0000:01:00.0: Wait for MC idle timedout !
radeon 0000:01:00.0: Wait for MC idle timedout !
[drm] PCIE GART of 512M enabled (table at 0x0000000000254000).
radeon 0000:01:00.0: WB enabled
radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000010000c00 and cpu addr 0xffff8800b4c0cc00
radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0xffffc900058921d0
[drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xFFFFFFFF)
[drm:r600_resume [radeon]] *ERROR* r600 startup failed on resume
[drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume failed
May 24 19:20:32 linux.site kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10034msec
May 24 19:20:32 linux.site kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000009ca0 last fence id 0x0000000000009cc3 on ring 0)
...

I hope this helps to track down and fix this issue. If there is something I can test or help otherwise, please let me know.

Cheers,
Armin
Comment 262 Armin Wehrfritz 2015-05-24 17:23:22 UTC
I just stumbled across this bug report, where similar lockups were reported for different GPUs:
https://bugzilla.kernel.org/show_bug.cgi?id=85421

From there I understood that the mesa version may have an affect on this. Currently I use Mesa 10.3.7, so my question:
Could an upgrade to the latest stable mesa help with this issues?
Comment 263 Kajzer 2015-07-02 18:39:26 UTC
(In reply to Alex Deucher from comment #251)
> (In reply to Kajzer from comment #250)
> > With kernels 3.17 and 3.18 I have freezes when playing some games, those
> > freezes can happen somewhere between 10 minutes and 1 hour.
> > Freeze occur only in games.
> > With kernel 3.16 everything is normal and there are no freezes.
> > Latest patch (auto->high and radeon.dpm=1) is used in each of those kernels.
> > Everything else is exactly the same, just booting with different kernel
> > makes a difference.
> 
> Can you bisect?

Sorry for delay.
In the meantime I tried every revision of kernel and this bug happened in each.
Last known good kernel was 3.16.7.
Here's how I started bisect :

git bisect start -- drivers/gpu/drm/radeon
git bisect good v3.16
git bisect bad v3.17

Results :

$ git bisect log
git bisect start '--' 'drivers/gpu/drm/radeon/'
# bad: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
git bisect bad bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
# good: [03f62abd112d5150b6ce8957fa85d4f6e85e357f] drm/radeon: split PT setup in more functions
git bisect good 03f62abd112d5150b6ce8957fa85d4f6e85e357f
# bad: [52da51f0f9ea9d213adfc99223630707b26d1d38] drm/radeon: fix active_cu mask on SI and CIK after re-init (v3)
git bisect bad 52da51f0f9ea9d213adfc99223630707b26d1d38
# good: [6e909f74db2aa9c5b5606b81efcbe18f2749b008] drm/radeon: add bapm module parameter
git bisect good 6e909f74db2aa9c5b5606b81efcbe18f2749b008
# good: [c8ad8b563c7e724e2fedc3aee5bcbd401668474c] drm/radeon: Remove duplicate include from Makefile
git bisect good c8ad8b563c7e724e2fedc3aee5bcbd401668474c
# good: [73ef0e0d62de4a8d40d34a6f645faee2f6e1ac33] drm/radeon: fix display handling in radeon_gpu_reset
git bisect good 73ef0e0d62de4a8d40d34a6f645faee2f6e1ac33
# good: [cd1c9c1a4b06d3bc264e774ad84c410ce02e124e] drm/radeon: re-enable selective GPUVM flushing
git bisect good cd1c9c1a4b06d3bc264e774ad84c410ce02e124e
# bad: [6101b3ae94b4f266456308824e9ca4eab1235d1a] drm/radeon: fix active cu count for SI and CIK
git bisect bad 6101b3ae94b4f266456308824e9ca4eab1235d1a
# first bad commit: [6101b3ae94b4f266456308824e9ca4eab1235d1a] drm/radeon: fix active cu count for SI and CIK
# bad: [6101b3ae94b4f266456308824e9ca4eab1235d1a] drm/radeon: fix active cu count for SI and CIK

git bisect visualize 
commit 6101b3ae94b4f266456308824e9ca4eab1235d1a
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Tue Aug 19 11:54:15 2014 -0400

    drm/radeon: fix active cu count for SI and CIK
    
    This fixes the CU count reported to userspace for
    OpenCL.
    
    bug:
    https://bugzilla.kernel.org/show_bug.cgi?id=82581
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
    Cc: stable@vger.kernel.org
Comment 264 Alex Deucher 2015-07-02 19:06:11 UTC
(In reply to Kajzer from comment #263)
> git bisect visualize 
> commit 6101b3ae94b4f266456308824e9ca4eab1235d1a
> Author: Alex Deucher <alexander.deucher@amd.com>
> Date:   Tue Aug 19 11:54:15 2014 -0400
> 
>     drm/radeon: fix active cu count for SI and CIK
>     
>     This fixes the CU count reported to userspace for
>     OpenCL.
>     
>     bug:
>     https://bugzilla.kernel.org/show_bug.cgi?id=82581
>     
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>     Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
>     Cc: stable@vger.kernel.org

Something must have gone wrong in your bisect.  That commit does not affect your chip.  Perhaps the bug is somewhat hard to reproduce which caused you to mis-mark some commits as good.
Comment 265 Kajzer 2015-07-02 20:46:12 UTC
(In reply to Alex Deucher from comment #264)
> (In reply to Kajzer from comment #263)
> > git bisect visualize 
> > commit 6101b3ae94b4f266456308824e9ca4eab1235d1a
> > Author: Alex Deucher <alexander.deucher@amd.com>
> > Date:   Tue Aug 19 11:54:15 2014 -0400
> > 
> >     drm/radeon: fix active cu count for SI and CIK
> >     
> >     This fixes the CU count reported to userspace for
> >     OpenCL.
> >     
> >     bug:
> >     https://bugzilla.kernel.org/show_bug.cgi?id=82581
> >     
> >     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >     Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
> >     Cc: stable@vger.kernel.org
> 
> Something must have gone wrong in your bisect.  That commit does not affect
> your chip.  Perhaps the bug is somewhat hard to reproduce which caused you
> to mis-mark some commits as good.

Could be the case. Bug is hard to reproduce because it happens only in games and at random intervals. And you can't idle it. Sadly I don't have much time for gaming now. I'll try it one more time, more thoroughly.
One question though, if I'm 100% positive that v3.16 is good, should I skip bisects (mark them as good) if they end up as 3.16.0+ image after bisect ?
Thus testing only v3.17 bisects.
Because out of 7 steps 3 were from 3.17, it can save a lot of time if I could skip 3.16 bisects.
Comment 266 Alex Deucher 2015-07-02 20:48:32 UTC
(In reply to Kajzer from comment #265)
> Could be the case. Bug is hard to reproduce because it happens only in games
> and at random intervals. And you can't idle it. Sadly I don't have much time
> for gaming now. I'll try it one more time, more thoroughly.
> One question though, if I'm 100% positive that v3.16 is good, should I skip
> bisects (mark them as good) if they end up as 3.16.0+ image after bisect ?
> Thus testing only v3.17 bisects.
> Because out of 7 steps 3 were from 3.17, it can save a lot of time if I
> could skip 3.16 bisects.

No, you can't skip them.  3.16.0+ just means 3.16.0 plus additional commits on top of it.  Once of those commits may be a problematic one.
Comment 267 Ilia Mirkin 2015-07-02 20:53:15 UTC
(In reply to Kajzer from comment #265)
> One question though, if I'm 100% positive that v3.16 is good, should I skip
> bisects (mark them as good) if they end up as 3.16.0+ image after bisect ?

On the bright side, you can be sure that all the bad commits are bad, and you can just remark all the bad commits as bad up-front.
Comment 268 Kajzer 2015-07-02 21:29:27 UTC
(In reply to Alex Deucher from comment #266)
> No, you can't skip them.  3.16.0+ just means 3.16.0 plus additional commits
> on top of it.  Once of those commits may be a problematic one.

Thanks, I get it now.
I'll do it right this time and it won't be long before I finish it.
Comment 269 Kajzer 2015-07-05 22:11:54 UTC
Done :

git bisect start '--' 'drivers/gpu/drm/radeon'
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
# bad: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
git bisect bad bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9
# bad: [03f62abd112d5150b6ce8957fa85d4f6e85e357f] drm/radeon: split PT setup in more functions
git bisect bad 03f62abd112d5150b6ce8957fa85d4f6e85e357f
# bad: [391bfec33cd4e103274f197924d41ef648b849de] drm/radeon: remove visible vram size limit on bo allocation (v4)
git bisect bad 391bfec33cd4e103274f197924d41ef648b849de
# good: [da9976206c15178eeae1b4445c9266125bf35b0a] drm/radeon: enable display scaling on all connectors (v2)
git bisect good da9976206c15178eeae1b4445c9266125bf35b0a
# good: [380670aebfca998bb67b9cf05fc7f28ebeac4b18] drm/radeon: Demote 'BO allocation size too large' message to debug only
git bisect good 380670aebfca998bb67b9cf05fc7f28ebeac4b18
# bad: [02376d8282b88f07d0716da6155094c8760b1a13] drm/radeon: Allow write-combined CPU mappings of BOs in GTT (v2)
git bisect bad 02376d8282b88f07d0716da6155094c8760b1a13
# good: [77497f2735ad6e29c55475e15e9790dbfa2c2ef8] drm/radeon: Pass GART page flags to radeon_gart_set_page() explicitly
git bisect good 77497f2735ad6e29c55475e15e9790dbfa2c2ef8
# first bad commit: [02376d8282b88f07d0716da6155094c8760b1a13] drm/radeon: Allow write-combined CPU mappings of BOs in GTT (v2)

commit 02376d8282b88f07d0716da6155094c8760b1a13
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Thu Jul 17 19:01:08 2014 +0900

    drm/radeon: Allow write-combined CPU mappings of BOs in GTT (v2)
    
    v2: fix rebase onto drm-fixes
    
    Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Comment 270 Michel Dänzer 2015-07-06 03:09:38 UTC
Kajzer, I don't think this has anything to do with the original problem reported here anymore but should be tracked in a separate report.

(In reply to Kajzer from comment #269)
> # first bad commit: [02376d8282b88f07d0716da6155094c8760b1a13] drm/radeon:
> Allow write-combined CPU mappings of BOs in GTT (v2)
> 
> commit 02376d8282b88f07d0716da6155094c8760b1a13
> Author: Michel Dänzer <michel.daenzer@amd.com>
> Date:   Thu Jul 17 19:01:08 2014 +0900
> 
>     drm/radeon: Allow write-combined CPU mappings of BOs in GTT (v2)

Are you 100% (or at least 99.99...% :) sure that the problem doesn't happen without this commit?
Comment 271 Kajzer 2015-07-06 11:05:46 UTC
(In reply to Michel Dänzer from comment #270)
> Are you 100% (or at least 99.99...% :) sure that the problem doesn't happen
> without this commit?

No I'm not, that's what bisect gave me, I don't know which one is it.
But I'm 100% sure the problem doesn't happen with kernel 3.16
Out of 7 bisects 4 were bad and 3 were good.
Each bad bisect happened within an hour of game play, as for good ones I marked them as good after 4+ hours of active play. In my experience with this bug that's more than enough to say it's good.
Comment 272 Kajzer 2015-07-06 11:15:03 UTC
(In reply to Michel Dänzer from comment #270)
> Kajzer, I don't think this has anything to do with the original problem
> reported here anymore but should be tracked in a separate report.

I missed this in my previous reply....
Problem manifest itself exactly the same as this dpm problem.
I suspect your commit has nothing to do with dpm but it is what it is.
I don't know what else I can do to help with this.
Comment 273 Michel Dänzer 2015-07-07 02:33:49 UTC
(In reply to Kajzer from comment #272)
> Problem manifest itself exactly the same as this dpm problem.

The original description of this report says: "screen becomes blank after grub trying to boot it". Please file your own report.


> I don't know what else I can do to help with this.

Please run a kernel built from commit 77497f2735ad6e29c55475e15e9790dbfa2c2ef8 (the commit before 02376d8282b88f07d0716da6155094c8760b1a13) for at least a few days to make sure it doesn't happen with that.
Comment 274 Kajzer 2015-07-07 14:47:39 UTC
(In reply to Michel Dänzer from comment #273)
> The original description of this report says: "screen becomes blank after
> grub trying to boot it". Please file your own report.
 
When it happens screen freeze for a few seconds, then it goes blank for a few seconds, then it comes back with strange artifacts on the screen, system is basically unresponsive, the only thing you can do is a hard reset.


> Please run a kernel built from commit
> 77497f2735ad6e29c55475e15e9790dbfa2c2ef8 (the commit before
> 02376d8282b88f07d0716da6155094c8760b1a13) for at least a few days to make
> sure it doesn't happen with that.

That's gonna be a problem, I don't have that image anymore.

Anyway, it doesn't matter, seems like I'm the only one with this issue, or at least the only one reporting it. Main thing is that dpm finally works without problems with profile set to high, except (in my case) problem happens again, but only with kernels 3.17 and above and only when playing games. Could be mesa related somehow in combination with something introduced in kernel 3.17.
One thing I can say for sure, with kernels 3.16 and below mesa version doesn't matter, I tried every mesa from 10.0 to 10.6 and this problem never happened.
It's something in the kernel but I guess it's very tricky to determine which commit exactly. There are 57 commits between 3.16 and 3.17
Comment 275 Nicola Mori 2015-07-07 16:12:55 UTC
(In reply to Kajzer from comment #274)
> Anyway, it doesn't matter, seems like I'm the only one with this issue, or
> at least the only one reporting it.

You are not the only one, Kajzer. I have exactly the same issue on my RV620, but I don't have the possibility to bisect nor I know how to do it. I appreciate your efforts very much and I'm closely following your conversation with Michel hoping that in the end this nasty bug will be finally squashed. I was silent because comments like "me too" are useless, but this does not mean that nobody is interested in this issue (I'd say rather the opposite given the number of CC'ers).
Comment 276 Kajzer 2015-07-07 17:22:19 UTC
(In reply to Nicola Mori from comment #275)
> You are not the only one, Kajzer. I have exactly the same issue on my RV620,
> but I don't have the possibility to bisect nor I know how to do it. I
> appreciate your efforts very much and I'm closely following your
> conversation with Michel hoping that in the end this nasty bug will be
> finally squashed. I was silent because comments like "me too" are useless,
> but this does not mean that nobody is interested in this issue (I'd say
> rather the opposite given the number of CC'ers).

I understand, well I'm glad I'm not alone, it was starting to feel weird, so basically I decided to give up today. You gave me strength now to continue :)

@Michel Dänzer, I can do a bisect one more time, but I'm 99% sure I did it right.
If it's not a big problem, can you please tell me how I can either exclude your commit from 3.17 or add it to 3.16 ? Either way works and I can then definitely say if your commit was the one or not.
Comment 277 Rafał Miłecki 2015-07-07 17:47:54 UTC
(In reply to Kajzer from comment #276)
> I understand, well I'm glad I'm not alone, it was starting to feel weird, so
> basically I decided to give up today. You gave me strength now to continue :)

I can assure you there are many more followers of your fight :) I wish you luck with it!


> @Michel Dänzer, I can do a bisect one more time, but I'm 99% sure I did it
> right.
> If it's not a big problem, can you please tell me how I can either exclude
> your commit from 3.17 or add it to 3.16 ? Either way works and I can then
> definitely say if your commit was the one or not.

Don't start the bisect again, just try the commit Michel told about.

git reset --hard 77497f2735ad6e29c55475e15e9790dbfa2c2ef8

Then compile the kernel, install it & test for few days.
Comment 278 Alex Deucher 2015-07-07 18:08:20 UTC
Be careful.  There are two issues here:
1. The general instability of dpm on r6xx (what this bug is about)
2. A potential additional dpm stability issue perhaps introduced by the bisected commit

Solving the second one won't necessarily help the fix the first one.
Comment 279 Kajzer 2015-07-07 19:26:17 UTC
(In reply to Rafał Miłecki from comment #277) 
> Don't start the bisect again, just try the commit Michel told about.
> 
> git reset --hard 77497f2735ad6e29c55475e15e9790dbfa2c2ef8
> 
> Then compile the kernel, install it & test for few days.

This will work even if I no longer have git directory from that bisect ?
I mean, I'm on a clean system, starting all over again with :
$ git bisect start -- drivers/gpu/drm/radeon
$ git bisect good v3.16
$ git bisect bad v3.17
Bisecting: 57 revisions left to test after this (roughly 6 steps)
[03f62abd112d5150b6ce8957fa85d4f6e85e357f] drm/radeon: split PT setup in more functions
$ git reset --hard 77497f2735ad6e29c55475e15e9790dbfa2c2ef8
HEAD is now at 77497f2 drm/radeon: Pass GART page flags to radeon_gart_set_page() explicitly
Comment 280 Alex Deucher 2015-07-07 20:25:35 UTC
(In reply to Kajzer from comment #279)
> (In reply to Rafał Miłecki from comment #277) 
> > Don't start the bisect again, just try the commit Michel told about.
> > 
> > git reset --hard 77497f2735ad6e29c55475e15e9790dbfa2c2ef8
> > 
> > Then compile the kernel, install it & test for few days.
> 
> This will work even if I no longer have git directory from that bisect ?

yes.

> I mean, I'm on a clean system, starting all over again with :
> $ git bisect start -- drivers/gpu/drm/radeon
> $ git bisect good v3.16
> $ git bisect bad v3.17
> Bisecting: 57 revisions left to test after this (roughly 6 steps)
> [03f62abd112d5150b6ce8957fa85d4f6e85e357f] drm/radeon: split PT setup in
> more functions
> $ git reset --hard 77497f2735ad6e29c55475e15e9790dbfa2c2ef8
> HEAD is now at 77497f2 drm/radeon: Pass GART page flags to
> radeon_gart_set_page() explicitly

You don't need to start the bisect again.  `git bisect reset` will clean up the bisect and reset your current HEAD to where it was when started the bisect.  At that point just run `git reset --hard 77497f2735ad6e29c55475e15e9790dbfa2c2ef8` or 'git checkout -b testing 77497f2735ad6e29c55475e15e9790dbfa2c2ef8` to checkout the specific commit you want to test.  The second method creates a new branch called testing with HEAD set to the specified commit.  The reset command resets the HEAD of the current tree to the specified commit.
Comment 281 Kajzer 2015-07-07 20:45:19 UTC
Nice, I'll start compiling now and will run that image for a few days.
Comment 282 Kajzer 2015-07-07 21:45:28 UTC
(In reply to Alex Deucher from comment #280)
> You don't need to start the bisect again.  `git bisect reset` will clean up
> the bisect and reset your current HEAD to where it was when started the
> bisect.  At that point just run `git reset --hard
> 77497f2735ad6e29c55475e15e9790dbfa2c2ef8` or 'git checkout -b testing
> 77497f2735ad6e29c55475e15e9790dbfa2c2ef8` to checkout the specific commit
> you want to test.  The second method creates a new branch called testing
> with HEAD set to the specified commit.  The reset command resets the HEAD of
> the current tree to the specified commit.

I'm running now on kernel compiled with first method, if previous bisect was indeed done right I don't expect dpm freeze.

Anyway, thanks for the second method, if I understood you correctly, basically I can test if Michel's commit is bad or not by running :
$ git bisect reset
$ git checkout -b testing 02376d8282b88f07d0716da6155094c8760b1a13

Of course after I'm done with current test.
Please correct me if I'm wrong.
Comment 283 Alex Deucher 2015-07-07 22:40:36 UTC
(In reply to Kajzer from comment #282)
> I'm running now on kernel compiled with first method, if previous bisect was
> indeed done right I don't expect dpm freeze.
> 
> Anyway, thanks for the second method, if I understood you correctly,
> basically I can test if Michel's commit is bad or not by running :
> $ git bisect reset

You only need this if you had previously started a bisect (git bisect start).

> $ git checkout -b testing 02376d8282b88f07d0716da6155094c8760b1a13

that will create a new branch named testing.  If you already did that, you will need to delete the existing testing branch first before you create another branch with the same name.  So either delete the branch when you are done with it (git branch -D testing) or create a new branch name (git branch -b testing2 02376d8282b88f07d0716da6155094c8760b1a13).
Comment 284 Michel Dänzer 2015-07-08 01:54:27 UTC
(In reply to Kajzer from comment #274)
> (In reply to Michel Dänzer from comment #273)
> > The original description of this report says: "screen becomes blank after
> > grub trying to boot it". Please file your own report.
>  
> When it happens screen freeze for a few seconds, then it goes blank for a
> few seconds, then it comes back with strange artifacts on the screen, system
> is basically unresponsive, the only thing you can do is a hard reset.

Those are standard symptoms of a GPU hang and failed reset. Those symptoms can be caused by an unlimited number of different problems. Since you're getting those symptoms under different circumstances than those described in this report, we have to assume it's a different problem.

Please don't make me ask you one more time to file your own report. Every comment we add here about your problem is cluttering up this report even more than it already is. Thanks for your understanding.
Comment 285 Kajzer 2015-07-08 12:20:15 UTC
(In reply to Michel Dänzer from comment #284)
> Those are standard symptoms of a GPU hang and failed reset. Those symptoms
> can be caused by an unlimited number of different problems. Since you're
> getting those symptoms under different circumstances than those described in
> this report, we have to assume it's a different problem.
> 
> Please don't make me ask you one more time to file your own report. Every
> comment we add here about your problem is cluttering up this report even
> more than it already is. Thanks for your understanding.

Sure, no problem.
I thought it was the same thing because it behaved like this problem.

Anyway, I made a new report, it's here : https://bugs.freedesktop.org/show_bug.cgi?id=91268
Comment 286 Alex Deucher 2015-10-26 20:00:56 UTC
*** Bug 92662 has been marked as a duplicate of this bug. ***
Comment 287 bugs.freedesktop.org 2015-11-06 02:18:29 UTC
The following commands from comment #239 seems to have corrected the issue on my Raden HD3650 Mobility:

>> echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
>> echo performance > /sys/class/drm/card0/device/power_dpm_state

This seems to be equivalent to the patch in comment #244. Any chance of getting this patch into the kernel?

If anyone is willing to write a fix, as opposed to a workaround, for this issue, I would be happy to test it on my device.
Comment 288 Zetok 2016-09-01 20:52:37 UTC
Created attachment 126164 [details] [review]
fixes GPU freeze by reverting 02376d8282b88f07d0716da6155094c8760b1a13 on 4.6.3, tested with r9 290

(In reply to Kajzer from comment #269)
> Done :
> 
> git bisect start '--' 'drivers/gpu/drm/radeon'
> # good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
> git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
> # bad: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
> git bisect bad bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9
> # bad: [03f62abd112d5150b6ce8957fa85d4f6e85e357f] drm/radeon: split PT setup
> in more functions
> git bisect bad 03f62abd112d5150b6ce8957fa85d4f6e85e357f
> # bad: [391bfec33cd4e103274f197924d41ef648b849de] drm/radeon: remove visible
> vram size limit on bo allocation (v4)
> git bisect bad 391bfec33cd4e103274f197924d41ef648b849de
> # good: [da9976206c15178eeae1b4445c9266125bf35b0a] drm/radeon: enable
> display scaling on all connectors (v2)
> git bisect good da9976206c15178eeae1b4445c9266125bf35b0a
> # good: [380670aebfca998bb67b9cf05fc7f28ebeac4b18] drm/radeon: Demote 'BO
> allocation size too large' message to debug only
> git bisect good 380670aebfca998bb67b9cf05fc7f28ebeac4b18
> # bad: [02376d8282b88f07d0716da6155094c8760b1a13] drm/radeon: Allow
> write-combined CPU mappings of BOs in GTT (v2)
> git bisect bad 02376d8282b88f07d0716da6155094c8760b1a13
> # good: [77497f2735ad6e29c55475e15e9790dbfa2c2ef8] drm/radeon: Pass GART
> page flags to radeon_gart_set_page() explicitly
> git bisect good 77497f2735ad6e29c55475e15e9790dbfa2c2ef8
> # first bad commit: [02376d8282b88f07d0716da6155094c8760b1a13] drm/radeon:
> Allow write-combined CPU mappings of BOs in GTT (v2)
> 
> commit 02376d8282b88f07d0716da6155094c8760b1a13
> Author: Michel Dänzer <michel.daenzer@amd.com>
> Date:   Thu Jul 17 19:01:08 2014 +0900
> 
>     drm/radeon: Allow write-combined CPU mappings of BOs in GTT (v2)
>     
>     v2: fix rebase onto drm-fixes
>     
>     Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
>     Reviewed-by: Christian König <christian.koenig@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Thank you for the great work with bisecting!

My box has been having ~constant hangs while playing games on all the kernels I've used with my new GPU, R9 290. The behavior seems to have gotten worse over newer mesa/kernel versions, to the point where playing just a few minutes could result in a "hard" system hang – black screen, no response to `reisub`. Or it could "just" result in radeon driver crash… Well, from that it was at least possible to reboot with `reisub`. Not that it was a nice thing.

Anyway, the kernel on which it was reproducible usually within <30min of play (quite often <10 min of play) was 4.6.3.

I've reverted 02376d8282b88f07d0716da6155094c8760b1a13 on checked out 4.6.3, copied my Gentoo kernel config, and to my surprise, my resolving of revert conflicts not only compiled, but booted, and after a few (>5) hours of playing, I can say that I'm fairly sure that revert makes hangs disappear. Of course I'll be further "testing", but that's it for me today.

Slight note regarding performance – I've noticed that when playing with patch reverted there sometimes occur slight microstutters – i.e. tops 0.2-0.3s long, barely noticeable, and not a problem, given that they were happening rarely, and box finally was not freezing into oblivion when playing games.


Anyway; there are 2 problems I have with GPU – one is that resuming DPM fails which causes slight (10s) freeze, and the other that the reverted patch introduced, is freeze/crashing once resuming DPM fails. Crashing/freeze is gone after revert, but the resuming DPM still fails. Given that freeze/crash is gone I don't really care though.

I'm attaching the patch that reverts 02376d8282b88f07d0716da6155094c8760b1a13 for 4.6.3.

I'll also attach the dmsg output without commit reverted, where it crashed(?), and dmsg output with commit reverted where resuming DPM still fails, but the freeze/crash is gone.

Note that I don't know C, and I have no idea what reverting the patch actually does (aside from fixing stuff for me).



Btw,

(In reply to bugs.freedesktop.org from comment #287)
> The following commands from comment #239 seems to have corrected the issue
> on my Raden HD3650 Mobility:
> 
> >> echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
> >> echo performance > /sys/class/drm/card0/device/power_dpm_state
> 
> This seems to be equivalent to the patch in comment #244. Any chance of
> getting this patch into the kernel?
> 
> If anyone is willing to write a fix, as opposed to a workaround, for this
> issue, I would be happy to test it on my device.

Yeah, no, that didn't help a bit.
Comment 289 Zetok 2016-09-01 20:57:42 UTC
Created attachment 126165 [details]
dmesg without reverted 02376d8282b88f07d0716da6155094c8760b1a13 on 4.6.3
Comment 290 Zetok 2016-09-01 21:01:06 UTC
Created attachment 126166 [details]
dmesg with reverted 02376d8282b88f07d0716da6155094c8760b1a13 on 4.6.3
Comment 291 Weber K. 2016-09-20 01:29:49 UTC
Hi!

I have HD 6850 and Kernel 4.4.14.

This problem appeared for me when I changed rootflags.
Solved with rootflags=relatime,lazytime,commit=60 in kernel parameters.

HTH

Best regards
Weber Kai
Comment 292 Weber K. 2016-09-20 01:45:55 UTC
(In reply to Weber K. from comment #291)
> Hi!
> 
> I have HD 6850 and Kernel 4.4.14.
> 
> This problem appeared for me when I changed rootflags.
> Solved with rootflags=relatime,lazytime,commit=60 in kernel parameters.
> 
> HTH
> 
> Best regards
> Weber Kai

Forgot to mention: And relatime,lazytime,commit=60 in fstab
I believe maybe dpm need some fs information to work well.
Comment 293 Armin Wehrfritz 2016-12-27 15:54:34 UTC
I have been running a AMD/ATI RV635 with dpm enabled for over a day now without any issues. I also suspended (to memory) and resumed the system several times without problems.

My system informations:
System:        Lenovo Thinkpad T500
Graphics card: Mobility Radeon HD 3650 (RV635/M86)
OS:            openSUSE Leap 42.2 
Kernel:        4.4.36-8-default
OpenGL renderer string: Gallium 0.4 on AMD RV635 (DRM 2.43.0, LLVM 3.8.0)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.2.2

As you can see from my earlier comment (https://bugs.freedesktop.org/show_bug.cgi?id=66963#c261), I have been testing various dpm configurations before on the same system without much success. However, apparently with the update stack this issues doesn't appear any more - will do some more testing though to further verify this.

Please let me know if you need any further information.

Cheers,
Armin
Comment 294 Kyle K 2017-04-21 07:23:18 UTC
Hello,

I have been having problem with my HD6870 for some time. I experienced it on 4.7/4.8/4.9 kernels (haven't tested earlier). My HD6870 works great out of the box with DPM enabled, I do not have any issues for general use.

I do have a problem with applications that put heavier load on GPU, for example MPV with hw acceleration enabled. My R600 tries to upclock from 300Mhz base d efault but fails with:
Apr 21 00:34:22 xenon kernel: [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed
Apr 21 00:34:33 xenon kernel: [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed

I can reproduce this every time with:
$ mpv --hwdec=vdpau --vo=vdpau <foo.mkv>

The video playback follows but it is very choppy as you would imagine.

Attempting to use VAAPI pipeline for hw decoding results in hard system crash with blank screen after just few seconds of playback. I can reproduce it always with:
$ mpv --hwdec=vaapi --vo=vaapi <foo.mkv>


I'm booting with radeon.dpm=0 as a workaround, vdpau hw decoding is butter smooth and playback in mpv is very reliable, however temps and noise is not as great :(


I used to run Arch on latest 4.10 kernel but recently I'm playing around with Solus so I'm on 4.9.22, here are stock settings with DPM enabled on Solus system that I have this problem:

root@xenon /home/kyle # cat /proc/version 
Linux version 4.9.22-17.lts (root@solus-build-server) (gcc version 6.3.0 (Solus) ) #1 SMP Sat Apr 15 06:05:30 UTC 2017

root@xenon /home/kyle # grep . /sys/module/radeon/parameters/*
/sys/module/radeon/parameters/agpmode:0
/sys/module/radeon/parameters/aspm:-1
/sys/module/radeon/parameters/audio:-1
/sys/module/radeon/parameters/auxch:-1
/sys/module/radeon/parameters/backlight:-1
/sys/module/radeon/parameters/bapm:-1
/sys/module/radeon/parameters/benchmark:0
/sys/module/radeon/parameters/connector_table:0
/sys/module/radeon/parameters/deep_color:0
/sys/module/radeon/parameters/disp_priority:0
/sys/module/radeon/parameters/dpm:-1
/sys/module/radeon/parameters/dynclks:-1
/sys/module/radeon/parameters/fastfb:0
/sys/module/radeon/parameters/gartsize:1024
/sys/module/radeon/parameters/hard_reset:0
/sys/module/radeon/parameters/hw_i2c:0
/sys/module/radeon/parameters/lockup_timeout:10000
/sys/module/radeon/parameters/modeset:1
/sys/module/radeon/parameters/msi:-1
/sys/module/radeon/parameters/mst:0
/sys/module/radeon/parameters/no_wb:0
/sys/module/radeon/parameters/pcie_gen2:-1
/sys/module/radeon/parameters/r4xx_atom:0
/sys/module/radeon/parameters/runpm:-1
/sys/module/radeon/parameters/test:0
/sys/module/radeon/parameters/tv:1
/sys/module/radeon/parameters/use_pflipirq:2
/sys/module/radeon/parameters/uvd:1
/sys/module/radeon/parameters/vce:1
/sys/module/radeon/parameters/vm_block_size:12
/sys/module/radeon/parameters/vm_size:8
/sys/module/radeon/parameters/vramlimit:0

root@xenon /home/kyle # md5sum /lib/firmware/radeon/R600_*
f2432caf487c4b586a2c391435f3749c  /lib/firmware/radeon/R600_me.bin
448dbf1df580c31a0e55de22bb076be3  /lib/firmware/radeon/R600_pfp.bin
f74a5163948bde215be6b689ca24afde  /lib/firmware/radeon/R600_rlc.bin
9bc76ae83f9326debf728f98803a7e11  /lib/firmware/radeon/R600_uvd.bin

libdrm, version: 2.4.76, release: 16



Alex, I do not know if you're still keeping eye on this Bug but your attention and effort would be hugely appreciated!

Thank you
Comment 295 Alex Deucher 2017-04-21 13:18:48 UTC
(In reply to Kyle K from comment #294)
> Hello,
> 
> I have been having problem with my HD6870 for some time. I experienced it on
> 4.7/4.8/4.9 kernels (haven't tested earlier). My HD6870 works great out of
> the box with DPM enabled, I do not have any issues for general use.

Please file your own bug.  This bug is for dpm on r6xx hardware.  Your hardware is NI.
Comment 296 Mihai Coman 2017-11-08 00:32:43 UTC
Hello,

On a whim I've decided to try radeon.dpm=1 again. I've been running it for the past couple of days and I haven't had gotten any lockups, and I used to get them in the first 30 min. The temp. decrease is considerable, from 10 to 20 degrees C.

Can anyone else still running this old hardware try it again, to confirm?

I'm running;
Linux icxbox 4.13.11-1-ARCH
Mobility Radeon HD 3650 on HP EliteBook 8530w.
Comment 297 Nicola Mori 2017-11-08 17:50:36 UTC
@Mihai I'm using dpm since years on my Mobility 3470 and I only experience random freezes very rarely. The only thing that's really annoying for me is this:

  https://bugs.freedesktop.org/show_bug.cgi?id=94933

but it seems that it's not going to be fixed. Actually, all the remaining r6xx dpm bugs will likely stay here forever, given the shifted focus of AMD developer towards newer architecture and maybe the lack of hardware for testing. Many dpm bug report threads end when the opener says he's using dpm, with developers suddenly ceasing to reply. This sounds a lot like a "WONTFIX" to me, and I perfectly understand the devs.
Comment 298 Martin Peres 2019-11-19 08:36:07 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/334.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.