Bug 38488 - AMD Radeon HD 6950 (Cayman): Power profile has no effect after resume from hibernation
Summary: AMD Radeon HD 6950 (Cayman): Power profile has no effect after resume from hi...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-20 01:32 UTC by Harald Judt
Modified: 2011-11-11 13:53 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Sapphire Radeon HD6950 2GiB Video BIOS (63.00 KB, application/octet-stream)
2011-06-23 11:18 UTC, Harald Judt
no flags Details

Description Harald Judt 2011-06-20 01:32:26 UTC
Power profile does not work after resuming.

Steps to reproduce:

1) Boot system, set power profile to low.

$ cat /sys/kernel/debug/dri/64/radeon_pm_info:
default engine clock: 800000 kHz
current engine clock: 249990 kHz
default memory clock: 1250000 kHz
current memory clock: 150000 kHz
voltage: 900 mV

2) Hibernate system and resume.

Expected results:

* Power profile still set to low, clocks stay at low level.

* Power profile can be changed, and changes have desired effects.

Actual results:

* Power profile still shows "low", but according to debugfs clocks are at default levels:

    default engine clock: 800000 kHz
    current engine clock: 799940 kHz
    default memory clock: 1250000 kHz
    current memory clock: 1250000 kHz
    voltage: 65281 mV

Voltage does seem strange too, when regarding the 900 mV above.

* Power profile can be changed, but clocks will not change.

* Fan is spinning louder due to the higher (default instead of low level) clock speeds.

I have to admit that I modified and applied tuxonice patch to linux-3.0-rc3, but on a laptop at work I use exactly the same and everything works correctly there (ATI Mobility Radeon HD 3400), so I don't believe I messed up. Cayman support is still quite new...

Is there another way to change the clock speeds?
Comment 1 Roland Scheidegger 2011-06-22 09:24:37 UTC
Are you sure you can't change clocks after suspend?
I know that after a gpu crash the clocks will be at default, but profile still incorrectly showing it's at low, could be the same after resume. Which means to actually get it to switch to low again you need to first switch it to mid/high.
Comment 2 Harald Judt 2011-06-22 15:19:54 UTC
Thanks for your response. Yes, I'm sure. I've already tried what you suggested, it doesn't matter. Additionally, the display will not flicker when changing the power profile after resume, in contrast to before hibernation.

low:
default engine clock: 800000 kHz
current engine clock: 249990 kHz
default memory clock: 1250000 kHz
current memory clock: 150000 kHz
voltage: 900 mV

medium:
default engine clock: 800000 kHz
current engine clock: 499990 kHz
default memory clock: 1250000 kHz
current memory clock: 1250000 kHz
voltage: 1000 mV

default:
default engine clock: 800000 kHz
current engine clock: 799940 kHz
default memory clock: 1250000 kHz
current memory clock: 1250000 kHz
voltage: 1000 mV

high:
Not possible, machine freezes completely. Not even magic sysreq keys work anymore, hard reset required.

Is there any other way to change the clock speeds, without using power_profile?

As explained before, something seems to get messed up by standby/resume (see strange voltage readings in my initial report). This *could* be caused by me applying self-modified tuxonice patch. I will have to test hibernation in vanilla kernel to exclude this possibility.

On the other hand, power_profile 'high' doesn't work at all, and I don't think this has anything to do with suspend/hibernation or the tuxonice patch. Shall I open another bug report for this?

I understand that cayman support is still in development.
Comment 3 Alex Deucher 2011-06-22 15:26:04 UTC
The funny voltage value isn't a real value it's a flag for the driver.  This patch should fix up that issue:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a377e187df725fe7e62d2cec59ec290c5a605d93
Comment 4 Harald Judt 2011-06-23 02:36:33 UTC
> The funny voltage value isn't a real value it's a flag for the driver.
> This patch should fix up that issue:
[...]

I had this patch already applied, but this seems to fix a different issue not related to resuming.

I reproduced the problem on git vanilla kernel now (bccaeafd7c117acee36e90d37c7e05c19be9e7bf) using in-kernel suspend. I didn't check whether this has your patch already applied, but I don't think it would change anything here.

So there are two things to conclude:
* The strange voltage reading and the power_profile malfunction are both caused by doing hibernate/resume.
* The self-modified tuxonice patch is not the culprit, as the symptoms are the same with in-kernel suspend.

Therefore, something is not right with the cayman power management code.

Would it help if I provide drm.debug information? Any special parameters required for drm.debug?
Comment 5 Alex Deucher 2011-06-23 08:00:34 UTC
Please attach a copy of your vbios.
(as root)
(use lspci to get the bus id)
cd /sys/bus/pci/devices/<pci bus id>
echo 1 > rom
cat rom > /tmp/vbios.rom
echo 0 > rom
Comment 6 Harald Judt 2011-06-23 11:18:38 UTC
Created attachment 48350 [details]
Sapphire Radeon HD6950 2GiB Video BIOS

lspci:

01:00.0 VGA compatible controller: ATI Technologies Inc Device 6719 (prog-if 00 [VGA controller])
	Subsystem: ATI Technologies Inc Device 0b00
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 54
	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at fe620000 (64-bit, non-prefetchable) [size=128K]
	Region 4: I/O ports at e000 [size=256]
	Expansion ROM at fe600000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee0100c  Data: 41c9
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: radeon
Comment 7 Harald Judt 2011-06-23 11:21:19 UTC
A strange thing: Accidentally, I put the computer into suspend mode instead of hibernation. Resume was successful, and the power profile and clock readings were ok this time.

So suspend-to-ram seems to work correctly, while suspend-to-disk does not.
Comment 8 Alex Deucher 2011-06-23 12:03:35 UTC
(In reply to comment #7)
> So suspend-to-ram seems to work correctly, while suspend-to-disk does not.

Strange.  The driver doesn't differentiate between the two.
Comment 9 Harald Judt 2011-06-23 12:55:28 UTC
Yes, but it gets even stranger:

Do hibernate & resume --> power_profile does not work, radeon_pm_info messed up. *Now* do suspend & resume --> power_profile works again, radeon_pm_info too! So what can be wrong here?
Comment 10 Harald Judt 2011-06-28 11:24:32 UTC
I've updated the kernel to current tuxonice-head which is somewhere in between linux-3.0-rc4 and linux-3.0-rc5, and the strange voltage issue has been fixed. Resuming from hibernation still produces the same problem with not being able to change the clock speeds, though.

Furthermore, the machine will not awake correctly after the second or third suspend attempt; the screen stays black.
Comment 11 Harald Judt 2011-07-30 04:27:57 UTC
Ok, another update, including some positive results:

Compiled updated kernel based on 3.0 final:
http://git.kernel.org/?p=linux/kernel/git/nigelc/tuxonice-head.git;a=summary
merged branch git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-core-next @5a96a899bbdee86024ab9ea6d02b9e242faacbed

* Hibernation & resume still does not work correctly, leaving the power profiles unfunctional

* Suspend & resume now works reliably and can be performed successfully multiple times after hibernation & resume, fixing the power profile issue caused by resume from hibernation

> Strange.  The driver doesn't differentiate between the two.

Do you think it's a bug in the general hibernation / resume code, meaning I should ask somewhere else? What speaks against this is that the same config works on a laptop (ATI Mobility Radeon HD 3400) where there are no such issues, hence I concluded that this *might* be specific to HD6950/Cayman.
Comment 12 Harald Judt 2011-09-18 06:02:30 UTC
I'm still using 3.1-rc2, but this issue seems solved after applying some patches I grabbed from the DRI mailing list, though I don't know exactly which one(s).

So everything looks fine now, but if the problems returns when 3.1 gets released, then I can will report back here. For the time being, consider it solved.

Thank you.
Comment 13 Harald Judt 2011-11-11 13:53:00 UTC
Definitely solved in kernel-3.2-rc1, therefore setting resolved fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.