Bug 110865 - Rx480 consumes 20w more power in idle than under Windows
Summary: Rx480 consumes 20w more power in idle than under Windows
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: Other All
: medium enhancement
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-09 13:22 UTC by Martin
Modified: 2019-08-20 13:41 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
logfiles as requested in the amd bugreport guide (372.83 KB, text/x-log)
2019-06-09 13:22 UTC, Martin
no flags Details
possible fix (1.48 KB, patch)
2019-08-08 05:49 UTC, Alex Deucher
no flags Details | Splinter Review
fix DC code (2.31 KB, patch)
2019-08-08 14:08 UTC, Alex Deucher
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Martin 2019-06-09 13:22:01 UTC
Created attachment 144485 [details]
logfiles as requested in the amd bugreport guide

First I am not sure where to file that bug, so please be gentle with me, if I selected the wrong component.

I noticed for a while higher temperatures of my Videocard when my pc was just idling with gnome. Then I dug deeper and found out that my "zero fan" videocard does not stop the fan when I run Linux.

So I ran this line here:
watch -n 0.5 cat /sys/kernel/debug/dri/0/amdgpu_pm_info
and it showed me that the MCLK does not clock down to 300MHz as it does with Windows 10. 
GFX Clocks and Power:
	2000 MHz (MCLK)
	300 MHz (SCLK)
	300 MHz (PSTATE_SCLK)
	300 MHz (PSTATE_MCLK)
	1000 mV (VDDGFX)
	24.75 W (average GPU)

GPU Temperature: 45 C
GPU Load: 0 %

I have a multimonitor setup with two 1920x1200 pixel screens. When I use Windows 10, the MCLK does not go beyond 300MHz when the desktop is idling. (measured with hwmonitor) 
When I power-off one screen under linux the (average GPU) goes down to 8-10W and the MCLK drops to 300MHz, so the card can clock down, but is somehow prohibited by the driver or configuration?

I followed this bug report guide from amd:
https://www.amd.com/en/support/kb/faq/amdgpu-installation#faq-Reporting-Bugs
and attached several logfiles.
Comment 1 Martin 2019-08-07 09:53:21 UTC
My bug is now two months old, do you need more information, or what can I do to get your attention?

I think this is a serious issue, because it seems to affect a lot, maybe even all polaris cards. (tested two more in the last weeks).

Shouldn't it be a priority to stop the waste of so much energy?
Comment 2 Alex Deucher 2019-08-07 14:35:00 UTC
This is the expected behavior for multiple monitors on Linux.  mclk switching must happen in the monitors' blanking period.  Since they likely don't align, especially if the monitors have different timing, we have to use a fixed mclk.  The DC modesetting code can lock the timing of multiple monitors if they are using the exact same timing so that the blanking periods align, but I don't think the Linux power management code takes this into account at the moment.
Comment 3 Martin 2019-08-07 15:16:01 UTC
Thank you for your explanation.
How do I find out the blanking periods?
Comment 4 Alex Deucher 2019-08-08 05:46:59 UTC
(In reply to Martin from comment #3)
> Thank you for your explanation.
> How do I find out the blanking periods?

They are based on the timing for the mode on the display.  As for the relevant driver code, take a look at smu7_apply_state_adjust_rules().
Comment 5 Alex Deucher 2019-08-08 05:49:59 UTC
Created attachment 144978 [details] [review]
possible fix

Does this patch fix the issue?
Comment 6 Martin 2019-08-08 10:29:11 UTC
Sadly it did not help.
the MCLK is still fixed at 2000MHz.

How can I verify that I did everything correctly?
I just rebuilt Kernel 5.2.6 from Fedoras srpm and added the patch in the spec file.

Or could it be that I have two different 1920x1200 screens? one from HP and one from Dell?
Comment 7 Alex Deucher 2019-08-08 13:31:17 UTC
(In reply to Martin from comment #6)
> Sadly it did not help.
> the MCLK is still fixed at 2000MHz.
> 
> How can I verify that I did everything correctly?

You can add a printk to the patch to verify that it's being applied.  Maybe print the value of hwmgr->display_config->multi_monitor_in_sync to see if the monitors are synced or not.

> I just rebuilt Kernel 5.2.6 from Fedoras srpm and added the patch in the
> spec file.
> 
> Or could it be that I have two different 1920x1200 screens? one from HP and
> one from Dell?

That is likely the issue.  If the timings for the displays are slightly different, they won't be synced.  It could also be that the DC code doesn't set the multi_monitor_in_sync flag properly.
Comment 8 Alex Deucher 2019-08-08 13:37:57 UTC
looks like the DC code does not set up the multi_monitor_in_sync flag properly.
Comment 9 Alex Deucher 2019-08-08 14:08:48 UTC
Created attachment 144983 [details] [review]
fix DC code

Can you try applying both of these patches?  Assuming both of your monitors have the same timing this might work.
Comment 10 Martin 2019-08-08 14:45:53 UTC
Sorry for the delay, I had to figure out which kernel to use, because only Kernel 5.3.0-rc3 accepts your second patch. Stable 5.2.6 and .7 generate errors at :25
In about 2 hours I will have it built. 
You don't have a spare Ryzen 9 3900X for me to speed it up? Kernel building shows me quite drastically that my Haswell I7 is out of date ;)
Comment 11 Martin 2019-08-08 15:42:03 UTC
Kernel 5.3.0-rc3 does not boot on my system
It hangs at detecting the discs.
Comment 12 Dieter Nützel 2019-08-08 23:39:03 UTC
(In reply to Alex Deucher from comment #9)
> Created attachment 144983 [details] [review] [review]
> fix DC code
> 
> Can you try applying both of these patches?  Assuming both of your monitors
> have the same timing this might work.

Didn't apply on amd-staging-drm-next, too.
Comment 13 Dieter Nützel 2019-08-08 23:56:06 UTC
(In reply to Dieter Nützel from comment #12)
> (In reply to Alex Deucher from comment #9)
> > Created attachment 144983 [details] [review] [review] [review]
> > fix DC code
> > 
> > Can you try applying both of these patches?  Assuming both of your monitors
> > have the same timing this might work.
> 
> Didn't apply on amd-staging-drm-next, too.

BTW

Alex, is this the same problem?
My card never was below ~32 W (even with single monitor
but I have two identical HDMI 1920x1080)
PSTATE_xxxx is much higher than Martin's
didn't saw "zero fan" / zero core (no spinning fans)

Polaris 20 / 8GB Sapphire Radeon RX 580 Nitro+
single monitor

GFX Clocks and Power:
        300 MHz (MCLK)
        300 MHz (SCLK)
        600 MHz (PSTATE_SCLK)
        1000 MHz (PSTATE_MCLK)
        750 mV (VDDGFX)
        32.17 W (average GPU)

GPU Temperature: 31 C
GPU Load: 0 %

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:       +0.75 V  
fan1:         909 RPM  (min =    0 RPM, max = 3200 RPM)
temp1:        +30.0°C  (crit = +94.0°C, hyst = -273.1°C)
power1:       32.09 W  (cap = 175.00 W)
Comment 14 Alex Deucher 2019-08-09 14:59:12 UTC
(In reply to Dieter Nützel from comment #13)
> 
> Alex, is this the same problem?

No.

> 
> GFX Clocks and Power:
>         300 MHz (MCLK)
>         300 MHz (SCLK)

Your mclk is going to a lower state when it's idle.
Comment 15 Martin 2019-08-20 13:41:45 UTC
Finally with rc5 of Kernel 5.3 I was able to boot the kernel, sadly your two patches did not lower the power consumption.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.