Bug 110865 - Rx480 consumes 20w more power in idle than under Windows
Summary: Rx480 consumes 20w more power in idle than under Windows
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: Other All
: medium enhancement
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-09 13:22 UTC by Martin
Modified: 2019-10-08 00:57 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
logfiles as requested in the amd bugreport guide (372.83 KB, text/x-log)
2019-06-09 13:22 UTC, Martin
no flags Details
possible fix (1.48 KB, patch)
2019-08-08 05:49 UTC, Alex Deucher
no flags Details | Splinter Review
fix DC code (2.31 KB, patch)
2019-08-08 14:08 UTC, Alex Deucher
no flags Details | Splinter Review
fix DC code (2.63 KB, patch)
2019-08-22 16:17 UTC, Alex Deucher
no flags Details | Splinter Review
fix DC code (2.18 KB, patch)
2019-08-26 03:00 UTC, Alex Deucher
no flags Details | Splinter Review
xrandr -q (1.36 KB, text/x-log)
2019-08-27 18:09 UTC, Dieter Nützel
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin 2019-06-09 13:22:01 UTC
Created attachment 144485 [details]
logfiles as requested in the amd bugreport guide

First I am not sure where to file that bug, so please be gentle with me, if I selected the wrong component.

I noticed for a while higher temperatures of my Videocard when my pc was just idling with gnome. Then I dug deeper and found out that my "zero fan" videocard does not stop the fan when I run Linux.

So I ran this line here:
watch -n 0.5 cat /sys/kernel/debug/dri/0/amdgpu_pm_info
and it showed me that the MCLK does not clock down to 300MHz as it does with Windows 10. 
GFX Clocks and Power:
	2000 MHz (MCLK)
	300 MHz (SCLK)
	300 MHz (PSTATE_SCLK)
	300 MHz (PSTATE_MCLK)
	1000 mV (VDDGFX)
	24.75 W (average GPU)

GPU Temperature: 45 C
GPU Load: 0 %

I have a multimonitor setup with two 1920x1200 pixel screens. When I use Windows 10, the MCLK does not go beyond 300MHz when the desktop is idling. (measured with hwmonitor) 
When I power-off one screen under linux the (average GPU) goes down to 8-10W and the MCLK drops to 300MHz, so the card can clock down, but is somehow prohibited by the driver or configuration?

I followed this bug report guide from amd:
https://www.amd.com/en/support/kb/faq/amdgpu-installation#faq-Reporting-Bugs
and attached several logfiles.
Comment 1 Martin 2019-08-07 09:53:21 UTC
My bug is now two months old, do you need more information, or what can I do to get your attention?

I think this is a serious issue, because it seems to affect a lot, maybe even all polaris cards. (tested two more in the last weeks).

Shouldn't it be a priority to stop the waste of so much energy?
Comment 2 Alex Deucher 2019-08-07 14:35:00 UTC
This is the expected behavior for multiple monitors on Linux.  mclk switching must happen in the monitors' blanking period.  Since they likely don't align, especially if the monitors have different timing, we have to use a fixed mclk.  The DC modesetting code can lock the timing of multiple monitors if they are using the exact same timing so that the blanking periods align, but I don't think the Linux power management code takes this into account at the moment.
Comment 3 Martin 2019-08-07 15:16:01 UTC
Thank you for your explanation.
How do I find out the blanking periods?
Comment 4 Alex Deucher 2019-08-08 05:46:59 UTC
(In reply to Martin from comment #3)
> Thank you for your explanation.
> How do I find out the blanking periods?

They are based on the timing for the mode on the display.  As for the relevant driver code, take a look at smu7_apply_state_adjust_rules().
Comment 5 Alex Deucher 2019-08-08 05:49:59 UTC
Created attachment 144978 [details] [review]
possible fix

Does this patch fix the issue?
Comment 6 Martin 2019-08-08 10:29:11 UTC
Sadly it did not help.
the MCLK is still fixed at 2000MHz.

How can I verify that I did everything correctly?
I just rebuilt Kernel 5.2.6 from Fedoras srpm and added the patch in the spec file.

Or could it be that I have two different 1920x1200 screens? one from HP and one from Dell?
Comment 7 Alex Deucher 2019-08-08 13:31:17 UTC
(In reply to Martin from comment #6)
> Sadly it did not help.
> the MCLK is still fixed at 2000MHz.
> 
> How can I verify that I did everything correctly?

You can add a printk to the patch to verify that it's being applied.  Maybe print the value of hwmgr->display_config->multi_monitor_in_sync to see if the monitors are synced or not.

> I just rebuilt Kernel 5.2.6 from Fedoras srpm and added the patch in the
> spec file.
> 
> Or could it be that I have two different 1920x1200 screens? one from HP and
> one from Dell?

That is likely the issue.  If the timings for the displays are slightly different, they won't be synced.  It could also be that the DC code doesn't set the multi_monitor_in_sync flag properly.
Comment 8 Alex Deucher 2019-08-08 13:37:57 UTC
looks like the DC code does not set up the multi_monitor_in_sync flag properly.
Comment 9 Alex Deucher 2019-08-08 14:08:48 UTC
Created attachment 144983 [details] [review]
fix DC code

Can you try applying both of these patches?  Assuming both of your monitors have the same timing this might work.
Comment 10 Martin 2019-08-08 14:45:53 UTC
Sorry for the delay, I had to figure out which kernel to use, because only Kernel 5.3.0-rc3 accepts your second patch. Stable 5.2.6 and .7 generate errors at :25
In about 2 hours I will have it built. 
You don't have a spare Ryzen 9 3900X for me to speed it up? Kernel building shows me quite drastically that my Haswell I7 is out of date ;)
Comment 11 Martin 2019-08-08 15:42:03 UTC
Kernel 5.3.0-rc3 does not boot on my system
It hangs at detecting the discs.
Comment 12 Dieter Nützel 2019-08-08 23:39:03 UTC
(In reply to Alex Deucher from comment #9)
> Created attachment 144983 [details] [review] [review]
> fix DC code
> 
> Can you try applying both of these patches?  Assuming both of your monitors
> have the same timing this might work.

Didn't apply on amd-staging-drm-next, too.
Comment 13 Dieter Nützel 2019-08-08 23:56:06 UTC
(In reply to Dieter Nützel from comment #12)
> (In reply to Alex Deucher from comment #9)
> > Created attachment 144983 [details] [review] [review] [review]
> > fix DC code
> > 
> > Can you try applying both of these patches?  Assuming both of your monitors
> > have the same timing this might work.
> 
> Didn't apply on amd-staging-drm-next, too.

BTW

Alex, is this the same problem?
My card never was below ~32 W (even with single monitor
but I have two identical HDMI 1920x1080)
PSTATE_xxxx is much higher than Martin's
didn't saw "zero fan" / zero core (no spinning fans)

Polaris 20 / 8GB Sapphire Radeon RX 580 Nitro+
single monitor

GFX Clocks and Power:
        300 MHz (MCLK)
        300 MHz (SCLK)
        600 MHz (PSTATE_SCLK)
        1000 MHz (PSTATE_MCLK)
        750 mV (VDDGFX)
        32.17 W (average GPU)

GPU Temperature: 31 C
GPU Load: 0 %

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:       +0.75 V  
fan1:         909 RPM  (min =    0 RPM, max = 3200 RPM)
temp1:        +30.0°C  (crit = +94.0°C, hyst = -273.1°C)
power1:       32.09 W  (cap = 175.00 W)
Comment 14 Alex Deucher 2019-08-09 14:59:12 UTC
(In reply to Dieter Nützel from comment #13)
> 
> Alex, is this the same problem?

No.

> 
> GFX Clocks and Power:
>         300 MHz (MCLK)
>         300 MHz (SCLK)

Your mclk is going to a lower state when it's idle.
Comment 15 Martin 2019-08-20 13:41:45 UTC
Finally with rc5 of Kernel 5.3 I was able to boot the kernel, sadly your two patches did not lower the power consumption.
Comment 16 Alex Deucher 2019-08-22 16:17:08 UTC
Created attachment 145136 [details] [review]
fix DC code

Can you try this patch along with attachment 144978 [details] [review]?
Comment 17 Alex Deucher 2019-08-22 16:17:29 UTC
Note that it will only work if your monitors have identical timing.
Comment 18 Martin 2019-08-25 20:53:59 UTC
Hello,

sorry that it took me that long. I was on a historic cycling event in Germany.

Your patch indeed did something.
The power consumption drops sometimes to 70W, but now both screen flicker and produce errors similar to a dying video-memory.

GFX Clocks and Power:
	300 MHz (MCLK)
	308 MHz (SCLK)
	300 MHz (PSTATE_SCLK)
	300 MHz (PSTATE_MCLK)
	800 mV (VDDGFX)
	12.222 W (average GPU)
Comment 19 Martin 2019-08-25 21:00:24 UTC
with 70W i mean total system power-consumption of course. 
This is roughly the same / a little more as with Windows. So we are on a good path I think.

If i do 
"echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level"
the flickering stops.
So the flickering is caused by the automatic powermanagement / reclocking.
Comment 20 Alex Deucher 2019-08-26 03:00:16 UTC
Created attachment 145157 [details] [review]
fix DC code

Updated patch.
Comment 21 Martin 2019-08-26 09:37:01 UTC
sadly the screen still flickers
Comment 22 Dieter Nützel 2019-08-26 21:45:45 UTC
Hello Alex and Martin,

I've tried both on my

Polaris 20, RX580 8 GB Sapphire Technology Limited Nitro+ Radeon RX 580

- v2 patched into amd-staging-drm-next (before inclusion of v3)
- v3 with amd-staging-drm-next
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-staging-drm-next&id=ca82748783d8189a54a85f2ea1c2710182ba6138

Both flicker with green/black (?) horizontally lines over both screens.
Mostly during power level switch. For example during mouse movement/interaction (wheel) and mouse pointer traverse from konsole/etc. to desktop (KDE5 Plasma 5.xx, here).

UVD load (mplayer etc.) is not enough to fix it.
E.g. radv (vkcube) not.

But other gfx load (vkmark/glmark2, etc.).
When there is lower gfx demand during the above tests (glmark2 -b buffer) the flicker came up, again.

Martin's observation

[-]
If i do 
"echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level"
the flickering stops.
So the flickering is caused by the automatic powermanagement / reclocking.
[-]

Works here, too (tested with v3).

But I never could go below ~32 W !!!
Tested with both Nitro+ BIOS modes.

The PSTATE_xxxx wouldn't change on my card. They stay @ 600/1000 all the time!?

GFX Clocks and Power:
        300 MHz (MCLK)
        300 MHz (SCLK)
        600 MHz (PSTATE_SCLK)
        1000 MHz (PSTATE_MCLK)
        750 mV (VDDGFX)
        32.76 W (average GPU)

GPU Temperature: 31 C
GPU Load: 0 %
MEM Load: 3 %

Any hints?

And sorry for my bad English this time - my best friend from beginning of German Gymnasium died after 6 years of fight against cancer. He aged only 52. Leaving a wife and two little girls...
Comment 23 Dieter Nützel 2019-08-26 22:07:35 UTC
Oh, BTW Martin which type are your 2 identical displays? HDMI (like mine) or DisplayPort?

I use sound over HDMI, too. And only one display present it.
Comment 24 Martin 2019-08-26 22:45:35 UTC
I am so sorry to hear that Dieter, this is really terrible.
But I can assure you your English is fine. 

I also think your problem is somewhere else. You mentioned it yourself. and I think Alex Deucher did as well, that your P-States are not changing.

I have two different screens. One HP ZR2440W connected via DVI and a Dell U2412M connected via Displayport. Both run at 59.95Hz@1920x1200
Comment 25 Alex Deucher 2019-08-27 14:37:02 UTC
The patches I posted only affect multiple monitors with identical timing.  That means identical modelines, not just the same resolution and refresh rate.  In practice this generally means you need to use identical monitors.  If you are using a single monitor or multiple different monitors, the patches are not relevant for you.
Comment 26 Dieter Nützel 2019-08-27 18:09:33 UTC
Created attachment 145180 [details]
xrandr -q
Comment 27 Dieter Nützel 2019-08-27 18:11:14 UTC
This is going nuts...

Martins has

2 different displays (both 59.95Hz@1920x1200),
RX 480 and _very nice_ numbers (only 12.222 W), now

GFX Clocks and Power:
	300 MHz (MCLK)                     <= !!!
	308 MHz (SCLK)
	300 MHz (PSTATE_SCLK)
	300 MHz (PSTATE_MCLK)
	800 mV (VDDGFX)                    <= !!!
	12.222 W (average GPU)

=> working (?!) but flicker

I have

2 identical displays BenQ GL2440H  (both 60.00 Hz @ 1920x1080),
RX580 and 'normal' numbers (~32 W - but to high?!), now

GFX Clocks and Power:
        300 MHz (MCLK)                     <= !!!
        300 MHz (SCLK)
        600 MHz (PSTATE_SCLK)
        1000 MHz (PSTATE_MCLK)
        750 mV (VDDGFX)                    <= !!! mine is better, but...
        32.76 W (average GPU)

=> working (?!) but flicker, too.


This
        600 MHz (PSTATE_SCLK)
        1000 MHz (PSTATE_MCLK)

must be a different problem (compare with Martin's RX 480).
I open another ticket for it.
Comment 28 Dieter Nützel 2019-08-27 18:22:59 UTC
I've tried solving the flicker with both fixes (sent by magist3r) from this bug

Bug 102646 - Screen flickering under amdgpu-experimental [buggy auto power profile]
https://bugs.freedesktop.org/show_bug.cgi?id=102646

But no success.
Comment 29 Martin 2019-08-29 10:50:52 UTC
@Alex Deucher
Is there a fix for the graphical glitches I experience? 
They seem to be similar to the glitches I get when I enable overclocking with amdgpu.ppfeaturemask=0xffffffff
Comment 30 Alex Deucher 2019-08-29 13:10:57 UTC
(In reply to Martin from comment #29)
> @Alex Deucher
> Is there a fix for the graphical glitches I experience? 
> They seem to be similar to the glitches I get when I enable overclocking
> with amdgpu.ppfeaturemask=0xffffffff

It would appear that the monitors don't actually quite sync up in your case otherwise you wouldn't see the flicker.
Comment 31 Martin 2019-08-29 13:44:45 UTC
well the flickering goes away, if I lock the clocks to "low" or "high"
Comment 32 Alex Deucher 2019-08-29 13:54:22 UTC
(In reply to Martin from comment #31)
> well the flickering goes away, if I lock the clocks to "low" or "high"

Exactly.  In that case the mclk never changes so there is no flicker.  The mclk has to change during the vblank period otherwise you see flickering.  If the vblank periods are not synced up across monitors, you see flickering.
Comment 33 Martin 2019-08-29 14:28:37 UTC
thank you for the clarification. 
Right now I switch manually between low and high when necessary, so I can work around the glitches. 
Do you think it will be possible to achieve feature parity with windows soon?
Comment 34 Alex Deucher 2019-08-29 14:33:00 UTC
(In reply to Martin from comment #33)
> thank you for the clarification. 
> Right now I switch manually between low and high when necessary, so I can
> work around the glitches. 
> Do you think it will be possible to achieve feature parity with windows soon?

I don't think windows enables mclk switching with multiple monitors either.  It's not clear what's different between windows and Linux on your board unfortunately.
Comment 35 Martin 2019-08-30 15:44:52 UTC
Thank you for clarification, do you think there is a solution for the problem on the linux side, since it works absolutely fine on windows.
Comment 36 tempel.julian 2019-10-03 20:43:50 UTC
(In reply to Dieter Nützel from comment #28)
> I've tried solving the flicker with both fixes (sent by magist3r) from this
> bug
> 
> Bug 102646 - Screen flickering under amdgpu-experimental [buggy auto power
> profile]
> https://bugs.freedesktop.org/show_bug.cgi?id=102646
> 
> But no success.

Have you also applied Ahzo's patch, just in case?
Comment 37 Dieter Nützel 2019-10-08 00:57:02 UTC
(In reply to tempel.julian from comment #36)
> (In reply to Dieter Nützel from comment #28)
> > I've tried solving the flicker with both fixes (sent by magist3r) from this
> > bug
> > 
> > Bug 102646 - Screen flickering under amdgpu-experimental [buggy auto power
> > profile]
> > https://bugs.freedesktop.org/show_bug.cgi?id=102646
> > 
> > But no success.
> 
> Have you also applied Ahzo's patch, just in case?

Thanks for the hint.

v2 is already in 'amd-staging-drm-next'
f659bb6dae58c113805f92822e4c16ddd3156b79
drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.