Created attachment 144485 [details]
logfiles as requested in the amd bugreport guide
First I am not sure where to file that bug, so please be gentle with me, if I selected the wrong component.
I noticed for a while higher temperatures of my Videocard when my pc was just idling with gnome. Then I dug deeper and found out that my "zero fan" videocard does not stop the fan when I run Linux.
So I ran this line here:
watch -n 0.5 cat /sys/kernel/debug/dri/0/amdgpu_pm_info
and it showed me that the MCLK does not clock down to 300MHz as it does with Windows 10.
GFX Clocks and Power:
2000 MHz (MCLK)
300 MHz (SCLK)
300 MHz (PSTATE_SCLK)
300 MHz (PSTATE_MCLK)
1000 mV (VDDGFX)
24.75 W (average GPU)
GPU Temperature: 45 C
GPU Load: 0 %
I have a multimonitor setup with two 1920x1200 pixel screens. When I use Windows 10, the MCLK does not go beyond 300MHz when the desktop is idling. (measured with hwmonitor)
When I power-off one screen under linux the (average GPU) goes down to 8-10W and the MCLK drops to 300MHz, so the card can clock down, but is somehow prohibited by the driver or configuration?
I followed this bug report guide from amd:
and attached several logfiles.
My bug is now two months old, do you need more information, or what can I do to get your attention?
I think this is a serious issue, because it seems to affect a lot, maybe even all polaris cards. (tested two more in the last weeks).
Shouldn't it be a priority to stop the waste of so much energy?
This is the expected behavior for multiple monitors on Linux. mclk switching must happen in the monitors' blanking period. Since they likely don't align, especially if the monitors have different timing, we have to use a fixed mclk. The DC modesetting code can lock the timing of multiple monitors if they are using the exact same timing so that the blanking periods align, but I don't think the Linux power management code takes this into account at the moment.
Thank you for your explanation.
How do I find out the blanking periods?
(In reply to Martin from comment #3)
> Thank you for your explanation.
> How do I find out the blanking periods?
They are based on the timing for the mode on the display. As for the relevant driver code, take a look at smu7_apply_state_adjust_rules().
Created attachment 144978 [details] [review]
Does this patch fix the issue?
Sadly it did not help.
the MCLK is still fixed at 2000MHz.
How can I verify that I did everything correctly?
I just rebuilt Kernel 5.2.6 from Fedoras srpm and added the patch in the spec file.
Or could it be that I have two different 1920x1200 screens? one from HP and one from Dell?
(In reply to Martin from comment #6)
> Sadly it did not help.
> the MCLK is still fixed at 2000MHz.
> How can I verify that I did everything correctly?
You can add a printk to the patch to verify that it's being applied. Maybe print the value of hwmgr->display_config->multi_monitor_in_sync to see if the monitors are synced or not.
> I just rebuilt Kernel 5.2.6 from Fedoras srpm and added the patch in the
> spec file.
> Or could it be that I have two different 1920x1200 screens? one from HP and
> one from Dell?
That is likely the issue. If the timings for the displays are slightly different, they won't be synced. It could also be that the DC code doesn't set the multi_monitor_in_sync flag properly.
looks like the DC code does not set up the multi_monitor_in_sync flag properly.
Created attachment 144983 [details] [review]
fix DC code
Can you try applying both of these patches? Assuming both of your monitors have the same timing this might work.
Sorry for the delay, I had to figure out which kernel to use, because only Kernel 5.3.0-rc3 accepts your second patch. Stable 5.2.6 and .7 generate errors at :25
In about 2 hours I will have it built.
You don't have a spare Ryzen 9 3900X for me to speed it up? Kernel building shows me quite drastically that my Haswell I7 is out of date ;)
Kernel 5.3.0-rc3 does not boot on my system
It hangs at detecting the discs.
(In reply to Alex Deucher from comment #9)
> Created attachment 144983 [details] [review] [review]
> fix DC code
> Can you try applying both of these patches? Assuming both of your monitors
> have the same timing this might work.
Didn't apply on amd-staging-drm-next, too.
(In reply to Dieter Nützel from comment #12)
> (In reply to Alex Deucher from comment #9)
> > Created attachment 144983 [details] [review] [review] [review]
> > fix DC code
> > Can you try applying both of these patches? Assuming both of your monitors
> > have the same timing this might work.
> Didn't apply on amd-staging-drm-next, too.
Alex, is this the same problem?
My card never was below ~32 W (even with single monitor
but I have two identical HDMI 1920x1080)
PSTATE_xxxx is much higher than Martin's
didn't saw "zero fan" / zero core (no spinning fans)
Polaris 20 / 8GB Sapphire Radeon RX 580 Nitro+
GFX Clocks and Power:
300 MHz (MCLK)
300 MHz (SCLK)
600 MHz (PSTATE_SCLK)
1000 MHz (PSTATE_MCLK)
750 mV (VDDGFX)
32.17 W (average GPU)
GPU Temperature: 31 C
GPU Load: 0 %
Adapter: PCI adapter
vddgfx: +0.75 V
fan1: 909 RPM (min = 0 RPM, max = 3200 RPM)
temp1: +30.0°C (crit = +94.0°C, hyst = -273.1°C)
power1: 32.09 W (cap = 175.00 W)
(In reply to Dieter Nützel from comment #13)
> Alex, is this the same problem?
> GFX Clocks and Power:
> 300 MHz (MCLK)
> 300 MHz (SCLK)
Your mclk is going to a lower state when it's idle.
Finally with rc5 of Kernel 5.3 I was able to boot the kernel, sadly your two patches did not lower the power consumption.