Created attachment 144485 [details] logfiles as requested in the amd bugreport guide First I am not sure where to file that bug, so please be gentle with me, if I selected the wrong component. I noticed for a while higher temperatures of my Videocard when my pc was just idling with gnome. Then I dug deeper and found out that my "zero fan" videocard does not stop the fan when I run Linux. So I ran this line here: watch -n 0.5 cat /sys/kernel/debug/dri/0/amdgpu_pm_info and it showed me that the MCLK does not clock down to 300MHz as it does with Windows 10. GFX Clocks and Power: 2000 MHz (MCLK) 300 MHz (SCLK) 300 MHz (PSTATE_SCLK) 300 MHz (PSTATE_MCLK) 1000 mV (VDDGFX) 24.75 W (average GPU) GPU Temperature: 45 C GPU Load: 0 % I have a multimonitor setup with two 1920x1200 pixel screens. When I use Windows 10, the MCLK does not go beyond 300MHz when the desktop is idling. (measured with hwmonitor) When I power-off one screen under linux the (average GPU) goes down to 8-10W and the MCLK drops to 300MHz, so the card can clock down, but is somehow prohibited by the driver or configuration? I followed this bug report guide from amd: https://www.amd.com/en/support/kb/faq/amdgpu-installation#faq-Reporting-Bugs and attached several logfiles.
My bug is now two months old, do you need more information, or what can I do to get your attention? I think this is a serious issue, because it seems to affect a lot, maybe even all polaris cards. (tested two more in the last weeks). Shouldn't it be a priority to stop the waste of so much energy?
This is the expected behavior for multiple monitors on Linux. mclk switching must happen in the monitors' blanking period. Since they likely don't align, especially if the monitors have different timing, we have to use a fixed mclk. The DC modesetting code can lock the timing of multiple monitors if they are using the exact same timing so that the blanking periods align, but I don't think the Linux power management code takes this into account at the moment.
Thank you for your explanation. How do I find out the blanking periods?
(In reply to Martin from comment #3) > Thank you for your explanation. > How do I find out the blanking periods? They are based on the timing for the mode on the display. As for the relevant driver code, take a look at smu7_apply_state_adjust_rules().
Created attachment 144978 [details] [review] possible fix Does this patch fix the issue?
Sadly it did not help. the MCLK is still fixed at 2000MHz. How can I verify that I did everything correctly? I just rebuilt Kernel 5.2.6 from Fedoras srpm and added the patch in the spec file. Or could it be that I have two different 1920x1200 screens? one from HP and one from Dell?
(In reply to Martin from comment #6) > Sadly it did not help. > the MCLK is still fixed at 2000MHz. > > How can I verify that I did everything correctly? You can add a printk to the patch to verify that it's being applied. Maybe print the value of hwmgr->display_config->multi_monitor_in_sync to see if the monitors are synced or not. > I just rebuilt Kernel 5.2.6 from Fedoras srpm and added the patch in the > spec file. > > Or could it be that I have two different 1920x1200 screens? one from HP and > one from Dell? That is likely the issue. If the timings for the displays are slightly different, they won't be synced. It could also be that the DC code doesn't set the multi_monitor_in_sync flag properly.
looks like the DC code does not set up the multi_monitor_in_sync flag properly.
Created attachment 144983 [details] [review] fix DC code Can you try applying both of these patches? Assuming both of your monitors have the same timing this might work.
Sorry for the delay, I had to figure out which kernel to use, because only Kernel 5.3.0-rc3 accepts your second patch. Stable 5.2.6 and .7 generate errors at :25 In about 2 hours I will have it built. You don't have a spare Ryzen 9 3900X for me to speed it up? Kernel building shows me quite drastically that my Haswell I7 is out of date ;)
Kernel 5.3.0-rc3 does not boot on my system It hangs at detecting the discs.
(In reply to Alex Deucher from comment #9) > Created attachment 144983 [details] [review] [review] > fix DC code > > Can you try applying both of these patches? Assuming both of your monitors > have the same timing this might work. Didn't apply on amd-staging-drm-next, too.
(In reply to Dieter Nützel from comment #12) > (In reply to Alex Deucher from comment #9) > > Created attachment 144983 [details] [review] [review] [review] > > fix DC code > > > > Can you try applying both of these patches? Assuming both of your monitors > > have the same timing this might work. > > Didn't apply on amd-staging-drm-next, too. BTW Alex, is this the same problem? My card never was below ~32 W (even with single monitor but I have two identical HDMI 1920x1080) PSTATE_xxxx is much higher than Martin's didn't saw "zero fan" / zero core (no spinning fans) Polaris 20 / 8GB Sapphire Radeon RX 580 Nitro+ single monitor GFX Clocks and Power: 300 MHz (MCLK) 300 MHz (SCLK) 600 MHz (PSTATE_SCLK) 1000 MHz (PSTATE_MCLK) 750 mV (VDDGFX) 32.17 W (average GPU) GPU Temperature: 31 C GPU Load: 0 % amdgpu-pci-0100 Adapter: PCI adapter vddgfx: +0.75 V fan1: 909 RPM (min = 0 RPM, max = 3200 RPM) temp1: +30.0°C (crit = +94.0°C, hyst = -273.1°C) power1: 32.09 W (cap = 175.00 W)
(In reply to Dieter Nützel from comment #13) > > Alex, is this the same problem? No. > > GFX Clocks and Power: > 300 MHz (MCLK) > 300 MHz (SCLK) Your mclk is going to a lower state when it's idle.
Finally with rc5 of Kernel 5.3 I was able to boot the kernel, sadly your two patches did not lower the power consumption.
Created attachment 145136 [details] [review] fix DC code Can you try this patch along with attachment 144978 [details] [review]?
Note that it will only work if your monitors have identical timing.
Hello, sorry that it took me that long. I was on a historic cycling event in Germany. Your patch indeed did something. The power consumption drops sometimes to 70W, but now both screen flicker and produce errors similar to a dying video-memory. GFX Clocks and Power: 300 MHz (MCLK) 308 MHz (SCLK) 300 MHz (PSTATE_SCLK) 300 MHz (PSTATE_MCLK) 800 mV (VDDGFX) 12.222 W (average GPU)
with 70W i mean total system power-consumption of course. This is roughly the same / a little more as with Windows. So we are on a good path I think. If i do "echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level" the flickering stops. So the flickering is caused by the automatic powermanagement / reclocking.
Created attachment 145157 [details] [review] fix DC code Updated patch.
sadly the screen still flickers
Hello Alex and Martin, I've tried both on my Polaris 20, RX580 8 GB Sapphire Technology Limited Nitro+ Radeon RX 580 - v2 patched into amd-staging-drm-next (before inclusion of v3) - v3 with amd-staging-drm-next https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-staging-drm-next&id=ca82748783d8189a54a85f2ea1c2710182ba6138 Both flicker with green/black (?) horizontally lines over both screens. Mostly during power level switch. For example during mouse movement/interaction (wheel) and mouse pointer traverse from konsole/etc. to desktop (KDE5 Plasma 5.xx, here). UVD load (mplayer etc.) is not enough to fix it. E.g. radv (vkcube) not. But other gfx load (vkmark/glmark2, etc.). When there is lower gfx demand during the above tests (glmark2 -b buffer) the flicker came up, again. Martin's observation [-] If i do "echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level" the flickering stops. So the flickering is caused by the automatic powermanagement / reclocking. [-] Works here, too (tested with v3). But I never could go below ~32 W !!! Tested with both Nitro+ BIOS modes. The PSTATE_xxxx wouldn't change on my card. They stay @ 600/1000 all the time!? GFX Clocks and Power: 300 MHz (MCLK) 300 MHz (SCLK) 600 MHz (PSTATE_SCLK) 1000 MHz (PSTATE_MCLK) 750 mV (VDDGFX) 32.76 W (average GPU) GPU Temperature: 31 C GPU Load: 0 % MEM Load: 3 % Any hints? And sorry for my bad English this time - my best friend from beginning of German Gymnasium died after 6 years of fight against cancer. He aged only 52. Leaving a wife and two little girls...
Oh, BTW Martin which type are your 2 identical displays? HDMI (like mine) or DisplayPort? I use sound over HDMI, too. And only one display present it.
I am so sorry to hear that Dieter, this is really terrible. But I can assure you your English is fine. I also think your problem is somewhere else. You mentioned it yourself. and I think Alex Deucher did as well, that your P-States are not changing. I have two different screens. One HP ZR2440W connected via DVI and a Dell U2412M connected via Displayport. Both run at 59.95Hz@1920x1200
The patches I posted only affect multiple monitors with identical timing. That means identical modelines, not just the same resolution and refresh rate. In practice this generally means you need to use identical monitors. If you are using a single monitor or multiple different monitors, the patches are not relevant for you.
Created attachment 145180 [details] xrandr -q
This is going nuts... Martins has 2 different displays (both 59.95Hz@1920x1200), RX 480 and _very nice_ numbers (only 12.222 W), now GFX Clocks and Power: 300 MHz (MCLK) <= !!! 308 MHz (SCLK) 300 MHz (PSTATE_SCLK) 300 MHz (PSTATE_MCLK) 800 mV (VDDGFX) <= !!! 12.222 W (average GPU) => working (?!) but flicker I have 2 identical displays BenQ GL2440H (both 60.00 Hz @ 1920x1080), RX580 and 'normal' numbers (~32 W - but to high?!), now GFX Clocks and Power: 300 MHz (MCLK) <= !!! 300 MHz (SCLK) 600 MHz (PSTATE_SCLK) 1000 MHz (PSTATE_MCLK) 750 mV (VDDGFX) <= !!! mine is better, but... 32.76 W (average GPU) => working (?!) but flicker, too. This 600 MHz (PSTATE_SCLK) 1000 MHz (PSTATE_MCLK) must be a different problem (compare with Martin's RX 480). I open another ticket for it.
I've tried solving the flicker with both fixes (sent by magist3r) from this bug Bug 102646 - Screen flickering under amdgpu-experimental [buggy auto power profile] https://bugs.freedesktop.org/show_bug.cgi?id=102646 But no success.
@Alex Deucher Is there a fix for the graphical glitches I experience? They seem to be similar to the glitches I get when I enable overclocking with amdgpu.ppfeaturemask=0xffffffff
(In reply to Martin from comment #29) > @Alex Deucher > Is there a fix for the graphical glitches I experience? > They seem to be similar to the glitches I get when I enable overclocking > with amdgpu.ppfeaturemask=0xffffffff It would appear that the monitors don't actually quite sync up in your case otherwise you wouldn't see the flicker.
well the flickering goes away, if I lock the clocks to "low" or "high"
(In reply to Martin from comment #31) > well the flickering goes away, if I lock the clocks to "low" or "high" Exactly. In that case the mclk never changes so there is no flicker. The mclk has to change during the vblank period otherwise you see flickering. If the vblank periods are not synced up across monitors, you see flickering.
thank you for the clarification. Right now I switch manually between low and high when necessary, so I can work around the glitches. Do you think it will be possible to achieve feature parity with windows soon?
(In reply to Martin from comment #33) > thank you for the clarification. > Right now I switch manually between low and high when necessary, so I can > work around the glitches. > Do you think it will be possible to achieve feature parity with windows soon? I don't think windows enables mclk switching with multiple monitors either. It's not clear what's different between windows and Linux on your board unfortunately.
Thank you for clarification, do you think there is a solution for the problem on the linux side, since it works absolutely fine on windows.
(In reply to Dieter Nützel from comment #28) > I've tried solving the flicker with both fixes (sent by magist3r) from this > bug > > Bug 102646 - Screen flickering under amdgpu-experimental [buggy auto power > profile] > https://bugs.freedesktop.org/show_bug.cgi?id=102646 > > But no success. Have you also applied Ahzo's patch, just in case?
(In reply to tempel.julian from comment #36) > (In reply to Dieter Nützel from comment #28) > > I've tried solving the flicker with both fixes (sent by magist3r) from this > > bug > > > > Bug 102646 - Screen flickering under amdgpu-experimental [buggy auto power > > profile] > > https://bugs.freedesktop.org/show_bug.cgi?id=102646 > > > > But no success. > > Have you also applied Ahzo's patch, just in case? Thanks for the hint. v2 is already in 'amd-staging-drm-next' f659bb6dae58c113805f92822e4c16ddd3156b79 drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/817.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.