Bug 107141

Summary: Manual setting of pp_dpm_sclk resets after monitor off/on (rx 480)
Product: DRI Reporter: Krystian <krystian.zajdel>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: major    
Priority: high CC: taijian, tempel.julian
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg none

Description Krystian 2018-07-06 14:04:29 UTC
Created attachment 140483 [details]
dmesg

Manual setting of pp_dpm_sclk resets after monitor off/on. 

1.setting:
echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo 2 > /sys/class/drm/card0/device/pp_dpm_sclk
echo 95 > /sys/class/drm/card0/device/hwmon/hwmon1/pwm1

2. checking:
cat /sys/class/drm/card0/device/pp_dpm_sclk 
0: 300Mhz 
1: 608Mhz 
2: 910Mhz *
3: 1077Mhz 
4: 1145Mhz 
5: 1191Mhz 
6: 1236Mhz 
7: 1303Mhz 

cat /sys/class/drm/card0/device/power_dpm_force_performance_level
manual

cat /sys/class/drm/card0/device/hwmon/hwmon1/pwm1
91

3. switching monitor off/on

4. checking again:
cat /sys/class/drm/card0/device/pp_dpm_sclk 
0: 300Mhz 
1: 608Mhz 
2: 910Mhz 
3: 1077Mhz 
4: 1145Mhz 
5: 1191Mhz 
6: 1236Mhz 
7: 1303Mhz *

cat /sys/class/drm/card0/device/power_dpm_force_performance_level
manual

cat /sys/class/drm/card0/device/hwmon/hwmon1/pwm1
91




The card is under opencl load (blender) but the problem exists regardless of the load.
So, the dpm_performance_level still says manual but the clock is changed to auto again.

GPU: rx480 (MSI) connected via DisplayPort (I didn't try HDMI) 
OS: Slackware current
Affected kernels (that I've tried): 4.17.1, 4.17.2, 4.17.3, 4.18rc3, 4.18.0-rc1-custom-g6becad35ec8a-dirty

Kernels that work properly (the ones I've tried): 4.16.14, 4.16.15, 4.16.17.


It's my first bug report so I apologize in advance for any mistakes.
Comment 1 dwagner 2018-08-12 20:41:42 UTC
I can confirm this bug - and it is worse than reported. And due to other bugs, it is not just a cosmetic annoyance.

The report only talks of "monitor off/on", but the power_dpm_force_performance_level setting is also disregarded after any output mode change with "xrandr --output XXX --mode YYY", and after any switch between the console display and X11 (if the console uses another mode, which is likely).

Also, changing power_dpm_force_performance_level to "manual" is currently the only method to work around the long standing (1 year) crash bug reported in https://bugs.freedesktop.org/show_bug.cgi?id=102322

But with amdgpu assuming "automatic" behaviour instead of "manual" after each output mode change, it is difficult to keep a system stable using this work-around.

BTW: This is bug is still present in current amd-staging-drm-next.
Comment 2 dwagner 2018-08-12 20:43:06 UTC
(I should mention that also screen blanking with DPMS being activated triggers this bug.)
Comment 3 Michel Dänzer 2018-11-02 14:06:13 UTC
*** Bug 108613 has been marked as a duplicate of this bug. ***
Comment 4 Mihai Preda 2019-03-17 17:19:59 UTC
Please see my original report and discussion on ROCm:
https://github.com/RadeonOpenCompute/ROCm/issues/605

Basically, the manually configured sclk is reset (lost) on monitor state change ("monitor on").

As per @kentrussell's feedback there, I'm reporting this as a suspected amdgpu issues related to power management.

I can reproduce with a recent 5.0 kernel and most recent ROCm (2.2).

~$ uname -a
Linux x2 5.0.2-050002-generic #201903131832 SMP Wed Mar 13 22:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

apt list --installed | grep rocm

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

rocm-opencl/Ubuntu 16.04,now 1.2.0-2019030702 amd64 [installed]
~$ apt list --installed | grep hsa 

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

hsa-ext-rocr-dev/Ubuntu 16.04,now 1.1.9-55-gbac2a9b amd64 [installed,automatic]
hsa-rocr-dev/Ubuntu 16.04,now 1.1.9-55-gbac2a9b amd64 [installed,automatic]
hsakmt-roct-dev/Ubuntu 16.04,now 1.0.9-121-g876627e amd64 [installed,automatic]
hsakmt-roct/Ubuntu 16.04,now 1.0.9-121-g876627e amd64 [installed,automatic]

I have reproduced this bug on a wide range of AMD GPUs. Most recently on AMD Vega64.
Comment 5 Martin Peres 2019-11-19 08:42:59 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/441.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.