Bug 101528

Summary: RX460 Memory clock stays high until card / display is "used"
Product: DRI Reporter: Sverd Johnsen <sverd.johnsen>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: alexander
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=196615
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
possible fix none

Description Sverd Johnsen 2017-06-20 18:58:28 UTC
linux 4.11.4

I use the IGPU on SKL and a RX460 as a dedicated card. In XOrg i turned autobindgpu, autoaddgpu off and singlecard on. with fb fbcon=map kernel parameter i map the linux ttys to different cards. the way i switch to the AMD card is to vt switch and then change the input on my monitor. this works fine and lets me login via agetty and launch a new xserver. One thing i noticed is that until i actually do that the memory clock of the card stays high (17xx mhz?) which makes it 5-8°C hotter and probably uses more power than it needs.
Comment 1 Alexander Tsoy 2017-06-29 17:36:48 UTC
Same problem with TONGA. When GPU is idle, mclk goes to its maximum. This is easily reproducible: just turn off the monitor. I noticed this issue after upgrade from 4.9.x kernels to 4.11.7. Maybe I'll check the mainline kernel later and/or bisect.

# cat /sys/class/drm/card0/device/pp_dpm_mclk
0: 150Mhz
1: 300Mhz
2: 700Mhz
3: 1450Mhz *
# sensors amdgpu-\*
amdgpu-pci-0100
Adapter: PCI adapter
fan1:        2025 RPM
temp1:        +47.0°C  (crit =  +0.0°C, hyst =  +0.0°C)
Comment 2 Alexander Tsoy 2017-06-29 19:48:40 UTC
I've added printing of some debug info into smu7_hwmgr.c and here what I get before GPU enters that state:

[  778.701843] AMDGPU: vblank_time_us: 630, switch_limit_us: 450          
[  778.707608] AMDGPU: vblank_time_us: 630, switch_limit_us: 450                                              
[  778.713379] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 60
[  778.748777] AMDGPU: vblank_time_us: 0, switch_limit_us: 450       
[  778.754361] AMDGPU: vblank_time_us: 0, switch_limit_us: 450  
[  778.759951] AMDGPU: disable_mclk_switching: 1, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 1, mode_info.refresh_rate: 0

For some reason if refresh_rate = 0 then vblank_time_us = 0. Shouldn't the latter be 0xffffffff instead? So I guess the following commit is the culprit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09be4a5219610a6fae3215d4f51f948d6f5d2609

and the following patch should fix (or workaround?) this issue:

- if (vblank_time_us < switch_limit_us)
+ if (vblank_time_us && (vblank_time_us < switch_limit_us))

After applying it:

[  409.588673] AMDGPU: vblank_time_us: 630, switch_limit_us: 450
[  409.594427] AMDGPU: vblank_time_us: 630, switch_limit_us: 450
[  409.600182] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 60
[  409.639750] AMDGPU: vblank_time_us: 0, switch_limit_us: 450
[  409.645321] AMDGPU: vblank_time_us: 0, switch_limit_us: 450
[  409.650917] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 0

$ cat /sys/class/drm/card0/device/pp_dpm_mclk
0: 150Mhz *
1: 300Mhz
2: 700Mhz
3: 1450Mhz
Comment 3 Alex Deucher 2017-06-29 20:14:55 UTC
Created attachment 132358 [details] [review]
possible fix
Comment 4 Alexander Tsoy 2017-06-29 22:14:03 UTC
(In reply to Alex Deucher from comment #3)
> Created attachment 132358 [details] [review] [review]
> possible fix

This patch fixes this bug for me. Thank you!

[  359.229187] AMDGPU: vblank_time_us: 630, switch_limit_us: 450
[  359.234933] AMDGPU: vblank_time_us: 630, switch_limit_us: 450
[  359.240684] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 60
[  359.283987] AMDGPU: vblank_time_us: 4294967295, switch_limit_us: 450
[  359.290342] AMDGPU: vblank_time_us: 4294967295, switch_limit_us: 450
[  359.296703] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 0
...
...
Comment 5 Sverd Johnsen 2017-07-14 11:52:52 UTC
Works for me on 4.11.10. Display off, MCLK is low and card temperature is 27°C as expected. Thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.