Bug 109650 - [amd-staging-drm-next] - Polaris 20 dc - idle power regession 3x [bisected]
Summary: [amd-staging-drm-next] - Polaris 20 dc - idle power regession 3x [bisected]
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-15 23:42 UTC by Dieter Nützel
Modified: 2019-02-20 02:18 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Dieter Nützel 2019-02-15 23:42:36 UTC
Polaris 20

Idle power went up from ~32 W to ~96 W.

With broken commits:

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:       +1.20 V  
fan1:         888 RPM  (min =    0 RPM, max = 3200 RPM)
temp1:        +55.0°C  (crit = +94.0°C, hyst = -273.1°C)
power1:       96.04 W  (cap = 175.00 W)

Bisected to:

764c85fef41722db0f21558c6c2fb38bee172d19 is the first bad commit
commit 764c85fef41722db0f21558c6c2fb38bee172d19
Author: Yong Zhao <Yong.Zhao@amd.com>
Date:   Tue Feb 5 15:17:40 2019 -0500

    drm/amdgpu: Fix bugs in setting CP RB/MEC DOORBELL_RANGE registers
    
    CP_RB_DOORBELL_RANGE_LOWER/UPPER and CP_MEC_DOORBELL_RANGE_LOWER/UPPER
    are used for waking up an idle scheduler and for power gating support.
    Usually the first few doorbells in pci doorbell bar are used for RB
    and all leftover for MEC. This patch fixes the incorrect settings.
    
    Theoretically, gfx ring doorbells should come before all MEC doorbells
    to be consistent with the design. However, since the doorbell
    allocations are agreed by all and we are not free to change them, also
    considering the kernel MEC ring doorbells which are before gfx ring
    doorbells are not used often, we compromise by leaving the doorbell
    allocations unchanged.
    
    Change-Id: I402a56ce9a80e6c2ed2f96be431ae71ca88e73a4
    Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
    Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

:040000 040000 a5747a6be3d388ae851855eebe7ebbf20488ba22 7b516291deb849c593199a4c8df3ad08c5b7a769 M drivers

After reverting both related commits from current
amd-staging-drm-next (256445aee13f)

9affde0e44af (HEAD -> amd-staging-drm-next) Revert "drm/amdgpu: Fix bugs in setting CP RB/MEC DOORBELL_RANGE registers"
8e73059158d8 Revert "drm/amdgpu: Delete user queue doorbell variables"
256445aee13f (origin/amd-staging-drm-next) drm/amdgpu: remove some old unused dpm helpers

I get these numbers, again (somewhat higher then Win... as some other pointed out):

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:       +0.75 V  
fan1:         900 RPM  (min =    0 RPM, max = 3200 RPM)
temp1:        +30.0°C  (crit = +94.0°C, hyst = -273.1°C)
power1:       32.16 W  (cap = 175.00 W)
Comment 1 tempel.julian 2019-02-19 12:53:25 UTC
I can confirm this with RX 580, /sys/kernel/debug/dri/0/amdgpu_pm_info also shows a constant GPU usage of 100%.
Comment 2 Alex Deucher 2019-02-20 01:16:03 UTC
Patch reverted.
Comment 3 Dieter Nützel 2019-02-20 02:17:30 UTC
The relevant 2 commits are reverted.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.