Bug 97362

Summary: Low performance after suspend on RX 480
Product: DRI Reporter: Christoph Haag <haagch>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: gwhite, vedran
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
dmesg drm-next-4.9-wip none

Description Christoph Haag 2016-08-16 10:32:53 UTC
Created attachment 125811 [details]
dmesg

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Polaris10] (rev c7)

Tried on vanilla Linux 4.7 and 4.8-rc2 with the async pageflip commit reverted.

I'm testing with a very simple directx9 application (without nine) because the impact is extremely obvious with wine: http://www.codesampler.com/dx9src/dx9src_1.htm#dx9_initialization

Before suspending it runs with 5000+ FPS, after suspending it runs with <1000. Also, if you keep the mouse pointer over the window moving, it will have a very small performance drop before suspending and a huuuge performance drop after suspending.

I looked at the powerplay values in sysfs while that application is running in wine and before suspend,
pcie is
0: 2.5GB, x8 
1: 8.0GB, x16 *
and sclk is
0: 300Mhz 
1: 608Mhz 
2: 910Mhz 
3: 1077Mhz 
4: 1145Mhz 
5: 1191Mhz 
6: 1236Mhz 
7: 1288Mhz *

after suspend it's
pcie
0: 2.5GB, x8 *
1: 8.0GB, x16 
and sclk
0: 300Mhz 
1: 608Mhz *
2: 910Mhz 
3: 1077Mhz 
4: 1145Mhz 
5: 1191Mhz 
6: 1236Mhz 
7: 1288Mhz 


mclk is
0: 300Mhz 
1: 2000Mhz *
in both cases.

I then tried
echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
and the sclk clock goes to max and pcie goes to 16x again, but the performance of the application does NOT increase so it looks like the low clocks are a symptom of whatever causes low performance.

dmesg from 4.8-rc2 attached, shows some errors:

[  574.369317] 
                failed to send message 5e ret is 0 


[  574.369317] [drm:amdgpu_vce_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 12 test failed
[  574.369317] [drm:amdgpu_resume [amdgpu]] *ERROR* resume of IP block <vce_v3_0> failed -110
[  574.369317] [drm:amdgpu_resume_kms [amdgpu]] *ERROR* amdgpu_resume failed (-110).
Comment 1 Christoph Haag 2016-08-17 23:50:07 UTC
Created attachment 125857 [details]
dmesg drm-next-4.9-wip

Just tried with drm-next-4.9-wip.

The vce error message is gone but the performance issue remains. So I can at least say that this error was unrelated.
Comment 2 Grigori Goronzy 2016-08-23 21:10:47 UTC
I can confirm, I also see this. The performance reduction also depends on the application for me. I think this issue has existed from the very start, so there's no point in trying to bisect.
Comment 3 Grigori Goronzy 2016-08-30 12:50:53 UTC
FWIW, the Metro 2033 games experience a particularly bad reduction in performance after suspend. Basically half FPS. Other demanding games, e.g. Tomb Raider, are not affected quite as much.

Still no idea what could be causing this.
Comment 4 Gašper Sedej 2017-02-12 22:03:29 UTC
The issue is still present. After resume, I can make GPU to max MHz, but the memory stays at 300MHz.

Ubuntu 16.04 + kernek 4.10rc7 + oibaf ppa (mesa git)
Comment 5 Martin Peres 2019-11-19 08:09:22 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/89.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.