Bug 77394 - Radeon R9 270X GPU lockup on power profile change
Summary: Radeon R9 270X GPU lockup on power profile change
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-13 11:35 UTC by nine
Modified: 2019-11-19 08:48 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg taken after desktop freeze (248.69 KB, text/plain)
2014-04-13 11:37 UTC, nine
no flags Details
Xorg.0.log of a session after a freeze (67.77 KB, text/plain)
2014-04-13 11:38 UTC, nine
no flags Details
dmesg after GPU lockup on setting power profile to high (137.55 KB, text/plain)
2014-04-22 21:15 UTC, nine
no flags Details
dmesg after echo auto > power_profile (110.33 KB, text/plain)
2014-07-06 19:14 UTC, nine
no flags Details

Description nine 2014-04-13 11:35:27 UTC
When I turn on KDE compositing the desktop freezes more often than not. These freezes can also be provoked when configuring mplayer for GL output and opening a video file. If that alone is not enough, just change repeatedly from window to fullscreen and back. Attaching Xorg.0.log and dmesg of such a crash.

I can reboot the system cleanly using alt+sysrq+r and ctrl+alt+del but any attempts to get a nice backtrace of the X server with gdb failed due to the process hanging in kernel space.

Hardware:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT [Radeon R9 270X]

Running:
kernel 3.14,
libdrm2-2.4.99~git20140411-1.1,
xorg-x11-server-7.6_1.15.99.902-306.1
Mesa-10.2~git20140411-5.1
llvm-r600-3.5~svn20140408-1.1.x86_64
openSUSE 13.1

Downgrading components to openSUSE 13.1 stock versions did not help except for downgrading the kernel to 3.11.
Comment 1 nine 2014-04-13 11:37:52 UTC
Created attachment 97304 [details]
dmesg taken after desktop freeze
Comment 2 nine 2014-04-13 11:38:33 UTC
Created attachment 97305 [details]
Xorg.0.log of a session after a freeze
Comment 3 Alex Deucher 2014-04-14 14:40:21 UTC
Does this patch set help:
http://lists.freedesktop.org/archives/dri-devel/2014-April/057512.html
and the new *mc2.bin ucode here:
http://people.freedesktop.org/~agd5f/radeon_ucode/
Comment 4 nine 2014-04-15 08:06:12 UTC
(In reply to comment #3)
> Does this patch set help:
> http://lists.freedesktop.org/archives/dri-devel/2014-April/057512.html

Which tree does this apply to?

I applied it to 3.14 but the third patch was rejected and the resulting kernel crashed on any attempt to open a GL window.

On Linus' master the third patch was rejected as well but at least I was able to enable desktop effects and start mplayer. Still got problems with the screen suddenly going blank and the system freezing (had those before as well but much less frequently, maybe due to lower GL usage).
Comment 5 Alex Deucher 2014-04-15 13:20:42 UTC
It might be easier to just test Christian's fixes branch here:
http://cgit.freedesktop.org/~deathsimple/linux/log/?h=drm-fixes-3.15-wip

Does disabling dpm help?  Append radeon.dpm=0 to the kernel command line in grub.  If that helps, the patches I mentioned may help.
Comment 6 nine 2014-04-16 08:14:12 UTC
With the drm-fixes-3.15-wip branch, updated firmware and radeon.dpm=0 I can now run KDE with desktop effects enabled and mplayer's GL output seems stable. But I can still provoke kernel panics (hard lockup on CPU x) quite easily by pressing and holding 'f' in mplayer quickly and repeatedly going into fullscreen mode and back to windowed mode. I also get those blank screens and lockups on resume from disk.

I guess there are multiple issues here working together to spoil my user experience ;)
Comment 7 nine 2014-04-22 21:14:52 UTC
I can easily reproduce GPU lockups by switching the power profile to high. I bought a new power supply that has power to spare, but it did not change anything.

Should I file a separate bug report for these lockups?
Comment 8 nine 2014-04-22 21:15:35 UTC
Created attachment 97776 [details]
dmesg after GPU lockup on setting power profile to high
Comment 9 Alex Deucher 2014-04-22 21:36:46 UTC
Do you still get lockups without forcing the power level to high?  You shouldn't need to force it.  On auto the hw will automatically switch between levels based on load.
Comment 10 nine 2014-04-24 11:32:17 UTC
(In reply to comment #9)
> Do you still get lockups without forcing the power level to high? 

Normal desktop use seems stable as long as I don't use mplayer.

> You
> shouldn't need to force it.  On auto the hw will automatically switch
> between levels based on load.

I thought automatic switching is what dpm is about? Anyway setting power_profile to auto gave me an immediate lockup just like the high profile.
Comment 11 Alex Deucher 2014-04-24 15:32:01 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Do you still get lockups without forcing the power level to high? 
> 
> Normal desktop use seems stable as long as I don't use mplayer.
> 
> > You
> > shouldn't need to force it.  On auto the hw will automatically switch
> > between levels based on load.
> 
> I thought automatic switching is what dpm is about? Anyway setting
> power_profile to auto gave me an immediate lockup just like the high profile.

Correct.  Auto is the default power profile.  So it appears that changing the state causes a problem.
Comment 12 nine 2014-05-03 10:24:57 UTC
Today I gave in and installed fglrx. Despite considerable torture the system is stable, so I can at least rule out some hardware problem.

Do you have any suggestions on what I can do to help fix this problem? Considering that I have a way to reliably reproduce these GPU lockups, that I have the kernel source to play with and that I can ssh into the system, I may be able to gather more information. Unless the problem is likely to be in the firmware.
Comment 13 nine 2014-07-06 19:13:23 UTC
After two months with a stable system using fglrx, I gave radeon a try again. I can still easily reproduce the GPU lockups using kernel 3.16.0-rc2-5.g19015a0-desktop and radeon_ucode fetched today. dmesg output changed a litte. Maybe this helps?
Comment 14 nine 2014-07-06 19:14:30 UTC
Created attachment 102331 [details]
dmesg after echo auto > power_profile

Problem starts at 131.600354
Comment 15 nine 2014-09-27 16:29:18 UTC
News: I actually can switch power levels as much as I like as long as X was not started yet. Starting X afterwards gives me a stable system with changed power level. As soon as it was started once however, the lockups happen on trying to change the power level. Even if I shut down the X server.

Downgrading xorg-x11-server from 7.6_1.16.99.9-2.2 to 7.6_1.16.1-350.4 fixes this problem but then I get random lockups and crashes which is not much better :/
Comment 16 nine 2015-05-23 13:41:27 UTC
Today I finally made decent progress on this issue. I can confirm that the GPU lockups are not the X server's fault, since they appear also when using weston and indeed, when running the eglkms Mesa demo on the pure framebuffer.

Trying to find the exact OpenGL call that causes the lockup, I discovered, that when I change the power profile before starting any GL program (Mesa demo or display server), it continues to work until the next reboot. So all that was needed was a
echo low > /sys/class/drm/card0/device/power_profile
executed before starting the X server and I could change power profile at runtime.

I then turned on dpm again and indeed:
echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level
called by an ExecStartPre script from the display-server service file fixes the issue completely for me and I have now a fully working dpm :)

So it seems like the first change of performance level/profile initializes something that must be initialized before the first GL call or CRTC change for power management to be stable. Any idea what this something might be?
Comment 17 Martin Peres 2019-11-19 08:48:21 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/488.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.