When I turn on KDE compositing the desktop freezes more often than not. These freezes can also be provoked when configuring mplayer for GL output and opening a video file. If that alone is not enough, just change repeatedly from window to fullscreen and back. Attaching Xorg.0.log and dmesg of such a crash.
I can reboot the system cleanly using alt+sysrq+r and ctrl+alt+del but any attempts to get a nice backtrace of the X server with gdb failed due to the process hanging in kernel space.
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT [Radeon R9 270X]
Downgrading components to openSUSE 13.1 stock versions did not help except for downgrading the kernel to 3.11.
Created attachment 97304 [details]
dmesg taken after desktop freeze
Created attachment 97305 [details]
Xorg.0.log of a session after a freeze
Does this patch set help:
and the new *mc2.bin ucode here:
(In reply to comment #3)
> Does this patch set help:
Which tree does this apply to?
I applied it to 3.14 but the third patch was rejected and the resulting kernel crashed on any attempt to open a GL window.
On Linus' master the third patch was rejected as well but at least I was able to enable desktop effects and start mplayer. Still got problems with the screen suddenly going blank and the system freezing (had those before as well but much less frequently, maybe due to lower GL usage).
It might be easier to just test Christian's fixes branch here:
Does disabling dpm help? Append radeon.dpm=0 to the kernel command line in grub. If that helps, the patches I mentioned may help.
With the drm-fixes-3.15-wip branch, updated firmware and radeon.dpm=0 I can now run KDE with desktop effects enabled and mplayer's GL output seems stable. But I can still provoke kernel panics (hard lockup on CPU x) quite easily by pressing and holding 'f' in mplayer quickly and repeatedly going into fullscreen mode and back to windowed mode. I also get those blank screens and lockups on resume from disk.
I guess there are multiple issues here working together to spoil my user experience ;)
I can easily reproduce GPU lockups by switching the power profile to high. I bought a new power supply that has power to spare, but it did not change anything.
Should I file a separate bug report for these lockups?
Created attachment 97776 [details]
dmesg after GPU lockup on setting power profile to high
Do you still get lockups without forcing the power level to high? You shouldn't need to force it. On auto the hw will automatically switch between levels based on load.
(In reply to comment #9)
> Do you still get lockups without forcing the power level to high?
Normal desktop use seems stable as long as I don't use mplayer.
> shouldn't need to force it. On auto the hw will automatically switch
> between levels based on load.
I thought automatic switching is what dpm is about? Anyway setting power_profile to auto gave me an immediate lockup just like the high profile.
(In reply to comment #10)
> (In reply to comment #9)
> > Do you still get lockups without forcing the power level to high?
> Normal desktop use seems stable as long as I don't use mplayer.
> > You
> > shouldn't need to force it. On auto the hw will automatically switch
> > between levels based on load.
> I thought automatic switching is what dpm is about? Anyway setting
> power_profile to auto gave me an immediate lockup just like the high profile.
Correct. Auto is the default power profile. So it appears that changing the state causes a problem.
Today I gave in and installed fglrx. Despite considerable torture the system is stable, so I can at least rule out some hardware problem.
Do you have any suggestions on what I can do to help fix this problem? Considering that I have a way to reliably reproduce these GPU lockups, that I have the kernel source to play with and that I can ssh into the system, I may be able to gather more information. Unless the problem is likely to be in the firmware.
After two months with a stable system using fglrx, I gave radeon a try again. I can still easily reproduce the GPU lockups using kernel 3.16.0-rc2-5.g19015a0-desktop and radeon_ucode fetched today. dmesg output changed a litte. Maybe this helps?
Created attachment 102331 [details]
dmesg after echo auto > power_profile
Problem starts at 131.600354
News: I actually can switch power levels as much as I like as long as X was not started yet. Starting X afterwards gives me a stable system with changed power level. As soon as it was started once however, the lockups happen on trying to change the power level. Even if I shut down the X server.
Downgrading xorg-x11-server from 7.6_220.127.116.11-2.2 to 7.6_1.16.1-350.4 fixes this problem but then I get random lockups and crashes which is not much better :/
Today I finally made decent progress on this issue. I can confirm that the GPU lockups are not the X server's fault, since they appear also when using weston and indeed, when running the eglkms Mesa demo on the pure framebuffer.
Trying to find the exact OpenGL call that causes the lockup, I discovered, that when I change the power profile before starting any GL program (Mesa demo or display server), it continues to work until the next reboot. So all that was needed was a
echo low > /sys/class/drm/card0/device/power_profile
executed before starting the X server and I could change power profile at runtime.
I then turned on dpm again and indeed:
echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level
called by an ExecStartPre script from the display-server service file fixes the issue completely for me and I have now a fully working dpm :)
So it seems like the first change of performance level/profile initializes something that must be initialized before the first GL call or CRTC change for power management to be stable. Any idea what this something might be?