Created attachment 128327 [details]
Kernel bisect log
I have a 2GB Radeon R7 260X (BONAIRE).
With kernel 4.7 and above, I was experiencing extremely slow performance. Even desktop animations on Ubuntu 16.04 w/ Unity desktop are extremely choppy, probably about 10fps.
dmesg produces several instances of the following error message:
[drm:ci_dpm_set_power_state [radeon]] *ERROR* ci_upload_dpm_level_enable_mask failed
I did a kernel bisect, and narrowed the problem to the following commit: http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=7050c6ef5f0e9bc5e6bf9eb035320b70f731b919
The bisect log is attached.
It seems that the commit adds support for a new firmware file, "bonaire_uvd.bin". If the driver fails in loading the new firmware file, it falls back to the legacy file, "BONAIRE_uvd.bin".
To confirm that the issue is caused by the new firmware, I deleted bonaire_uvd.bin, and performance is restored to normal with the latest stable kernel (4.9.0-rc7).
For what it's worth, here are the contents of /sys/kernel/debug/dri/64/radeon_pm_info while idling on the Ubuntu desktop with the new firmware:
power level avg sclk: 115774 mclk: 15000
And the old firmware:
power level avg sclk: 30248 mclk: 165000
Created attachment 128536 [details]
I just bisected the regression that has been affecting my R9 290 for a long time. I ended up at the same commit as Furkan: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7050c6ef5f0e9bc5e6bf9eb035320b70f731b919
I also observed the same radeon_pm_info debug output:
On good commits:
power level avg sclk: 100000 mclk: 126000
On bad commits:
power level avg sclk: 100000 mclk: 15000
Please let me know if you need any more information, or if you want me to test something for you.
Commenting out http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_uvd.c?id=7050c6ef5f0e9bc5e6bf9eb035320b70f731b919#n130
or removing /lib/firmware/radeon/bonaire_uvd.bin fixes the problem on my system.
Created attachment 128558 [details]
Bonaire UVD firmware
Please try attached Bonaire UVD firmware.
(In reply to leoxsliu from comment #3)
> Created attachment 128558 [details]
> Bonaire UVD firmware
> Please try attached Bonaire UVD firmware.
This appears to fix the problem for me.
The firmware from comment #3 was distributed in linux-firmware starting with commit 5e6165a8705613646c9a5a282f0a7243fe5dafdc (https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/commit/?id=5e6165a8705613646c9a5a282f0a7243fe5dafdc). Which corresponds to Ubuntu's linux-firmware package version 1.158, released on May 6, 2016.
People who already had this firmware would not experience the regression, which could explain the mixed reports of reproducibility.
What is the md5sum of the bonaire_uvd.bin file on your system? For the latest file in git (http://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/plain/radeon/bonaire_uvd.bin) I get 3106157934a8feb55145c4f5de3128e2 which matches the md5sum in the firmware Leo attached in attachment 128558 [details].
(In reply to John Brooks from comment #5)
> The firmware from comment #3 was distributed in linux-firmware starting with
> commit 5e6165a8705613646c9a5a282f0a7243fe5dafdc
> commit/?id=5e6165a8705613646c9a5a282f0a7243fe5dafdc). Which corresponds to
> Ubuntu's linux-firmware package version 1.158, released on May 6, 2016.
> People who already had this firmware would not experience the regression,
> which could explain the mixed reports of reproducibility.
The latest firmware from upstream linux-firmware.git matches what Leo posted. I think you just need to update your firmware from linux-firmware.git and Ubuntu needs to update as well if they haven't already.
(In reply to Alex Deucher from comment #6)
> What is the md5sum of the bonaire_uvd.bin file on your system? For the
> latest file in git
> plain/radeon/bonaire_uvd.bin) I get 3106157934a8feb55145c4f5de3128e2 which
> matches the md5sum in the firmware Leo attached in attachment 128558 [details]
I clobbered my backup by accident but it was different from that. The one that apt-get gives me if I reinstall linux-firmware (version 1.157.6 from xenial-updates; this is Mint 18) has an md5sum of 9f2ba7e720e2af4d7605a9a4fd903513
I think the fix is to make sure Ubuntu has the latest firmware from the linux firmware git tree.
I can also confirm that the new firmware image solves the issue for me.
Should we add code to the driver to avoid the bad firmware? Or can we just resolve this report as NOTOURBUG?
(In reply to Alex Deucher from comment #9)
> I think the fix is to make sure Ubuntu has the latest firmware from the
> linux firmware git tree.
Ubuntu only started shipping an affected kernel in 16.10 (yakkety), and in that release they are also shipping the updated linux-firmware (I downloaded the package from http://packages.ubuntu.com/yakkety/linux-firmware and ran md5sum on the file). I think that the Ubuntu users affected by this issue are those that installed a newer kernel on an older Ubuntu release. Those users will just have to make sure they install the newer firmware too.
Anyone using kernel 4.7+ should make sure that their bonaire_uvd.bin is up to date.