Bug 98988 - [Regression, bisected] New BONAIRE UVD firmware causes DPM problems and extremely slow performance
Summary: [Regression, bisected] New BONAIRE UVD firmware causes DPM problems and extre...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-04 03:25 UTC by Furkan
Modified: 2016-12-30 04:33 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel bisect log (2.18 KB, text/plain)
2016-12-04 03:25 UTC, Furkan
no flags Details
Bisect log (2.59 KB, text/plain)
2016-12-19 03:22 UTC, John Brooks
no flags Details
Bonaire UVD firmware (227.30 KB, application/octet-stream)
2016-12-19 16:36 UTC, leoxsliu
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Furkan 2016-12-04 03:25:43 UTC
Created attachment 128327 [details]
Kernel bisect log

I have a 2GB Radeon R7 260X (BONAIRE).

With kernel 4.7 and above, I was experiencing extremely slow performance. Even desktop animations on Ubuntu 16.04 w/ Unity desktop are extremely choppy, probably about 10fps.

dmesg produces several instances of the following error message:
[drm:ci_dpm_set_power_state [radeon]] *ERROR* ci_upload_dpm_level_enable_mask failed

I did a kernel bisect, and narrowed the problem to the following commit: http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=7050c6ef5f0e9bc5e6bf9eb035320b70f731b919

The bisect log is attached.

It seems that the commit adds support for a new firmware file, "bonaire_uvd.bin". If the driver fails in loading the new firmware file, it falls back to the legacy file, "BONAIRE_uvd.bin".

To confirm that the issue is caused by the new firmware, I deleted bonaire_uvd.bin, and performance is restored to normal with the latest stable kernel (4.9.0-rc7).

For what it's worth, here are the contents of /sys/kernel/debug/dri/64/radeon_pm_info while idling on the Ubuntu desktop with the new firmware:

uvd    disabled
vce    disabled
power level avg    sclk: 115774 mclk: 15000

And the old firmware:
uvd    disabled
vce    disabled
power level avg    sclk: 30248 mclk: 165000
Comment 1 John Brooks 2016-12-19 03:22:23 UTC
Created attachment 128536 [details]
Bisect log

I just bisected the regression that has been affecting my R9 290 for a long time. I ended up at the same commit as Furkan: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7050c6ef5f0e9bc5e6bf9eb035320b70f731b919

I also observed the same radeon_pm_info debug output:

On good commits:
uvd    disabled
vce    disabled
power level avg    sclk: 100000 mclk: 126000

On bad commits:
uvd    disabled
vce    disabled
power level avg    sclk: 100000 mclk: 15000

Please let me know if you need any more information, or if you want me to test something for you.
Comment 2 John Brooks 2016-12-19 04:51:12 UTC
Commenting out http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_uvd.c?id=7050c6ef5f0e9bc5e6bf9eb035320b70f731b919#n130

or removing /lib/firmware/radeon/bonaire_uvd.bin fixes the problem on my system.
Comment 3 leoxsliu 2016-12-19 16:36:32 UTC
Created attachment 128558 [details]
Bonaire UVD firmware

Please try attached Bonaire UVD firmware.
Comment 4 John Brooks 2016-12-19 17:08:39 UTC
(In reply to leoxsliu from comment #3)
> Created attachment 128558 [details]
> Bonaire UVD firmware
> 
> Please try attached Bonaire UVD firmware.

This appears to fix the problem for me.
Comment 5 John Brooks 2016-12-19 23:15:20 UTC
The firmware from comment #3 was distributed in linux-firmware starting with commit 5e6165a8705613646c9a5a282f0a7243fe5dafdc (https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/commit/?id=5e6165a8705613646c9a5a282f0a7243fe5dafdc). Which corresponds to Ubuntu's linux-firmware package version 1.158, released on May 6, 2016.

People who already had this firmware would not experience the regression, which could explain the mixed reports of reproducibility.
Comment 6 Alex Deucher 2016-12-19 23:57:43 UTC
What is the md5sum of the bonaire_uvd.bin file on your system?  For the latest file in git (http://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/plain/radeon/bonaire_uvd.bin) I get 3106157934a8feb55145c4f5de3128e2 which matches the md5sum in the firmware Leo attached in attachment 128558 [details].
Comment 7 Alex Deucher 2016-12-20 00:03:32 UTC
(In reply to John Brooks from comment #5)
> The firmware from comment #3 was distributed in linux-firmware starting with
> commit 5e6165a8705613646c9a5a282f0a7243fe5dafdc
> (https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/
> commit/?id=5e6165a8705613646c9a5a282f0a7243fe5dafdc). Which corresponds to
> Ubuntu's linux-firmware package version 1.158, released on May 6, 2016.
> 
> People who already had this firmware would not experience the regression,
> which could explain the mixed reports of reproducibility.

The latest firmware from upstream linux-firmware.git matches what Leo posted.  I think you just need to update your firmware from linux-firmware.git and Ubuntu needs to update as well if they haven't already.
Comment 8 John Brooks 2016-12-20 00:08:02 UTC
(In reply to Alex Deucher from comment #6)
> What is the md5sum of the bonaire_uvd.bin file on your system?  For the
> latest file in git
> (http://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/
> plain/radeon/bonaire_uvd.bin) I get 3106157934a8feb55145c4f5de3128e2 which
> matches the md5sum in the firmware Leo attached in attachment 128558 [details]
> [details].

I clobbered my backup by accident but it was different from that. The one that apt-get gives me if I reinstall linux-firmware (version 1.157.6 from xenial-updates; this is Mint 18) has an md5sum of 9f2ba7e720e2af4d7605a9a4fd903513
Comment 9 Alex Deucher 2016-12-20 00:10:52 UTC
I think the fix is to make sure Ubuntu has the latest firmware from the linux firmware git tree.
Comment 10 Furkan 2016-12-20 01:13:20 UTC
I can also confirm that the new firmware image solves the issue for me.
Comment 11 Michel Dänzer 2016-12-20 02:02:24 UTC
Should we add code to the driver to avoid the bad firmware? Or can we just resolve this report as NOTOURBUG?
Comment 12 John Brooks 2016-12-30 04:33:09 UTC
(In reply to Alex Deucher from comment #9)
> I think the fix is to make sure Ubuntu has the latest firmware from the
> linux firmware git tree.

Ubuntu only started shipping an affected kernel in 16.10 (yakkety), and in that release they are also shipping the updated linux-firmware (I downloaded the package from http://packages.ubuntu.com/yakkety/linux-firmware and ran md5sum on the file). I think that the Ubuntu users affected by this issue are those that installed a newer kernel on an older Ubuntu release. Those users will just have to make sure they install the newer firmware too.

Anyone using kernel 4.7+ should make sure that their bonaire_uvd.bin is up to date.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.