Bug 98988 - [Regression, bisected] New BONAIRE UVD firmware causes DPM problems and extremely slow performance
Summary: [Regression, bisected] New BONAIRE UVD firmware causes DPM problems and extre...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-04 03:25 UTC by Furkan
Modified: 2019-11-19 09:20 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel bisect log (2.18 KB, text/plain)
2016-12-04 03:25 UTC, Furkan
no flags Details
Bisect log (2.59 KB, text/plain)
2016-12-19 03:22 UTC, John Brooks
no flags Details
Bonaire UVD firmware (227.30 KB, application/octet-stream)
2016-12-19 16:36 UTC, leoxsliu
no flags Details

Description Furkan 2016-12-04 03:25:43 UTC
Created attachment 128327 [details]
Kernel bisect log

I have a 2GB Radeon R7 260X (BONAIRE).

With kernel 4.7 and above, I was experiencing extremely slow performance. Even desktop animations on Ubuntu 16.04 w/ Unity desktop are extremely choppy, probably about 10fps.

dmesg produces several instances of the following error message:
[drm:ci_dpm_set_power_state [radeon]] *ERROR* ci_upload_dpm_level_enable_mask failed

I did a kernel bisect, and narrowed the problem to the following commit: http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=7050c6ef5f0e9bc5e6bf9eb035320b70f731b919

The bisect log is attached.

It seems that the commit adds support for a new firmware file, "bonaire_uvd.bin". If the driver fails in loading the new firmware file, it falls back to the legacy file, "BONAIRE_uvd.bin".

To confirm that the issue is caused by the new firmware, I deleted bonaire_uvd.bin, and performance is restored to normal with the latest stable kernel (4.9.0-rc7).

For what it's worth, here are the contents of /sys/kernel/debug/dri/64/radeon_pm_info while idling on the Ubuntu desktop with the new firmware:

uvd    disabled
vce    disabled
power level avg    sclk: 115774 mclk: 15000

And the old firmware:
uvd    disabled
vce    disabled
power level avg    sclk: 30248 mclk: 165000
Comment 1 John Brooks 2016-12-19 03:22:23 UTC
Created attachment 128536 [details]
Bisect log

I just bisected the regression that has been affecting my R9 290 for a long time. I ended up at the same commit as Furkan: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7050c6ef5f0e9bc5e6bf9eb035320b70f731b919

I also observed the same radeon_pm_info debug output:

On good commits:
uvd    disabled
vce    disabled
power level avg    sclk: 100000 mclk: 126000

On bad commits:
uvd    disabled
vce    disabled
power level avg    sclk: 100000 mclk: 15000

Please let me know if you need any more information, or if you want me to test something for you.
Comment 2 John Brooks 2016-12-19 04:51:12 UTC
Commenting out http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_uvd.c?id=7050c6ef5f0e9bc5e6bf9eb035320b70f731b919#n130

or removing /lib/firmware/radeon/bonaire_uvd.bin fixes the problem on my system.
Comment 3 leoxsliu 2016-12-19 16:36:32 UTC
Created attachment 128558 [details]
Bonaire UVD firmware

Please try attached Bonaire UVD firmware.
Comment 4 John Brooks 2016-12-19 17:08:39 UTC
(In reply to leoxsliu from comment #3)
> Created attachment 128558 [details]
> Bonaire UVD firmware
> 
> Please try attached Bonaire UVD firmware.

This appears to fix the problem for me.
Comment 5 John Brooks 2016-12-19 23:15:20 UTC
The firmware from comment #3 was distributed in linux-firmware starting with commit 5e6165a8705613646c9a5a282f0a7243fe5dafdc (https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/commit/?id=5e6165a8705613646c9a5a282f0a7243fe5dafdc). Which corresponds to Ubuntu's linux-firmware package version 1.158, released on May 6, 2016.

People who already had this firmware would not experience the regression, which could explain the mixed reports of reproducibility.
Comment 6 Alex Deucher 2016-12-19 23:57:43 UTC
What is the md5sum of the bonaire_uvd.bin file on your system?  For the latest file in git (http://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/plain/radeon/bonaire_uvd.bin) I get 3106157934a8feb55145c4f5de3128e2 which matches the md5sum in the firmware Leo attached in attachment 128558 [details].
Comment 7 Alex Deucher 2016-12-20 00:03:32 UTC
(In reply to John Brooks from comment #5)
> The firmware from comment #3 was distributed in linux-firmware starting with
> commit 5e6165a8705613646c9a5a282f0a7243fe5dafdc
> (https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/
> commit/?id=5e6165a8705613646c9a5a282f0a7243fe5dafdc). Which corresponds to
> Ubuntu's linux-firmware package version 1.158, released on May 6, 2016.
> 
> People who already had this firmware would not experience the regression,
> which could explain the mixed reports of reproducibility.

The latest firmware from upstream linux-firmware.git matches what Leo posted.  I think you just need to update your firmware from linux-firmware.git and Ubuntu needs to update as well if they haven't already.
Comment 8 John Brooks 2016-12-20 00:08:02 UTC
(In reply to Alex Deucher from comment #6)
> What is the md5sum of the bonaire_uvd.bin file on your system?  For the
> latest file in git
> (http://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/
> plain/radeon/bonaire_uvd.bin) I get 3106157934a8feb55145c4f5de3128e2 which
> matches the md5sum in the firmware Leo attached in attachment 128558 [details]
> [details].

I clobbered my backup by accident but it was different from that. The one that apt-get gives me if I reinstall linux-firmware (version 1.157.6 from xenial-updates; this is Mint 18) has an md5sum of 9f2ba7e720e2af4d7605a9a4fd903513
Comment 9 Alex Deucher 2016-12-20 00:10:52 UTC
I think the fix is to make sure Ubuntu has the latest firmware from the linux firmware git tree.
Comment 10 Furkan 2016-12-20 01:13:20 UTC
I can also confirm that the new firmware image solves the issue for me.
Comment 11 Michel Dänzer 2016-12-20 02:02:24 UTC
Should we add code to the driver to avoid the bad firmware? Or can we just resolve this report as NOTOURBUG?
Comment 12 John Brooks 2016-12-30 04:33:09 UTC
(In reply to Alex Deucher from comment #9)
> I think the fix is to make sure Ubuntu has the latest firmware from the
> linux firmware git tree.

Ubuntu only started shipping an affected kernel in 16.10 (yakkety), and in that release they are also shipping the updated linux-firmware (I downloaded the package from http://packages.ubuntu.com/yakkety/linux-firmware and ran md5sum on the file). I think that the Ubuntu users affected by this issue are those that installed a newer kernel on an older Ubuntu release. Those users will just have to make sure they install the newer firmware too.

Anyone using kernel 4.7+ should make sure that their bonaire_uvd.bin is up to date.
Comment 13 Martin Peres 2019-11-19 09:20:34 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/763.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.