Bug 91221 - UVD: GPU lockup with BARTS
Summary: UVD: GPU lockup with BARTS
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: XOrg git
Hardware: All All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-03 21:26 UTC by Chí-Thanh Christopher Nguyễn
Modified: 2019-11-19 09:07 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg with drm.debug=14 (134.84 KB, text/plain)
2015-07-03 21:26 UTC, Chí-Thanh Christopher Nguyễn
no flags Details
dmesg from kernel 4.7.0 (230.67 KB, text/plain)
2016-07-28 03:03 UTC, Chí-Thanh Christopher Nguyễn
no flags Details
dmesg/journalctl output for two GPU lockups (106.16 KB, text/plain)
2018-06-19 21:37 UTC, J Mueller
no flags Details

Description Chí-Thanh Christopher Nguyễn 2015-07-03 21:26:07 UTC
Created attachment 116935 [details]
dmesg with drm.debug=14

When playing back a video using VDPAU and UVD it first plays fine, but the X Server reproducibly hangs after a few minutes. Audio will continue to play, but no screen updates will happen any more, nor is there any reaction to keyboard input. I have to use Magic SysRq to switch to the console and kill the X server. From that point on no acceleration will work, and the video card fan gets audibly louder, until the system is rebooted.

When using Xv and no hardware decode acceleration, the video plays back fine.

This is using Linux kernel 4.1.1 and Mesa 10.3.7.
Hardware is 

In dmesg, messages like 

[ 1406.489288] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000031828 last fence id 0x000000000003182a on ring 0)
[ 1406.919692] [drm:rv770_stop_dpm] *ERROR* Could not force DPM to low.
[ 1407.091325] radeon 0000:01:00.0: couldn't schedule ib
[ 1407.091329] [drm:radeon_uvd_suspend] *ERROR* Error destroying UVD (-22)!

are shown. Nothing unusual in the Xorg.0.log.
Comment 1 Chí-Thanh Christopher Nguyễn 2015-07-03 21:29:05 UTC
> Hardware is 
That should have been:
Hardware is an ASUS EAH6870 DC/2DI2S/1GD5
Comment 2 Chí-Thanh Christopher Nguyễn 2016-07-28 03:03:08 UTC
Created attachment 125358 [details]
dmesg from kernel 4.7.0

The problem still happens with kernel 4.7.0

Sometimes, but not always, it seems that the driver is able to recover on its own. But even in that case, the video will hang until I stop and restart it, and after a few minutes the problem will show up again.
Comment 3 jancoow 2017-10-23 21:15:06 UTC
I'm facing the same issue:

AMD XFX HD6850 Black Edition
Intel I3 3550

If I remove the vdaup packages playblack is smootly. But when i'm trying to use video acceleration it's choppy and eventually crash my desktop.
Comment 4 jancoow 2017-10-23 21:17:09 UTC
Dmesg info:

[   52.191082] [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed
[  112.173163] radeon 0000:01:00.0: ring 0 stalled for more than 10283msec
[  112.173171] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000004009 last fence id 0x000000000000401e on ring 0)
[  112.468609] radeon 0000:01:00.0: couldn't schedule ib
[  112.468627] [drm:radeon_uvd_suspend [radeon]] *ERROR* Error destroying UVD (-22)!
[  112.469664] radeon 0000:01:00.0: Saved 695 dwords of commands on ring 0.
[  112.469674] radeon 0000:01:00.0: GPU softreset: 0x00000088
[  112.469675] radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003828
[  112.469676] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[  112.469677] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  112.469678] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200840C0
[  112.469679] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  112.469680] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  112.469681] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010000
[  112.469682] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000002
[  112.469683] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80010243
[  112.469684] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  112.501078] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
[  112.501130] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00008100
[  112.502283] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[  112.502284] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[  112.502285] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  112.502286] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200800C0
[  112.502287] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  112.502288] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  112.502289] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  112.502290] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  112.502291] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[  112.502292] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  112.502302] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  112.526119] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[  112.528665] [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
[  112.528756] radeon 0000:01:00.0: WB enabled
[  112.528757] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff97f5155a1c00
[  112.528758] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff97f5155a1c0c
[  112.529503] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffbb7702c32118
[  112.545920] [drm] ring test on 0 succeeded in 3 usecs
[  112.545930] [drm] ring test on 3 succeeded in 7 usecs
[  112.723049] [drm] ring test on 5 succeeded in 2 usecs
[  112.723057] [drm] UVD initialized successfully.
[  113.879790] [drm:r600_ib_test [radeon]] *ERROR* radeon: fence wait timed out.
[  113.879818] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on GFX ring (-110).
[  114.346545] radeon 0000:01:00.0: couldn't schedule ib
[  114.346562] [drm:radeon_uvd_suspend [radeon]] *ERROR* Error destroying UVD (-22)!
[  114.347615] radeon 0000:01:00.0: GPU softreset: 0x00000088
[  114.347616] radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003828
[  114.347617] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[  114.347618] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  114.347619] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200040C0
[  114.347620] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  114.347621] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  114.347622] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010000
[  114.347623] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000002
[  114.347624] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80010243
[  114.347625] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  114.361570] radeon_dp_aux_transfer_native: 190 callbacks suppressed
[  114.362412] rfkill: input handler enabled
[  114.373793] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
[  114.373845] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00008100
[  114.374998] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[  114.374999] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[  114.375000] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  114.375001] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  114.375002] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  114.375003] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  114.375003] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  114.375004] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  114.375005] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[  114.375006] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  114.375017] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  114.399015] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[  114.414077] [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
[  114.414168] radeon 0000:01:00.0: WB enabled
[  114.414170] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff97f5155a1c00
[  114.414171] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff97f5155a1c0c
[  114.414916] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffbb7702c32118
[  114.523388] [drm] ring test on 0 succeeded in 0 usecs
[  114.523393] [drm] ring test on 3 succeeded in 3 usecs
[  114.700492] [drm] ring test on 5 succeeded in 1 usecs
[  114.700496] [drm] UVD initialized successfully.
[  114.818963] [drm] ib test on ring 0 succeeded in 0 usecs
[  114.819007] [drm] ib test on ring 3 succeeded in 0 usecs
[  115.186420] [drm] ib test on ring 5 succeeded
[  124.939359] rfkill: input handler disabled
[  147.571358] [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed
[  148.131379] [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed
[  158.412285] radeon 0000:01:00.0: ring 5 stalled for more than 10136msec
[  158.412291] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000007cd last fence id 0x00000000000007d1 on ring 5)
[  158.710095] radeon 0000:01:00.0: couldn't schedule ib
[  158.710112] [drm:radeon_uvd_suspend [radeon]] *ERROR* Error destroying UVD (-22)!
[  158.711171] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  158.746555] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[  158.749128] [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
[  158.749219] radeon 0000:01:00.0: WB enabled
[  158.749220] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff97f5155a1c00
[  158.749221] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff97f5155a1c0c
[  158.749967] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffbb7702c32118
[  158.766410] [drm] ring test on 0 succeeded in 3 usecs
[  158.766419] [drm] ring test on 3 succeeded in 6 usecs
[  158.943537] [drm] ring test on 5 succeeded in 2 usecs
[  158.943544] [drm] UVD initialized successfully.
[  159.093895] [drm] ib test on ring 0 succeeded in 0 usecs
[  159.093943] [drm] ib test on ring 3 succeeded in 0 usecs
[  159.718981] [drm] ib test on ring 5 succeeded
[  165.273859] [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed
[  178.196931] [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed
[ 2144.033023] [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed
[ 2179.085875] [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed
[ 2302.171795] [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed
[ 2303.189937] [drm:btc_dpm_set_power_state [radeon]] *ERROR* rv770_restrict_performance_levels_before_switch failed
Comment 5 Chí-Thanh Christopher Nguyễn 2017-10-24 00:21:39 UTC
As suggested by iive on #radeon, I tried to set R600_DEBUG=nodma and while the problem still happens, the time to trigger the problem increases considerably from a few minutes to 15 minutes or more.
Comment 6 Roman Elshin 2017-10-25 08:03:17 UTC
You may try to use vdpau decoding with opengl video out (if it not this case), for me it much more stable than using vdpau decoding with vdpau video out (at least with rv730 agp card).
Comment 7 Chí-Thanh Christopher Nguyễn 2017-10-25 15:11:48 UTC
Thanks for the suggestion. However,
mpv -vo opengl --hwdec=vdpau
triggers the problem in the same way as -vo vdpau
Comment 8 J Mueller 2018-06-19 21:37:15 UTC
Created attachment 140238 [details]
dmesg/journalctl output for two GPU lockups

### ATTACHMENT
[Line 854-971]: A recovered GPU-lockup, leaving the system in a stable state with some pixel-artifacts all over the desktop.
[Line 972-end]: A non-recoverable GPU-lockup, which made a reboot via power-button necessary.

#### SYSTEM
System:    Host: iotem-pc Kernel: 4.14.48-2-MANJARO x86_64 bits: 64 compiler: gcc v: 8.1.1 Desktop: N/A 
           Distro: Manjaro Linux 17.1.10 Hakoila 
Machine:   Type: Desktop System: Gigabyte product: N/A v: N/A serial: <root required> 
           Mobo: Gigabyte model: 970A-DS3P v: x.x serial: <root required> BIOS: American Megatrends v: F2j 
           date: 12/29/2014 
CPU:       Topology: 8-Core model: AMD FX-8350 bits: 64 type: MCP arch: Bulldozer L2 cache: 2048 KiB 
           flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 64313 
           Speed: 1403 MHz min/max: 1400/4000 MHz Core speeds (MHz): 1: 1401 2: 1398 3: 1404 4: 1398 5: 1401 
           6: 1406 7: 1403 8: 1401 
Graphics:  Card-1: Advanced Micro Devices [AMD/ATI] Barts XT [Radeon HD 6870] driver: radeon v: kernel 
           bus ID: 01:00.0 
           Display: x11 server: N/A driver: ati,radeon unloaded: fbdev,modesetting,vesa 
           resolution: <xdpyinfo missing> 
           OpenGL: renderer: AMD BARTS (DRM 2.50.0 / 4.14.48-2-MANJARO LLVM 6.0.0) v: 3.3 Mesa 18.1.1 
           direct render: Yes 
Audio:     Card-1: Advanced Micro Devices [AMD/ATI] SBx00 Azalia driver: snd_hda_intel v: kernel 
           bus ID: 00:14.2 
           Card-2: AMD Barts HDMI Audio [Radeon HD 6790/6850/6870 / 7720 OEM] driver: snd_hda_intel v: kernel 
           bus ID: 01:00.1 
           Sound Server: ALSA v: k4.14.48-2-MANJARO 
Network:   Card-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169 v: 2.3LK-NAPI 
           port: d000 bus ID: 03:00.0 
           IF: enp3s0 state: up speed: 1000 Mbps duplex: full mac: fc:aa:14:74:f2:39 
Drives:    HDD Total Size: 1.14 TiB used: 691.30 GiB (59.4%) 
           ID-1: /dev/sda vendor: Samsung model: SSD 850 EVO 250GB size: 232.89 GiB 
           ID-2: /dev/sdb vendor: Western Digital model: WD10EARS-00Y5B1 size: 931.51 GiB 
Partition: ID-1: / size: 213.51 GiB used: 116.63 GiB (54.6%) fs: ext4 dev: /dev/sda1 
           ID-2: swap-1 size: 14.96 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda2 
Sensors:   System Temperatures: cpu: 35.0 C mobo: 21.2 C gpu: radeon temp: 44 C 
           Fan Speeds (RPM): cpu: 0 fan-1: 580 fan-3: 0 fan-4: 0 fan-5: 0 
           Voltages: 12v: N/A 5v: N/A 3.3v: N/A vbat: 3.14 
Info:      Processes: 189 Uptime: 55m Memory: 7.82 GiB used: 1.37 GiB (17.5%) Init: systemd Compilers: 
           gcc: 8.1.1 Shell: bash v: 4.4.19 inxi: 3.0.10 

### DESCRIPTION

I also encountered this bug on my system when playing videos in vlc. Setting vlc's FFmpeg hardware decoder from "automatic" (or "VDPAU") to "VA-API video decoder" worked as a workaround for my situation.

The dmesg output might help tracking down this bug, since one time the lockup was recovered (pressing CTRL-Q fast enough, thus closing vlc) and the other time it rendered the system unusable (GPU artifacts "reacting" to keyboard input (changing pattern/colors), but no recovery from this state).
Comment 9 Martin Peres 2019-11-19 09:07:08 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/630.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.