Bug 81620 - radeon: fence wait failed (-35) after hybrid suspend on 3.15
Summary: radeon: fence wait failed (-35) after hybrid suspend on 3.15
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-21 20:02 UTC by i.kalvachev
Modified: 2018-05-30 17:38 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg of suspend/resume session with radeon.ko dpm debug turned on (54.89 KB, text/plain)
2014-07-21 20:02 UTC, i.kalvachev
no flags Details

Description i.kalvachev 2014-07-21 20:02:39 UTC
Created attachment 103217 [details]
dmesg of suspend/resume session with radeon.ko dpm debug turned on

My hardware is Radeon HD5670 (Redwood).
To reproduce the problem boot vanilla 3.15.x kernel. Run in KMS mode (no Xorg server needed). Then suspend to ram-and-disk with the following command:

`echo suspend  > /sys/power/disk; echo disk > /sys/power/state`

Resume from the power button. In `dmesg` you can find:

[   83.997399] [drm] ring test on 5 succeeded in 1 usecs
[   83.997403] [drm] UVD initialized successfully.
[   83.997450] [drm] ib test on ring 0 succeeded in 0 usecs
[   83.997494] [drm] ib test on ring 3 succeeded in 1 usecs
[   94.137259] radeon 0000:01:00.0: ring 5 stalled for more than 10000msec
[   94.137263] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000004 last fence id 0x0000000000000002 on ring 5)
[   94.137265] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35).
[   94.137268] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35).

At this point if Xorg server is started any attempt to vdpau hardware decoding would fail.
If the computer is left working, without reboot, at some point timeout would trigger and GPU restart might be attempted, usually hanging the system. (I took the following log from older GPU restart, probably successful).

[    0.000000] Linux version 3.15.2 (root) (gcc version 4.8.3 (GCC) ) #2 SMP
[12398.387691] radeon 0000:01:00.0: ring 5 stalled for more than 242796msec
[12398.387699] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000005 last fence id 0x0000000000000004 on ring 5)
[12398.425151] radeon 0000:01:00.0: Saved 23 dwords of commands on ring 0.
[12398.425167] radeon 0000:01:00.0: GPU softreset: 0x00000009
[12398.425169] radeon 0000:01:00.0:   GRBM_STATUS               = 0xF5703828
[12398.425171] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xFC000007
[12398.425173] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[12398.425175] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200800C0
[12398.425177] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[12398.425179] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[12398.425181] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x40000000
[12398.425183] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008004
[12398.425185] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80228647
[12398.425187] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[12398.441572] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B
[12398.441626] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[12398.442782] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[12398.442783] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[12398.442785] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[12398.442787] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200800C0
[12398.442789] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[12398.442791] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[12398.442793] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[12398.442795] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[12398.442796] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[12398.442798] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[12398.442813] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[12398.512161] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[12398.516583] [drm] PCIE GART of 1024M enabled (table at 0x000000000025D000).


The bug vanishes if only suspend to ram or suspend to disk is used.

I tried to do a bisect, but a number of rc kernels seem to hang on me, long before radeon module is loaded. At this point bisect

I suspected that the new async suspend/resume code might be at fault, as I was seeing video card been turned off (aka monitor  going off) and then turning on a moment before shutting down completely.

So I found the commits of the async suspend (from an article) and reverted them.
Reverting 200421a80f6e0a9e39d698944cc35cba103eb6ce, 3c31b52f96f7b559d950b16113c0f68c72a1985e seems to avoid the above effect about monitor turning off, on, then off again. But it does not fix the bug. 

Reverting 
7cd0602d7836c0056fe9bdab014d5ac5ec5cb291, 92858c476ec4e99cf0425f05dee109b6a55eb6f8 and
9e5e7910df824ba02aedd2b5d2ca556426ea6d0b, 76569faa62c46382e080c3e190c66e19515aae1c, de377b3972729f00ee236ae4a97393e282ffe391, 28b6fd6e37792b16a56d324841bdb20ab78e4522, a59ffb2062df3a5c346dbed931fa1e587fd0f0f3
doesn't affect the bug either, so I assume that this bug is not related to suspend/resume async changes.

If you cannot reproduce the problem, please advice me what commits to revert.
Comment 1 Paul Menzel 2018-04-10 10:49:45 UTC
Sorry, that there was no response. In my experience these issues were fixed in the meantime. Could you please retry with Linux 4.16?
Comment 2 i.kalvachev 2018-05-30 17:38:00 UTC
Yes, the bug has been fixed long ago.

I'm using vdpau all the time, so I'm sure it is working.
At least on 64 bit kernels.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.