Created attachment 96233 [details] dmesg 3.12 I am seeing a GPU lockup from any v3.13 up to 3.14-rc7, which basically renders my computer unusable under recent kernels :-( [ 55.762710] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [ 55.762715] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000004 last fence id 0x000000000000000 on ring 5) [ 55.762717] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35). [ 55.762720] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35). Hardware is an iMac 11,2 with a Radeon 4670 M96XT (RV730), 256MB GDDR3. working up to 3.12, broken as of 3.13. Xorg comes up after some dalays with a mostly black screen, some colored rectangular artifacts where the login fields are, a working mouse cursor. Console fb still works. Bisected to this commit: commit f9eaf9ae782d6480f179850e27e6f4911ac10227 Author: Christian König <christian.koenig@amd.com> Date: Tue Oct 29 20:14:47 2013 +0100 drm/radeon: rework and fix reset detection v2 Stop fiddling with jiffies, always wait for RADEON_FENCE_JIFFIES_TIMEOUT. Consolidate the two wait sequence implementations into just one function. Activate all waiters and remember if the reset was already done instead of trying to reset from only one thread. v2: clear reset flag earlier to avoid timeout in IB test
Created attachment 96234 [details] dmesg 3.14
NB: the UVD init does not occur each time. But the "GPU lockup" message does.
please provide a dmesg from commit f9eaf9ae782d6480f179850e27e6f4911ac10227 and 1dac28eb726109e7ac256051b157baf60b21a5f7 as well. Thansk in advance, Christian.
Created attachment 96315 [details] last good commit
Created attachment 96316 [details] first bad commit
interestingly also the last good commit produces the following log: [ 7.573975] [drm] UVD initialized successfully. [ 7.574210] [drm] Enabling audio 0 support [ 7.574240] [drm] ib test on ring 0 succeeded in 0 usecs [ 7.574263] [drm] ib test on ring 3 succeeded in 0 usecs [ 17.730386] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [ 17.730390] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000002 last fence id 0x0000000000000000) [ 17.730393] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35). [ 17.730397] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35). So that seems unrelated to the issue at hand.
Created attachment 96360 [details] [review] Possible fix
Thanks, I will test the patch tonight. Also I will bisect the first commit that produces the GPU lockup (without visible artifacts), as that seems to me the real problem. Probably f9eaf9 only exposes that bug visibly.
(In reply to comment #6) > interestingly also the last good commit produces the following log: > [ 7.573975] [drm] UVD initialized successfully. > [ 7.574210] [drm] Enabling audio 0 support > [ 7.574240] [drm] ib test on ring 0 succeeded in 0 usecs > [ 7.574263] [drm] ib test on ring 3 succeeded in 0 usecs > [ 17.730386] radeon 0000:01:00.0: GPU lockup CP stall for more than > 10000msec > [ 17.730390] radeon 0000:01:00.0: GPU lockup (waiting for > 0x0000000000000002 last fence id 0x0000000000000000) > [ 17.730393] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed > (-35). > [ 17.730397] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB > on ring 5 (-35). > > > So that seems unrelated to the issue at hand. Actually it is related, and now the behaviours makes perfect sense. Somewhere between 3.12 and your "last good" commit we have a patch that breaks UVD IB testing. But that isn't critical (3D still works fine) until the reset detection rework, cause after that one we try to get the UVD ring working again with each new IOCTL made to the card. Please give the attached patch a try, it clears the "needs_reset" flag if the IB test failed for some reason. So that if the initial bringup fails we won't try to get it working over and over again. Additional to that please bisect what commit breaks UVD IB testing between 3.12 and the "last good" commit and open up a new bug report for this issue. Thanks for the help, Christian.
I confirm that the patch fixes the screen output (3D). The GPU lockup is still present in dmesg, as expected. Bisecting now and will open a new bug report for it.
Perfect, thanks for the help. Patch is on it's way upstream so any objections to closing this bug then?
OK to closing. The other problem has resolved itself, by the way. For convenience I had always booted these kernels via kexec, which was the reason for the GPU lockup. After a normal warm boot the problem went away.
Ah, ok. That makes sense, cause kexec and UVD are known to not work together by design. Closing this.
Does this fix issues where when GPU locks up X is able to resume? when X resumes for me, there is no ability to resume using the display server and most of the time it just GPU wedges and then I need to do a reset.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.