Created attachment 116789 [details]
This is with latest kernel from linus git tree on a CAPE VERDE card.
When the errors appears I get screen corruption when scrolling in a browser/file-manager and missing/changed letters in a terminal.
A bisect led to the commit 161ab658a611df14fb0365b7b70a8c5fed3e4870 and reverting it on master makes everything work normal again.
Created attachment 116790 [details]
Fix is already in Alex's drm-fixes-4.2 tree and should appear in -rc1.
If you for some reason need it sooner just cherry pick "drm/radeon: fix adding all VAs to the freed list on remove v2"
Found the fix in his amdgpu branch and it fixes it, thanks!
(In reply to hadack from comment #3)
> Found the fix in his amdgpu branch and it fixes it, thanks!
Some users still report some issues even after this fix, so please keep an eye open for additional issues.
If you find some then please reopen this bug report.
Hmm, seems you are right, desktop usage is fine on xfce with compton but starting a game like KSP leads to a non-refreshing screen. Reverting both commits makes it work again.
(In reply to hadack from comment #5)
> Hmm, seems you are right, desktop usage is fine on xfce with compton but
> starting a game like KSP leads to a non-refreshing screen. Reverting both
> commits makes it work again.
I can verify the same observations on my HD 7850 (PITCAIRN 0x1002:0x6819 0x1787:0x2320) card. I use Linux stable kernels with Radeon DRM (and core DRM) cherry-picked in from drm-next and drm-fixes. With my last local update -- from
kernel 4.0.4 + DRM 4.1 cherry-picks, to 4.0.6 + DRM 4.1 + DRM 4.2 -- running 'alien-arena' as a test program causes the DE (also Xfce, as is the case with hadack) to stop responding once I exit the game; also, the DE itself seems to
trigger the bug after a while, or when resuming from suspend.
I tried the patch mentioned in comment 2 ("drm/radeon: fix adding all VAs to the freed list on remove v2"), but the symptoms described above continued.
Reverting 161ab658, and not applying the "fix ... VAs ... v2" patch, gives me a working kernel. (And one I am very happy with! My current combination of LLVM 3.7, Mesa, libdrm, xf86-video-ati, and xorg-server is the fastest, most responsive system I've ever had with open source drivers.)
I confirm removing both patches mentioned (from dri-next-4.2) no issue happens for me.
Created attachment 116918 [details] [review]
I unfortunately can't reproduce the issue.
So could somebody please apply the attached patch and try to get me the result stack dump? I need to know who is calling this function.
Thanks in advance,
Created attachment 116924 [details]
output with debugging patch
Here is the output with the debugging patch applied.
Created attachment 116933 [details] [review]
Thanks does the attached patch fixes the issue?
Created attachment 116936 [details]
dmesg with possible fix applied
Still not working with the possible fix applied.
Created attachment 116973 [details]
Possible fix part 2
Please apply this one on top of the first fix and see if the problem still happen.
Sorry that I can't find it of hand and need to check each possible cause separately, but as noted before I can't reproduce the issue here.
No problem, seems the second try was it. With both patches applied it works fine. Tested standard desktop usage and KSP.
(In reply to hadack from comment #13)
> No problem, seems the second try was it. With both patches applied it works
> fine. Tested standard desktop usage and KSP.
Thanks for testing. Are you convinced enough that it works so that I can add an "Test-by: email@example.com" to the patches while pushing them towards 4.2?
(In reply to Christian König from comment #12)
> Created attachment 116973 [details]
> Possible fix part 2
> Please apply this one on top of the first fix and see if the problem still
Works good on my machine. The programs that triggered the bug before no longer cause any problems.
Sanity check: I had dropped 161ab658 and b13e22ae from my list of cherry picks before in order to have a working kernel. After adding those back, and applying
everything works great again. I have not yet tested suspend-to-RAM, but after the testing I've done so far I doubt there will be problems.
Still working fine here, I tested all ways to trigger it and its fine. Feel free to add the tested-by.
I can also confirm that a suspend resume cycle no longer floods my kernel log with linus kernel + the two patches.
(In reply to Dave Witbrodt from comment #15)
> I have not yet tested suspend-to-RAM, but
> after the testing I've done so far I doubt there will be problems.
I tried suspend-to-RAM before leaving for work, and it resumed fine after work. No problems at all with the code in question.
I think we can close this one now.