Bug 104919 - R9285 4.17-wip locks/vmfaults since drm/amdgpu: revert "drm/amdgpu: use AMDGPU_GEM_CREATE_VRAM_CLEARED for VM PD/PTs" v2
Summary: R9285 4.17-wip locks/vmfaults since drm/amdgpu: revert "drm/amdgpu: use AMDGP...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-02 19:38 UTC by Andy Furniss
Modified: 2018-02-08 20:42 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
error logging from corruption/locks (13.31 KB, text/plain)
2018-02-02 19:38 UTC, Andy Furniss
no flags Details
Possible fix (1.15 KB, patch)
2018-02-04 18:38 UTC, Christian König
no flags Details | Splinter Review

Description Andy Furniss 2018-02-02 19:38:01 UTC
last couple of agd5f 4.17-wips have locked with unreal tournament alpha for me on R9 285.

Seems to be

first bad commit: [d712b817ceb9311cffad47867da26311c06a812b] drm/amdgpu: revert "drm/amdgpu: use AMDGPU_GEM_CREATE_VRAM_CLEARED for VM PD/PTs" v2

Though it takes a while to lock, and sometimes only after restarting the game, so slight chance of a false good.

This game requests slightly more than the 2 gig vram I have - maybe relevant if others have more and can't reproduce.

Attached examples of logging retrieved after a lock. Usually sysrq will do, once needed hard reset, first chunk got before a lock by quickly quitting the game after seeing some new artifacts.
Comment 1 Andy Furniss 2018-02-02 19:38:50 UTC
Created attachment 137138 [details]
error logging from corruption/locks
Comment 2 Christian König 2018-02-02 19:42:31 UTC
At least I now knew that the PASID handling is working fine.

Does it work if you disable the new clear method? E.g. just add a "return 0;" to the beginning of amdgpu_vm_clear_bo().
Comment 3 Andy Furniss 2018-02-02 21:43:11 UTC
(In reply to Christian König from comment #2)
> At least I now knew that the PASID handling is working fine.
> 
> Does it work if you disable the new clear method? E.g. just add a "return
> 0;" to the beginning of amdgpu_vm_clear_bo().

Seems good with that.
Just a quick test as got to be AFK, will try more later.
Comment 4 Andy Furniss 2018-02-03 17:44:00 UTC
(In reply to Andy Furniss from comment #3)

> Seems good with that.
> Just a quick test as got to be AFK, will try more later.

Still good after a bit more time.
Comment 5 Christian König 2018-02-04 18:38:55 UTC
Created attachment 137166 [details] [review]
Possible fix

One thing I've found while looking at the code.

Please test if that fixed the issue.
Comment 6 Andy Furniss 2018-02-04 21:36:01 UTC
That fixes it, thanks.
Comment 7 Andy Furniss 2018-02-08 20:42:07 UTC
Fix is in affected kernels.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.