Created attachment 134171 [details] dmesg I'm getting a [ 404.518419] BUG: unable to handle kernel NULL pointer dereference at 0000000000000220 [ 404.518445] IP: amdgpu_vm_bo_invalidate+0x71/0x150 [amdgpu] when running vulkan cts with 32 processes (with tests that cause OOM removed). Current linux tip: commit 2dd9dc59c1419c090b084461165bd8b0adf1fecb (HEAD -> amd-staging-drm-next, origin/amd-staging-drm-next) Author: Harry Wentland <harry.wentland@amd.com> Date: Thu Aug 31 21:17:05 2017 -0400 drm/amdgpu: Remove unused flip_flags from amdgpu_crtc It doesn't seem like there is a correlating hang: the card is clocked down and /sys/kernel/debug/dri/0/amdgpu_fence_info shows no pending fences. However, eventually some of the CTS processes get stuck, and I can't kill them gdb into them etc. Probably a pagefault that gets stuck, since fence waiting doesn't seem to get stuck easily? Either way, not sure if that is related yet. AFAICT the issue is that vm->root.base.bo is NULL in if (evicted && bo->tbo.resv == vm->root.base.bo->tbo.resv) {
I haven't had this in a long while, seems to be fixed for a while.
Ok in this case let's close this.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.