Bug 102666 - amdgpu_vm_bo_invalidate NULL reference in amd-staging-drm-next
Summary: amdgpu_vm_bo_invalidate NULL reference in amd-staging-drm-next
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-11 22:22 UTC by Bas Nieuwenhuizen
Modified: 2018-06-21 09:24 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg (100.47 KB, text/plain)
2017-09-11 22:22 UTC, Bas Nieuwenhuizen
no flags Details

Description Bas Nieuwenhuizen 2017-09-11 22:22:57 UTC
Created attachment 134171 [details]
dmesg

I'm getting a 

[  404.518419] BUG: unable to handle kernel NULL pointer dereference at 0000000000000220
[  404.518445] IP: amdgpu_vm_bo_invalidate+0x71/0x150 [amdgpu]


when running vulkan cts with 32 processes (with tests that cause OOM removed).

Current linux tip:

commit 2dd9dc59c1419c090b084461165bd8b0adf1fecb (HEAD -> amd-staging-drm-next, origin/amd-staging-drm-next)
Author: Harry Wentland <harry.wentland@amd.com>
Date:   Thu Aug 31 21:17:05 2017 -0400

    drm/amdgpu: Remove unused flip_flags from amdgpu_crtc


It doesn't seem like there is a correlating hang: the card is clocked down and /sys/kernel/debug/dri/0/amdgpu_fence_info shows no pending fences. However, eventually some of the CTS processes get stuck, and I can't kill them gdb into them etc. Probably a pagefault that gets stuck, since fence waiting doesn't seem to get stuck easily? Either way, not sure if that is related yet.

AFAICT the issue is that vm->root.base.bo is NULL in

if (evicted && bo->tbo.resv == vm->root.base.bo->tbo.resv) {
Comment 1 Bas Nieuwenhuizen 2018-06-20 22:41:50 UTC
I haven't had this in a long while, seems to be fixed for a while.
Comment 2 Christian König 2018-06-21 09:24:44 UTC
Ok in this case let's close this.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.