Created attachment 145345 [details]
When using amdgpu.vm_update_mode=3 the following error appears after some time (ranging from a few minutes to a few hours):
BUG: KASAN: use-after-free in amdgpu_vm_update_directories
I attached the relevant dmesg part.
- happens on Navi10 and gfx9 (probably also on other cards but I didn't try)
- reproduced on 865b4ca43816e113996c3be571d4998b6daf5f1 and 20d6b9c3b7f40ec427af912d140f2be0de098d2d
Which kernel branch are you using ? I couldn't find amdgpu_vm_update_directories in latest code in amd-staging-drm-next and turns out it was renamed to amdgpu_vm_update_pdes in 78b20c2ee6788ba0df8b36b1369bc7e264262d3b back in March so seems like this is very outdated code.
(In reply to Andrey Grodzovsky from comment #1)
> Which kernel branch are you using ? I couldn't find
> amdgpu_vm_update_directories in latest code in amd-staging-drm-next and
> turns out it was renamed to amdgpu_vm_update_pdes in
> 78b20c2ee6788ba0df8b36b1369bc7e264262d3b back in March so seems like this is
> very outdated code.
I'm using amd-staging-drm-next from a few days ago.
But 78b20c2ee6788ba0df8b36b1369bc7e264262d3b (drm/amdgpu: allow direct submission of PDE updates v2) has been pushed in this branch recently and indeed it renamed the function.
I'll rebuild a kernel and test if the issue is still there.
Created attachment 145387 [details]
dmesg when using cfdabd064b2d(drm/amdgpu: remove the redundant null checks)
Using the latest commit from amd-staging-drm-next (= cfdabd064b2d58f "drm/amdgpu: remove the redundant null checks"): the use-after-free bug is still there.
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/905.