Created attachment 129346 [details]
It looks like we are hitting a use-after-free in gen8_ppgtt_alloc_page_directories with some pdp state. One possible theory from looking at the log is that the shrinker kicks in and starts swinging its axe, evicting one or more vma's, which results in said pdp being freed, I guess we didn't have anything else inserted in that range, which is why it was freed. But all of this could have happened while we were in the middle of allocating a va range for another vma which just so happens to touch the same pdp, and so with a little bad timing the free could have happened just after we check if we need to allocate a new pdp, resulting in all kinds of brokenness. It looks like something similar could also happen with a pd.
Shrinker doing unbind + clear_range vs bind + va_allocate is protected by struct_mutex. But what if the shrinker is triggered by va_allocate or insert-enties. Insert-entries should not be an issue, it should never allocate. But there is a window of opportunity for the shrinker to run as we do allocate and reap a level after we have already checked its presence.
Please see the patches on the list as to how we could fix this by moving the accounting into the allocation phase - that will prevent us from reaping levels we have already processed.
*** This bug has been marked as a duplicate of bug 99295 ***