Bug 93460 - [amdgpu] Ooops during shutdown - amdgpu_vm_grab_id
Summary: [amdgpu] Ooops during shutdown - amdgpu_vm_grab_id
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-20 14:54 UTC by Mike Lothian
Modified: 2016-04-15 17:50 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Screenshot of oops (731.83 KB, image/jpeg)
2015-12-20 15:09 UTC, Mike Lothian
no flags Details

Description Mike Lothian 2015-12-20 14:54:42 UTC
This doesn't happen on the powerplay branch but it does happen on Linus's tree 4.4-rc5

As this appears related to the scheduler I can go back to kernel 4.3 and test that and if it doesn't happen try and bisect if you think it's worthwhile
Comment 1 Mike Lothian 2015-12-20 15:09:56 UTC
Created attachment 120605 [details]
Screenshot of oops
Comment 2 Mike Lothian 2015-12-20 15:35:15 UTC
I tried to bisect between v4.3 and HEAD but there were too many other issues getting in the way - i915, ath10k - making remoting in too difficult when the screen wasn't showing anything
Comment 3 david1.zhou@amd.com 2015-12-21 04:08:20 UTC
0x24e7e is in amdgpu_vm_grab_id (include/linux/fence.h:292).
287	 * Returns true if f1 is chronologically later than f2. Both fences must be
288	 * from the same context, since a seqno is not re-used across contexts.
289	 */
290	static inline bool fence_is_later(struct fence *f1, struct fence *f2)
291	{
292		if (WARN_ON(f1->context != f2->context))
293			return false;

This should be normal warnings, isn't bug.
Comment 4 Mike Lothian 2015-12-21 07:21:42 UTC
When this happens my machine just freezes, the only way to continue is to press and hold the power button but that doesn't cleanly unmount the disks
Comment 5 Christian König 2015-12-21 08:55:37 UTC
(In reply to Mike Lothian from comment #4)
> When this happens my machine just freezes, the only way to continue is to
> press and hold the power button but that doesn't cleanly unmount the disks

Yeah, that is clearly a bug when the driver unloads.

Probably rather hard to reproduce, we should add a test case which loads and unloads the driver multiple times while there is load.
Comment 6 david1.zhou@amd.com 2015-12-21 09:23:22 UTC
(In reply to Christian König from comment #5)
> (In reply to Mike Lothian from comment #4)
> > When this happens my machine just freezes, the only way to continue is to
> > press and hold the power button but that doesn't cleanly unmount the disks
> 
> Yeah, that is clearly a bug when the driver unloads.
> 
> Probably rather hard to reproduce, we should add a test case which loads and
> unloads the driver multiple times while there is load.

Maybe we shall avoid to use fence for vmid, instead using LRU list.
Comment 7 Christian König 2015-12-21 14:57:32 UTC
(In reply to david1.zhou@amd.com from comment #6)
> Maybe we shall avoid to use fence for vmid, instead using LRU list.

Yeah, thought about that as well. The problem is that we used to have an LRU list and I switched to fences because they had less overhead.

We still need to keep the fences around for synchronization, so I'm not sure if that would really help.

The real price question is what is going wrong here?
Comment 8 david1.zhou@amd.com 2015-12-22 02:49:27 UTC
(In reply to Christian König from comment #7)
> (In reply to david1.zhou@amd.com from comment #6)
> > Maybe we shall avoid to use fence for vmid, instead using LRU list.
> 
> Yeah, thought about that as well. The problem is that we used to have an LRU
> list and I switched to fences because they had less overhead.
> 
> We still need to keep the fences around for synchronization, so I'm not sure
> if that would really help.
> 
> The real price question is what is going wrong here?

yes, we need to identify why the contexts of two fences are different, where two fences come from, what the kind of two fences are, which ring two fences belong.
Comment 9 Mike Lothian 2016-04-15 17:50:00 UTC
Not seen this in a while now


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.