Summary: | Tonga faults and oopses on agd5f drm-fixes-4.4 since fix incorrect mutex usage v3 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Andy Furniss <adf.lists> | ||||||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||||
Severity: | normal | ||||||||||||
Priority: | medium | CC: | ckoenig.leichtzumerken, nhaehnle | ||||||||||
Version: | DRI git | ||||||||||||
Hardware: | Other | ||||||||||||
OS: | All | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | i915 features: | ||||||||||||
Attachments: |
|
Created attachment 120051 [details]
dmesg
Unless I eventually manage to lock the one before - and I am failing to so far, it looks like this starts with - commit e284022163716ecf11c37fd1057c35d689ef2c11 Author: Christian König <christian.koenig@amd.com> Date: Thu Nov 5 19:49:48 2015 +0100 drm/amdgpu: fix incorrect mutex usage v3 Seems this is the correct bad. Ran all day yesterday without issue on the one before. Booted into kernel set on the "bad" this morning and was locked within 10 minutes browsing. Slightly different - no Oops, similar trace, rcu_preempt detected stalls on CPUs/tasks. Created attachment 120106 [details]
Another trace
Well the good news first it looks like I can reproduce the issue. Now the bad news is that I don't have the slightest idea what's causing it. The patch you mentioned results in different timing around those functions, but I can't see how it should cause something like this. Created attachment 120274 [details] [review] 0001-drm-amdgpu-fix-race-condition-in-amd_sched_entity_pu.patch Please try the attached patch. Thanks to Christian for pointing me at this bug (I got lucky by running into a similar but cleaner stack trace...) (In reply to Nicolai Hähnle from comment #6) > Created attachment 120274 [details] [review] [review] > 0001-drm-amdgpu-fix-race-condition-in-amd_sched_entity_pu.patch > > Please try the attached patch. Thanks to Christian for pointing me at this > bug (I got lucky by running into a similar but cleaner stack trace...) Seems good so far, I'll stay on this kernel for a few days to be sure. "Sure" may be a bit strong though, I said above powerplay was OK, but despite being OK on it for several days in the past I managed to hit this bug on it yesterday. Still seems good |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 120050 [details] some backtraces Haven't tried agd5f drm-fixes-4.4 before today. It seems that it is unstable on R9 285. Twice I've been close to submitting a bisect, but false goods have spoilt things. Varying backtraces attached. Will try to bisect again but it's not going to be quick. I am stable on powerplay and was OK on 4.4-next-wip.