Summary: | agd5f drm-next-3.19-wip + Unreal Elemental sometimes = list_add corruption/hung task | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Andy Furniss <adf.lists> | ||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||
Status: | CLOSED FIXED | QA Contact: | |||||||||
Severity: | normal | ||||||||||
Priority: | medium | CC: | ckoenig.leichtzumerken, commiethebeastie | ||||||||
Version: | XOrg git | ||||||||||
Hardware: | Other | ||||||||||
OS: | All | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Also noticed in that dmesg and searching kern log that I sometimes get apparently without effect - kernel: [drm:radeon_gem_va_update_vm] *ERROR* Couldn't update BO_VA (-512) With this kernel. (In reply to Andy Furniss from comment #0) > Haven't seen on drm-next-3.18-wip Can you bisect the kernel? (In reply to Michel Dänzer from comment #2) > (In reply to Andy Furniss from comment #0) > > Haven't seen on drm-next-3.18-wip > > Can you bisect the kernel? May be a bit early, but I will sit on the one before for a while to confirm. Looks like the head commit - commit bb9a49819ed30f3f5782b2504066547a8507a591 Author: Christian König <christian.koenig@amd.com> Date: Mon Oct 13 12:41:47 2014 +0200 drm/radeon: update the VM after setting BO address This way the necessary VM update is kicked off immediately if all BOs involved are in GPU accessible memory. I haven't managed to lock or get Valley to GPU fault on the one before so far. FWIW I noticed even on head the valley fault doesn't always happen - it seems that I need to have set my CPUs to perf (which I nearly always do when testing things like this). With cpufreq ondemand I didn't see the fault. Created attachment 108165 [details] [review] Possible fix Ups! Forgotten to take the VM lock in radeon_gem_va_update_vm. Fix is attached. Thanks for testing, Christian. (In reply to Christian König from comment #4) > Created attachment 108165 [details] [review] [review] > Possible fix > > Ups! Forgotten to take the VM lock in radeon_gem_va_update_vm. Fix is > attached. > > Thanks for testing, > Christian. I don't know about Elemental as it's far harder to trigger, but first try with valley produced - [ 156.617954] radeon 0000:01:00.0: GPU fault detected: 146 0x02e83504 [ 156.617960] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00010F17 [ 156.617961] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08035004 [ 156.617963] VM fault (0x04, vmid 4) at page 69399, read from VGT (53) (In reply to Andy Furniss from comment #5) > I don't know about Elemental as it's far harder to trigger, but first try > with valley produced - > > [ 156.617954] radeon 0000:01:00.0: GPU fault detected: 146 0x02e83504 > [ 156.617960] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR > 0x00010F17 > [ 156.617961] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS > 0x08035004 > [ 156.617963] VM fault (0x04, vmid 4) at page 69399, read from VGT (53) Sounds like a different problem triggered by the same patchset to me. But first things first, is the original issue with the list corruption fixed? If yes we can start to look into this one as well. (In reply to Christian König from comment #6) > (In reply to Andy Furniss from comment #5) > > I don't know about Elemental as it's far harder to trigger, but first try > > with valley produced - > > > > [ 156.617954] radeon 0000:01:00.0: GPU fault detected: 146 0x02e83504 > > [ 156.617960] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR > > 0x00010F17 > > [ 156.617961] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS > > 0x08035004 > > [ 156.617963] VM fault (0x04, vmid 4) at page 69399, read from VGT (53) > > Sounds like a different problem triggered by the same patchset to me. > > But first things first, is the original issue with the list corruption > fixed? If yes we can start to look into this one as well. It's OK so far, but then I need more time as I don't really know how to trigger it and last time I called it as OK (in another bug) it wasn't. (In reply to Andy Furniss from comment #7) > (In reply to Christian König from comment #6) > > (In reply to Andy Furniss from comment #5) > > > I don't know about Elemental as it's far harder to trigger, but first try > > > with valley produced - > > > > > > [ 156.617954] radeon 0000:01:00.0: GPU fault detected: 146 0x02e83504 > > > [ 156.617960] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR > > > 0x00010F17 > > > [ 156.617961] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS > > > 0x08035004 > > > [ 156.617963] VM fault (0x04, vmid 4) at page 69399, read from VGT (53) > > > > Sounds like a different problem triggered by the same patchset to me. > > > > But first things first, is the original issue with the list corruption > > fixed? If yes we can start to look into this one as well. > > It's OK so far, but then I need more time as I don't really know how to > trigger it and last time I called it as OK (in another bug) it wasn't. Still haven't crashed Elemental but have got - [29066.333908] [drm:radeon_gem_va_update_vm] *ERROR* Couldn't update BO_VA (-512) [29066.335653] [drm:radeon_gem_va_update_vm] *ERROR* Couldn't update BO_VA (-512) (In reply to Christian König from comment #6) > But first things first, is the original issue with the list corruption > fixed? If yes we can start to look into this one as well. Enough time has passed now, so I do think that the patch fixed the list corruption. I found same issues here. [ 1384.901951] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512) [ 1453.198866] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512) [ 2215.773607] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512) [ 2351.238014] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512) [ 3877.903397] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512) Self compiled kernel from Linus git. 3.19-rc2+ right now. (In reply to Lorenzo Bona from comment #10) > > [ 1384.901951] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update > BO_VA (-512) Christian, any ideas for these? Various people including myself are still hitting them occasionally. Created attachment 111961 [details] [review] Fix for printing the error message (In reply to Michel Dänzer from comment #11) > (In reply to Lorenzo Bona from comment #10) > > > > [ 1384.901951] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update > > BO_VA (-512) > > Christian, any ideas for these? Various people including myself are still > hitting them occasionally. Ups, yeah trivial to fix. Should have been closed some time ago *** Bug 88211 has been marked as a duplicate of this bug. *** Let's close this. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 108075 [details] dmesg when Unreal Elemental hangs on start R9270X Sometime running unreal elemental demo it hangs at startup with errors in dmesg attached. This doesn't always happen. Mesa is currently on winsys/radeon: Use a single buffer cache manager again, previously produced with slightly older. Haven't seen on drm-next-3.18-wip (but really need to test more with current mesa) Possibly unrelated, but new for drm-next-3.19-wip I get below when running Unigine Valley - it runs OK. Oct 17 11:15:35 ph4 kernel: radeon 0000:01:00.0: GPU fault detected: 146 0x0af03504 Oct 17 11:15:35 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00010E57 Oct 17 11:15:35 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x10035004 Oct 17 11:15:35 ph4 kernel: VM fault (0x04, vmid 8) at page 69207, read from VGT (53)