Summary: | [amdgpu] New kernel warning during shutdown | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Mike Lothian <mike> | ||||||||||||||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||||||||||||
Severity: | blocker | ||||||||||||||||||||
Priority: | lowest | CC: | ernstp, mike, pankaj.baware1 | ||||||||||||||||||
Version: | XOrg git | Keywords: | have-backtrace | ||||||||||||||||||
Hardware: | IA64 (Itanium) | ||||||||||||||||||||
OS: | NetBSD | ||||||||||||||||||||
URL: | https://bugs.freedesktop.org | ||||||||||||||||||||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=98638 | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||
Attachments: |
|
Does cherry-picking this patch over help? https://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=a951ed85abd4615e98e36b536e3b3b07b22a88ac Yes that fixes it I've been having a more and more difficult time testing stuff of late, there's been quite a few regressions and I've been carrying more and more patches amongst various branches - lets hope the next cycle will be better What's your handle on IRC? (In reply to Mike Lothian from comment #2) > Yes that fixes it > > I've been having a more and more difficult time testing stuff of late, > there's been quite a few regressions and I've been carrying more and more > patches amongst various branches - lets hope the next cycle will be better > Well, bug fixes go to -fixes and new features go to -next. If you want everything, you'd need to merge -fixes into -next. > What's your handle on IRC? agd5f Sorry I spoke too soon, the issue is still there, it's just more difficult to see as the reboot is so quick now Maybe a different issue but I've just started getting shutdown issues with agd5f drm-next-4.9-wip It seems the monitor blanks early so I don't get to see anything - just with halt it doesn't power off. On current kernel reverting 0ea8cba5ef7b783f11cb1a0b900b7c18d2ce0b6 drm/amdgpu: always apply pci shutdown callbacks (v2) Apparently fixes it, but it's not that simple. I first saw the issue on the 25th, but with the next update the branch got it went away, so I thought it was fixed. It re-appeared with more recent updates. Unfortunately it seems the my working recent kernel (26th) has the above commit - so maybe some interaction/timing issue with something else. I'm still seeing this issue on the 4.9-wip branch and that has this patch included: --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1708,11 +1708,11 @@ void amdgpu_device_fini(struct amdgpu_device *adev) DRM_INFO("amdgpu: finishing device.\n"); adev->shutdown = true; + drm_crtc_force_disable_all(adev->ddev); /* evict vram memory */ amdgpu_bo_evict_vram(adev); amdgpu_ib_pool_fini(adev); amdgpu_fence_driver_fini(adev); - drm_crtc_force_disable_all(adev->ddev); amdgpu_fbdev_fini(adev); r = amdgpu_fini(adev); kfree(adev->ip_block_status); Created attachment 127331 [details]
Updated screenshot
OK I followed the advice you gave in the other bug about compiling amdgpu as a module and got the following dmesg using modprobe -r amdgpu && dmesg > dmesg && sync Created attachment 127340 [details]
Dmesg
After I issue the modprobe -r amdgpu command the system entirely freezes up I took a screenshot of the final messages - could this be TTM related? Created attachment 127341 [details]
Updated screenshot
This captures the BUG that freezes up the system
Created attachment 127565 [details]
New Screenshot
The first stack trace in the dmesg is the same, the one captured after the system freezes up is slightly different
Created attachment 127566 [details]
Updated dmesg
I've tested this again with the latest drm-next-4.10-wip branch and I still get the same errors Created attachment 128355 [details] [review] possible fix Does this patch help? It helps the original issue where a saw a panic / stack trace on shutdown and shutdown took a while - so that's great news I've retested compiling amdgpu as a module and modprobe -r(ing) it - this still kills my machine, would you be interested in me taking more diagnostics? Or can that now be considered a separate bug? (In reply to Mike Lothian from comment #16) > It helps the original issue where a saw a panic / stack trace on shutdown > and shutdown took a while - so that's great news > > I've retested compiling amdgpu as a module and modprobe -r(ing) it - this > still kills my machine, would you be interested in me taking more > diagnostics? Or can that now be considered a separate bug? Separate bug. With this patch, the two code paths (module unload and shutdown are now separate). *** Bug 98638 has been marked as a duplicate of this bug. *** Created attachment 128372 [details] [review] alternative patch Does this patch also work? So I removed your previous patch and applied the new one, I get a panic in shutdown again |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 126886 [details] Screenshot I might have spoke too soon with the memory manager patches, I'm seeing a stack trace just as the machine is just about to switch off. Also it takes about 30 seconds to switch off my laptop now, I think it's amdgpu related, it seems to wait then fire up the card then switch off - it could also be hard disk or even systemd related though. I'm attaching the screen shot but it looks like an issue with ttm_bo_force_list_clean Sorry about the bad quality but I had to record a video in slowmo to capture it, then screenshot that