Created attachment 135997 [details] Dmesg [ 9087.801615] WARNING: CPU: 6 PID: 1002 at dm_suspend+0x49/0x50 [ 9087.801617] Modules linked in: [ 9087.801620] CPU: 6 PID: 1002 Comm: kworker/6:0 Tainted: G W 4.15.0-rc2-agd5f+ #300 [ 9087.801621] Hardware name: Alienware Alienware 15 R2/0H6J09, BIOS 1.3.12 07/28/2017 [ 9087.801622] Workqueue: pm pm_runtime_work [ 9087.801623] task: 00000000fc3d3872 task.stack: 00000000a1a771ba [ 9087.801624] RIP: 0010:dm_suspend+0x49/0x50 [ 9087.801625] RSP: 0018:ffffc900000fbca0 EFLAGS: 00010282 [ 9087.801626] RAX: 0000000000000000 RBX: ffff88089c9e0000 RCX: 0000000000000000 [ 9087.801627] RDX: 0000000000000001 RSI: 0000000000000282 RDI: ffff88089c9ea8f0 [ 9087.801627] RBP: 0000000000000003 R08: 00000000c0000000 R09: ffffffff824e7648 [ 9087.801628] R10: ffffea001e7c8020 R11: ffff8808c1d19480 R12: ffff88089c9e0000 [ 9087.801628] R13: ffffffff82241e98 R14: 0000000000000004 R15: ffffffff816b7670 [ 9087.801629] FS: 0000000000000000(0000) GS:ffff8808c1d80000(0000) knlGS:0000000000000000 [ 9087.801630] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9087.801631] CR2: 00007f43c544dbb8 CR3: 000000000240a004 CR4: 00000000001606e0 [ 9087.801631] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 9087.801632] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 9087.801632] Call Trace: [ 9087.801636] amdgpu_suspend+0x61/0x170 [ 9087.801637] amdgpu_device_suspend+0x195/0x390 [ 9087.801639] ? vga_switcheroo_runtime_resume+0x50/0x50 [ 9087.801640] amdgpu_pmops_runtime_suspend+0x4d/0xc0 [ 9087.801642] pci_pm_runtime_suspend+0x4d/0x120 [ 9087.801644] vga_switcheroo_runtime_suspend+0x19/0x90 [ 9087.801645] __rpm_callback+0xb5/0x1e0 [ 9087.801647] ? vga_switcheroo_runtime_resume+0x50/0x50 [ 9087.801648] rpm_callback+0x1a/0x70 [ 9087.801649] ? vga_switcheroo_runtime_resume+0x50/0x50 [ 9087.801650] rpm_suspend+0x124/0x650 [ 9087.801652] pm_runtime_work+0x58/0x80 [ 9087.801653] process_one_work+0x1d5/0x3d0 [ 9087.801655] worker_thread+0x42/0x3e0 [ 9087.801656] kthread+0xf0/0x130 [ 9087.801657] ? cancel_delayed_work+0x10/0x10 [ 9087.801658] ? kthread_create_worker_on_cpu+0x70/0x70 [ 9087.801660] ret_from_fork+0x1f/0x30 [ 9087.801661] Code: a9 00 00 00 75 25 48 8b 7b 08 e8 93 4f f0 ff 48 8b bb 00 92 00 00 be 08 00 00 00 48 89 83 68 a9 00 00 e8 bb 4f 04 00 31 c0 5b c3 <0f> ff eb d7 0f 1f 00 53 48 89 fb 48 8b bf 20 92 00 00 e8 20 8d [ 9087.801675] ---[ end trace 8e3cd942fb9ca189 ]--- [ 9087.996944] amdgpu 0000:01:00.0: GPU pci config reset I'm seeing these a lot, also in Linus's tree Will try and bisect later though I have a funny feeling it might be DC related
We do a WARN_ON(adev->dm.cached_state). We shouldn't have a cached state when doing suspend. Not sure right now why this is happening. Is this with an Intel iGPU + AMD dGPU laptop?
Yes it's Intel Skylake and AMD Tonga Just tested it with Alex's 4.16-wip branch [ 8476.275162] [drm] PCIE GART of 1024M enabled (table at 0x000000F400040000). [ 8476.472088] [drm] UVD initialized successfully. [ 8476.684149] [drm] VCE initialized successfully. [ 8481.834519] WARNING: CPU: 2 PID: 62 at dm_suspend+0x49/0x50 [ 8481.834521] Modules linked in: [ 8481.834523] CPU: 2 PID: 62 Comm: kworker/2:1 Not tainted 4.15.0-rc2-agd5f+ #303 [ 8481.834524] Hardware name: Alienware Alienware 15 R2/0H6J09, BIOS 1.3.12 07/28/2017 [ 8481.834525] Workqueue: pm pm_runtime_work [ 8481.834527] task: 000000002f4790b3 task.stack: 000000003f1bbe2b [ 8481.834528] RIP: 0010:dm_suspend+0x49/0x50 [ 8481.834529] RSP: 0018:ffffc90000273ca0 EFLAGS: 00010286 [ 8481.834530] RAX: 0000000000000000 RBX: ffff88089c9f0000 RCX: 0000000000000000 [ 8481.834530] RDX: 0000000000000001 RSI: 0000000000000282 RDI: ffff88089c9fa8f0 [ 8481.834531] RBP: 0000000000000003 R08: 00000000c0000000 R09: ffffffff824e80c8 [ 8481.834532] R10: ffffea0021940020 R11: ffff8808c1c99480 R12: ffff88089c9f0000 [ 8481.834532] R13: ffffffff82243fc0 R14: 0000000000000004 R15: ffffffff816b7630 [ 8481.834533] FS: 0000000000000000(0000) GS:ffff8808c1c80000(0000) knlGS:0000000000000000 [ 8481.834534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8481.834535] CR2: 00007fd9e466f000 CR3: 000000000240a005 CR4: 00000000001606e0 [ 8481.834535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8481.834536] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8481.834536] Call Trace: [ 8481.834540] amdgpu_suspend+0x61/0x170 [ 8481.834541] amdgpu_device_suspend+0x195/0x390 [ 8481.834543] ? vga_switcheroo_runtime_resume+0x50/0x50 [ 8481.834544] amdgpu_pmops_runtime_suspend+0x4d/0xc0 [ 8481.834547] pci_pm_runtime_suspend+0x4d/0x120 [ 8481.834548] vga_switcheroo_runtime_suspend+0x19/0x90 [ 8481.834550] __rpm_callback+0xb5/0x1e0 [ 8481.834551] ? vga_switcheroo_runtime_resume+0x50/0x50 [ 8481.834552] rpm_callback+0x1a/0x70 [ 8481.834554] ? vga_switcheroo_runtime_resume+0x50/0x50 [ 8481.834555] rpm_suspend+0x124/0x650 [ 8481.834556] pm_runtime_work+0x58/0x80 [ 8481.834558] process_one_work+0x1d5/0x3d0 [ 8481.834559] worker_thread+0x42/0x3e0 [ 8481.834560] kthread+0xf0/0x130 [ 8481.834562] ? cancel_delayed_work+0x10/0x10 [ 8481.834562] ? kthread_create_worker_on_cpu+0x70/0x70 [ 8481.834564] ret_from_fork+0x1f/0x30 [ 8481.834565] Code: a9 00 00 00 75 25 48 8b 7b 08 e8 d3 4f f0 ff 48 8b bb 00 92 00 00 be 08 00 00 00 48 89 83 68 a9 00 00 e8 bb 4f 04 00 31 c0 5b c3 <0f> ff eb d7 0f 1f 00 53 48 89 fb 48 8b bf 20 92 00 00 e8 60 8d [ 8481.834579] ---[ end trace 86b596a21b2ff6ee ]--- [ 8482.018625] amdgpu 0000:01:00.0: GPU pci config reset Is there anything else you need from me? Or debugging I could turn on?
I bisected this back to: d21becbe0225de0e2582d17d4fbc73fbd103b1f7 is the first bad commit commit d21becbe0225de0e2582d17d4fbc73fbd103b1f7 Author: Tony Cheng <tony.cheng@amd.com> Date: Wed Jul 12 11:54:10 2017 -0400 drm/amd/display: avoid disabling opp clk before hubp is blanked. Signed-off-by: Tony Cheng <tony.cheng@amd.com> Reviewed-by: Eric Yang <eric.yang2@amd.com> Acked-by: Harry Wentland <Harry.Wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> :040000 040000 61debba3cf73670d29975bc136d01862c2a54576 3d2315a1843d6276655b1550cb9f18fab47c5ce4 M drivers
Tried again and it looks a little more promising: 0a214e2fb6b0a56519b6d5efab4b21475c233ee0 is the first bad commit commit 0a214e2fb6b0a56519b6d5efab4b21475c233ee0 Author: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Date: Thu Jul 13 10:56:48 2017 -0400 drm/amd/display: Release cached atomic state in S3. Fixes memory leak. Signed-off-by: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Reviewed-by: Tony Cheng <Tony.Cheng@amd.com> Acked-by: Harry Wentland <Harry.Wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> :040000 040000 494f25ce4ad407678f88d6c85128905762c9fbfb fb36845ef2ccca7bf823c9fec4d13d0a6e71ea2b M drivers Which makes sense as that's the commit that adds the WARN_ON, I guess that takes us back to why is there a cached state
I ran into this bug when upgrading to 4.15.1. Anything I can do to help?
We have a few new patches in our staging trees relating to suspend and driver unload. Would you be able to try amd-staging-drm-next or drm-next-4.17-wip from https://cgit.freedesktop.org/~agd5f/linux/?h=drm-next-4.17-wip and see if the issue is fixed there?
I'm running agd5f's drm-next-4.17-wip branch with https://patchwork.freedesktop.org/series/38985/ applied on top along with https://github.com/FireBurn/KernelStuff/blob/master/05-remove-warn.patch removing the WARN_ON(adev->dm.cached_state); I've reverted the removal of the WARN_ON and it seems to be fixed thanks
Thanks for the update.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.