Summary: | [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | mlen <mlen> | ||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||
Severity: | normal | ||||||||
Priority: | medium | CC: | andrey.grodzovsky, harry.wentland | ||||||
Version: | XOrg git | ||||||||
Hardware: | x86-64 (AMD64) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
mlen
2018-01-29 06:57:03 UTC
This patch https://patchwork.freedesktop.org/patch/198719/ should fix it, but there could be some other issues as well. amd-staging-drm-next has fixes for a whole bunch of driver unload issues, including what you're seeing. It's hosted at https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next Can you try the patch and/or amd-staging-drm-next? I'll try running amd-staging-drm-next later today I tested amd-staging-drm-next with HEAD at f1367d12f5fabb04789c7772594887434c8d9e8b. This time the unbind succeeded, but there are still some errors logged and kernel reports locking problem in amdgpu: [ 77.098923] [drm] amdgpu: finishing device. [ 77.458614] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) [ 77.481247] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready [ 77.653815] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) [ 77.845085] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) [ 77.855055] IPv6: ADDRCONF(NETDEV_CHANGE): virbr10: link becomes ready [ 78.036695] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) [ 78.233244] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) [ 78.425058] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) [ 78.616635] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) [ 78.808323] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) [ 78.810659] amdgpu 0000:03:00.0: 00000000a667dd57 unpin not necessary [ 78.810672] amdgpu 0000:03:00.0: 00000000a7594a2b unpin not necessary [ 78.811733] ===================================== [ 78.813109] WARNING: bad unlock balance detected! [ 78.813947] 4.15.0-rc4+ #2 Not tainted [ 78.814835] ------------------------------------- [ 78.815731] openrc-run.sh/3931 is trying to release lock (&(&mgr->lock)->rlock) at: [ 78.816646] [<000000006fd39549>] amdgpu_gtt_mgr_fini+0x22/0x37 [ 78.817531] but there are no more locks to release! [ 78.818446] other info that might help us debug this: [ 78.820208] 5 locks held by openrc-run.sh/3931: [ 78.821127] #0: (sb_writers#6){....}, at: [<00000000322e5044>] vfs_write+0x87/0xe2 [ 78.822051] #1: (&of->mutex){....}, at: [<00000000660270c4>] kernfs_fop_write+0xca/0x156 [ 78.823007] #2: (kn->count#211){....}, at: [<000000000634dafb>] kernfs_fop_write+0xd2/0x156 [ 78.823936] #3: (&dev->mutex){....}, at: [<00000000c386f49f>] unbind_store+0x58/0x90 [ 78.824912] #4: (&dev->mutex){....}, at: [<00000000eefcc37f>] device_release_driver_internal+0x2f/0x1f3 [ 78.825861] stack backtrace: [ 78.827764] CPU: 7 PID: 3931 Comm: openrc-run.sh Not tainted 4.15.0-rc4+ #2 [ 78.828747] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017 [ 78.829718] Call Trace: [ 78.830717] dump_stack+0x67/0x8e [ 78.831689] ? amdgpu_gtt_mgr_fini+0x22/0x37 [ 78.832687] print_unlock_imbalance_bug+0xcc/0xd3 [ 78.833657] lock_release+0x134/0x267 [ 78.834646] ? _raw_spin_unlock+0x2e/0x40 [ 78.835605] _raw_spin_unlock+0x1c/0x40 [ 78.836586] amdgpu_gtt_mgr_fini+0x22/0x37 [ 78.837549] ttm_bo_clean_mm+0x79/0xab [ 78.838544] amdgpu_ttm_fini+0x75/0x11c [ 78.839507] amdgpu_bo_fini+0xe/0x2d [ 78.840495] gmc_v8_0_sw_fini+0x2e/0x49 [ 78.841454] amdgpu_device_ip_fini+0x21f/0x2d3 [ 78.842439] amdgpu_device_fini+0x4c/0x125 [ 78.843394] amdgpu_driver_unload_kms+0x63/0x76 [ 78.844373] drm_dev_unregister+0x49/0xc3 [ 78.845318] amdgpu_pci_remove+0x19/0x37 [ 78.846244] pci_device_remove+0x36/0x86 [ 78.847190] device_release_driver_internal+0x122/0x1f3 [ 78.848120] unbind_store+0x60/0x90 [ 78.849069] kernfs_fop_write+0x10e/0x156 [ 78.849997] __vfs_write+0x31/0xcc [ 78.850937] ? preempt_count_sub+0x8b/0x94 [ 78.851871] ? __sb_start_write+0xc0/0x180 [ 78.852828] vfs_write+0xa5/0xe2 [ 78.853755] SyS_write+0x5f/0xa3 [ 78.854708] do_syscall_64+0x6c/0x7b [ 78.855630] entry_SYSCALL64_slow_path+0x25/0x25 [ 78.856583] RIP: 0033:0x7fc5804b6408 [ 78.857511] RSP: 002b:00007ffd95228060 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 78.858484] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc5804b6408 [ 78.859438] RDX: 000000000000000d RSI: 000055b229e9e890 RDI: 0000000000000001 [ 78.860419] RBP: 000055b229e9e890 R08: 000000000000000a R09: 000055b229ea45f0 [ 78.861379] R10: 000000000000009b R11: 0000000000000246 R12: 000000000000000d [ 78.862364] R13: 0000000000000001 R14: 00007fc580783740 R15: 000000000000000d [ 78.863411] [drm] amdgpu: ttm finalized Good to hear DC issues are gone. Not sure about the unlock balance myself. Hi, can you please provide full dmesg for the unbind sequence ? So the issue happens when you unbind the card to from the host driver and before binding it to VFIO ? Andrey Currently I don't have access to that machine, I should be able to test next week. I'll extend the script to be more verbose and writing some delimiters to /dev/kmsg On Carrizo unbinding the driver will indeed show [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD), I don't have LOCKDEP related kflags enabled in kernel for some reason so that probably why I don't see the locking imbalance warning, will rebuild and check. What card model are you using ? Andrey I'm using two RX 480 cards, but I'm rebinding only one of them Created attachment 137652 [details]
DAL warnings during driver unbind
Harry, with DAL enabled I observe warnings and unbind can't complete -
to reproduce under root do
cd /sys/bus/pci/drivers/amdgpu/
echo 'pci_id' > unbind where pci_id is seen when doing ls -l on this folder and seeing soft link like this
0000:2a:00.0 -> ../../../../devices/pci0000:00/0000:00:03.1/0000:2a:00.0/
P.S I am using CZ or Ellsmire with tip of amd-staging-drm-next. Going to disable DAL for now to debug the KCQ ring failure. P.S After enabling lockdep i still don't see any locking related warnings. Created attachment 137662 [details] [review] [PATCH] drm/amd/display: Use atomic crtc_disable for DC on shutdown Andrey, can you see if this fixes the warning for you? (In reply to Harry Wentland from comment #11) > Created attachment 137662 [details] [review] [review] > [PATCH] drm/amd/display: Use atomic crtc_disable for DC on shutdown > > Andrey, can you see if this fixes the warning for you? Get use after free now [ 82.400097 < 0.000387>] BUG: KASAN: use-after-free in amdgpu_dm_set_pflip_irq_state+0x3d/0xa0 [amdgpu] [ 82.400185 < 0.000088>] Read of size 4 at addr ffff88008f53ee94 by task bash/1178 [ 82.400302 < 0.000117>] CPU: 0 PID: 1178 Comm: bash Tainted: G W OE 4.16.0-rc1.main+ #14 [ 82.400308 < 0.000006>] Hardware name: AMD Gardenia/Gardenia, BIOS RGA1101C 07/20/2015 [ 82.400312 < 0.000004>] Call Trace: [ 82.400329 < 0.000017>] dump_stack+0x5c/0x78 [ 82.400342 < 0.000013>] print_address_description+0xd1/0x270 [ 82.400618 < 0.000276>] ? amdgpu_dm_set_pflip_irq_state+0x3d/0xa0 [amdgpu] [ 82.400627 < 0.000009>] kasan_report+0x260/0x360 [ 82.400913 < 0.000286>] amdgpu_dm_set_pflip_irq_state+0x3d/0xa0 [amdgpu] [ 82.401189 < 0.000276>] amdgpu_irq_disable_all+0x111/0x190 [amdgpu] [ 82.401452 < 0.000263>] amdgpu_device_ip_fini+0x1b7/0x610 [amdgpu] [ 82.401718 < 0.000266>] amdgpu_device_fini+0xa1/0x320 [amdgpu] [ 82.401973 < 0.000255>] amdgpu_driver_unload_kms+0x6a/0xd0 [amdgpu] [ 82.402059 < 0.000086>] drm_dev_unregister+0x79/0x180 [drm] [ 82.402315 < 0.000256>] amdgpu_pci_remove+0x2a/0x60 [amdgpu] [ 82.402331 < 0.000016>] pci_device_remove+0x5b/0x100 [ 82.402347 < 0.000016>] device_release_driver_internal+0x1da/0x300 [ 82.402363 < 0.000016>] unbind_store+0x143/0x190 [ 82.402376 < 0.000013>] ? sysfs_file_ops+0xa0/0xa0 [ 82.402386 < 0.000010>] kernfs_fop_write+0x186/0x220 [ 82.402402 < 0.000016>] __vfs_write+0xb9/0x2e0 [ 82.402412 < 0.000010>] ? locks_remove_posix+0x87/0x220 [ 82.402421 < 0.000009>] ? kernel_read+0xa0/0xa0 [ 82.402430 < 0.000009>] ? find_held_lock+0xfb/0x130 [ 82.402441 < 0.000011>] ? __lock_acquire.isra.30+0x414/0xb00 [ 82.402465 < 0.000024>] ? vfs_write+0x227/0x250 [ 82.402485 < 0.000020>] ? __sb_start_write+0xc3/0x1a0 [ 82.402492 < 0.000007>] ? vfs_write+0x227/0x250 [ 82.402506 < 0.000014>] vfs_write+0xe6/0x250 [ 82.402522 < 0.000016>] SyS_write+0xa1/0x120 [ 82.402532 < 0.000010>] ? SyS_read+0x120/0x120 [ 82.402547 < 0.000015>] ? vtime_user_exit+0xc8/0xe0 [ 82.402558 < 0.000011>] ? SyS_read+0x120/0x120 [ 82.402570 < 0.000012>] do_syscall_64+0xf0/0x270 [ 82.402588 < 0.000018>] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 82.402597 < 0.000009>] RIP: 0033:0x7fd050bc32c0 [ 82.402603 < 0.000006>] RSP: 002b:00007ffc06b7f3b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 82.402615 < 0.000012>] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fd050bc32c0 [ 82.402620 < 0.000005>] RDX: 000000000000000d RSI: 0000000001a6e408 RDI: 0000000000000001 [ 82.402626 < 0.000006>] RBP: 0000000001a6e408 R08: 00007fd050e92780 R09: 00007fd0514d9700 [ 82.402632 < 0.000006>] R10: 000000000000000c R11: 0000000000000246 R12: 000000000000000d [ 82.402637 < 0.000005>] R13: 0000000000000001 R14: 00007fd050e91620 R15: 0000000000000000 [ 82.402711 < 0.000074>] Allocated by task 1084: [ 82.402771 < 0.000060>] kasan_kmalloc+0xa6/0xd0 [ 82.402780 < 0.000009>] kmem_cache_alloc_trace+0x13a/0x270 [ 82.403079 < 0.000299>] dm_hw_init+0x898/0x1660 [amdgpu] [ 82.403338 < 0.000259>] amdgpu_device_init+0x1a97/0x2100 [amdgpu] [ 82.403596 < 0.000258>] amdgpu_driver_load_kms+0xa8/0x3a0 [amdgpu] [ 82.403673 < 0.000077>] drm_dev_register+0x1d5/0x2f0 [drm] [ 82.403931 < 0.000258>] amdgpu_pci_probe+0x1bf/0x290 [amdgpu] [ 82.403941 < 0.000010>] local_pci_probe+0x74/0xe0 [ 82.403951 < 0.000010>] pci_device_probe+0x1dc/0x2d0 [ 82.403970 < 0.000019>] driver_probe_device+0x40e/0x6b0 [ 82.403977 < 0.000007>] __driver_attach+0x11d/0x130 [ 82.403984 < 0.000007>] bus_for_each_dev+0xd8/0x140 [ 82.403990 < 0.000006>] bus_add_driver+0x31d/0x3a0 [ 82.403998 < 0.000008>] driver_register+0xc6/0x170 [ 82.404006 < 0.000008>] do_one_initcall+0x82/0x1d0 [ 82.404012 < 0.000006>] do_init_module+0xe7/0x333 [ 82.404020 < 0.000008>] load_module+0x41b3/0x4c40 [ 82.404028 < 0.000008>] SYSC_finit_module+0x14d/0x180 [ 82.404036 < 0.000008>] do_syscall_64+0xf0/0x270 [ 82.404044 < 0.000008>] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 82.404095 < 0.000051>] Freed by task 1178: [ 82.404152 < 0.000057>] __kasan_slab_free+0x124/0x170 [ 82.404159 < 0.000007>] kfree+0xd4/0x200 [ 82.404239 < 0.000080>] drm_mode_config_cleanup+0x241/0x450 [drm] [ 82.404536 < 0.000297>] amdgpu_dm_fini+0x29/0xb0 [amdgpu] [ 82.404834 < 0.000298>] dm_hw_fini+0x1e/0x30 [amdgpu] [ 82.405091 < 0.000257>] amdgpu_device_ip_fini+0x157/0x610 [amdgpu] [ 82.405349 < 0.000258>] amdgpu_device_fini+0xa1/0x320 [amdgpu] [ 82.405607 < 0.000258>] amdgpu_driver_unload_kms+0x6a/0xd0 [amdgpu] [ 82.405684 < 0.000077>] drm_dev_unregister+0x79/0x180 [drm] [ 82.405941 < 0.000257>] amdgpu_pci_remove+0x2a/0x60 [amdgpu] [ 82.405949 < 0.000008>] pci_device_remove+0x5b/0x100 [ 82.405957 < 0.000008>] device_release_driver_internal+0x1da/0x300 [ 82.405963 < 0.000006>] unbind_store+0x143/0x190 [ 82.405971 < 0.000008>] kernfs_fop_write+0x186/0x220 [ 82.405978 < 0.000007>] __vfs_write+0xb9/0x2e0 [ 82.405985 < 0.000007>] vfs_write+0xe6/0x250 [ 82.405991 < 0.000006>] SyS_write+0xa1/0x120 [ 82.405998 < 0.000007>] do_syscall_64+0xf0/0x270 [ 82.406007 < 0.000009>] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 82.406057 < 0.000050>] The buggy address belongs to the object at ffff88008f53e600 which belongs to the cache kmalloc-4096 of size 4096 [ 82.406163 < 0.000106>] The buggy address is located 2196 bytes inside of 4096-byte region [ffff88008f53e600, ffff88008f53f600) [ 82.406262 < 0.000099>] The buggy address belongs to the page: [ 82.406326 < 0.000064>] page:ffffea00023d4e00 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 [ 82.406424 < 0.000098>] flags: 0x1ffff0000008100(slab|head) [ 82.406488 < 0.000064>] raw: 01ffff0000008100 0000000000000000 0000000000000000 0000000100070007 [ 82.406571 < 0.000083>] raw: dead000000000100 dead000000000200 ffff880102802600 0000000000000000 [ 82.406649 < 0.000078>] page dumped because: kasan: bad access detected [ 82.406754 < 0.000105>] Memory state around the buggy address: [ 82.406816 < 0.000062>] ffff88008f53ed80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 82.406893 < 0.000077>] ffff88008f53ee00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 82.406968 < 0.000075>] >ffff88008f53ee80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 82.407036 < 0.000068>] ^ [ 82.407087 < 0.000051>] ffff88008f53ef00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 82.407157 < 0.000070>] ffff88008f53ef80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 82.407226 < 0.000069>] ================================================================== This should be fixed in drm-next-4.17-wip and amd-staging-drm-next. Can someone test if this resolves this ticket satisfactorily? Both branches can be found on https://cgit.freedesktop.org/~agd5f/linux (In reply to Harry Wentland from comment #13) > This should be fixed in drm-next-4.17-wip and amd-staging-drm-next. Can > someone test if this resolves this ticket satisfactorily? > > Both branches can be found on https://cgit.freedesktop.org/~agd5f/linux Yes, DAL issues are gone, also gone KIQ ring error on unbind. Still there is a KIQ ring error on rebind which I am investigating now + there is SDMA ring IB test failure which I will get to after KIQ is resloved. Andrey (In reply to mlen from comment #3) > I tested amd-staging-drm-next with HEAD at > f1367d12f5fabb04789c7772594887434c8d9e8b. This time the unbind succeeded, > but there are still some errors logged and kernel reports locking problem in > amdgpu: > > [ 77.098923] [drm] amdgpu: finishing device. > [ 77.458614] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed > (scratch(0xC040)=0x00000000) > [ 77.481247] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready > [ 77.653815] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed > (scratch(0xC040)=0x00000000) > [ 77.845085] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed > (scratch(0xC040)=0x00000000) > [ 77.855055] IPv6: ADDRCONF(NETDEV_CHANGE): virbr10: link becomes ready > [ 78.036695] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed > (scratch(0xC040)=0x00000000) > [ 78.233244] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed > (scratch(0xC040)=0x00000000) > [ 78.425058] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed > (scratch(0xC040)=0x00000000) > [ 78.616635] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed > (scratch(0xC040)=0x00000000) > [ 78.808323] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed > (scratch(0xC040)=0x00000000) Can you retest with latest https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.17-wip to see if KCQ related errors are gone ? I'll test it on saturday, I don't have access to that machine at the moment No issues on drm-next-4.17-wip, unbinding works without any lockdep issues. Do you plan to backport these changes to 4.16 or maybe even 4.15? For the record, suspend fails on drm-next-4.17-wip. This could be possibly related, but I don't know how to debug this and most likely this is out of scope of this bug. [ 96.222095] PM: suspend entry (deep) [ 96.222099] PM: Syncing filesystems ... done. [ 96.230020] INFO: trying to register non-static key. [ 96.230024] the code is fine but needs lockdep annotation. [ 96.230026] turning off the locking correctness validator. [ 96.230029] CPU: 39 PID: 4506 Comm: pm-suspend Not tainted 4.16.0-rc7+ #2 [ 96.230031] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017 [ 96.230033] Call Trace: [ 96.230040] dump_stack+0x46/0x59 [ 96.230046] register_lock_class+0x192/0x361 [ 96.230050] ? cycles_2_ns+0x55/0x75 [ 96.230054] __lock_acquire.isra.30+0x97/0x595 [ 96.230057] lock_acquire+0x105/0x12e [ 96.230060] ? devres_for_each_res+0x41/0xc4 [ 96.230064] ? __fw_entry_found+0x3b/0x3b [ 96.230068] _raw_spin_lock_irqsave+0x3d/0x74 [ 96.230071] ? devres_for_each_res+0x41/0xc4 [ 96.230073] ? kref_get+0xa/0xa [ 96.230075] ? alloc_fw_cache_entry+0x4e/0x4e [ 96.230077] devres_for_each_res+0x41/0xc4 [ 96.230096] dev_cache_fw_image+0x59/0x11d [ 96.230098] ? fw_pm_notify+0xd1/0xd1 [ 96.230102] dpm_for_each_dev+0x41/0x58 [ 96.230104] fw_pm_notify+0xac/0xd1 [ 96.230108] notifier_call_chain+0x39/0x5a [ 96.230127] __blocking_notifier_call_chain+0x4e/0x65 [ 96.230130] __pm_notifier_call_chain+0x1b/0x2f [ 96.230133] pm_suspend+0x15b/0x2c1 [ 96.230135] state_store+0x4b/0x7e [ 96.230140] kernfs_fop_write+0x114/0x15c [ 96.230145] __vfs_write+0x33/0xd7 [ 96.230148] ? __sb_start_write+0x94/0x180 [ 96.230150] ? __sb_start_write+0xc0/0x180 [ 96.230153] vfs_write+0xa5/0xe2 [ 96.230156] SyS_write+0x5f/0xa3 [ 96.230160] do_syscall_64+0x79/0x88 [ 96.230164] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 96.230166] RIP: 0033:0x7fac7a914468 [ 96.230168] RSP: 002b:00007ffea25554e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 96.230171] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fac7a914468 [ 96.230172] RDX: 0000000000000004 RSI: 0000558679eb4500 RDI: 0000000000000001 [ 96.230174] RBP: 0000558679eb4500 R08: 000000000000000a R09: 0000558679eeb6b0 [ 96.230175] R10: 00007fac7a9a5b20 R11: 0000000000000246 R12: 0000000000000004 [ 96.230177] R13: 0000000000000001 R14: 00007fac7abe1740 R15: 0000000000000004 [ 96.230183] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 96.230189] IP: devres_for_each_res+0x59/0xc4 [ 96.230191] PGD 0 P4D 0 [ 96.230196] Oops: 0000 [#1] PREEMPT SMP PTI [ 96.230199] Modules linked in: [ 96.230203] CPU: 39 PID: 4506 Comm: pm-suspend Not tainted 4.16.0-rc7+ #2 [ 96.230205] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017 [ 96.230208] RIP: 0010:devres_for_each_res+0x59/0xc4 [ 96.230210] RSP: 0018:ffffa63689fabc70 EFLAGS: 00010086 [ 96.230214] RAX: 0000000000000000 RBX: ffff9fd574d4ad98 RCX: ffff9fd574d4b1f0 [ 96.230216] RDX: ffff9fd576ea0918 RSI: ffff9fd574d4b1c0 RDI: ffff9fd576ea0918 [ 96.230218] RBP: ffffffffa8951b3a R08: 0000000000000000 R09: 0000000000000000 [ 96.230221] R10: ffff9fd576ea0000 R11: ffffffffab22aa07 R12: ffffffffa8951f44 [ 96.230223] R13: dead000000000100 R14: ffffffffa8951c08 R15: ffff9fd574d4b1a8 [ 96.230226] FS: 00007fac7b021740(0000) GS:ffff9fe57f400000(0000) knlGS:0000000000000000 [ 96.230229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 96.230231] CR2: 0000000000000008 CR3: 000000202a4a8002 CR4: 00000000003606e0 [ 96.230233] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 96.230236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 96.230238] Call Trace: [ 96.230243] dev_cache_fw_image+0x59/0x11d [ 96.230247] ? fw_pm_notify+0xd1/0xd1 [ 96.230250] dpm_for_each_dev+0x41/0x58 [ 96.230253] fw_pm_notify+0xac/0xd1 [ 96.230256] notifier_call_chain+0x39/0x5a [ 96.230261] __blocking_notifier_call_chain+0x4e/0x65 [ 96.230264] __pm_notifier_call_chain+0x1b/0x2f [ 96.230267] pm_suspend+0x15b/0x2c1 [ 96.230271] state_store+0x4b/0x7e [ 96.230275] kernfs_fop_write+0x114/0x15c [ 96.230279] __vfs_write+0x33/0xd7 [ 96.230284] ? __sb_start_write+0x94/0x180 [ 96.230286] ? __sb_start_write+0xc0/0x180 [ 96.230290] vfs_write+0xa5/0xe2 [ 96.230295] SyS_write+0x5f/0xa3 [ 96.230299] do_syscall_64+0x79/0x88 [ 96.230303] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 96.230305] RIP: 0033:0x7fac7a914468 [ 96.230308] RSP: 002b:00007ffea25554e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 96.230311] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fac7a914468 [ 96.230313] RDX: 0000000000000004 RSI: 0000558679eb4500 RDI: 0000000000000001 [ 96.230316] RBP: 0000558679eb4500 R08: 000000000000000a R09: 0000558679eeb6b0 [ 96.230318] R10: 00007fac7a9a5b20 R11: 0000000000000246 R12: 0000000000000004 [ 96.230320] R13: 0000000000000001 R14: 00007fac7abe1740 R15: 0000000000000004 [ 96.230326] Code: 48 83 ec 28 48 89 4c 24 18 4c 89 4c 24 20 e8 ca 7d 4b 00 48 8d 8b 58 04 00 00 48 89 44 24 08 48 8b 83 60 04 00 00 48 89 4c 24 10 <4c> 8b 68 08 48 3b 44 24 10 74 44 4c 3b 70 10 75 35 48 83 c0 28 [ 96.230390] RIP: devres_for_each_res+0x59/0xc4 RSP: ffffa63689fabc70 [ 96.230392] CR2: 0000000000000008 [ 96.230395] ---[ end trace 43f33fa700a0efa9 ]--- [ 96.233392] note: pm-suspend[4506] exited with preempt_count 1 Thanks for your bug report, and constant testing and feedback. Since the original issue is fixed in drm-next-4.17-wip I'll mark this resolved. If I'm jumping the gun here and you still notice problems feel free to reopen. The suspend/resume issue you mention seems to be something else and might be best tracked with a new bug report, possibly on kernel.org as I don't see an indication that this is graphics driver related. As for back-porting the changes, AFAIK this bug report uncovered quite a few issues which were all fixed with separate commits. I don't think all of them have been pulled into 4.16. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.