Summary: | regression: deadlock-freeze due to kernel commit aa56a292ce623734ddd30f52d73f527d1f3529b5 | ||
---|---|---|---|
Product: | DRI | Reporter: | Dq8CokMHloQZw <howaboutsynergy> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | critical | ||
Priority: | highest | CC: | albeec13, chris, howaboutsynergy, intel-gfx-bugs, leho, massimiliano.torromeo |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | Triaged, ReadyForDev | ||
i915 platform: | CFL | i915 features: | GEM/Other |
Attachments: |
Description
Dq8CokMHloQZw
2019-09-09 09:49:13 UTC
Created attachment 145306 [details]
cloned here what I've referenced: memfreeze, le9i.patch, le9h.patch
just in case gist.github goes away for whatever reason.
(In reply to howaboutsynergy from comment #0) > In kernel 5.3.0-rc8 with commit aa56a292ce623734ddd30f52d73f527d1f3529b5 > present(bisected) That would be commit aa56a292ce623734ddd30f52d73f527d1f3529b5 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jul 8 15:03:27 2019 +0100 drm/i915/userptr: Acquire the page lock around set_page_dirty() Cc: Chris. (In reply to howaboutsynergy from comment #0) > In kernel 5.3.0-rc8 with commit aa56a292ce623734ddd30f52d73f527d1f3529b5 > present(bisected), only on Intel/i915 computer (not on AMD/Radeon laptop) I > can easily reproduce a deadlock-like freeze of the entire system which > apparently lasts indefinitely. Sysrq still works but the CPU/fans go low > speed and there's no disk activity. > > Apparently this freeze happens during low memory / memory pressure > situations. > I've encountered the freeze during chromium compilation (possibly due to > jumbo build being in use which means it uses much RAM, I've 32G RAM total) > What's the platform under use? (In reply to Lakshmi from comment #3) > What's the platform under use? Intel Core i7-8700K CPU, 12 cores ASUS Prime Z370-A motherboard [ 0.373549] smpboot: CPU0: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz (family: 0x6, model: 0x9e, stepping: 0xa) microcode : 0xb4 [ 0.000000] DMI: System manufacturer System Product Name/PRIME Z370-A, BIOS 2201 05/27/2019 32G RAM, no swap(during memfreeze test, but swap in zram during chromium compilation) 64G ext4 zram of /tmp and /var/tmp /dev/zram1 63G 53M 60G 1% /tmp /dev/zram2 63G 53M 60G 1% /var/tmp $ swapon NAME TYPE SIZE USED PRIO /dev/zram0 partition 64G 0B -2 $ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-linux-git root=UUID=2b8b9ab8-7ac5-4586-aa42-d7ffb12de92a rw root_trim=yes rd.luks.allow-discards rd.luks.options=discard ipv6.disable=1 ipv6.disable_ipv6=1 ipv6.autoconf=0 loglevel=15 log_buf_len=16M ignore_loglevel printk.always_kmsg_dump=y printk.time=y printk.devkmsg=on mminit_loglevel=4 memory_corruption_check=1 fbcon=scrollback:4096k fbcon=font:ProFont6x11 net.ifnames=0 nolvm dobtrfs console=tty1 earlyprintk=vga audit=0 systemd.log_target=kmsg systemd.journald.forward_to_console=1 enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=256M panic=0 page_poison=1 psi=1 sysrq_always_enabled random.trust_cpu=off logo.nologo lpj=0 mce=bootlog reboot=force,cold noexec=on nohibernate scsi_mod.use_blk_mq=1 consoleblank=120 mitigations=off nospectre_v1 nospectre_v2 spectre_v2=off nospec_store_bypass_disable kvm-intel.vmentry_l1d_flush=never l1tf=off nopti pti=off no_stf_barrier noibrs noibpb ssbd=force-off spectre_v2_user=off noretpoline mds=off rd.log=all noefi cpuidle.governor=menu zram.num_devices=3 zswap.enabled=0 zswap.same_filled_pages_enabled=1 zswap.compressor=zstd zswap.max_pool_percent=40 zswap.zpool=z3fold vsyscall=none i915.enable_fbc=1 i915.verbose_state_checks=1 i915.enable_hangcheck=1 i915.error_capture=1 I've just discovered why reading the kdump with `crash` didn't work, because crashkernel=256M was too low and it would OOM during `makedumpfile` so the dump would be incomplete/garbage(tested this only on the AMD/Radeon laptop which would do this with crashkernel=128M), so I will thus increase that to 512M on the Intel computer and have something next... ok here's what I got, the last part of the dmesg: ``` [ 299.780876] sysrq: Emergency Sync [ 299.802758] Emergency Sync complete [ 300.777865] sudo[2070]: user : TTY=pts/1 ; PWD=/home/user ; USER=root ; COMMAND=/usr/bin/swapoff -a [ 300.778020] sudo[2070]: pam_unix(sudo:session): session opened for user root by (uid=0) [ 300.778907] sudo[2070]: pam_unix(sudo:session): session closed for user root [ 300.842945] sudo[2230]: user : TTY=pts/1 ; PWD=/home/user ; USER=root ; COMMAND=/usr/bin/sysctl -w vm.oom_dump_tasks=0 [ 300.843213] sudo[2230]: pam_unix(sudo:session): session opened for user root by (uid=0) [ 300.843947] sudo[2230]: pam_unix(sudo:session): session closed for user root [ 302.213705] stress invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 [ 302.213708] CPU: 7 PID: 2237 Comm: stress Kdump: loaded Tainted: G U 5.3.0-rc8-gf74c2bb98776 #55 [ 302.213709] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2201 05/27/2019 [ 302.213709] Call Trace: [ 302.213713] dump_stack+0x46/0x60 [ 302.213715] dump_header+0x45/0x17a [ 302.213716] oom_kill_process.cold+0xb/0x10 [ 302.213718] out_of_memory+0x1aa/0x450 [ 302.213719] __alloc_pages_slowpath+0x9d9/0xc10 [ 302.213721] __alloc_pages_nodemask+0x268/0x2b0 [ 302.213722] alloc_pages_vma+0xc1/0x160 [ 302.213724] __handle_mm_fault+0xe13/0x12f0 [ 302.213725] handle_mm_fault+0xa9/0x1d0 [ 302.213726] __do_page_fault+0x23a/0x480 [ 302.213728] do_page_fault+0x1a/0x64 [ 302.213729] page_fault+0x39/0x40 [ 302.213731] RIP: 0033:0x5ea9a660ec10 [ 302.213732] Code: c0 0f 84 53 02 00 00 8b 54 24 0c 31 c0 85 d2 0f 94 c0 89 04 24 41 83 fd 02 0f 8f fa 00 00 00 31 c0 4d 85 ff 7e 10 0f 1f 40 00 <c6> 04 03 5a 4c 01 f0 49 39 c7 7f f4 4d 85 e4 0f 84 f4 01 00 00 7e [ 302.213733] RSP: 002b:00007ffc1d98c090 EFLAGS: 00010206 [ 302.213733] RAX: 0000000375063000 RBX: 000078ba8d6cf010 RCX: 000078c1719b76fb [ 302.213734] RDX: 0000000000000001 RSI: 00000006e41c3000 RDI: 000078ba8d6cf000 [ 302.213735] RBP: 00005ea9a660fa54 R08: 000078ba8d6cf010 R09: 0000000000000000 [ 302.213735] R10: 0000000000000022 R11: 00000006e41c2000 R12: ffffffffffffffff [ 302.213736] R13: 0000000000000002 R14: 0000000000001000 R15: 00000006e41c2000 [ 302.213736] Mem-Info: [ 302.213739] active_anon:7334737 inactive_anon:2479 isolated_anon:276 active_file:18211 inactive_file:14386 isolated_file:71 unevictable:94204 dirty:16 writeback:0 unstable:0 slab_reclaimable:5767 slab_unreclaimable:15583 mapped:28927 shmem:97461 pagetables:15786 bounce:0 free:85636 free_pcp:1127 free_cma:0 [ 302.213740] Node 0 active_anon:29338948kB inactive_anon:9916kB active_file:72844kB inactive_file:57544kB unevictable:376816kB isolated(anon):1104kB isolated(file):284kB mapped:115708kB dirty:64kB writeback:0kB shmem:389844kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 28256256kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes [ 302.213741] Node 0 DMA free:15892kB min:104kB low:128kB high:152kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15980kB managed:15892kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 302.213742] lowmem_reserve[]: 0 1578 30909 30909 [ 302.213744] Node 0 DMA32 free:127904kB min:10800kB low:13500kB high:16200kB active_anon:1500668kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2744176kB managed:1630064kB mlocked:0kB kernel_stack:0kB pagetables:1232kB bounce:0kB free_pcp:248kB local_pcp:248kB free_cma:0kB [ 302.213745] lowmem_reserve[]: 0 0 29331 29331 [ 302.213746] Node 0 Normal free:198748kB min:199088kB low:248860kB high:298632kB active_anon:27837920kB inactive_anon:10104kB active_file:72956kB inactive_file:57616kB unevictable:376416kB writepending:64kB present:30654464kB managed:30039288kB mlocked:32kB kernel_stack:5696kB pagetables:61912kB bounce:0kB free_pcp:4260kB local_pcp:388kB free_cma:0kB [ 302.213747] lowmem_reserve[]: 0 0 0 0 [ 302.213748] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB [ 302.213751] Node 0 DMA32: 6*4kB (UM) 5*8kB (U) 4*16kB (UM) 7*32kB (UM) 5*64kB (UM) 2*128kB (U) 4*256kB (UM) 4*512kB (UM) 3*1024kB (UM) 1*2048kB (M) 29*4096kB (M) = 127904kB [ 302.213754] Node 0 Normal: 2322*4kB (UME) 1861*8kB (UE) 776*16kB (UME) 350*32kB (UME) 272*64kB (UE) 162*128kB (UME) 27*256kB (UE) 93*512kB (UM) 57*1024kB (UME) 0*2048kB 0*4096kB = 198832kB [ 302.213757] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 302.213758] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 302.213758] 130209 total pagecache pages [ 302.213759] 0 pages in swap cache [ 302.213759] Swap cache stats: add 0, delete 0, find 0/0 [ 302.213760] Free swap = 0kB [ 302.213760] Total swap = 0kB [ 302.213761] 8353655 pages RAM [ 302.213761] 0 pages HighMem/MovableOnly [ 302.213761] 432344 pages reserved [ 302.213762] 0 pages cma reserved [ 302.213762] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-1.scope,task=stress,pid=2238,uid=1000 [ 302.213767] Out of memory: Killed process 2238 (stress) total-vm:28906972kB, anon-rss:14587072kB, file-rss:276kB, shmem-rss:0kB [ 302.245057] oom_reaper: reaped process 2238 (stress), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 310.858232] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.858335] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.858408] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.858477] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.858624] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.858692] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.858759] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.858825] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.858890] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.858955] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859020] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859085] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859150] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859248] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859321] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859394] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859461] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859527] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859593] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859660] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859726] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859792] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859858] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859924] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.859990] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860056] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860122] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860195] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860273] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860340] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860406] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860472] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860538] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860604] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860669] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860735] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860802] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860868] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.860934] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 310.861000] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. [ 316.311671] gpg-agent[1077]: handler 0x7da757fff700 for fd 10 started [ 316.402397] gpg-agent[1077]: handler 0x7da757fff700 for fd 10 terminated [ 326.028889] sysrq: Emergency Sync [ 326.031866] Emergency Sync complete [ 327.359884] sysrq: Emergency Remount R/O [ 327.359938] Emergency Remount complete [ 328.452885] sysrq: Emergency Sync [ 328.454829] Emergency Sync complete [ 329.821886] sysrq: Trigger a crash [ 329.821892] Kernel panic - not syncing: sysrq triggered crash [ 329.821899] CPU: 8 PID: 0 Comm: swapper/8 Kdump: loaded Tainted: G U 5.3.0-rc8-gf74c2bb98776 #55 [ 329.821902] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2201 05/27/2019 [ 329.821905] Call Trace: [ 329.821910] <IRQ> [ 329.821919] dump_stack+0x46/0x60 [ 329.821928] panic+0xf6/0x2e1 [ 329.821935] ? printk+0x53/0x6a [ 329.821944] sysrq_handle_crash+0xc/0x10 [ 329.821951] __handle_sysrq.cold+0x43/0xfd [ 329.821957] sysrq_filter+0x2ea/0x3b0 [ 329.821965] input_to_handler+0x4a/0xe0 [ 329.821973] input_pass_values.part.0+0x109/0x130 [ 329.821978] input_handle_event+0x92/0x560 [ 329.821982] input_event+0x4d/0x70 [ 329.821987] hidinput_report_event+0x2e/0x40 [ 329.821991] hid_report_raw_event+0x25a/0x420 [ 329.821996] hid_input_report+0xf8/0x150 [ 329.822002] hid_irq_in+0x169/0x1a0 [ 329.822007] __usb_hcd_giveback_urb+0x6b/0xd0 [ 329.822024] xhci_giveback_urb_in_irq.isra.0+0x62/0x90 [xhci_hcd] [ 329.822041] xhci_td_cleanup+0xf7/0x140 [xhci_hcd] [ 329.822055] xhci_irq+0xe6b/0x1a90 [xhci_hcd] [ 329.822063] __handle_irq_event_percpu+0x29/0xc0 [ 329.822070] handle_irq_event_percpu+0x2c/0x80 [ 329.822076] handle_irq_event+0x23/0x43 [ 329.822082] handle_edge_irq+0x72/0x180 [ 329.822088] handle_irq+0x11/0x20 [ 329.822096] do_IRQ+0x3e/0xc0 [ 329.822101] common_interrupt+0xf/0xf [ 329.822104] </IRQ> [ 329.822109] RIP: 0010:cpuidle_enter_state+0x108/0x280 [ 329.822115] Code: ff 56 48 41 89 c5 e8 e7 66 a1 ff 65 8b 3d e0 ee 95 48 e8 9b 65 a1 ff 49 89 c6 31 ff e8 a1 75 a1 ff 45 84 ff 75 1d fb 45 85 ed <79> 2b 41 c7 44 24 10 00 00 00 00 44 89 e8 5b 5d 41 5c 41 5d 41 5e [ 329.822119] RSP: 0018:ffffbd3e000fbe80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffde [ 329.822124] RAX: ffff9dd62da20a40 RBX: ffffffffb8062040 RCX: 000000000000001f [ 329.822127] RDX: 0000000000000000 RSI: 0000000022a1ce40 RDI: 0000000000000000 [ 329.822130] RBP: 0000004cc9264b0a R08: 0000004ccaea38d8 R09: 000000007fffffff [ 329.822133] R10: ffff9dd62da1fb84 R11: ffff9dd62da1fb64 R12: ffff9dd62da29700 [ 329.822135] R13: 0000000000000008 R14: 0000004ccaea38d8 R15: 0000000000000000 [ 329.822142] cpuidle_enter+0x24/0x40 [ 329.822147] do_idle+0x1c1/0x230 [ 329.822151] cpu_startup_entry+0x14/0x20 [ 329.822158] start_secondary+0x141/0x190 [ 329.822163] secondary_startup_64+0xa4/0xb0 ``` [ 310.860735] userptr_mn_invalidate_range_start+0x0/0x210 [i915] callback failed with -4 in blockable context. mm/mmu_notifier.c:179: pr_info("%pS callback failed with %d in %sblockable context.\n", context: ```c int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) { struct mmu_notifier *mn; int ret = 0; int id; id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) { int _ret = mn->ops->invalidate_range_start(mn, range); if (_ret) { pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !mmu_notifier_range_blockable(range) ? "non-" : ""); ret = _ret; } } } srcu_read_unlock(&srcu, id); return ret; } EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start); ``` the `-4` is: #define EINTR 4 /* Interrupted system call */ context: ```c static int userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, const struct mmu_notifier_range *range) { struct i915_mmu_notifier *mn = container_of(_mn, struct i915_mmu_notifier, mn); struct interval_tree_node *it; struct mutex *unlock = NULL; unsigned long end; int ret = 0; if (RB_EMPTY_ROOT(&mn->objects.rb_root)) return 0; /* interval ranges are inclusive, but invalidate range is exclusive */ end = range->end - 1; spin_lock(&mn->lock); it = interval_tree_iter_first(&mn->objects, range->start, end); while (it) { struct drm_i915_gem_object *obj; if (!mmu_notifier_range_blockable(range)) { ret = -EAGAIN; break; } /* * The mmu_object is released late when destroying the * GEM object so it is entirely possible to gain a * reference on an object in the process of being freed * since our serialisation is via the spinlock and not * the struct_mutex - and consequently use it after it * is freed and then double free it. To prevent that * use-after-free we only acquire a reference on the * object if it is not in the process of being destroyed. */ obj = container_of(it, struct i915_mmu_object, it)->obj; if (!kref_get_unless_zero(&obj->base.refcount)) { it = interval_tree_iter_next(it, range->start, end); continue; } spin_unlock(&mn->lock); if (!unlock) { unlock = &mn->mm->i915->drm.struct_mutex; switch (mutex_trylock_recursive(unlock)) { default: case MUTEX_TRYLOCK_FAILED: if (mutex_lock_killable_nested(unlock, I915_MM_SHRINKER)) { i915_gem_object_put(obj); return -EINTR; } /* fall through */ case MUTEX_TRYLOCK_SUCCESS: break; case MUTEX_TRYLOCK_RECURSIVE: unlock = ERR_PTR(-EEXIST); break; } } ret = i915_gem_object_unbind(obj); if (ret == 0) ret = __i915_gem_object_put_pages(obj, I915_MM_SHRINKER); i915_gem_object_put(obj); if (ret) goto unlock; spin_lock(&mn->lock); /* * As we do not (yet) protect the mmu from concurrent insertion * over this range, there is no guarantee that this search will * terminate given a pathologic workload. */ it = interval_tree_iter_first(&mn->objects, range->start, end); } spin_unlock(&mn->lock); unlock: if (!IS_ERR_OR_NULL(unlock)) mutex_unlock(unlock); return ret; } ``` drivers/gpu/drm/i915/gem/i915_gem_userptr.c Setting the priority of the bug as highest considering it's regression. Created attachment 145318 [details]
semi-full log of stacktraces at the point of `return -EINTR`
I got stacktraces from this:
```c
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 2caa594322bc..5a9dd14d3bea 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -138,6 +138,7 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
case MUTEX_TRYLOCK_FAILED:
if (mutex_lock_killable_nested(unlock, I915_MM_SHRINKER)) {
i915_gem_object_put(obj);
+ WARN_ON(1);
return -EINTR;
}
/* fall through */
```
They look like this:
```
[ 154.134495] ------------[ cut here ]------------
[ 154.134498] WARN_ON(1)
[ 154.134642] WARNING: CPU: 7 PID: 1931 at drivers/gpu/drm/i915/gem/i915_gem_userptr.c:141 userptr_mn_invalidate_range_start+0x176/0x220 [i915]
[ 154.134684] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic xt_TCPMSS iptable_mangle iptable_security iptable_nat nf_nat iptable_raw nf_log_ipv4 nf_log_common xt_conntrack xt_LOG xt_connlimit nf_conncount nf_conntrack nf_defrag_ipv4 xt_hashlimit xt_multiport xt_owner xt_addrtype intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit snd_hda_intel ghash_clmulni_intel drm_kms_helper snd_hda_codec syscopyarea snd_hwdep sysfillrect sysimgblt snd_hda_core fb_sys_fops snd_pcm iTCO_wdt intel_cstate drm snd_timer iTCO_vendor_support intel_uncore snd intel_rapl_perf e1000e drm_panel_orientation_quirks soundcore bfq i2c_i801 pcspkr xhci_pci xhci_hcd
[ 154.134720] CPU: 7 PID: 1931 Comm: stress Kdump: loaded Tainted: G U W 5.3.0-rc8-gf74c2bb98776 #56
[ 154.134723] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2201 05/27/2019
[ 154.134804] RIP: 0010:userptr_mn_invalidate_range_start+0x176/0x220 [i915]
[ 154.134809] Code: ff ff ff 48 89 ef e8 a9 37 cc cb 84 c0 74 08 48 89 ef e8 1d db f1 ff 48 c7 c6 81 00 89 c0 48 c7 c7 7e 00 89 c0 e8 63 ae 8e cb <0f> 0b 41 bf fc ff ff ff e9 97 fe ff ff be 01 00 00 00 48 89 ef e8
[ 154.134816] RSP: 0018:ffffaf0607e03768 EFLAGS: 00010286
[ 154.134820] RAX: 0000000000000000 RBX: ffffaf0607e03820 RCX: 0000000000000000
[ 154.134824] RDX: 000000000000000a RSI: ffffffff8d56c9ca RDI: ffffffff8d56d9ca
[ 154.134828] RBP: ffff94de3c88f900 R08: 00000023e321caf3 R09: 000000000000000a
[ 154.134832] R10: 0000000000000000 R11: 00000000fffffffe R12: ffff94de3c418068
[ 154.134837] R13: ffff94dea9dbfd98 R14: ffff94de47e70b90 R15: 0000000000000000
[ 154.134842] FS: 0000766ecc735740(0000) GS:ffff94dead9c0000(0000) knlGS:0000000000000000
[ 154.134846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 154.134849] CR2: 00007ded1ef46740 CR3: 00000007bdffe006 CR4: 00000000003606e0
[ 154.134852] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 154.134859] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 154.134865] Call Trace:
[ 154.134877] __mmu_notifier_invalidate_range_start+0x4f/0x90
[ 154.134897] try_to_unmap_one+0x718/0x820
[ 154.134905] rmap_walk_file+0xe4/0x250
[ 154.134933] try_to_unmap+0xc1/0xf0
[ 154.134942] ? page_remove_rmap+0x2a0/0x2a0
[ 154.134952] ? page_not_mapped+0x10/0x10
[ 154.134974] ? page_get_anon_vma+0x70/0x70
[ 154.134986] migrate_pages+0x7aa/0x9a0
[ 154.134999] ? isolate_freepages_block+0x340/0x340
[ 154.135007] ? move_freelist_tail+0xd0/0xd0
[ 154.135019] compact_zone+0x656/0xa70
[ 154.135029] compact_zone_order+0xde/0x120
[ 154.135042] try_to_compact_pages+0x187/0x240
[ 154.135065] __alloc_pages_direct_compact+0x87/0x170
[ 154.135084] __alloc_pages_slowpath+0x1f8/0xc10
[ 154.135096] ? get_page_from_freelist+0xe80/0x1330
[ 154.135117] __alloc_pages_nodemask+0x268/0x2b0
[ 154.135141] alloc_pages_vma+0xc1/0x160
[ 154.135149] do_huge_pmd_anonymous_page+0x271/0x610
[ 154.135166] __handle_mm_fault+0xbfc/0x12f0
[ 154.135173] handle_mm_fault+0xa9/0x1d0
[ 154.135187] __do_page_fault+0x23a/0x480
[ 154.135199] do_page_fault+0x1a/0x64
[ 154.135221] page_fault+0x39/0x40
[ 154.135229] RIP: 0033:0x5bfeb4f67c10
[ 154.135244] Code: c0 0f 84 53 02 00 00 8b 54 24 0c 31 c0 85 d2 0f 94 c0 89 04 24 41 83 fd 02 0f 8f fa 00 00 00 31 c0 4d 85 ff 7e 10 0f 1f 40 00 <c6> 04 03 5a 4c 01 f0 49 39 c7 7f f4 4d 85 e4 0f 84 f4 01 00 00 7e
[ 154.135266] RSP: 002b:00007ffcc3719e40 EFLAGS: 00010206
[ 154.135272] RAX: 0000000374e52000 RBX: 00007667ebdae010 RCX: 0000766ecc85a6fb
[ 154.135281] RDX: 0000000000000001 RSI: 00000006e0987000 RDI: 00007667ebdae000
[ 154.135295] RBP: 00005bfeb4f68a54 R08: 00007667ebdae010 R09: 0000000000000000
[ 154.135305] R10: 0000000000000022 R11: 00000006e0986000 R12: ffffffffffffffff
[ 154.135315] R13: 0000000000000002 R14: 0000000000001000 R15: 00000006e0986000
[ 154.135331] ---[ end trace 28cc7c376d08f23b ]---
[ 154.135452] userptr_mn_invalidate_range_start+0x0/0x220 [i915] callback failed with -4 in blockable context.
```
semi-full log attached
Hope this is helpful.
Created attachment 145319 [details]
got stacktraces also at the point of `callback failed with -4 in blockable context` not just on `return -EINTR`
I just realized I could do better: get stacktraces also at this point
mm/mmu_notifier.c:179: pr_info("%pS callback failed with %d in %sblockable context.\n",
using this code:
```c
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index b5670620aea0..2ec61d700e72 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -179,6 +179,7 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
!mmu_notifier_range_blockable(range) ? "non-" : "");
+ WARN_ON(1);
ret = _ret;
}
}
```
The first try yielded no stacktraces, not sure why, but the second try was a success and it looks like this:
```
[ 261.691955] ------------[ cut here ]------------
[ 261.691961] WARN_ON(1)
[ 261.692078] WARNING: CPU: 1 PID: 2240 at drivers/gpu/drm/i915/gem/i915_gem_userptr.c:141 userptr_mn_invalidate_range_start+0x176/0x220 [i915]
[ 261.692082] Modules linked in: xt_comment xt_TCPMSS iptable_mangle iptable_security iptable_nat nf_nat iptable_raw nf_log_ipv4 nf_log_common xt_conntrack xt_LOG xt_connlimit nf_conncount nf_conntrack nf_defrag_ipv4 xt_hashlimit xt_multiport xt_owner xt_addrtype snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp i915 i2c_algo_bit crct10dif_pclmul drm_kms_helper crc32_pclmul snd_hda_intel crc32c_intel snd_hda_codec syscopyarea sysfillrect snd_hwdep sysimgblt snd_hda_core ghash_clmulni_intel fb_sys_fops drm intel_cstate iTCO_wdt snd_pcm e1000e intel_uncore iTCO_vendor_support snd_timer intel_rapl_perf bfq snd soundcore pcspkr i2c_i801 drm_panel_orientation_quirks xhci_pci xhci_hcd
[ 261.692125] CPU: 1 PID: 2240 Comm: stress Kdump: loaded Tainted: G U 5.3.0-rc8-gf74c2bb98776 #57
[ 261.692128] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2201 05/27/2019
[ 261.692203] RIP: 0010:userptr_mn_invalidate_range_start+0x176/0x220 [i915]
[ 261.692209] Code: ff ff ff 48 89 ef e8 c9 77 f5 c6 84 c0 74 08 48 89 ef e8 1d 3b e2 ff 48 c7 c6 81 c0 5f c0 48 c7 c7 7e c0 5f c0 e8 63 ee b7 c6 <0f> 0b 41 bf fc ff ff ff e9 97 fe ff ff be 01 00 00 00 48 89 ef e8
[ 261.692213] RSP: 0018:ffffafc9c8f5f768 EFLAGS: 00010286
[ 261.692217] RAX: 0000000000000000 RBX: ffffafc9c8f5f820 RCX: 0000000000000000
[ 261.692220] RDX: 000000000000000a RSI: ffffffff8856c9ca RDI: ffffffff8856d9ca
[ 261.692223] RBP: ffffa42007915500 R08: 0000003cee0e71d7 R09: 000000000000000a
[ 261.692225] R10: 0000000000000000 R11: 00000000fffffffe R12: ffffa41ffd7a8068
[ 261.692228] R13: ffffa42069d80c18 R14: ffffa42007a9b990 R15: 0000000000000000
[ 261.692232] FS: 00007f7258f5a740(0000) GS:ffffa4206d840000(0000) knlGS:0000000000000000
[ 261.692235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 261.692238] CR2: 0000737968747578 CR3: 00000007c7d1e006 CR4: 00000000003606e0
[ 261.692241] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 261.692244] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 261.692246] Call Trace:
[ 261.692261] __mmu_notifier_invalidate_range_start+0x4f/0x90
[ 261.692268] try_to_unmap_one+0x718/0x820
[ 261.692273] rmap_walk_file+0xe4/0x250
[ 261.692278] try_to_unmap+0xc1/0xf0
[ 261.692283] ? page_remove_rmap+0x2a0/0x2a0
[ 261.692287] ? page_not_mapped+0x10/0x10
[ 261.692291] ? page_get_anon_vma+0x70/0x70
[ 261.692296] migrate_pages+0x7aa/0x9a0
[ 261.692304] ? isolate_freepages_block+0x340/0x340
[ 261.692310] ? move_freelist_tail+0xd0/0xd0
[ 261.692314] compact_zone+0x656/0xa70
[ 261.692319] compact_zone_order+0xde/0x120
[ 261.692324] try_to_compact_pages+0x187/0x240
[ 261.692332] __alloc_pages_direct_compact+0x87/0x170
[ 261.692338] __alloc_pages_slowpath+0x1f8/0xc10
[ 261.692345] ? get_page_from_freelist+0xe80/0x1330
[ 261.692352] __alloc_pages_nodemask+0x268/0x2b0
[ 261.692357] alloc_pages_vma+0xc1/0x160
[ 261.692363] do_huge_pmd_anonymous_page+0x271/0x610
[ 261.692369] __handle_mm_fault+0xbfc/0x12f0
[ 261.692375] handle_mm_fault+0xa9/0x1d0
[ 261.692382] __do_page_fault+0x23a/0x480
[ 261.692388] do_page_fault+0x1a/0x64
[ 261.692395] page_fault+0x39/0x40
[ 261.692400] RIP: 0033:0x5afb2a4d7c10
[ 261.692406] Code: c0 0f 84 53 02 00 00 8b 54 24 0c 31 c0 85 d2 0f 94 c0 89 04 24 41 83 fd 02 0f 8f fa 00 00 00 31 c0 4d 85 ff 7e 10 0f 1f 40 00 <c6> 04 03 5a 4c 01 f0 49 39 c7 7f f4 4d 85 e4 0f 84 f4 01 00 00 7e
[ 261.692409] RSP: 002b:00007ffe9f9a6050 EFLAGS: 00010206
[ 261.692413] RAX: 000000036dd5a000 RBX: 00007f6b76aa6010 RCX: 00007f725907f6fb
[ 261.692415] RDX: 0000000000000001 RSI: 00000006e24b4000 RDI: 00007f6b76aa6000
[ 261.692418] RBP: 00005afb2a4d8a54 R08: 00007f6b76aa6010 R09: 0000000000000000
[ 261.692420] R10: 0000000000000022 R11: 00000006e24b3000 R12: ffffffffffffffff
[ 261.692423] R13: 0000000000000002 R14: 0000000000001000 R15: 00000006e24b3000
[ 261.692427] ---[ end trace 4cdaeb6ba05f3a75 ]---
[ 261.692502] userptr_mn_invalidate_range_start+0x0/0x220 [i915] callback failed with -4 in blockable context.
[ 261.692505] ------------[ cut here ]------------
[ 261.692518] WARNING: CPU: 1 PID: 2240 at mm/mmu_notifier.c:182 __mmu_notifier_invalidate_range_start.cold+0x33/0x43
[ 261.692520] Modules linked in: xt_comment xt_TCPMSS iptable_mangle iptable_security iptable_nat nf_nat iptable_raw nf_log_ipv4 nf_log_common xt_conntrack xt_LOG xt_connlimit nf_conncount nf_conntrack nf_defrag_ipv4 xt_hashlimit xt_multiport xt_owner xt_addrtype snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp i915 i2c_algo_bit crct10dif_pclmul drm_kms_helper crc32_pclmul snd_hda_intel crc32c_intel snd_hda_codec syscopyarea sysfillrect snd_hwdep sysimgblt snd_hda_core ghash_clmulni_intel fb_sys_fops drm intel_cstate iTCO_wdt snd_pcm e1000e intel_uncore iTCO_vendor_support snd_timer intel_rapl_perf bfq snd soundcore pcspkr i2c_i801 drm_panel_orientation_quirks xhci_pci xhci_hcd
[ 261.692554] CPU: 1 PID: 2240 Comm: stress Kdump: loaded Tainted: G U W 5.3.0-rc8-gf74c2bb98776 #57
[ 261.692557] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2201 05/27/2019
[ 261.692563] RIP: 0010:__mmu_notifier_invalidate_range_start.cold+0x33/0x43
[ 261.692567] Code: ec 9f ea 87 49 0f 44 cf 48 8b 43 10 48 8b 70 28 89 ea 48 c7 c7 78 9f e8 87 e8 55 f2 ef ff 48 c7 c7 c8 44 e6 87 e8 49 f2 ef ff <0f> 0b 41 89 ee e9 6f fc ff ff 90 90 90 90 90 90 53 48 83 ec 10 48
[ 261.692570] RSP: 0018:ffffafc9c8f5f7c0 EFLAGS: 00010246
[ 261.692574] RAX: 0000000000000024 RBX: ffffa42069d80c18 RCX: 0000000000000006
[ 261.692576] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffffa4206d8564c0
[ 261.692579] RBP: 00000000fffffffc R08: ffffafc9c8f5f675 R09: 000000000000167c
[ 261.692582] R10: ffffa4208e079cb4 R11: ffffafc9c8f5f675 R12: ffffafc9c8f5f820
[ 261.692584] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff87e89f71
[ 261.692588] FS: 00007f7258f5a740(0000) GS:ffffa4206d840000(0000) knlGS:0000000000000000
[ 261.692591] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 261.692593] CR2: 0000737968747578 CR3: 00000007c7d1e006 CR4: 00000000003606e0
[ 261.692596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 261.692599] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 261.692601] Call Trace:
[ 261.692607] try_to_unmap_one+0x718/0x820
[ 261.692612] rmap_walk_file+0xe4/0x250
[ 261.692617] try_to_unmap+0xc1/0xf0
[ 261.692621] ? page_remove_rmap+0x2a0/0x2a0
[ 261.692625] ? page_not_mapped+0x10/0x10
[ 261.692629] ? page_get_anon_vma+0x70/0x70
[ 261.692634] migrate_pages+0x7aa/0x9a0
[ 261.692640] ? isolate_freepages_block+0x340/0x340
[ 261.692646] ? move_freelist_tail+0xd0/0xd0
[ 261.692650] compact_zone+0x656/0xa70
[ 261.692655] compact_zone_order+0xde/0x120
[ 261.692660] try_to_compact_pages+0x187/0x240
[ 261.692667] __alloc_pages_direct_compact+0x87/0x170
[ 261.692673] __alloc_pages_slowpath+0x1f8/0xc10
[ 261.692680] ? get_page_from_freelist+0xe80/0x1330
[ 261.692687] __alloc_pages_nodemask+0x268/0x2b0
[ 261.692692] alloc_pages_vma+0xc1/0x160
[ 261.692698] do_huge_pmd_anonymous_page+0x271/0x610
[ 261.692703] __handle_mm_fault+0xbfc/0x12f0
[ 261.692709] handle_mm_fault+0xa9/0x1d0
[ 261.692715] __do_page_fault+0x23a/0x480
[ 261.692721] do_page_fault+0x1a/0x64
[ 261.692727] page_fault+0x39/0x40
[ 261.692730] RIP: 0033:0x5afb2a4d7c10
[ 261.692734] Code: c0 0f 84 53 02 00 00 8b 54 24 0c 31 c0 85 d2 0f 94 c0 89 04 24 41 83 fd 02 0f 8f fa 00 00 00 31 c0 4d 85 ff 7e 10 0f 1f 40 00 <c6> 04 03 5a 4c 01 f0 49 39 c7 7f f4 4d 85 e4 0f 84 f4 01 00 00 7e
[ 261.692737] RSP: 002b:00007ffe9f9a6050 EFLAGS: 00010206
[ 261.692740] RAX: 000000036dd5a000 RBX: 00007f6b76aa6010 RCX: 00007f725907f6fb
[ 261.692743] RDX: 0000000000000001 RSI: 00000006e24b4000 RDI: 00007f6b76aa6000
[ 261.692745] RBP: 00005afb2a4d8a54 R08: 00007f6b76aa6010 R09: 0000000000000000
[ 261.692748] R10: 0000000000000022 R11: 00000006e24b3000 R12: ffffffffffffffff
[ 261.692750] R13: 0000000000000002 R14: 0000000000001000 R15: 00000006e24b3000
[ 261.692754] ---[ end trace 4cdaeb6ba05f3a76 ]---
```
semi-full log (of all stacktraces encountered due to the WARN_ON(1) that I added) is attached.
Thanks for filing this @howaboutsynergy. I wonder if this is a duplicate of my i915 freeze struggles at https://bugs.freedesktop.org/show_bug.cgi?id=111500 or vice versa? (In reply to Leho Kraav (:macmaN :lkraav) from comment #10) Ha! That's very interesting! I can actually ssh to it while frozen!! (just like you) I assumed that since during `stress` the CPU/fans are high speed, and when freeze happens they go low/silent/off and not even the mouse moves in X, that it's completely frozen, even though sysrq stuff would work. So I can then see hung tasks via ssh, dmesg like: ``` [ 2062.553078] glibc64:getnameinfo.c:559/getnameinfo[6929]: sshd[6929](full:'sshd: user@pts/7') for user user(1000(eff:user(1000))) 2of2 successfully reverse-resolved requested IP address: 192.168.0.78 192.168.0.78 [ 2098.511916] INFO: task kworker/10:0H:63 blocked for more than 36 seconds. [ 2098.511923] Tainted: G U 5.3.0-rc8-gf74c2bb98776 #62 [ 2098.511926] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2098.511929] kworker/10:0H D 0 63 2 0x80004000 [ 2098.511944] Workqueue: events_highpri intel_atomic_cleanup_work [ 2098.511948] Call Trace: [ 2098.511960] ? __schedule+0x2c6/0x580 [ 2098.511966] schedule+0x34/0xa0 [ 2098.511972] schedule_preempt_disabled+0x5/0x10 [ 2098.511978] __mutex_lock.isra.0+0x25f/0x4b0 [ 2098.511986] ? pick_next_task_fair+0x1eb/0x700 [ 2098.511993] intel_cleanup_plane_fb+0x28/0x70 [ 2098.512000] drm_atomic_helper_cleanup_planes+0x4a/0x60 [ 2098.512005] intel_atomic_cleanup_work+0x17/0x80 [ 2098.512011] process_one_work+0x16b/0x2a0 [ 2098.512016] worker_thread+0x48/0x390 [ 2098.512023] kthread+0xee/0x130 [ 2098.512028] ? process_one_work+0x2a0/0x2a0 [ 2098.512034] ? kthread_park+0x70/0x70 [ 2098.512038] ret_from_fork+0x1f/0x30 [ 2098.512076] INFO: task kworker/u24:1:5567 blocked for more than 36 seconds. [ 2098.512079] Tainted: G U 5.3.0-rc8-gf74c2bb98776 #62 [ 2098.512081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2098.512083] kworker/u24:1 D 0 5567 2 0x80004000 [ 2098.512093] Workqueue: i915 __i915_gem_free_work [ 2098.512095] Call Trace: [ 2098.512102] ? __schedule+0x2c6/0x580 [ 2098.512107] schedule+0x34/0xa0 [ 2098.512113] schedule_preempt_disabled+0x5/0x10 [ 2098.512119] __mutex_lock.isra.0+0x25f/0x4b0 [ 2098.512125] ? __pm_runtime_resume+0x53/0x70 [ 2098.512130] __i915_gem_free_objects+0x71/0x1d0 [ 2098.512135] __i915_gem_free_work+0x5f/0x90 [ 2098.512140] process_one_work+0x16b/0x2a0 [ 2098.512144] worker_thread+0x48/0x390 [ 2098.512151] kthread+0xee/0x130 [ 2098.512155] ? process_one_work+0x2a0/0x2a0 [ 2098.512160] ? kthread_park+0x70/0x70 [ 2098.512164] ret_from_fork+0x1f/0x30 [ 2098.512171] INFO: task stress:6620 blocked for more than 36 seconds. [ 2098.512174] Tainted: G U 5.3.0-rc8-gf74c2bb98776 #62 [ 2098.512176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2098.512178] stress D 0 6620 6618 0x00000004 [ 2098.512183] Call Trace: [ 2098.512189] ? __schedule+0x2c6/0x580 [ 2098.512194] schedule+0x34/0xa0 [ 2098.512199] io_schedule+0xd/0x40 [ 2098.512206] __lock_page+0x13a/0x230 [ 2098.512212] ? file_fdatawait_range+0x20/0x20 [ 2098.512217] set_page_dirty_lock+0x44/0x50 [ 2098.512224] i915_gem_userptr_put_pages+0x135/0x180 [ 2098.512229] __i915_gem_object_put_pages+0x50/0x90 [ 2098.512235] userptr_mn_invalidate_range_start+0x17f/0x210 [ 2098.512243] __mmu_notifier_invalidate_range_start+0x4f/0x90 [ 2098.512249] try_to_unmap_one+0x718/0x820 [ 2098.512254] rmap_walk_file+0xe4/0x250 [ 2098.512260] try_to_unmap+0xc1/0xf0 [ 2098.512264] ? page_remove_rmap+0x2a0/0x2a0 [ 2098.512268] ? page_not_mapped+0x10/0x10 [ 2098.512272] ? page_get_anon_vma+0x70/0x70 [ 2098.512277] migrate_pages+0x7aa/0x9a0 [ 2098.512285] ? isolate_freepages_block+0x340/0x340 [ 2098.512291] ? move_freelist_tail+0xd0/0xd0 [ 2098.512295] compact_zone+0x656/0xa70 [ 2098.512300] ? set_next_entity+0x9e/0x1a0 [ 2098.512304] compact_zone_order+0xde/0x120 [ 2098.512309] try_to_compact_pages+0x187/0x240 [ 2098.512316] __alloc_pages_direct_compact+0x87/0x170 [ 2098.512323] __alloc_pages_slowpath+0x1f8/0xc10 [ 2098.512329] ? get_page_from_freelist+0xe80/0x1330 [ 2098.512337] __alloc_pages_nodemask+0x268/0x2b0 [ 2098.512342] alloc_pages_vma+0xc1/0x160 [ 2098.512348] do_huge_pmd_anonymous_page+0x271/0x610 [ 2098.512354] __handle_mm_fault+0xbfc/0x12f0 [ 2098.512360] handle_mm_fault+0xa9/0x1d0 [ 2098.512367] __do_page_fault+0x23a/0x480 [ 2098.512373] do_page_fault+0x1a/0x64 [ 2098.512378] page_fault+0x39/0x40 [ 2098.512383] RIP: 0033:0x6453194e9c10 [ 2098.512391] Code: Bad RIP value. [ 2098.512394] RSP: 002b:00007ffdcdfbbbc0 EFLAGS: 00010206 [ 2098.512398] RAX: 00000006c5c79000 RBX: 00007a6cfe787010 RCX: 00007a73db2596fb [ 2098.512401] RDX: 0000000000000001 RSI: 00000006dc9ad000 RDI: 00007a6cfe787000 [ 2098.512404] RBP: 00006453194eaa54 R08: 00007a6cfe787010 R09: 0000000000000000 [ 2098.512406] R10: 0000000000000022 R11: 00000006dc9ac000 R12: ffffffffffffffff [ 2098.512409] R13: 0000000000000002 R14: 0000000000001000 R15: 00000006dc9ac000 [ 2117.676377] gpg-agent[1084]: handler 0x7c2858d5b700 for fd 10 started [ 2117.741468] gpg-agent[1084]: handler 0x7c2858d5b700 for fd 10 terminated [ 2135.375908] INFO: task kworker/10:0H:63 blocked for more than 73 seconds. [ 2135.375914] Tainted: G U 5.3.0-rc8-gf74c2bb98776 #62 [ 2135.375917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2135.375920] kworker/10:0H D 0 63 2 0x80004000 [ 2135.375935] Workqueue: events_highpri intel_atomic_cleanup_work [ 2135.375938] Call Trace: [ 2135.375950] ? __schedule+0x2c6/0x580 [ 2135.375957] schedule+0x34/0xa0 [ 2135.375963] schedule_preempt_disabled+0x5/0x10 [ 2135.375969] __mutex_lock.isra.0+0x25f/0x4b0 [ 2135.375977] ? pick_next_task_fair+0x1eb/0x700 [ 2135.375984] intel_cleanup_plane_fb+0x28/0x70 [ 2135.375991] drm_atomic_helper_cleanup_planes+0x4a/0x60 [ 2135.375996] intel_atomic_cleanup_work+0x17/0x80 [ 2135.376002] process_one_work+0x16b/0x2a0 [ 2135.376007] worker_thread+0x48/0x390 [ 2135.376014] kthread+0xee/0x130 [ 2135.376019] ? process_one_work+0x2a0/0x2a0 [ 2135.376025] ? kthread_park+0x70/0x70 [ 2135.376029] ret_from_fork+0x1f/0x30 [ 2135.376061] INFO: task kworker/u24:1:5567 blocked for more than 73 seconds. [ 2135.376064] Tainted: G U 5.3.0-rc8-gf74c2bb98776 #62 [ 2135.376066] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2135.376069] kworker/u24:1 D 0 5567 2 0x80004000 [ 2135.376078] Workqueue: i915 __i915_gem_free_work [ 2135.376080] Call Trace: [ 2135.376087] ? __schedule+0x2c6/0x580 [ 2135.376093] schedule+0x34/0xa0 [ 2135.376098] schedule_preempt_disabled+0x5/0x10 [ 2135.376104] __mutex_lock.isra.0+0x25f/0x4b0 [ 2135.376110] ? __pm_runtime_resume+0x53/0x70 [ 2135.376114] __i915_gem_free_objects+0x71/0x1d0 [ 2135.376119] __i915_gem_free_work+0x5f/0x90 [ 2135.376124] process_one_work+0x16b/0x2a0 [ 2135.376129] worker_thread+0x48/0x390 [ 2135.376135] kthread+0xee/0x130 [ 2135.376139] ? process_one_work+0x2a0/0x2a0 [ 2135.376145] ? kthread_park+0x70/0x70 [ 2135.376149] ret_from_fork+0x1f/0x30 [ 2135.376155] INFO: task stress:6620 blocked for more than 73 seconds. [ 2135.376158] Tainted: G U 5.3.0-rc8-gf74c2bb98776 #62 [ 2135.376160] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2135.376163] stress D 0 6620 6618 0x00000004 [ 2135.376166] Call Trace: [ 2135.376172] ? __schedule+0x2c6/0x580 [ 2135.376178] schedule+0x34/0xa0 [ 2135.376183] io_schedule+0xd/0x40 [ 2135.376190] __lock_page+0x13a/0x230 [ 2135.376197] ? file_fdatawait_range+0x20/0x20 [ 2135.376201] set_page_dirty_lock+0x44/0x50 [ 2135.376209] i915_gem_userptr_put_pages+0x135/0x180 [ 2135.376214] __i915_gem_object_put_pages+0x50/0x90 [ 2135.376220] userptr_mn_invalidate_range_start+0x17f/0x210 [ 2135.376228] __mmu_notifier_invalidate_range_start+0x4f/0x90 [ 2135.376234] try_to_unmap_one+0x718/0x820 [ 2135.376239] rmap_walk_file+0xe4/0x250 [ 2135.376244] try_to_unmap+0xc1/0xf0 [ 2135.376248] ? page_remove_rmap+0x2a0/0x2a0 [ 2135.376252] ? page_not_mapped+0x10/0x10 [ 2135.376257] ? page_get_anon_vma+0x70/0x70 [ 2135.376261] migrate_pages+0x7aa/0x9a0 [ 2135.376269] ? isolate_freepages_block+0x340/0x340 [ 2135.376275] ? move_freelist_tail+0xd0/0xd0 [ 2135.376279] compact_zone+0x656/0xa70 [ 2135.376284] ? set_next_entity+0x9e/0x1a0 [ 2135.376288] compact_zone_order+0xde/0x120 [ 2135.376293] try_to_compact_pages+0x187/0x240 [ 2135.376300] __alloc_pages_direct_compact+0x87/0x170 [ 2135.376307] __alloc_pages_slowpath+0x1f8/0xc10 [ 2135.376314] ? get_page_from_freelist+0xe80/0x1330 [ 2135.376321] __alloc_pages_nodemask+0x268/0x2b0 [ 2135.376327] alloc_pages_vma+0xc1/0x160 [ 2135.376332] do_huge_pmd_anonymous_page+0x271/0x610 [ 2135.376338] __handle_mm_fault+0xbfc/0x12f0 [ 2135.376344] handle_mm_fault+0xa9/0x1d0 [ 2135.376351] __do_page_fault+0x23a/0x480 [ 2135.376357] do_page_fault+0x1a/0x64 [ 2135.376362] page_fault+0x39/0x40 [ 2135.376367] RIP: 0033:0x6453194e9c10 [ 2135.376375] Code: Bad RIP value. [ 2135.376379] RSP: 002b:00007ffdcdfbbbc0 EFLAGS: 00010206 [ 2135.376383] RAX: 00000006c5c79000 RBX: 00007a6cfe787010 RCX: 00007a73db2596fb [ 2135.376386] RDX: 0000000000000001 RSI: 00000006dc9ad000 RDI: 00007a6cfe787000 [ 2135.376389] RBP: 00006453194eaa54 R08: 00007a6cfe787010 R09: 0000000000000000 [ 2135.376392] R10: 0000000000000022 R11: 00000006dc9ac000 R12: ffffffffffffffff [ 2135.376395] R13: 0000000000000002 R14: 0000000000001000 R15: 00000006dc9ac000 [ 2172.239905] INFO: task kworker/10:0H:63 blocked for more than 110 seconds. [ 2172.239912] Tainted: G U 5.3.0-rc8-gf74c2bb98776 #62 [ 2172.239914] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2172.239918] kworker/10:0H D 0 63 2 0x80004000 [ 2172.239932] Workqueue: events_highpri intel_atomic_cleanup_work [ 2172.239935] Call Trace: [ 2172.239947] ? __schedule+0x2c6/0x580 [ 2172.239953] schedule+0x34/0xa0 [ 2172.239959] schedule_preempt_disabled+0x5/0x10 [ 2172.239965] __mutex_lock.isra.0+0x25f/0x4b0 [ 2172.239973] ? pick_next_task_fair+0x1eb/0x700 [ 2172.239980] intel_cleanup_plane_fb+0x28/0x70 [ 2172.239986] drm_atomic_helper_cleanup_planes+0x4a/0x60 [ 2172.239992] intel_atomic_cleanup_work+0x17/0x80 [ 2172.239997] process_one_work+0x16b/0x2a0 [ 2172.240003] worker_thread+0x48/0x390 [ 2172.240010] kthread+0xee/0x130 [ 2172.240014] ? process_one_work+0x2a0/0x2a0 [ 2172.240020] ? kthread_park+0x70/0x70 [ 2172.240024] ret_from_fork+0x1f/0x30 [ 2172.240054] INFO: task kworker/u24:1:5567 blocked for more than 110 seconds. [ 2172.240057] Tainted: G U 5.3.0-rc8-gf74c2bb98776 #62 [ 2172.240059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2172.240062] kworker/u24:1 D 0 5567 2 0x80004000 [ 2172.240071] Workqueue: i915 __i915_gem_free_work [ 2172.240073] Call Trace: [ 2172.240080] ? __schedule+0x2c6/0x580 [ 2172.240085] schedule+0x34/0xa0 [ 2172.240091] schedule_preempt_disabled+0x5/0x10 [ 2172.240097] __mutex_lock.isra.0+0x25f/0x4b0 [ 2172.240102] ? __pm_runtime_resume+0x53/0x70 [ 2172.240107] __i915_gem_free_objects+0x71/0x1d0 [ 2172.240112] __i915_gem_free_work+0x5f/0x90 [ 2172.240117] process_one_work+0x16b/0x2a0 [ 2172.240122] worker_thread+0x48/0x390 [ 2172.240128] kthread+0xee/0x130 [ 2172.240132] ? process_one_work+0x2a0/0x2a0 [ 2172.240137] ? kthread_park+0x70/0x70 [ 2172.240141] ret_from_fork+0x1f/0x30 [ 2172.240148] INFO: task stress:6620 blocked for more than 110 seconds. [ 2172.240151] Tainted: G U 5.3.0-rc8-gf74c2bb98776 #62 [ 2172.240153] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2172.240155] stress D 0 6620 6618 0x00000004 [ 2172.240159] Call Trace: [ 2172.240165] ? __schedule+0x2c6/0x580 [ 2172.240170] schedule+0x34/0xa0 [ 2172.240176] io_schedule+0xd/0x40 [ 2172.240182] __lock_page+0x13a/0x230 [ 2172.240188] ? file_fdatawait_range+0x20/0x20 [ 2172.240193] set_page_dirty_lock+0x44/0x50 [ 2172.240200] i915_gem_userptr_put_pages+0x135/0x180 [ 2172.240206] __i915_gem_object_put_pages+0x50/0x90 [ 2172.240212] userptr_mn_invalidate_range_start+0x17f/0x210 [ 2172.240219] __mmu_notifier_invalidate_range_start+0x4f/0x90 [ 2172.240224] try_to_unmap_one+0x718/0x820 [ 2172.240229] rmap_walk_file+0xe4/0x250 [ 2172.240234] try_to_unmap+0xc1/0xf0 [ 2172.240239] ? page_remove_rmap+0x2a0/0x2a0 [ 2172.240243] ? page_not_mapped+0x10/0x10 [ 2172.240247] ? page_get_anon_vma+0x70/0x70 [ 2172.240251] migrate_pages+0x7aa/0x9a0 [ 2172.240259] ? isolate_freepages_block+0x340/0x340 [ 2172.240265] ? move_freelist_tail+0xd0/0xd0 [ 2172.240269] compact_zone+0x656/0xa70 [ 2172.240274] ? set_next_entity+0x9e/0x1a0 [ 2172.240278] compact_zone_order+0xde/0x120 [ 2172.240283] try_to_compact_pages+0x187/0x240 [ 2172.240290] __alloc_pages_direct_compact+0x87/0x170 [ 2172.240297] __alloc_pages_slowpath+0x1f8/0xc10 [ 2172.240304] ? get_page_from_freelist+0xe80/0x1330 [ 2172.240311] __alloc_pages_nodemask+0x268/0x2b0 [ 2172.240317] alloc_pages_vma+0xc1/0x160 [ 2172.240322] do_huge_pmd_anonymous_page+0x271/0x610 [ 2172.240328] __handle_mm_fault+0xbfc/0x12f0 [ 2172.240334] handle_mm_fault+0xa9/0x1d0 [ 2172.240341] __do_page_fault+0x23a/0x480 [ 2172.240347] do_page_fault+0x1a/0x64 [ 2172.240352] page_fault+0x39/0x40 [ 2172.240357] RIP: 0033:0x6453194e9c10 [ 2172.240365] Code: Bad RIP value. [ 2172.240369] RSP: 002b:00007ffdcdfbbbc0 EFLAGS: 00010206 [ 2172.240373] RAX: 00000006c5c79000 RBX: 00007a6cfe787010 RCX: 00007a73db2596fb [ 2172.240376] RDX: 0000000000000001 RSI: 00000006dc9ad000 RDI: 00007a6cfe787000 [ 2172.240379] RBP: 00006453194eaa54 R08: 00007a6cfe787010 R09: 0000000000000000 [ 2172.240382] R10: 0000000000000022 R11: 00000006dc9ac000 R12: ffffffffffffffff [ 2172.240384] R13: 0000000000000002 R14: 0000000000001000 R15: 00000006dc9ac000 [ 2177.737800] gpg-agent[1084]: handler 0x7c285955c700 for fd 10 started [ 2177.781519] gpg-agent[1084]: handler 0x7c285955c700 for fd 10 terminated ``` Looks like this was mentioned before: https://lkml.org/lkml/2019/8/9/433 Thanks to Leho Kraav (:macmaN :lkraav) for the pointer, also for the ssh idea! Cheers! commit 505a8ec7e11ae5236c4a154a1e24ef49a8349600 (linus/master) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Sep 12 13:56:34 2019 +0100 Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()" |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.