Looks like if we dirty the pages from inside an the invalidate-range we may trigger: [ 5686.374234] ------------[ cut here ]------------ [ 5686.374255] WARNING: CPU: 0 PID: 13274 at mm/filemap.c:217 __delete_from_page_cache+0x274/0x280() [ 5686.374259] Modules linked in: drbg ansi_cprng ctr ccm arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core intel_powerclamp snd_hwdep coretemp iwlwifi snd_pcm dm_multipath thinkpad_acpi kvm cfg80211 snd_seq_midi nvram snd_seq_midi_event irqbypass rfcomm snd_rawmidi crct10dif_pclmul crc32_pclmul btusb snd_seq aesni_intel btrtl aes_x86_64 btbcm bnep lrw btintel bluetooth gf128mul glue_helper intel_ips ablk_helper cryptd joydev snd_seq_device serio_raw snd_timer snd lpc_ich mei_me shpchp wmi mei soundcore mac_hid parport_pc binfmt_misc ppdev lp parport dm_mirror dm_region_hash dm_log e1000e psmouse ahci ptp libahci pps_core [ 5686.374345] CPU: 0 PID: 13274 Comm: gem_concurrent_ Not tainted 4.4.0+ #36 [ 5686.374348] Hardware name: LENOVO 514328U/514328U, BIOS 6QET44WW (1.14 ) 04/20/2010 [ 5686.374351] ffffffff81d008ee ffff8800ab763b00 ffffffff813970cd 0000000000000000 [ 5686.374355] ffff8800ab763b38 ffffffff810762c6 ffff8800aae053b8 ffffea00009b6340 [ 5686.374360] ffff8800aae053b0 0000000000000000 0000000000000003 ffff8800ab763b48 [ 5686.374364] Call Trace: [ 5686.374376] [<ffffffff813970cd>] dump_stack+0x44/0x57 [ 5686.374381] [<ffffffff810762c6>] warn_slowpath_common+0x86/0xc0 [ 5686.374385] [<ffffffff810763ba>] warn_slowpath_null+0x1a/0x20 [ 5686.374389] [<ffffffff811723d4>] __delete_from_page_cache+0x274/0x280 [ 5686.374393] [<ffffffff8117242d>] delete_from_page_cache+0x4d/0x80 [ 5686.374399] [<ffffffff81180976>] truncate_inode_page+0x56/0x90 [ 5686.374406] [<ffffffff8118b249>] shmem_undo_range+0x399/0x690 [ 5686.374412] [<ffffffff8118b554>] shmem_truncate_range+0x14/0x40 [ 5686.374417] [<ffffffff8118b630>] shmem_evict_inode+0xb0/0x130 [ 5686.374422] [<ffffffff81203cee>] evict+0xbe/0x1a0 [ 5686.374426] [<ffffffff81204985>] iput+0x175/0x1e0 [ 5686.374432] [<ffffffff8120034c>] __dentry_kill+0x17c/0x1e0 [ 5686.374436] [<ffffffff81200549>] dput+0x199/0x1f0 [ 5686.374441] [<ffffffff811ebb58>] __fput+0x188/0x210 [ 5686.374445] [<ffffffff811ebc1e>] ____fput+0xe/0x10 [ 5686.374452] [<ffffffff81092407>] task_work_run+0x77/0x90 [ 5686.374458] [<ffffffff81071427>] exit_to_usermode_loop+0x73/0xa2 [ 5686.374466] [<ffffffff81003a7d>] syscall_return_slowpath+0x8d/0xa0 [ 5686.374472] [<ffffffff8186c758>] int_ret_from_sys_call+0x25/0x8f [ 5686.374476] ---[ end trace 329c2060913a2504 ]---
More strictly it looks like a race between truncate_complete_page()/delete_from_page_cache() and the cancel_userptr woker.
I haven't seen this since around commit 40313f0cd0b711a7a5905e5182422799e157d8aa Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 5 15:00:00 2016 +0100 drm/i915/userptr: Hold mmref whilst calling get-user-pages so I'm going to assume that it helped! Or the changes in worker etc.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.