Bug 112012

Summary:	kernel BUG at fs/ext4/inode.c:2721!
Product:	DRI	Reporter:	Robert Holmes <robeholmes>
Component:	DRM/Intel	Assignee:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Status:	RESOLVED MOVED	QA Contact:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity:	major
Priority:	high	CC:	alarrosa, intel-gfx-bugs, massimiliano.torromeo, robeholmes
Version:	XOrg git
Hardware:	x86-64 (AMD64)
OS:	All
Whiteboard:	Triaged, ReadyForDev
i915 platform:	SKL	i915 features:	GEM/Other

Description Robert Holmes 2019-10-15 21:16:09 UTC

Starting with 5.3, I've had the following behavioral pattern. First, soon after booting, a kernel WARNING as follows:

	Oct 03 04:39:51 laptop kernel: WARNING: CPU: 5 PID: 198 at fs/ext4/inode.c:3941 ext4_set_page_dirty+0x3e/0x50
	Oct 03 04:39:51 laptop kernel: Modules linked in: squashfs zstd_decompress loop ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables ppdev parport fuse sunrpc vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek iwlmvm snd_hda_codec_generic ledtrig_audio x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel mac80211 snd_hda_codec kvm_intel snd_hda_core snd_hwdep libarc4 snd_seq iwlwifi kvm snd_seq_device irqbypass intel_cstate snd_pcm iTCO_wdt intel_uncore asus_nb_wmi mei_hdcp iTCO_vendor_support intel_rapl_msr snd_timer cfg80211 asus_wmi intel_rapl_perf processor_thermal_device rtsx_pci_ms snd sparse_keymap joydev mei_me memstick i2c_i801 soundcore idma64 rfkill intel_rapl_common mei
	Oct 03 04:39:51 laptop kernel:  intel_pch_thermal elan_i2c intel_lpss_pci intel_lpss intel_soc_dts_iosf int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel acpi_pad dm_crypt i915 nouveau ttm crct10dif_pclmul i2c_algo_bit crc32_pclmul rtsx_pci_sdmmc drm_kms_helper crc32c_intel mmc_core nvme mxm_wmi drm nvme_core ghash_clmulni_intel serio_raw rtsx_pci r8169 hid_microsoft ff_memless i2c_hid wmi video [last unloaded: vmnet]
	Oct 03 04:39:51 laptop kernel: CPU: 5 PID: 198 Comm: kworker/u16:4 Tainted: G           O      5.3.2-300.fc30.x86_64 #1
	Oct 03 04:39:51 laptop kernel: Hardware name: ASUSTeK COMPUTER INC. N552VX/N552VX, BIOS N552VX.300 08/31/2016
	Oct 03 04:39:51 laptop kernel: Workqueue: i915 __i915_gem_free_work [i915]
	Oct 03 04:39:51 laptop kernel: RIP: 0010:ext4_set_page_dirty+0x3e/0x50
	Oct 03 04:39:51 laptop kernel: Code: 48 8b 00 a8 01 75 16 48 8b 57 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 48 8b 00 a8 08 74 0d 48 8b 07 f6 c4 20 74 0f e9 a2 ef f7 ff <0f> 0b 48 8b 07 f6 c4 20 75 f1 0f 0b e9 91 ef f7 ff 90 0f 1f 44 00
	Oct 03 04:39:51 laptop kernel: RSP: 0018:ffffa617002d3d90 EFLAGS: 00010246
	Oct 03 04:39:51 laptop kernel: RAX: 0017fffe00002016 RBX: ffff99931606f800 RCX: 0000000000000000
	Oct 03 04:39:51 laptop kernel: RDX: 0000000000000000 RSI: 00000003d4800000 RDI: ffffd2e80fe496c0
	Oct 03 04:39:51 laptop kernel: RBP: ffffd2e80fe496c0 R08: 00000003d4800000 R09: ffff999429548188
	Oct 03 04:39:51 laptop kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000003f925b
	Oct 03 04:39:51 laptop kernel: R13: ffff99931630a400 R14: ffff99942d4917a0 R15: 0000000000000000
	Oct 03 04:39:51 laptop kernel: FS:  0000000000000000(0000) GS:ffff999433b40000(0000) knlGS:0000000000000000
	Oct 03 04:39:51 laptop kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	Oct 03 04:39:51 laptop kernel: CR2: 00007f16bfc25688 CR3: 000000037a40a002 CR4: 00000000003606e0
	Oct 03 04:39:51 laptop kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	Oct 03 04:39:51 laptop kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	Oct 03 04:39:51 laptop kernel: Call Trace:
	Oct 03 04:39:51 laptop kernel:  i915_gem_userptr_put_pages+0x14a/0x1e0 [i915]
	Oct 03 04:39:51 laptop kernel:  __i915_gem_object_put_pages+0x5e/0xa0 [i915]
	Oct 03 04:39:51 laptop kernel:  __i915_gem_free_objects+0x123/0x220 [i915]
	Oct 03 04:39:51 laptop kernel:  __i915_gem_free_work+0x64/0x90 [i915]
	Oct 03 04:39:51 laptop kernel:  process_one_work+0x19d/0x340
	Oct 03 04:39:51 laptop kernel:  worker_thread+0x50/0x3b0
	Oct 03 04:39:51 laptop kernel:  kthread+0xfb/0x130
	Oct 03 04:39:51 laptop kernel:  ? process_one_work+0x340/0x340
	Oct 03 04:39:51 laptop kernel:  ? kthread_park+0x80/0x80
	Oct 03 04:39:51 laptop kernel:  ret_from_fork+0x35/0x40
	Oct 03 04:39:51 laptop kernel: ---[ end trace 8ee114643cf24b2e ]---

Later, perhaps a few days, another WARNING soon followed by a BUG:

	Oct 07 01:32:44 laptop kernel: WARNING: CPU: 6 PID: 14209 at fs/ext4/inode.c:3942 ext4_set_page_dirty+0x48/0x50
	Oct 07 01:32:44 laptop kernel: Modules linked in: xfs btrfs xor zstd_compress raid6_pq uas usb_storage squashfs zstd_decompress loop ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables ppdev parport fuse sunrpc vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek iwlmvm snd_hda_codec_generic ledtrig_audio x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel mac80211 snd_hda_codec kvm_intel snd_hda_core snd_hwdep libarc4 snd_seq iwlwifi kvm snd_seq_device irqbypass intel_cstate snd_pcm iTCO_wdt intel_uncore asus_nb_wmi mei_hdcp iTCO_vendor_support intel_rapl_msr snd_timer cfg80211 asus_wmi intel_rapl_perf processor_thermal_device rtsx_pci_ms snd sparse_keymap joydev mei_me memstick i2c_i801
	Oct 07 01:32:44 laptop kernel:  soundcore idma64 rfkill intel_rapl_common mei intel_pch_thermal elan_i2c intel_lpss_pci intel_lpss intel_soc_dts_iosf int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel acpi_pad dm_crypt i915 nouveau ttm crct10dif_pclmul i2c_algo_bit crc32_pclmul rtsx_pci_sdmmc drm_kms_helper crc32c_intel mmc_core nvme mxm_wmi drm nvme_core ghash_clmulni_intel serio_raw rtsx_pci r8169 hid_microsoft ff_memless i2c_hid wmi video [last unloaded: vmnet]
	Oct 07 01:32:44 laptop kernel: CPU: 6 PID: 14209 Comm: kworker/u16:1 Tainted: G        W  O      5.3.2-300.fc30.x86_64 #1
	Oct 07 01:32:44 laptop kernel: Hardware name: ASUSTeK COMPUTER INC. N552VX/N552VX, BIOS N552VX.300 08/31/2016
	Oct 07 01:32:44 laptop kernel: Workqueue: i915 __i915_gem_free_work [i915]
	Oct 07 01:32:44 laptop kernel: RIP: 0010:ext4_set_page_dirty+0x48/0x50
	Oct 07 01:32:44 laptop kernel: Code: 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 48 8b 00 a8 08 74 0d 48 8b 07 f6 c4 20 74 0f e9 a2 ef f7 ff 0f 0b 48 8b 07 f6 c4 20 75 f1 <0f> 0b e9 91 ef f7 ff 90 0f 1f 44 00 00 41 54 49 89 fc 55 89 d5 53
	Oct 07 01:32:44 laptop kernel: RSP: 0018:ffffa617060dbd90 EFLAGS: 00010246
	Oct 07 01:32:44 laptop kernel: RAX: 0017fffe00020016 RBX: ffff9993a4991000 RCX: 0000000000000000
	Oct 07 01:32:44 laptop kernel: RDX: 0000000000000000 RSI: 000000037614f000 RDI: ffffd2e808eaac40
	Oct 07 01:32:44 laptop kernel: RBP: ffffd2e808eaac40 R08: 000000037614f000 R09: 0000000000000001
	Oct 07 01:32:44 laptop kernel: R10: ffff99940ceb6201 R11: 0000000000000000 R12: 000000000023aab1
	Oct 07 01:32:44 laptop kernel: R13: ffff9990316eb600 R14: ffff9992c0bc8240 R15: 0000000000000000
	Oct 07 01:32:44 laptop kernel: FS:  0000000000000000(0000) GS:ffff999433b80000(0000) knlGS:0000000000000000
	Oct 07 01:32:44 laptop kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	Oct 07 01:32:44 laptop kernel: CR2: 00003ae54cb46000 CR3: 000000037a40a001 CR4: 00000000003606e0
	Oct 07 01:32:44 laptop kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	Oct 07 01:32:44 laptop kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	Oct 07 01:32:44 laptop kernel: Call Trace:
	Oct 07 01:32:44 laptop kernel:  i915_gem_userptr_put_pages+0x14a/0x1e0 [i915]
	Oct 07 01:32:44 laptop kernel:  __i915_gem_object_put_pages+0x5e/0xa0 [i915]
	Oct 07 01:32:44 laptop kernel:  __i915_gem_free_objects+0x123/0x220 [i915]
	Oct 07 01:32:44 laptop kernel:  __i915_gem_free_work+0x64/0x90 [i915]
	Oct 07 01:32:44 laptop kernel:  process_one_work+0x19d/0x340
	Oct 07 01:32:44 laptop kernel:  worker_thread+0x50/0x3b0
	Oct 07 01:32:44 laptop kernel:  kthread+0xfb/0x130
	Oct 07 01:32:44 laptop kernel:  ? process_one_work+0x340/0x340
	Oct 07 01:32:44 laptop kernel:  ? kthread_park+0x80/0x80
	Oct 07 01:32:44 laptop kernel:  ret_from_fork+0x35/0x40
	Oct 07 01:32:44 laptop kernel: ---[ end trace 8ee114643cf24b2f ]---
	Oct 07 01:33:15 laptop kernel: ------------[ cut here ]------------
	Oct 07 01:33:15 laptop kernel: kernel BUG at fs/ext4/inode.c:2721!
	Oct 07 01:33:15 laptop kernel: invalid opcode: 0000 [#1] SMP PTI
	Oct 07 01:33:15 laptop kernel: CPU: 2 PID: 14209 Comm: kworker/u16:1 Tainted: G        W  O      5.3.2-300.fc30.x86_64 #1
	Oct 07 01:33:15 laptop kernel: Hardware name: ASUSTeK COMPUTER INC. N552VX/N552VX, BIOS N552VX.300 08/31/2016
	Oct 07 01:33:15 laptop kernel: Workqueue: writeback wb_workfn (flush-253:1)
	Oct 07 01:33:15 laptop kernel: RIP: 0010:mpage_prepare_extent_to_map+0x25d/0x290
	Oct 07 01:33:15 laptop kernel: Code: 00 75 3d e8 d5 3a 62 00 48 8b 04 24 48 39 44 24 10 0f 86 49 fe ff ff 31 c0 eb af 4c 89 ff e8 3a 2d e8 ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d 7c 24 18 89 44 24 08 e8 43 3a e9 ff 8b 44 24 08 eb 8a
	Oct 07 01:33:15 laptop kernel: RSP: 0018:ffffa617060db988 EFLAGS: 00010246
	Oct 07 01:33:15 laptop kernel: RAX: 0017fffe0002003f RBX: ffffa617060db9b0 RCX: 000000000000074c
	Oct 07 01:33:15 laptop kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffd2e80c2d41c0
	Oct 07 01:33:15 laptop kernel: RBP: 0000000000002800 R08: 0000000000000000 R09: 0000000000000000
	Oct 07 01:33:15 laptop kernel: R10: 0000000000000228 R11: ffffffffffffffff R12: ffffa617060dba20
	Oct 07 01:33:15 laptop kernel: R13: ffff99940fc26798 R14: ffffa617060dbae0 R15: ffffd2e80c2d41c0
	Oct 07 01:33:15 laptop kernel: FS:  0000000000000000(0000) GS:ffff999433a80000(0000) knlGS:0000000000000000
	Oct 07 01:33:15 laptop kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	Oct 07 01:33:15 laptop kernel: CR2: 00002cf01cf1a000 CR3: 000000037a40a003 CR4: 00000000003606e0
	Oct 07 01:33:15 laptop kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	Oct 07 01:33:15 laptop kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	Oct 07 01:33:15 laptop kernel: Call Trace:
	Oct 07 01:33:15 laptop kernel:  ext4_writepages+0x3da/0xe20
	Oct 07 01:33:15 laptop kernel:  ? recalibrate_cpu_khz+0x10/0x10
	Oct 07 01:33:15 laptop kernel:  ? ktime_get+0x3c/0x90
	Oct 07 01:33:15 laptop kernel:  ? __switch_to_asm+0x40/0x70
	Oct 07 01:33:15 laptop kernel:  ? __switch_to_asm+0x40/0x70
	Oct 07 01:33:15 laptop kernel:  ? __switch_to_asm+0x40/0x70
	Oct 07 01:33:15 laptop kernel:  ? __switch_to_asm+0x34/0x70
	Oct 07 01:33:15 laptop kernel:  ? do_writepages+0x43/0xd0
	Oct 07 01:33:15 laptop kernel:  ? ext4_mark_inode_dirty+0x1d0/0x1d0
	Oct 07 01:33:15 laptop kernel:  do_writepages+0x43/0xd0
	Oct 07 01:33:15 laptop kernel:  ? __switch_to_asm+0x34/0x70
	Oct 07 01:33:15 laptop kernel:  __writeback_single_inode+0x3d/0x330
	Oct 07 01:33:15 laptop kernel:  writeback_sb_inodes+0x1fd/0x480
	Oct 07 01:33:15 laptop kernel:  __writeback_inodes_wb+0x4c/0xc0
	Oct 07 01:33:15 laptop kernel:  wb_writeback+0x255/0x2f0
	Oct 07 01:33:15 laptop kernel:  ? get_nr_inodes+0x32/0x50
	Oct 07 01:33:15 laptop kernel:  wb_workfn+0x38f/0x450
	Oct 07 01:33:15 laptop kernel:  ? __switch_to_asm+0x34/0x70
	Oct 07 01:33:15 laptop kernel:  process_one_work+0x19d/0x340
	Oct 07 01:33:15 laptop kernel:  worker_thread+0x50/0x3b0
	Oct 07 01:33:15 laptop kernel:  kthread+0xfb/0x130
	Oct 07 01:33:15 laptop kernel:  ? process_one_work+0x340/0x340
	Oct 07 01:33:15 laptop kernel:  ? kthread_park+0x80/0x80
	Oct 07 01:33:15 laptop kernel:  ret_from_fork+0x35/0x40
	Oct 07 01:33:15 laptop kernel: Modules linked in: xfs btrfs xor zstd_compress raid6_pq uas usb_storage squashfs zstd_decompress loop ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables ppdev parport fuse sunrpc vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek iwlmvm snd_hda_codec_generic ledtrig_audio x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel mac80211 snd_hda_codec kvm_intel snd_hda_core snd_hwdep libarc4 snd_seq iwlwifi kvm snd_seq_device irqbypass intel_cstate snd_pcm iTCO_wdt intel_uncore asus_nb_wmi mei_hdcp iTCO_vendor_support intel_rapl_msr snd_timer cfg80211 asus_wmi intel_rapl_perf processor_thermal_device rtsx_pci_ms snd sparse_keymap joydev mei_me memstick i2c_i801
	Oct 07 01:33:15 laptop kernel:  soundcore idma64 rfkill intel_rapl_common mei intel_pch_thermal elan_i2c intel_lpss_pci intel_lpss intel_soc_dts_iosf int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel acpi_pad dm_crypt i915 nouveau ttm crct10dif_pclmul i2c_algo_bit crc32_pclmul rtsx_pci_sdmmc drm_kms_helper crc32c_intel mmc_core nvme mxm_wmi drm nvme_core ghash_clmulni_intel serio_raw rtsx_pci r8169 hid_microsoft ff_memless i2c_hid wmi video [last unloaded: vmnet]
	Oct 07 01:33:15 laptop kernel: ---[ end trace 8ee114643cf24b30 ]---
	Oct 07 01:33:15 laptop kernel: RIP: 0010:mpage_prepare_extent_to_map+0x25d/0x290
	Oct 07 01:33:15 laptop kernel: Code: 00 75 3d e8 d5 3a 62 00 48 8b 04 24 48 39 44 24 10 0f 86 49 fe ff ff 31 c0 eb af 4c 89 ff e8 3a 2d e8 ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d 7c 24 18 89 44 24 08 e8 43 3a e9 ff 8b 44 24 08 eb 8a
	Oct 07 01:33:15 laptop kernel: RSP: 0018:ffffa617060db988 EFLAGS: 00010246
	Oct 07 01:33:15 laptop kernel: RAX: 0017fffe0002003f RBX: ffffa617060db9b0 RCX: 000000000000074c
	Oct 07 01:33:15 laptop kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffd2e80c2d41c0
	Oct 07 01:33:15 laptop kernel: RBP: 0000000000002800 R08: 0000000000000000 R09: 0000000000000000
	Oct 07 01:33:15 laptop kernel: R10: 0000000000000228 R11: ffffffffffffffff R12: ffffa617060dba20
	Oct 07 01:33:15 laptop kernel: R13: ffff99940fc26798 R14: ffffa617060dbae0 R15: ffffd2e80c2d41c0
	Oct 07 01:33:15 laptop kernel: FS:  0000000000000000(0000) GS:ffff999433a80000(0000) knlGS:0000000000000000
	Oct 07 01:33:15 laptop kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	Oct 07 01:33:15 laptop kernel: CR2: 00002cf01cf1a000 CR3: 000000037a40a003 CR4: 00000000003606e0
	Oct 07 01:33:15 laptop kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	Oct 07 01:33:15 laptop kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Now, the reason I am reporting this on this bug tracker is that I've bisected this issue (more precisely, the first WARNING right after boot) to the following commit: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.3.y&id=0bd6cb6b58f7332c61cef2e4ae48db1ca9910b6b

In fact, reverting this commit on the latest stable 5.3.6 release apparently stops these WARNINGs/BUGs from happening. However, I don't quite get why they're happening, so I don't know whether reverting is a suitable fix.

Comment 1 Chris Wilson 2019-10-16 08:48:42 UTC

(In reply to Robert Holmes from comment #0) 
> Now, the reason I am reporting this on this bug tracker is that I've
> bisected this issue (more precisely, the first WARNING right after boot) to
> the following commit:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> ?h=linux-5.3.y&id=0bd6cb6b58f7332c61cef2e4ae48db1ca9910b6b

Yikes. That shows that code was inherently more buggy than I thought, as it was causing us to drop writes to pages we didn't own (but thought we did).

The root cause of the warn and ext4 bug is the lack of lock_page around set_page_dirty in userptr_put_pages. We tried putting a lock there, but we recurse into userptr_put_pages from underneath locked pages...

There is a plan afoot to replace this interface with HMM in the hope that it makes the integration between the GPU and user pages much nicer and in the process resolve these mistakes.

Comment 2 Roman Tsisyk 2019-11-01 19:30:19 UTC

Hi Chris,

Could you please add quick workaround for this problem? Kernel 5.3.x is completely unusable on desktop because of that...

Comment 3 Öyvind Saether 2019-11-02 23:39:20 UTC

Acer laptop, Intel(R) Pentium(R) CPU N4200Intel(R) Pentium(R) CPU N4200. Kernel 5.3.8. Just compiled and tried the 5.3.x kernel. Got the following in dmesg, no hang (yet) but the following is concerning:

[   50.138567] WARNING: CPU: 1 PID: 1330 at fs/ext4/inode.c:3941 ext4_set_page_dirty+0x3e/0x50
[   50.138569] Modules linked in: rfcomm nf_conntrack_irc nf_conntrack_sip iptable_raw xt_CT nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rt xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables sunrpc lz4 lz4_compress bnep vfat fat snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core iwlmvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_compress ac97_bus ledtrig_audio snd_pcm_dmaengine snd_hda_intel snd_hda_codec intel_telemetry_pltdrv intel_punit_ipc intel_telemetry_core x86_pkg_temp_thermal intel_powerclamp coretemp snd_hwdep kvm_intel snd_hda_core btusb btrtl btbcm kvm btintel iwlwifi uvcvideo bluetooth snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device videobuf2_v4l2 snd_pcm videobuf2_common videodev joydev mei_hdcp intel_rapl_msr hid_multitouch acer_wmi wmi_bmof sparse_keymap irqbypass snd_timer intel_cstate mei_me snd
[   50.138610]  intel_rapl_perf mei mc processor_thermal_device intel_rapl_common wdat_wdt ecdh_generic pcspkr idma64 i2c_i801 intel_lpss_pci lpc_ich ecc int340x_thermal_zone intel_xhci_usb_role_switch soundcore bfq intel_lpss intel_soc_dts_iosf roles int3400_thermal wmi acpi_thermal_rel acer_wireless int3406_thermal dm_crypt i915 i2c_algo_bit cec rc_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm mmc_block rtsx_usb_sdmmc sdhci_pci cqhci sdhci mmc_core crct10dif_pclmul crc32_pclmul serio_raw ghash_clmulni_intel rtsx_usb video i2c_hid pinctrl_broxton pinctrl_intel fuse
[   50.138638] CPU: 1 PID: 1330 Comm: kworker/u8:4 Not tainted 5.3.8-Seohyun #1
[   50.138639] Hardware name: Acer Swift SF113-31/ASAHI_AP_S, BIOS V1.12 03/30/2018
[   50.138700] Workqueue: i915 __i915_gem_free_work [i915]
[   50.138704] RIP: 0010:ext4_set_page_dirty+0x3e/0x50
[   50.138706] Code: 48 8b 00 a8 01 75 16 48 8b 57 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 48 8b 00 a8 08 74 0d 48 8b 07 f6 c4 20 74 0f e9 92 e7 f7 ff <0f> 0b 48 8b 07 f6 c4 20 75 f1 0f 0b e9 81 e7 f7 ff 90 0f 1f 44 00
[   50.138707] RSP: 0018:ffffc1e60137fd90 EFLAGS: 00010246
[   50.138709] RAX: 0017ffe000002016 RBX: ffff9e337236a200 RCX: 0000000000000000
[   50.138710] RDX: 0000000000000000 RSI: 0000000121400000 RDI: fffff3ecc498ea40
[   50.138711] RBP: fffff3ecc498ea40 R08: 0000000121400000 R09: 0000000000000000
[   50.138712] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000001263a9
[   50.138713] R13: ffff9e3322c11b00 R14: ffff9e33367f9ca0 R15: 0000000000000000
[   50.138714] FS:  0000000000000000(0000) GS:ffff9e337ba80000(0000) knlGS:0000000000000000
[   50.138715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   50.138716] CR2: 000055fb74151f10 CR3: 000000013c60a000 CR4: 00000000003406e0
[   50.138717] Call Trace:
[   50.138767]  i915_gem_userptr_put_pages+0x14b/0x1e0 [i915]
[   50.138812]  __i915_gem_object_put_pages+0x5b/0xa0 [i915]
[   50.138854]  __i915_gem_free_objects+0x124/0x230 [i915]
[   50.138898]  __i915_gem_free_work+0x64/0x90 [i915]
[   50.138902]  process_one_work+0x199/0x340
[   50.138905]  worker_thread+0x4e/0x3b0
[   50.138907]  kthread+0xfc/0x130
[   50.138910]  ? process_one_work+0x340/0x340
[   50.138912]  ? kthread_park+0x80/0x80
[   50.138915]  ret_from_fork+0x35/0x40
[   50.138919] ---[ end trace ca5ea2ec07e00336 ]---

It looks like this is the same bug as originally reported by Holmes.

Please let me know if/how I can provide additional useful information.

Comment 4 David Noriega 2019-11-05 21:21:00 UTC

I'd like to report that I am seeing this message as well, I'm running Fedora 30 on a Dell Latitude 7490, current kernel: 5.3.8-200.fc30.x86_64. I thought using the i915 module parameter enable_guc=2 was the trigger, but I've removed that and at least on this new kernel, the message still came.

Comment 5 Antonio Larrosa 2019-11-19 11:09:04 UTC

Hi, I'm also seeing this bug. In my case it's happening in a desktop PC with an i7-6700K cpu and the i915 module (and no other special hardware) using openSUSE Tumbleweed with 5.3.0, 5.3.7 and 5.3.8 kernels.

I'm currently running 5.2.14 without any issue since 5.3.x kernels are completely unusable. Once the BUG appears in dmesg, processes get stuck in D state and the system has to be rebooted, and in some cases, it has happened within an hour after booting.

I reported this to https://bugzilla.opensuse.org/show_bug.cgi?id=1156537 where I've been putting some information on my system before I was pointed to this bug report.

Comment 6 Robert Holmes 2019-11-19 15:51:37 UTC

There has been a patch submitted to -stable fixing (or at least working around) this issue: https://www.spinics.net/lists/stable/msg340095.html

All that is left, presumably, is for it to be actually pulled into a 5.3.x release.

Comment 7 Lakshmi 2019-11-20 09:26:12 UTC

(In reply to Robert Holmes from comment #6)
> There has been a patch submitted to -stable fixing (or at least working
> around) this issue: https://www.spinics.net/lists/stable/msg340095.html
> 
> All that is left, presumably, is for it to be actually pulled into a 5.3.x
> release.

What is the impact of this issue apart from warning in the log?

Comment 8 Robert Holmes 2019-11-21 02:44:21 UTC

(In reply to Lakshmi from comment #7)
> (In reply to Robert Holmes from comment #6)
> > There has been a patch submitted to -stable fixing (or at least working
> > around) this issue: https://www.spinics.net/lists/stable/msg340095.html
> > 
> > All that is left, presumably, is for it to be actually pulled into a 5.3.x
> > release.
> 
> What is the impact of this issue apart from warning in the log?

If this was just the WARNING, I probably wouldn't have even noticed it in the first place. But the WARNING is just a quick sign that the bug is present, before things end up on a BUG later on.

As I've reported initially, after the first (mostly harmless) WARNING, things run fine for a while -- maybe even a few days -- before another WARNING followed by a BUG happens:

	Oct 07 01:33:15 laptop kernel: ------------[ cut here ]------------
	Oct 07 01:33:15 laptop kernel: kernel BUG at fs/ext4/inode.c:2721!
	Oct 07 01:33:15 laptop kernel: invalid opcode: 0000 [#1] SMP PTI
	...

This BUG, as the previous commenter notes, causes processes to get stuck in D mode, as well as new processes becoming impossible to start. The system quickly becomes unusable, and requires a (hard) reboot.

Comment 9 Antonio Larrosa 2019-11-21 10:56:05 UTC

Just for the record, I've been running a 5.3.11 kernel with the patch Robert Holmes mentioned in #c6 applied (with the diff context slightly modified to apply correctly) and so far (over 24 hours later), the system is still running fine and stable with no warning or bug in dmesg.

Comment 10 Robert Holmes 2019-11-29 09:19:38 UTC

Kernel 5.3.14 has now been released with the fix, and 5.4.x already carries it. I have been testing the patch for about a week now, and haven't been able to trigger any WARNING/BUG so far. So this issue can probably be closed soon.

Comment 11 Martin Peres 2019-11-29 19:40:24 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/509.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.