Bug 112202 - Kernel bug in i915: list_del corruption in drm_mm_scan_add_block, i915_gem_evict_something, intel_atomic_commit with 5.3.8 on Fedora 31
Summary: Kernel bug in i915: list_del corruption in drm_mm_scan_add_block, i915_gem_ev...
Status: NEEDINFO
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: not set not set
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-04 02:05 UTC by Arcadiy Ivanov
Modified: 2019-11-05 20:59 UTC (History)
1 user (show)

See Also:
i915 platform: CFL
i915 features: display/atomic


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arcadiy Ivanov 2019-11-04 02:05:37 UTC
During a video conference call X froze, unresponsive to mouse and keyboard. ACPI shutdown was attempted via power button but wasn't successful. 

The following kernel message was found in the logs after hard power-off and reboot:


Nov 03 20:45:28 hostname kernel: list_del corruption. prev->next should be ffff92e80b0cbc20, but was ffff92f26ab956a0
Nov 03 20:45:28 hostname kernel: ------------[ cut here ]------------
Nov 03 20:45:28 hostname kernel: kernel BUG at lib/list_debug.c:51!
Nov 03 20:45:28 hostname kernel: invalid opcode: 0000 [#1] SMP NOPTI
Nov 03 20:45:28 hostname kernel: CPU: 2 PID: 1820 Comm: Xorg Tainted: P     U  W  OE     5.3.8-300.fc31.x86_64 #1
Nov 03 20:45:28 hostname kernel: Hardware name: Dell Inc. Precision 5540/0V030K, BIOS 1.3.3 09/25/2019
Nov 03 20:45:28 hostname kernel: RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
Nov 03 20:45:28 hostname kernel: Code: 5c 18 b0 e8 c4 db c5 ff 0f 0b 48 c7 c7 30 5d 18 b0 e8 b6 db c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 f0 5c 18 b0 e8 a2 db c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 b8 5c 18 b0 e8 8e db c5 ff 0f 0b
Nov 03 20:45:28 hostname kernel: RSP: 0018:ffffaf40412938f8 EFLAGS: 00010246
Nov 03 20:45:28 hostname kernel: RAX: 0000000000000054 RBX: ffffaf4041293978 RCX: 0000000000000000
Nov 03 20:45:28 hostname kernel: RDX: 0000000000000000 RSI: ffff92f27c297908 RDI: ffff92f27c297908
Nov 03 20:45:28 hostname kernel: RBP: ffff92e80b0cbc00 R08: ffff92f27c297908 R09: 0000000000000ca5
Nov 03 20:45:28 hostname kernel: R10: ffffaf40412937b0 R11: ffffaf40412937b6 R12: ffff92e299136320
Nov 03 20:45:28 hostname kernel: R13: ffff92f268df5360 R14: ffff92e80b0cbc00 R15: ffff92f268df5330
Nov 03 20:45:28 hostname kernel: FS:  00007f5ea565bf00(0000) GS:ffff92f27c280000(0000) knlGS:0000000000000000
Nov 03 20:45:28 hostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 03 20:45:28 hostname kernel: CR2: 0000165f53ec3000 CR3: 0000001014142002 CR4: 00000000003606e0
Nov 03 20:45:28 hostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 03 20:45:28 hostname kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 03 20:45:28 hostname kernel: Call Trace:
Nov 03 20:45:28 hostname kernel:  drm_mm_scan_add_block+0x44/0x160 [drm]
Nov 03 20:45:28 hostname kernel:  i915_gem_evict_something+0x214/0x4c0 [i915]
Nov 03 20:45:28 hostname kernel:  i915_gem_gtt_insert+0x16d/0x250 [i915]
Nov 03 20:45:28 hostname kernel:  __i915_vma_do_pin+0x31c/0x490 [i915]
Nov 03 20:45:28 hostname kernel:  i915_gem_object_ggtt_pin+0x126/0x170 [i915]
Nov 03 20:45:28 hostname kernel:  i915_gem_object_pin_to_display_plane+0xb8/0x100 [i915]
Nov 03 20:45:28 hostname kernel:  intel_pin_and_fence_fb_obj+0xac/0x180 [i915]
Nov 03 20:45:28 hostname kernel:  intel_plane_pin_fb+0x44/0xd0 [i915]
Nov 03 20:45:28 hostname kernel:  intel_prepare_plane_fb+0x16b/0x2b0 [i915]
Nov 03 20:45:28 hostname kernel:  drm_atomic_helper_prepare_planes+0x87/0x110 [drm_kms_helper]
Nov 03 20:45:28 hostname kernel:  intel_atomic_commit+0xb9/0x2d0 [i915]
Nov 03 20:45:28 hostname kernel:  drm_atomic_helper_page_flip+0x5d/0x90 [drm_kms_helper]
Nov 03 20:45:28 hostname kernel:  ? drm_event_reserve_init+0x4c/0x60 [drm]
Nov 03 20:45:28 hostname kernel:  drm_mode_page_flip_ioctl+0x54b/0x5d0 [drm]
Nov 03 20:45:28 hostname kernel:  ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
Nov 03 20:45:28 hostname kernel:  drm_ioctl_kernel+0xaa/0xf0 [drm]
Nov 03 20:45:28 hostname kernel:  drm_ioctl+0x208/0x390 [drm]
Nov 03 20:45:28 hostname kernel:  ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
Nov 03 20:45:28 hostname kernel:  ? __hrtimer_init+0xb0/0xb0
Nov 03 20:45:28 hostname kernel:  do_vfs_ioctl+0x405/0x660
Nov 03 20:45:28 hostname kernel:  ? do_setitimer+0xd8/0x220
Nov 03 20:45:28 hostname kernel:  ? wake_up_q+0x60/0x60
Nov 03 20:45:28 hostname kernel:  ksys_ioctl+0x5e/0x90
Nov 03 20:45:28 hostname kernel:  __x64_sys_ioctl+0x16/0x20
Nov 03 20:45:28 hostname kernel:  do_syscall_64+0x5f/0x1a0
Nov 03 20:45:28 hostname kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 03 20:45:28 hostname kernel: RIP: 0033:0x7f5ea5ad22fb
Nov 03 20:45:28 hostname kernel: Code: 0f 1e fa 48 8b 05 8d 9b 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 9b 0c 00 f7 d8 64 89 01 48
Nov 03 20:45:28 hostname kernel: RSP: 002b:00007ffd90e21df8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
Nov 03 20:45:28 hostname kernel: RAX: ffffffffffffffda RBX: 00007ffd90e21eb0 RCX: 00007f5ea5ad22fb
Nov 03 20:45:28 hostname kernel: RDX: 00007ffd90e21eb0 RSI: 00000000c01864b0 RDI: 000000000000000f
Nov 03 20:45:28 hostname kernel: RBP: 00000000c01864b0 R08: 0000000000000ac8 R09: 00007f5ea185b4d0
Nov 03 20:45:28 hostname kernel: R10: 0000000000000000 R11: 0000000000003246 R12: 0000000000000000
Nov 03 20:45:28 hostname kernel: R13: 000000000000000f R14: 000055a3d662d6b0 R15: 000055a3d52c7d90
Nov 03 20:45:28 hostname kernel: Modules linked in: tun ipmi_devintf ipmi_msghandler cdc_ether usbnet dm_crypt loop ccm rfcomm nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack>
Nov 03 20:45:28 hostname kernel:  snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_hda_codec_realtek snd_soc_acpi x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_generic snd_soc_core kvm_intel snd_compress ac97_bus snd_pcm_dmaengine i>
Nov 03 20:45:28 hostname kernel:  binfmt_misc ip_tables xfs libcrc32c uas usb_storage i915 rtsx_pci_sdmmc i2c_algo_bit mmc_core drm_kms_helper nvme drm mxm_wmi r8152 crc32c_intel nvme_core rtsx_pci serio_raw mii i2c_hid pinctrl_cannonlake video wmi pinctrl_in>
Nov 03 20:45:28 hostname kernel: ---[ end trace 7c48767ab7592c98 ]---
Nov 03 20:45:28 hostname kernel: RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
Nov 03 20:45:28 hostname kernel: Code: 5c 18 b0 e8 c4 db c5 ff 0f 0b 48 c7 c7 30 5d 18 b0 e8 b6 db c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 f0 5c 18 b0 e8 a2 db c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 b8 5c 18 b0 e8 8e db c5 ff 0f 0b
Nov 03 20:45:28 hostname kernel: RSP: 0018:ffffaf40412938f8 EFLAGS: 00010246
Nov 03 20:45:28 hostname kernel: RAX: 0000000000000054 RBX: ffffaf4041293978 RCX: 0000000000000000
Nov 03 20:45:28 hostname kernel: RDX: 0000000000000000 RSI: ffff92f27c297908 RDI: ffff92f27c297908
Nov 03 20:45:28 hostname kernel: RBP: ffff92e80b0cbc00 R08: ffff92f27c297908 R09: 0000000000000ca5
Nov 03 20:45:28 hostname kernel: R10: ffffaf40412937b0 R11: ffffaf40412937b6 R12: ffff92e299136320
Nov 03 20:45:28 hostname kernel: R13: ffff92f268df5360 R14: ffff92e80b0cbc00 R15: ffff92f268df5330
Nov 03 20:45:28 hostname kernel: FS:  00007f5ea565bf00(0000) GS:ffff92f27c280000(0000) knlGS:0000000000000000
Nov 03 20:45:28 hostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 03 20:45:28 hostname kernel: CR2: 0000165f53ec3000 CR3: 0000001014142002 CR4: 00000000003606e0
Nov 03 20:45:28 hostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 03 20:45:28 hostname kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Comment 1 Arcadiy Ivanov 2019-11-04 02:08:01 UTC
I will not attempt to reproduce this with drm-git as it's a one-off that occurred randomly. 

I'll report if there is some reproducible pattern and if there is one I'll try with drm-git.
Comment 2 Lakshmi 2019-11-05 13:10:03 UTC
This issue could be duplicate of Bug 111695.
I would recommend to verify the issue with latest kernel (drm-tip (https://cgit.freedesktop.org/drm-tip)).

Other option is to verify the issue with patch https://bugs.freedesktop.org/show_bug.cgi?id=111695#c2

If the issue persists with drmtip or with the recommended patch, please attach the full dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M.
Comment 3 Arcadiy Ivanov 2019-11-05 20:59:19 UTC
Does it matter that the stack traces look completely different and that 

> list_del corruption. prev->next should be ffff92e80b0cbc20, but was ffff92f26ab956a0

in this issue vs 

> list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610)

in test?

Is "dead" address the artifact of the test environment and would not occur in vivo?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.