Bug 106352

Summary: [CI] igt@gem_exec_suspend@basic-s4-devices - dmesg-warn - list_del corruption, ffffea0004070420->next is LIST_POISON1 (dead000000000100)
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED NOTOURBUG QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: IVB i915 features: GEM/Other

Description Martin Peres 2018-05-02 12:13:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4120/fi-ivb-3520m/igt@gem_exec_suspend@basic-s4-devices.html

[  195.271754] ------------[ cut here ]------------
[  195.271755] list_del corruption, ffffea0004070420->next is LIST_POISON1 (dead000000000100)
[  195.271772] WARNING: CPU: 0 PID: 130 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90
[  195.271773] Modules linked in: vgem snd_hda_codec_hdmi btusb btrtl btbcm btintel snd_hda_codec_realtek snd_hda_codec_generic bluetooth ecdh_generic cdc_ncm usbnet mii x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e lpc_ich mei_me prime_numbers mei
[  195.271796] CPU: 0 PID: 130 Comm: kworker/u16:5 Tainted: G    BU  W         4.17.0-rc3-CI-CI_DRM_4120+ #1
[  195.271797] Hardware name: LENOVO 2356GCG/2356GCG, BIOS G7ET31WW (1.13 ) 07/02/2012
[  195.271799] Workqueue: events_unbound async_run_entry_fn
[  195.271801] RIP: 0010:__list_del_entry_valid+0x4e/0x90
[  195.271802] RSP: 0018:ffffc900011d3b68 EFLAGS: 00010082
[  195.271803] RAX: 0000000000000000 RBX: ffffea0004070400 RCX: 0000000000000002
[  195.271804] RDX: 0000000080000002 RSI: 0000000000000001 RDI: 00000000ffffffff
[  195.271805] RBP: ffffc900011d3c70 R08: 0000000000000000 R09: 0000000000000000
[  195.271806] R10: ffffc900011d3af8 R11: ffffffff82243d38 R12: ffffea0004070400
[  195.271807] R13: 0000000000000000 R14: ffff8801394093c0 R15: ffff880139403440
[  195.271808] FS:  0000000000000000(0000) GS:ffff88013e200000(0000) knlGS:0000000000000000
[  195.271809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  195.271810] CR2: 00007fd5a5673d18 CR3: 0000000005210004 CR4: 00000000001606f0
[  195.271810] Call Trace:
[  195.271815]  remove_full.isra.7.part.8+0x17/0x60
[  195.271817]  __slab_free+0x3fb/0x580
[  195.271822]  ? _raw_spin_unlock_irqrestore+0x39/0x60
[  195.271823]  ? debug_check_no_obj_freed+0x132/0x210
[  195.271828]  ? device_release+0x2b/0x80
[  195.271830]  ? pci_pm_thaw+0x80/0x80
[  195.271831]  ? pci_pm_thaw+0x80/0x80
[  195.271832]  device_release+0x2b/0x80
[  195.271835]  kobject_put+0x85/0x1a0
[  195.271841]  mei_cl_bus_remove_devices+0x30/0x50 [mei]
[  195.271845]  mei_stop+0x35/0xb0 [mei]
[  195.271847]  mei_me_pci_suspend+0x27/0x80 [mei_me]
[  195.271849]  pci_pm_freeze+0x50/0xc0
[  195.271852]  dpm_run_callback+0x5d/0x2f0
[  195.271854]  __device_suspend+0x11f/0x600
[  195.271856]  ? dpm_watchdog_set+0x60/0x60
[  195.271859]  async_suspend+0x15/0x90
[  195.271861]  async_run_entry_fn+0x34/0x160
[  195.271863]  process_one_work+0x229/0x6a0
[  195.271866]  worker_thread+0x35/0x380
[  195.271869]  ? process_one_work+0x6a0/0x6a0
[  195.271870]  kthread+0x119/0x130
[  195.271872]  ? _kthread_create_on_node+0x60/0x60
[  195.271874]  ret_from_fork+0x3a/0x50
[  195.271877] Code: 74 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 88 15 0d 82 e8 f2 0e bf ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 c0 15 0d 82 e8 de 0e bf ff 0f 
[  195.271912] irq event stamp: 2149676
[  195.271914] hardirqs last  enabled at (2149675): [<ffffffff8192f9b4>] _raw_spin_unlock_irq+0x24/0x50
[  195.271916] hardirqs last disabled at (2149676): [<ffffffff8192845a>] __schedule+0xaa/0xbe0
[  195.271918] softirqs last  enabled at (2149440): [<ffffffff81c0032b>] __do_softirq+0x32b/0x4e1
[  195.271920] softirqs last disabled at (2149419): [<ffffffff8108b904>] irq_exit+0xa4/0xb0
[  195.271922] WARNING: CPU: 0 PID: 130 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90
[  195.271923] ---[ end trace c8d0b1eee9a6bdbb ]---
Comment 1 Chris Wilson 2018-05-02 12:17:54 UTC
Unless kasan gives any reason to believe otherwise, this looks just to be more fallout from the usb memcorruption.
Comment 2 Chris Wilson 2018-05-11 14:29:57 UTC
commit 44a182b9d17765514fa2b1cc911e4e65134eef93
Author: Mathias Nyman <mathias.nyman@linux.intel.com>
Date:   Thu May 3 17:30:07 2018 +0300

    xhci: Fix use-after-free in xhci_free_virt_device
    
    KASAN found a use-after-free in xhci_free_virt_device+0x33b/0x38e
    where xhci_free_virt_device() sets slot id to 0 if udev exists:
    if (dev->udev && dev->udev->slot_id)
            dev->udev->slot_id = 0;
    
    dev->udev will be true even if udev is freed because dev->udev is
    not set to NULL.
    
    set dev->udev pointer to NULL in xhci_free_dev()
    
    The original patch went to stable so this fix needs to be applied
    there as well.
    
    Fixes: a400efe455f7 ("xhci: zero usb device slot_id member when disabling and freeing a xhci slot")

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.