Bug 111159 - [CI][DRMTIP] igt@gem_exec_balancer@nop - dmesg-warn - list_del corruption
Summary: [CI][DRMTIP] igt@gem_exec_balancer@nop - dmesg-warn - list_del corruption
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-17 10:35 UTC by Lakshmi
Modified: 2019-07-19 11:57 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Attachments

Description Lakshmi 2019-07-17 10:35:40 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_323/fi-icl-u3/igt@gem_exec_balancer@nop.html

6> [865.857143] Console: switching to colour dummy device 80x25
<6> [865.857180] [IGT] gem_exec_balancer: executing
<5> [865.871583] Setting dangerous option reset - tainting kernel
<6> [865.874089] [IGT] gem_exec_balancer: starting subtest nop
<4> [893.973537] ------------[ cut here ]------------
<4> [893.973566] list_del corruption. prev->next should be ffff993d38943868, but was ffff993d38a9b868
<4> [893.973597] WARNING: CPU: 2 PID: 1171 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90
<4> [893.973599] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul btusb btrtl snd_hda_intel ghash_clmulni_intel btbcm snd_hda_codec btintel snd_hwdep snd_hda_core bluetooth e1000e cdc_ether usbnet snd_pcm mii ptp ecdh_generic pps_core ecc mei_me mei prime_numbers
<4> [893.973629] CPU: 2 PID: 1171 Comm: kworker/u16:0 Tainted: G     U            5.2.0-gf870335815ab-drmtip_323+ #1
<4> [893.973630] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [893.973672] Workqueue: i915 retire_work_handler [i915]
<4> [893.973687] RIP: 0010:__list_del_entry_valid+0x79/0x90
<4> [893.973689] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 f8 35 0a 88 e8 2e af bd ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 30 36 0a 88 e8 17 af bd ff <0f> 0b 31 c0 c3 48 c7 c7 70 36 0a 88 e8 06 af bd ff 0f 0b 31 c0 c3
<4> [893.973691] RSP: 0018:ffffa112001ebda0 EFLAGS: 00010086
<4> [893.973693] RAX: 0000000000000000 RBX: ffff993d52590e80 RCX: 0000000000000000
<4> [893.973694] RDX: 0000000000000007 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [893.973696] RBP: ffff993d3843a2b0 R08: 0000000000000000 R09: 0000000000000001
<4> [893.973697] R10: ffffa112001ebd28 R11: 0000000000000000 R12: ffff993d38943848
<4> [893.973699] R13: ffff993d38943868 R14: ffff993d52590ee0 R15: 0000000000000000
<4> [893.973701] FS:  0000000000000000(0000) GS:ffff993d5fd00000(0000) knlGS:0000000000000000
<4> [893.973703] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [893.973704] CR2: 00007f64d98ce778 CR3: 000000015f210005 CR4: 0000000000760ee0
<4> [893.973706] PKRU: 55555554
<4> [893.973708] Call Trace:
<4> [893.973735]  i915_request_cancel_breadcrumb+0x14f/0x180 [i915]
<4> [893.973778]  i915_request_retire+0x562/0x840 [i915]
<4> [893.973821]  ring_retire_requests+0x47/0x50 [i915]
<4> [893.973829]  i915_retire_requests+0x57/0xc0 [i915]
<4> [893.973829]  retire_work_handler+0x27/0x60 [i915]
<4> [893.973829]  process_one_work+0x245/0x610
<4> [893.973829]  worker_thread+0x37/0x380
<4> [893.973829]  ? process_one_work+0x610/0x610
<4> [893.973829]  kthread+0x119/0x130
<4> [893.973829]  ? kthread_park+0x80/0x80
<4> [893.973829]  ret_from_fork+0x24/0x50
<4> [893.973829] irq event stamp: 225410
<4> [893.973829] hardirqs last  enabled at (225409): [<ffffffff87237708>] __slab_free+0x3e8/0x4f0
<4> [893.973829] hardirqs last disabled at (225410): [<ffffffffc0507389>] i915_request_retire+0x249/0x840 [i915]
<4> [893.973829] softirqs last  enabled at (224652): [<ffffffff872940a4>] wb_workfn+0x4c4/0x5f0
<4> [893.973829] softirqs last disabled at (224648): [<ffffffff871e9609>] wb_wakeup_delayed+0x29/0x60
<4> [893.973829] WARNING: CPU: 2 PID: 1171 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90
<4> [893.973829] ---[ end trace 3b4e2fba04d7e206 ]---
<6> [903.633639] [IGT] gem_exec_balancer: exiting, ret=0
<5> [903.634058] Setting dangerous option reset - tainting kernel
<6> [903.647122] Console: switching to colour frame buffer device 240x75
Comment 1 CI Bug Log 2019-07-17 10:36:33 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@gem_exec_balancer@nop - dmesg-warn -  list_del corruption
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_323/fi-icl-u3/igt@gem_exec_balancer@nop.html
Comment 2 Chris Wilson 2019-07-17 10:46:42 UTC
https://patchwork.freedesktop.org/series/63712/
Comment 3 Chris Wilson 2019-07-19 11:57:29 UTC
commit 7d6b60dbc6a015dbdc444e4d39549600f7156690 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 16 13:49:30 2019 +0100

    drm/i915/execlists: Cancel breadcrumb on preempting the virtual engine
    
    As we unwind the requests for a preemption event, we return a virtual
    request back to its original virtual engine (so that it is available for
    execution on any of its siblings). In the process, this means that its
    breadcrumb should no longer be associated with the original physical
    engine, and so we are forced to decouple it. Previously, as the request
    could not complete without our awareness, we would move it to the next
    real engine without any danger. However, preempt-to-busy allowed for
    requests to continue on the HW and complete in the background as we
    unwound, which meant that we could end up retiring the request before
    fixing up the breadcrumb link.
    
    [51679.517943] INFO: trying to register non-static key.
    [51679.517956] the code is fine but needs lockdep annotation.
    [51679.517960] turning off the locking correctness validator.
    [51679.517966] CPU: 0 PID: 3270 Comm: kworker/u8:0 Tainted: G     U            5.2.0+ #717
    [51679.517971] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
    [51679.518012] Workqueue: i915 retire_work_handler [i915]
    [51679.518017] Call Trace:
    [51679.518026]  dump_stack+0x67/0x90
    [51679.518031]  register_lock_class+0x52c/0x540
    [51679.518038]  ? find_held_lock+0x2d/0x90
    [51679.518042]  __lock_acquire+0x68/0x1800
    [51679.518047]  ? find_held_lock+0x2d/0x90
    [51679.518073]  ? __i915_sw_fence_complete+0xff/0x1c0 [i915]
    [51679.518079]  lock_acquire+0x90/0x170
    [51679.518105]  ? i915_request_cancel_breadcrumb+0x29/0x160 [i915]
    [51679.518112]  _raw_spin_lock+0x27/0x40
    [51679.518138]  ? i915_request_cancel_breadcrumb+0x29/0x160 [i915]
    [51679.518165]  i915_request_cancel_breadcrumb+0x29/0x160 [i915]
    [51679.518199]  i915_request_retire+0x43f/0x530 [i915]
    [51679.518232]  retire_requests+0x4d/0x60 [i915]
    [51679.518263]  i915_retire_requests+0xdf/0x1f0 [i915]
    [51679.518294]  retire_work_handler+0x4c/0x60 [i915]
    [51679.518301]  process_one_work+0x22c/0x5c0
    [51679.518307]  worker_thread+0x37/0x390
    [51679.518311]  ? process_one_work+0x5c0/0x5c0
    [51679.518316]  kthread+0x116/0x130
    [51679.518320]  ? kthread_create_on_node+0x40/0x40
    [51679.518325]  ret_from_fork+0x24/0x30
    [51679.520177] ------------[ cut here ]------------
    [51679.520189] list_del corruption, ffff88883675e2f0->next is LIST_POISON1 (dead000000000100)
    
    Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-4-chris@chris-wilson.co.uk


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.