Bug 111695 - [CI]igt@i915_selftest@live_execlists - incomplete - list_del corruption. prev->next should be ffff888361ffca10, but was ffff88840ac2c730
Summary: [CI]igt@i915_selftest@live_execlists - incomplete - list_del corruption. prev...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: not set not set
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-16 11:31 UTC by Lakshmi
Modified: 2019-09-19 11:03 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Attachments

Description Lakshmi 2019-09-16 11:31:09 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6898/fi-icl-u2/igt@i915_selftest@live_execlists.html

 list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610).
<4> [876.736136] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [876.736137] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
<4> [876.736154] CPU: 2 PID: 21 Comm: ksoftirqd/2 Tainted: G     U            5.3.0-CI-CI_DRM_6898+ #1
<4> [876.736156] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [876.736157] RIP: 0010:__list_add_valid+0x4d/0x70
<4> [876.736159] Code: c3 48 89 d1 48 c7 c7 20 33 0e 82 48 89 c2 e8 4a 4a bc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 70 33 0e 82 e8 33 4a bc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 c0 33 0e 82 e8
<4> [876.736160] RSP: 0018:ffffc9000018bd30 EFLAGS: 00010082
<4> [876.736162] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000104
<4> [876.736163] RDX: 0000000080000104 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [876.736164] RBP: ffffc9000018bd68 R08: 0000000000000000 R09: 0000000000000001
<4> [876.736165] R10: 00000000aed95de3 R11: 000000007fe927eb R12: ffff888361ffca10
<4> [876.736166] R13: ffff888361ffa610 R14: ffff888361ffc880 R15: ffff8883f931a1f8
<4> [876.736168] FS:  0000000000000000(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
<4> [876.736169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [876.736170] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
<4> [876.736171] PKRU: 55555554
<4> [876.736172] Call Trace:
<4> [876.736226]  __i915_request_submit+0x152/0x370 [i915]
<4> [876.736263]  __execlists_submission_tasklet+0x6da/0x1f50 [i915]
<4> [876.736293]  ? execlists_submission_tasklet+0x29/0x50 [i915]
<4> [876.736321]  execlists_submission_tasklet+0x34/0x50 [i915]
<4> [876.736325]  tasklet_action_common.isra.5+0x47/0xb0
<4> [876.736328]  __do_softirq+0xd8/0x4ae
<4> [876.736332]  ? smpboot_thread_fn+0x23/0x280
<4> [876.736334]  ? smpboot_thread_fn+0x6b/0x280
<4> [876.736336]  run_ksoftirqd+0x2b/0x50
<4> [876.736338]  smpboot_thread_fn+0x1d3/0x280
<4> [876.736341]  ? sort_range+0x20/0x20
<4> [876.736343]  kthread+0x119/0x130
<4> [876.736345]  ? kthread_park+0xa0/0xa0
<4> [876.736347]  ret_from_fork+0x24/0x50
<4> [876.736353] irq event stamp: 2290145
<4> [876.736356] hardirqs last  enabled at (2290144): [<ffffffff8123cde8>] __slab_free+0x3e8/0x500
<4> [876.736358] hardirqs last disabled at (2290145): [<ffffffff819cfb4d>] _raw_spin_lock_irqsave+0xd/0x50
<4> [876.736360] softirqs last  enabled at (2290114): [<ffffffff81c0033e>] __do_softirq+0x33e/0x4ae
<4> [876.736361] softirqs last disabled at (2290119): [<ffffffff810b815b>] run_ksoftirqd+0x2b/0x50
<4> [876.736363] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [876.736364] ---[ end trace 3e58d6c7356c65bf ]---
Comment 1 CI Bug Log 2019-09-16 11:31:47 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@i915_selftest@live_execlists - incomplete - list_del corruption. prev-&gt;next should be ffff888361ffca10, but was ffff88840ac2c730
  (No new failures associated)
Comment 2 Chris Wilson 2019-09-16 11:39:36 UTC
https://patchwork.freedesktop.org/series/66753/
Comment 3 Chris Wilson 2019-09-19 11:03:37 UTC
commit 37fa0de3c137d5f54f7e64f53495c9d501d42a4d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 18 15:54:50 2019 +0100

    drm/i915: Verify the engine after acquiring the active.lock
    
    When using virtual engines, the rq->engine is not stable until we hold
    the engine->active.lock (as the virtual engine may be exchanged with the
    sibling). Since commit 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
    we may retire a request concurrently with resubmitting it to HW, we need
    to be extra careful to verify we are holding the correct lock for the
    request's active list. This is similar to the issue we saw with
    rescheduling the virtual requests, see sched_lock_engine().
    
    Or else:
    
    <4> [876.736126] list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610).
    <4> [876.736136] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
    <4> [876.736137] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
    <4> [876.736154] CPU: 2 PID: 21 Comm: ksoftirqd/2 Tainted: G     U            5.3.0-CI-CI_DRM_6898+ #1
    <4> [876.736156] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
    <4> [876.736157] RIP: 0010:__list_add_valid+0x4d/0x70
    <4> [876.736159] Code: c3 48 89 d1 48 c7 c7 20 33 0e 82 48 89 c2 e8 4a 4a bc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 70 33 0e 82 e8 33 4a bc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 c0 33 0e 82 e8
    <4> [876.736160] RSP: 0018:ffffc9000018bd30 EFLAGS: 00010082
    <4> [876.736162] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000104
    <4> [876.736163] RDX: 0000000080000104 RSI: 0000000000000000 RDI: 00000000ffffffff
    <4> [876.736164] RBP: ffffc9000018bd68 R08: 0000000000000000 R09: 0000000000000001
    <4> [876.736165] R10: 00000000aed95de3 R11: 000000007fe927eb R12: ffff888361ffca10
    <4> [876.736166] R13: ffff888361ffa610 R14: ffff888361ffc880 R15: ffff8883f931a1f8
    <4> [876.736168] FS:  0000000000000000(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
    <4> [876.736169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    <4> [876.736170] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
    <4> [876.736171] PKRU: 55555554
    <4> [876.736172] Call Trace:
    <4> [876.736226]  __i915_request_submit+0x152/0x370 [i915]
    <4> [876.736263]  __execlists_submission_tasklet+0x6da/0x1f50 [i915]
    <4> [876.736293]  ? execlists_submission_tasklet+0x29/0x50 [i915]
    <4> [876.736321]  execlists_submission_tasklet+0x34/0x50 [i915]
    <4> [876.736325]  tasklet_action_common.isra.5+0x47/0xb0
    <4> [876.736328]  __do_softirq+0xd8/0x4ae
    <4> [876.736332]  ? smpboot_thread_fn+0x23/0x280
    <4> [876.736334]  ? smpboot_thread_fn+0x6b/0x280
    <4> [876.736336]  run_ksoftirqd+0x2b/0x50
    <4> [876.736338]  smpboot_thread_fn+0x1d3/0x280
    <4> [876.736341]  ? sort_range+0x20/0x20
    <4> [876.736343]  kthread+0x119/0x130
    <4> [876.736345]  ? kthread_park+0xa0/0xa0
    <4> [876.736347]  ret_from_fork+0x24/0x50
    <4> [876.736353] irq event stamp: 2290145
    <4> [876.736356] hardirqs last  enabled at (2290144): [<ffffffff8123cde8>] __slab_free+0x3e8/0x500
    <4> [876.736358] hardirqs last disabled at (2290145): [<ffffffff819cfb4d>] _raw_spin_lock_irqsave+0xd/0x50
    <4> [876.736360] softirqs last  enabled at (2290114): [<ffffffff81c0033e>] __do_softirq+0x33e/0x4ae
    <4> [876.736361] softirqs last disabled at (2290119): [<ffffffff810b815b>] run_ksoftirqd+0x2b/0x50
    <4> [876.736363] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
    <4> [876.736364] ---[ end trace 3e58d6c7356c65bf ]---
    <4> [876.736406] ------------[ cut here ]------------
    <4> [876.736415] list_del corruption. prev->next should be ffff888361ffca10, but was ffff88840ac2c730
    <4> [876.736421] WARNING: CPU: 2 PID: 5490 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90
    <4> [876.736422] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
    <4> [876.736433] CPU: 2 PID: 5490 Comm: i915_selftest Tainted: G     U  W         5.3.0-CI-CI_DRM_6898+ #1
    <4> [876.736435] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
    <4> [876.736436] RIP: 0010:__list_del_entry_valid+0x79/0x90
    <4> [876.736438] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 30 34 0e 82 e8 ae 49 bc ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 68 34 0e 82 e8 97 49 bc ff <0f> 0b 31 c0 c3 48 c7 c7 a8 34 0e 82 e8 86 49 bc ff 0f 0b 31 c0 c3
    <4> [876.736439] RSP: 0018:ffffc900003ef758 EFLAGS: 00010086
    <4> [876.736440] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000002
    <4> [876.736442] RDX: 0000000080000002 RSI: 0000000000000000 RDI: 00000000ffffffff
    <4> [876.736443] RBP: ffffc900003ef780 R08: 0000000000000000 R09: 0000000000000001
    <4> [876.736444] R10: 000000001418e4b7 R11: 000000007f0ea93b R12: ffff888361ffcab8
    <4> [876.736445] R13: ffff88843b6d0000 R14: 000000000000217c R15: 0000000000000001
    <4> [876.736447] FS:  00007f4e6f255240(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
    <4> [876.736448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    <4> [876.736449] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
    <4> [876.736450] PKRU: 55555554
    <4> [876.736451] Call Trace:
    <4> [876.736488]  i915_request_retire+0x224/0x8e0 [i915]
    <4> [876.736521]  i915_request_create+0x4b/0x1b0 [i915]
    <4> [876.736550]  nop_virtual_engine+0x230/0x4d0 [i915]
    
    Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111695
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190918145453.8800-1-chris@chris-wilson.co.uk


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.