Bug 112246

Summary: [CI][SHARDS]igt@gem_ctx_persistence@smoketest - dmesg-warn - Invalid lrc state found before submission
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: not set    
Priority: not set CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: BXT, GLK, ICL, KBL, TGL i915 features: GEM/Other

Description Lakshmi 2019-11-11 12:26:21 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7293/shard-tglb8/igt@gem_ctx_persistence@smoketest.html

<6> [429.970646] Console: switching to colour dummy device 80x25
<6> [429.970686] [IGT] gem_ctx_persistence: executing
<5> [429.974240] Setting dangerous option enable_hangcheck - tainting kernel
<5> [429.974587] Setting dangerous option reset - tainting kernel
<7> [429.976197] [drm:i915_gem_execbuffer2_ioctl [i915]] EINVAL at i915_gem_execbuffer2_ioctl:2820
<6> [429.976664] [IGT] gem_ctx_persistence: starting subtest smoketest
<3> [430.286256] rcs0: context submitted with incorrect RING_START [007ef000], expected 00923000
<4> [430.286389] ------------[ cut here ]------------
<4> [430.286391] Invalid lrc state found before submission
<4> [430.286471] WARNING: CPU: 5 PID: 198 at drivers/gpu/drm/i915/gt/intel_lrc.c:1042 execlists_schedule_in.isra.81+0x354/0x450 [i915]
<4> [430.286473] Modules linked in: vgem i915 mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul cdc_ether usbnet mii ghash_clmulni_intel mei_me mei prime_numbers
<4> [430.286484] CPU: 5 PID: 198 Comm: kworker/u16:3 Tainted: G     U            5.4.0-rc6-CI-CI_DRM_7293+ #1
<4> [430.286485] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2321.A08.1909162051 09/16/2019
<4> [430.286512] Workqueue: events_unbound fence_work [i915]
<4> [430.286539] RIP: 0010:execlists_schedule_in.isra.81+0x354/0x450 [i915]
<4> [430.286542] Code: e4 fe 0d 00 00 00 01 41 89 06 80 3d d6 07 26 00 00 0f 85 5e fe ff ff 48 c7 c7 f8 26 2a a0 c6 05 c2 07 26 00 01 e8 ec 46 f9 e0 <0f> 0b e9 44 fe ff ff 4c 8b a5 88 00 00 00 4c 8b ad 90 00 00 00 e9
<4> [430.286543] RSP: 0018:ffffc900022f7c50 EFLAGS: 00010086
<4> [430.286545] RAX: 0000000000000000 RBX: ffff88842f78b100 RCX: 0000000000000003
<4> [430.286547] RDX: 0000000080000003 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [430.286548] RBP: ffff88848fd49240 R08: 0000000000000000 R09: 0000000000000001
<4> [430.286550] R10: 000000006a6d3819 R11: 000000005cb8c97e R12: ffff888489fd2000
<4> [430.286551] R13: ffff88842f78b100 R14: ffffc9000096e184 R15: ffff888489fd2000
<4> [430.286553] FS:  0000000000000000(0000) GS:ffff8884a0680000(0000) knlGS:0000000000000000
<4> [430.286555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [430.286556] CR2: 00007f0be3a03040 CR3: 000000045f774003 CR4: 0000000000760ee0
<4> [430.286558] PKRU: 55555554
<4> [430.286559] Call Trace:
<4> [430.286590]  __execlists_submission_tasklet+0xbce/0x1670 [i915]
<4> [430.286628]  ? i915_sched_lookup_priolist+0x98/0x350 [i915]
<4> [430.286657]  execlists_submit_request+0x102/0x1e0 [i915]
<4> [430.286693]  submit_notify+0xa8/0x13c [i915]
<4> [430.286717]  __i915_sw_fence_complete+0x81/0x250 [i915]
<4> [430.286744]  dma_i915_sw_fence_wake+0x1b/0x30 [i915]
<4> [430.286749]  dma_fence_signal_locked+0x9e/0x1b0
<4> [430.286753]  dma_fence_signal+0x1f/0x40
<4> [430.286776]  fence_work+0x28/0x80 [i915]
<4> [430.286780]  process_one_work+0x26a/0x620
<4> [430.286786]  worker_thread+0x37/0x380
<4> [430.286791]  ? process_one_work+0x620/0x620
<4> [430.286793]  kthread+0x119/0x130
<4> [430.286796]  ? kthread_park+0x80/0x80
<4> [430.286801]  ret_from_fork+0x24/0x50
<4> [430.286809] irq event stamp: 706846
<4> [430.286812] hardirqs last  enabled at (706845): [<ffffffff81001bea>] trace_hardirqs_on_thunk+0x1a/0x20
<4> [430.286827] hardirqs last disabled at (706846): [<ffffffff819e4f9d>] _raw_spin_lock_irqsave+0xd/0x50
<4> [430.286830] softirqs last  enabled at (706844): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
<4> [430.286833] softirqs last disabled at (706839): [<ffffffff810b7faa>] irq_exit+0xba/0xc0
<4> [430.286840] ---[ end trace 9bdb664234aebd11 ]---
Comment 2 Chris Wilson 2019-11-11 12:40:17 UTC
https://patchwork.freedesktop.org/series/69247/
Comment 3 Chris Wilson 2019-11-11 16:39:36 UTC
commit 31b61f0ef9af62b6404d8df5dcd2cf58f80c9f53 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Nov 11 13:32:05 2019 +0000

    drm/i915/execlists: Move reset_active() from schedule-out to schedule-in
    
    The gem_ctx_persistence/smoketest was detecting an odd coherency issue
    inside the LRC context image; that the address of the ring buffer did
    not match our associated struct intel_ring. As we set the address into
    the context image when we pin the ring buffer into place before the
    context is active, that leaves the question of where did it get
    overwritten. Either the HW context save occurred after our pin which
    would imply that our idle barriers are broken, or we overwrote the
    context image ourselves. It is only in reset_active() where we dabble
    inside the context image outside of a serialised path from schedule-out;
    but we could equally perform the operation inside schedule-in which is
    then fully serialised with the context pin -- and remains serialised by
    the engine pulse with kill_context(). (The only downside, aside from
    doing more work inside the engine->active.lock, was the plan to merge
    all the reset paths into doing their context scrubbing on schedule-out
    needs more thought.)
    
    Fixes: d12acee84ffb ("drm/i915/execlists: Cancel banned contexts on schedule-out")
    Testcase: igt/gem_ctx_persistence/smoketest
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20191111133205.11590-3-chris@chris-wilson.co.uk
Comment 4 CI Bug Log 2019-11-12 08:10:45 UTC
A CI Bug Log filter associated to this bug has been updated:

{- KBL GLK ICL TGL: igt@gem_ctx_persistence@smoketest - dmesg-warn - Invalid lrc state found before submission -}
{+ KBL GLK ICL TGL: igt@gem_ctx_persistence@smoketest - dmesg-warn - Invalid lrc state found before submission +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5269/shard-apl3/igt@gem_ctx_persistence@smoketest.html
Comment 5 CI Bug Log 2019-11-12 08:11:26 UTC
A CI Bug Log filter associated to this bug has been updated:

{- KBL GLK ICL TGL: igt@gem_ctx_persistence@smoketest - dmesg-warn - Invalid lrc state found before submission -}
{+ APL KBL GLK ICL TGL: igt@gem_ctx_persistence@smoketest - dmesg-warn - Invalid lrc state found before submission +}


  No new failures caught with the new filter

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.