Bug 111905 - [CI][BAT] igt@i915_selftest@live_hangcheck - incomplete - BUG: kernel NULL pointer dereference, address: 0000000000000028
Summary: [CI][BAT] igt@i915_selftest@live_hangcheck - incomplete - BUG: kernel NULL po...
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-04 07:37 UTC by Lakshmi
Modified: 2019-10-25 17:10 UTC (History)
1 user (show)

See Also:
i915 platform: BDW
i915 features: GEM/Other


Attachments

Description Lakshmi 2019-10-04 07:37:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5211/fi-bdw-5557u/igt@i915_selftest@live_hangcheck.html

<1> [472.177001] BUG: kernel NULL pointer dereference, address: 0000000000000028
<1> [472.177005] #PF: supervisor read access in kernel mode
<1> [472.177007] #PF: error_code(0x0000) - not-present page
<6> [472.177008] PGD 0 P4D 0 
<4> [472.177011] Oops: 0000 [#1] PREEMPT SMP PTI
<4> [472.177013] CPU: 2 PID: 4908 Comm: i915_selftest Tainted: G     U            5.4.0-rc1-CI-CI_DRM_6999+ #1
<4> [472.177014] Hardware name:  /NUC5i7RYB, BIOS RYBDWi35.86A.0362.2017.0118.0940 01/18/2017
<4> [472.177058] RIP: 0010:trace_ports+0x1bd/0x2c0 [i915]
<4> [472.177060] Code: 48 c7 c2 6e 23 67 a0 be 01 00 00 00 48 c7 c7 a0 79 24 82 e8 a5 5e aa e0 48 8b 45 d0 44 3b 7d cc 4c 8b 38 48 c7 c0 68 e0 81 a0 <4d> 8b 4f 28 0f 89 85 fe ff ff 48 c7 c0 65 e0 81 a0 e9 79 fe ff ff
<4> [472.177061] RSP: 0018:ffffc90000453738 EFLAGS: 00010046
<4> [472.177063] RAX: ffffffffa081e068 RBX: ffff8881d55aa570 RCX: 00000000cbec4aa8
<4> [472.177064] RDX: ffff88823df1d7d0 RSI: 00000000b00abb6e RDI: 00000000ffffffff
<4> [472.177065] RBP: ffffc90000453770 R08: ffff88823df1d820 R09: 00000000fffffffe
<4> [472.177066] R10: 0000000022822714 R11: 00000000d6ff0aea R12: ffffffffa081e156
<4> [472.177068] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4> [472.177069] FS:  00007f4d8305de40(0000) GS:ffff88824eb00000(0000) knlGS:0000000000000000
<4> [472.177070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [472.177072] CR2: 0000000000000028 CR3: 000000020b58c006 CR4: 00000000003606e0
<4> [472.177073] Call Trace:
<4> [472.177116]  assert_pending_valid+0x1a/0x1a0 [i915]
<4> [472.177159]  process_csb+0x3b6/0xaa0 [i915]
<4> [472.177201]  __execlists_reset+0x30/0xac0 [i915]
<4> [472.177243]  execlists_reset+0x3d/0x50 [i915]
<4> [472.177284]  intel_engine_reset+0xdf/0x230 [i915]
<4> [472.177323]  __igt_atomic_reset_engine+0x4b/0x90 [i915]
<0> [472.180412] igt/rcs0-5002    3.... 471435087us : i915_request_retire_upto: rcs0 fence 298a6:176, current 176
<0> [472.187736] igt/rcs0-5002    3d..1 471438393us : trace_ports: rcs0: submit { 298a6:182, 0:0 }
<0> [472.197054] ksoftirq-26      3d.s1 471442049us : __execlists_submission_tasklet: vecs0: queue_priority_hint:-2147483648, submit:no
Comment 1 CI Bug Log 2019-10-04 07:42:24 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* BDW: igt@i915_selftest@live_hangcheck - incomplete - BUG: kernel NULL pointer dereference, address: 0000000000000028
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5211/fi-bdw-5557u/igt@i915_selftest@live_hangcheck.html
Comment 2 Chris Wilson 2019-10-04 10:19:59 UTC
* blinks

That's worrisome, but I hope it's just the debug printer itself.
Comment 3 Chris Wilson 2019-10-09 10:25:18 UTC
We're overdue for a kasan run or three. Maybe that will turn up something useful.
Comment 4 Francesco Balestrieri 2019-10-10 06:08:17 UTC
BAT run, happened so far once but as it's recent we need to keep following it. Setting to medium for now, will adjust based on occurrence rate.
Comment 5 Chris Wilson 2019-10-10 13:21:55 UTC
commit c949ae431467764277cdd88d7c26ff963a9db40a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Oct 9 11:09:54 2019 +0100

    drm/i915/execlists: Protect peeking at execlists->active
    
    Now that we dropped the engine->active.lock serialisation from around
    process_csb(), direct submission can run concurrently to the interrupt
    handler. As such execlists->active may be advanced as we dequeue,
    dropping the reference to the request. We need to employ our RCU request
    protection to ensure that the request is not freed too early.
    
    Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20191009100955.21477-1-chris@chris-wilson.co.uk

is a vaguely related patch. It shouldn't impact this path along reset, but maybe it provides a hint?
Comment 6 Chris Wilson 2019-10-25 17:10:25 UTC
Writing it off as a one-off and pretending the supposedly unrelated patch, in fact, fixed everything.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.