Bug 109981 - [CI][SHARDS] igt@i915_selftest@mock_requests - dmesg-warn - BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
Summary: [CI][SHARDS] igt@i915_selftest@mock_requests - dmesg-warn - BUG: unable to ha...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-12 15:39 UTC by Martin Peres
Modified: 2019-04-17 14:28 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features:


Attachments

Description Martin Peres 2019-03-12 15:39:53 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4881/shard-iclb8/igt@i915_selftest@mock_requests.html

<6> [65.039091] i915: Running i915_request_mock_selftests/mock_breadcrumbs_smoketest
<3> [65.862647] waiting for 395 fences (last 1069:2) on mock timed out!
<1> [65.862738] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
<1> [65.862745] #PF error: [INSTR]
<6> [65.862748] PGD 0 P4D 0 
<4> [65.862752] Oops: 0010 [#1] PREEMPT SMP NOPTI
<4> [65.862758] CPU: 2 PID: 1138 Comm: igt/2 Tainted: G     U            5.0.0-CI-CI_DRM_5734+ #1
<4> [65.862765] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [65.862774] RIP: 0010:          (null)
<4> [65.862780] Code: Bad RIP value.
<4> [65.862783] RSP: 0018:ffffc90000577de0 EFLAGS: 00010246
<4> [65.862788] RAX: 0000000000000000 RBX: ffff8884900c8058 RCX: 0000000000000000
<4> [65.862794] RDX: 0000000000000001 RSI: 00000000000001ff RDI: ffff888499c1c000
<4> [65.862800] RBP: ffff888499c1c000 R08: 0000000000000f0a R09: ffff88849d289000
<4> [65.862805] R10: ffff88849d289d68 R11: ffff88849e1bcfb8 R12: ffff8884900c8058
<4> [65.862811] R13: ffff8884900c0918 R14: ffff8884900c8058 R15: ffff8884900c8098
<4> [65.862816] FS:  0000000000000000(0000) GS:ffff88849fe80000(0000) knlGS:0000000000000000
<4> [65.862823] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [65.862828] CR2: ffffffffffffffd6 CR3: 0000000495914001 CR4: 0000000000760ee0
<4> [65.862834] PKRU: 55555554
<4> [65.862836] Call Trace:
<4> [65.862887]  ? __i915_gem_set_wedged.part.5+0x7a/0x1f0 [i915]
<4> [65.862895]  ? printk+0x4d/0x69
<4> [65.862928]  ? i915_gem_set_wedged+0x2a/0x40 [i915]
<4> [65.862972]  ? __igt_breadcrumbs_smoketest+0x514/0x6f0 [i915]
<4> [65.862980]  ? wait_woken+0xa0/0xa0
<4> [65.863022]  ? i915_request_add+0x9d0/0x9d0 [i915]
<4> [65.863027]  ? kthread+0x119/0x130
<4> [65.863031]  ? kthread_park+0x80/0x80
<4> [65.863037]  ? ret_from_fork+0x24/0x50
Comment 1 CI Bug Log 2019-03-12 15:41:44 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@i915_selftest@mock_requests - dmesg-warn - BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4881/shard-iclb8/igt@i915_selftest@mock_requests.html

* ICL: igt@runner@aborted - fail - Previous test: i915_selftest (mock_requests)
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4881/shard-iclb8/igt@runner@aborted.html
Comment 2 Chris Wilson 2019-03-12 15:48:23 UTC
For whatever reason the test failed due to an inexplicable delay, hitting cleanup code never before executed and which doesn't exist!
Comment 3 Chris Wilson 2019-03-20 09:03:38 UTC
Ignoring the root cause of the erroneous delay, and merely focusing on the secondary explosion:

commit d315d4faf82092df6fe82f456fd26dc8b247b627 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 19 21:42:33 2019 +0000

    drm/i915/selftests: Provide stub reset functions
    
    If a test fails, we quite often mark the device as wedged. Provide the
    stub functions so that we can wedge the mock device, and avoid exploding
    on test failures.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109981
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190319214233.25498-3-chris@chris-wilson.co.uk
Comment 4 Martin Peres 2019-04-17 14:28:45 UTC
(In reply to Chris Wilson from comment #3)
> Ignoring the root cause of the erroneous delay, and merely focusing on the
> secondary explosion:
> 
> commit d315d4faf82092df6fe82f456fd26dc8b247b627 (HEAD ->
> drm-intel-next-queued, drm-intel/for-linux-next,
> drm-intel/drm-intel-next-queued)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Mar 19 21:42:33 2019 +0000
> 
>     drm/i915/selftests: Provide stub reset functions
>     
>     If a test fails, we quite often mark the device as wedged. Provide the
>     stub functions so that we can wedge the mock device, and avoid exploding
>     on test failures.
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109981
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20190319214233.25498-3-
> chris@chris-wilson.co.uk

Thanks, it seems to have done the trick! It happened twice in 4 runs, and now nothing for 180.
Comment 5 CI Bug Log 2019-04-17 14:28:51 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.