https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3683/shard-kbl4/igt@drv_selftest@live_hangcheck.html [ 200.430432] BUG: stack guard page was hit at 00000000a99d6f9e (stack is 00000000433da8e3..00000000af5c7482) [ 200.430438] kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP PTI
https://bugs.freedesktop.org/show_bug.cgi?id=104262#c8 > With a kasan run to investigate the stack page overflow: > > https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_1702/shard-kbl6/ > igt@drv_selftest@live_hangcheck.html > > it passed. Note that one thing that kasan does is disable CONFIG_STACK_VMAP, > so changing the stack allocatin/layout. Still not the positive lead I was > hoping for.
Mysterious https://intel-gfx-ci.01.org/CI/CI_DRM_3687/shard-kbl1/igt@drv_selftest@live_hangcheck.html Let's see if it has magically resolved itself!
(In reply to Chris Wilson from comment #2) > Mysterious > https://intel-gfx-ci.01.org/CI/CI_DRM_3687/shard-kbl1/ > igt@drv_selftest@live_hangcheck.html > > Let's see if it has magically resolved itself! Just a fluke.
Infinite recursion: [ 202.346915] BUG: stack guard page was hit at 00000000fdec3e36 (stack is 00000000cfb340f3..00000000783f2a8f) [ 202.346915] kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP PTI [ 202.346915] Dumping ftrace buffer: [ 202.346915] (ftrace buffer empty) [ 202.346916] Modules linked in: i915(+) snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec x86_pkg_temp_thermal intel_powerclamp coretemp e1000e snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ghash_clmulni_intel snd_pcm mei_me mei ptp pps_core prime_numbers [last unloaded: i915] [ 202.346919] CPU: 0 PID: 5985 Comm: drv_selftest Tainted: G U 4.15.0-CI-Trybot_1728+ #1 [ 202.346919] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 [ 202.346920] RIP: 0010:__lock_acquire+0x3b/0x1b60 [ 202.346920] RSP: 0018:ffffc90000243f90 EFLAGS: 00010086 [ 202.346920] RAX: 0000000000000000 RBX: 0000000000000086 RCX: 0000000000000000 [ 202.346921] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82244958 [ 202.346921] RBP: ffffc90000244050 R08: 0000000000000001 R09: 0000000000000001 [ 202.346921] R10: 0000000000000000 R11: ffffffff810eae9e R12: 0000000000000000 [ 202.346921] R13: ffff880273260040 R14: 0000000000000001 R15: 0000000000000000 [ 202.346922] FS: 00007f654b75f8c0(0000) GS:ffff88027ec00000(0000) knlGS:0000000000000000 [ 202.346922] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 202.346922] CR2: ffffc90000243f88 CR3: 000000025578e006 CR4: 00000000003606f0 [ 202.346922] Call Trace: [ 202.346922] ? lock_acquire+0xaf/0x200 [ 202.346923] lock_acquire+0xaf/0x200 [ 202.346923] ? vprintk_emit+0x6e/0x3b0 [ 202.346923] _raw_spin_lock+0x2a/0x40 [ 202.346923] ? vprintk_emit+0x6e/0x3b0 [ 202.346923] vprintk_emit+0x6e/0x3b0 [ 202.346924] dev_vprintk_emit+0x94/0x200 [ 202.346924] ? deactivate_slab.isra.23+0x856/0x880 [ 202.346924] dev_printk_emit+0x36/0x40 [ 202.346924] ? lock_acquire+0xaf/0x200 [ 202.346924] dev_notice+0x50/0x60 [ 202.346925] ? i915_gem_unset_wedged+0x151/0x180 [i915] [ 202.346925] i915_reset+0x177/0x270 [i915] [ 202.346925] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346925] i915_wait_request+0x77b/0x820 [i915] [ 202.346925] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346926] ? wake_up_q+0x70/0x70 [ 202.346926] ? wake_up_q+0x70/0x70 [ 202.346926] wait_for_space+0x91/0x150 [i915] [ 202.346926] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346926] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346927] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346927] i915_gem_reset+0x107/0x130 [i915] [ 202.346927] i915_reset+0x207/0x270 [i915] [ 202.346927] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346928] i915_wait_request+0x77b/0x820 [i915] [ 202.346928] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346928] ? wake_up_q+0x70/0x70 [ 202.346928] ? wake_up_q+0x70/0x70 [ 202.346928] wait_for_space+0x91/0x150 [i915] [ 202.346929] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346929] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346929] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346929] i915_gem_reset+0x107/0x130 [i915] [ 202.346929] i915_reset+0x207/0x270 [i915] [ 202.346930] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346930] i915_wait_request+0x77b/0x820 [i915] [ 202.346930] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346930] ? wake_up_q+0x70/0x70 [ 202.346930] ? wake_up_q+0x70/0x70 [ 202.346931] wait_for_space+0x91/0x150 [i915] [ 202.346931] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346931] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346931] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346931] i915_gem_reset+0x107/0x130 [i915] [ 202.346932] i915_reset+0x207/0x270 [i915] [ 202.346932] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346932] i915_wait_request+0x77b/0x820 [i915] [ 202.346932] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346932] ? wake_up_q+0x70/0x70 [ 202.346933] ? wake_up_q+0x70/0x70 [ 202.346933] wait_for_space+0x91/0x150 [i915] [ 202.346933] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346933] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346933] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346934] i915_gem_reset+0x107/0x130 [i915] [ 202.346934] i915_reset+0x207/0x270 [i915] [ 202.346934] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346934] i915_wait_request+0x77b/0x820 [i915] [ 202.346935] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346935] ? wake_up_q+0x70/0x70 [ 202.346935] ? wake_up_q+0x70/0x70 [ 202.346935] wait_for_space+0x91/0x150 [i915] [ 202.346935] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346936] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346936] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346936] i915_gem_reset+0x107/0x130 [i915] [ 202.346936] i915_reset+0x207/0x270 [i915] [ 202.346936] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346937] i915_wait_request+0x77b/0x820 [i915] [ 202.346937] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346937] ? wake_up_q+0x70/0x70 [ 202.346937] ? wake_up_q+0x70/0x70 [ 202.346937] wait_for_space+0x91/0x150 [i915] [ 202.346938] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346938] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346938] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346938] i915_gem_reset+0x107/0x130 [i915] [ 202.346938] i915_reset+0x207/0x270 [i915] [ 202.346939] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346939] i915_wait_request+0x77b/0x820 [i915] [ 202.346939] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346939] ? wake_up_q+0x70/0x70 [ 202.346939] ? wake_up_q+0x70/0x70 [ 202.346940] wait_for_space+0x91/0x150 [i915] [ 202.346940] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346940] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346940] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346940] i915_gem_reset+0x107/0x130 [i915] [ 202.346941] i915_reset+0x207/0x270 [i915] [ 202.346941] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346941] i915_wait_request+0x77b/0x820 [i915] [ 202.346941] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346942] ? wake_up_q+0x70/0x70 [ 202.346942] ? wake_up_q+0x70/0x70 [ 202.346942] wait_for_space+0x91/0x150 [i915] [ 202.346942] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346942] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346943] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346943] i915_gem_reset+0x107/0x130 [i915] [ 202.346943] i915_reset+0x207/0x270 [i915] [ 202.346943] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346943] i915_wait_request+0x77b/0x820 [i915] [ 202.346944] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346944] ? wake_up_q+0x70/0x70 [ 202.346944] ? wake_up_q+0x70/0x70 [ 202.346944] wait_for_space+0x91/0x150 [i915] [ 202.346944] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346945] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346945] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346945] i915_gem_reset+0x107/0x130 [i915] [ 202.346945] i915_reset+0x207/0x270 [i915] [ 202.346945] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346946] i915_wait_request+0x77b/0x820 [i915] [ 202.346946] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346946] ? wake_up_q+0x70/0x70 [ 202.346946] ? wake_up_q+0x70/0x70 [ 202.346946] wait_for_space+0x91/0x150 [i915] [ 202.346947] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346947] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346947] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346947] i915_gem_reset+0x107/0x130 [i915] [ 202.346947] i915_reset+0x207/0x270 [i915] [ 202.346948] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346948] i915_wait_request+0x77b/0x820 [i915] [ 202.346948] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346948] ? wake_up_q+0x70/0x70 [ 202.346948] ? wake_up_q+0x70/0x70 [ 202.346949] wait_for_space+0x91/0x150 [i915] [ 202.346949] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346949] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346949] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346949] i915_gem_reset+0x107/0x130 [i915] [ 202.346950] i915_reset+0x207/0x270 [i915] [ 202.346950] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346950] i915_wait_request+0x77b/0x820 [i915] [ 202.346950] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346950] ? wake_up_q+0x70/0x70 [ 202.346951] ? wake_up_q+0x70/0x70 [ 202.346951] wait_for_space+0x91/0x150 [i915] [ 202.346951] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346951] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346951] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346952] i915_gem_reset+0x107/0x130 [i915] [ 202.346952] i915_reset+0x207/0x270 [i915] [ 202.346952] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346952] i915_wait_request+0x77b/0x820 [i915] [ 202.346952] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346953] ? wake_up_q+0x70/0x70 [ 202.346953] ? wake_up_q+0x70/0x70 [ 202.346953] wait_for_space+0x91/0x150 [i915] [ 202.346953] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346953] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346954] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346954] i915_gem_reset+0x107/0x130 [i915] [ 202.346954] i915_reset+0x207/0x270 [i915] [ 202.346954] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346955] i915_wait_request+0x77b/0x820 [i915] [ 202.346955] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346955] ? wake_up_q+0x70/0x70 [ 202.346955] ? wake_up_q+0x70/0x70 [ 202.346955] wait_for_space+0x91/0x150 [i915] [ 202.346956] intel_ring_begin+0x113/0x1a0 [i915] [ 202.346956] gen8_emit_flush_render+0x96/0x260 [i915] [ 202.346956] i915_gem_request_alloc+0x2c8/0x5e0 [i915] [ 202.346956] i915_gem_reset+0x107/0x130 [i915] [ 202.346956] i915_reset+0x207/0x270 [i915] [ 202.346957] __i915_wait_request_check_and_reset.isra.9.part.10+0x26/0x30 [i915] [ 202.346957] i915_wait_request+0x77b/0x820 [i915] [ 202.346957] ? ___slab_alloc.constprop.30+0x152/0x3d0 [ 202.346957] ? wake_up_q+0x70/0x70 [ 202.346957] ? wake_up_q+0x70/0x70 [ 202.346958] wait_for_space+0 [ 202.346958] Lost 238 message(s)! Mainly due to the selftest design. Hmm.
*** Bug 104594 has been marked as a duplicate of this bug. ***
(In reply to Chris Wilson from comment #5) > *** Bug 104594 has been marked as a duplicate of this bug. *** OK Chris. Just a reminder that bug 104262 is the one tracked in cibuglog.
After the stack overflow is fixed, we are still likely to get the bad read from the CSB register sporadically occurring, so 104262 will have a long life yet.
commit 01b8fdc5222007bdfc905941173f82576898a7f7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 5 15:24:31 2018 +0000 drm/i915: Skip post-reset request emission if the engine is not idle Since commit 7b6da818d86f ("drm/i915: Restore the kernel context after a GPU reset on an idle engine") we submit a request following the engine reset. The intent is that we don't submit a request if the engine is busy (as it will restart active by itself) but we only checked to see if there were remaining requests in flight on the hardware and skipped checking to see if there were any ready requests that would be immediately submitted on restart (the same time as our new request would be). Having convinced the engine to appear idle in the previous patch, we can use intel_engine_is_idle() as a better test to only submit a new request if there are no pending requests. As it happens, this is tripping up igt/drv_selftest/live_hangcheck in CI as we overfill the kernel_context ringbuffer trigger an infinite recursion from within the reset. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104786 References: 7b6da818d86f ("drm/i915: Restore the kernel context after a GPU reset on an idle engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180205152431.12163-4-chris@chris-wilson.co.uk
Closing, please re-open if still occurs.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.