Summary: | [CI] igt@* - incomplete - intel_engine_unpin_breadcrumbs_irq:226 GEM_BUG_ON(!b->irq_enabled) | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | chris, intel-gfx-bugs, marta.lofstedt |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | CFL, KBL | i915 features: | firmware/guc |
Description
Martin Peres
2018-05-28 14:56:51 UTC
*** Bug 105864 has been marked as a duplicate of this bug. *** commit 209b7955e59e361fe8ba1911fac68f46355ac0cf Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jul 17 21:29:32 2018 +0100 drm/i915/guc: Keep guc submission permanently engaged We make a decision at module load whether to use the GuC backend or not, but lose that setup across set-wedge. Currently, the guc doesn't override the engine->set_default_submission hook letting execlists sneak back in temporarily on unwedging leading to an unbalanced park/unpark. v2: Remove comment about switching back temporarily to execlists on guc_submission_disable(). We currently only call disable on shutdown, and plan to also call disable before suspend and reset, in which case we will either restore guc submission or mark the driver as wedged, making the reset back to execlists pointless. v3: Move reset.prepare across Fixes: 63572937cebf ("drm/i915/execlists: Flush pending preemption events during reset") Testcase: igt/drv_module_reload/basic-reload-inject Testcase: igt/gem_eio Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180717202932.1423-1-chris@chris-wilson.co.uk (In reply to Chris Wilson from comment #2) > commit 209b7955e59e361fe8ba1911fac68f46355ac0cf > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Jul 17 21:29:32 2018 +0100 > > drm/i915/guc: Keep guc submission permanently engaged > > We make a decision at module load whether to use the GuC backend or not, > but lose that setup across set-wedge. Currently, the guc doesn't > override the engine->set_default_submission hook letting execlists sneak > back in temporarily on unwedging leading to an unbalanced park/unpark. > > v2: Remove comment about switching back temporarily to execlists on > guc_submission_disable(). We currently only call disable on shutdown, > and plan to also call disable before suspend and reset, in which case we > will either restore guc submission or mark the driver as wedged, making > the reset back to execlists pointless. > v3: Move reset.prepare across > > Fixes: 63572937cebf ("drm/i915/execlists: Flush pending preemption > events during reset") > Testcase: igt/drv_module_reload/basic-reload-inject > Testcase: igt/gem_eio > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Michał Winiarski <michal.winiarski@intel.com> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> > Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> > Link: > https://patchwork.freedesktop.org/patch/msgid/20180717202932.1423-1- > chris@chris-wilson.co.uk I would like to believe this solved it, but the evidence has started piling up... It looks like this commit did not change anything :s https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_110/fi-cfl-guc/igt@prime_busy@hang-default.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_110/fi-kbl-guc/igt@prime_busy@wait-hang-default.html Nah, this GEM_BUG_ON is definitely solved. There's no way the guc can trigger it anymore. Those, they are another matter :-p (In reply to Chris Wilson from comment #4) > Nah, this GEM_BUG_ON is definitely solved. There's no way the guc can > trigger it anymore. Those, they are another matter :-p Meaning, need a separate bug for the ongoing failures? I see that this is happening still Call Trace: <4> [479.329919] <IRQ> <4> [479.329931] ? lock_acquire+0xa6/0x1c0 <4> [479.329937] ? handle_irq_event+0x3a/0x50 <4> [479.329947] tasklet_action_common.isra.5+0x47/0xb0 <4> [479.329957] __do_softirq+0xd8/0x483 <4> [479.329964] ? _raw_spin_unlock+0x29/0x40 <4> [479.329973] irq_exit+0xa9/0xc0 <4> [479.329977] do_IRQ+0x9a/0x120 <4> [479.329985] common_interrupt+0xf/0xf <4> [479.329989] </IRQ> <4> [479.329996] RIP: 0010:cpuidle_enter_state+0xab/0x340 <4> [479.330000] Code: 44 00 00 31 ff e8 25 88 94 ff 45 84 f6 74 12 9c 58 f6 c4 02 0f 85 70 02 00 00 31 ff e8 0e 2b 9b ff e8 39 fd 9e ff fb 4c 29 fb <48> ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 ea b8 ff <4> [479.330003] RSP: 0018:ffffc900000a7e90 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffde <4> [479.330010] RAX: ffff88027623ce40 RBX: 00000000000345c1 RCX: 0000000000000000 <4> [479.330013] RDX: 0000000000000046 RSI: ffffffff82124e7a RDI: ffffffff820d38bf <4> [479.330017] RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000000 <4> [479.330020] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880277bac980 <4> [479.330024] R13: ffffffff82298578 R14: 0000000000000000 R15: 0000006f99a3e023 <4> [479.330045] do_idle+0x1f3/0x260 <4> [479.330054] cpu_startup_entry+0x6a/0x70 <4> [479.330061] start_secondary+0x19d/0x1f0 <4> [479.330068] secondary_startup_64+0xa4/0xb0 <4> [479.330084] irq event stamp: 8264263 <4> [479.330089] hardirqs last enabled at (8264262): [<ffffffff8108c8d9>] tasklet_action_common.isra.5+0x29/0xb0 <4> [479.330094] hardirqs last disabled at (8264263): [<ffffffff8194113d>] _raw_spin_lock_irqsave+0xd/0x50 <4> [479.330098] softirqs last enabled at (8264258): [<ffffffff8108c488>] irq_enter+0x58/0x60 <4> [479.330103] softirqs last disabled at (8264259): [<ffffffff8108c539>] irq_exit+0xa9/0xc0 <4> [479.330190] WARNING: CPU: 3 PID: 0 at drivers/gpu/drm/i915/intel_guc_submission.c:638 guc_submission_tasklet+0x7db/0x960 [i915] <4> [479.330196] ---[ end trace ef18452cc0701dee ]--- <3> [479.330205] __i915_request_submit:445 GEM_BUG_ON(intel_engine_signaled(engine, seqno)) <4> [479.330404] ------------[ cut here ]------------ <2> [479.330408] kernel BUG at drivers/gpu/drm/i915/i915_request.c:445! <4> [479.330421] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI <4> [479.330426] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G U W 4.19.0-rc8-CI-CI_DRM_4984+ #1 <4> [479.330430] Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018 <4> [479.330518] RIP: 0010:__i915_request_submit+0x271/0x280 [i915] <4> [479.330522] Code: 2e 4b f9 e0 48 8b 35 9e fd 1b 00 49 c7 c0 28 79 28 a0 b9 bd 01 00 00 48 c7 c2 20 38 25 a0 48 c7 c7 1c 55 16 a0 e8 ff da ff e0 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 41 54 55 48 89 fd 53 <4> [479.330526] RSP: 0018:ffff880277b83e70 EFLAGS: 00010082 <4> [479.330531] RAX: 0000000000000011 RBX: ffff8801886c0940 RCX: 0000000000000000 <4> [479.330534] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff880276991a98 <4> [479.330538] RBP: ffff880277b83e98 R08: 000000000009903e R09: ffff8802762c5000 <4> [479.330541] R10: ffff880277b83e88 R11: ffff880276991a98 R12: 0000000000000005 <4> [479.330544] R13: ffff8802373b4730 R14: ffff8802373b42a8 R15: ffff8801886c0b38 <4> [479.330548] FS: 0000000000000000(0000) GS:ffff880277b80000(0000) knlGS:0000000000000000 <4> [479.330551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [479.330555] CR2: 00005652ab31c4a8 CR3: 0000000005210000 CR4: 00000000003406e0 <4> [479.330558] Call Trace: <4> [479.330561] <IRQ> <4> [479.330648] guc_submission_tasklet+0x33e/0x960 [i915] <4> [479.330659] tasklet_action_common.isra.5+0x47/0xb0 <4> [479.330666] __do_softirq+0xd8/0x483 <4> [479.330671] ? _raw_spin_unlock+0x29/0x40 <4> [479.330677] irq_exit+0xa9/0xc0 <4> [479.330682] do_IRQ+0x9a/0x120 <4> [479.330687] common_interrupt+0xf/0xf <4> [479.330691] </IRQ> <4> [479.330695] RIP: 0010:cpuidle_enter_state+0xab/0x340 <4> [479.330699] Code: 44 00 00 31 ff e8 25 88 94 ff 45 84 f6 74 12 9c 58 f6 c4 02 0f 85 70 02 00 00 31 ff e8 0e 2b 9b ff e8 39 fd 9e ff fb 4c 29 fb <48> ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 ea b8 ff <4> [479.330703] RSP: 0018:ffffc900000a7e90 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffde <4> [479.330708] RAX: ffff88027623ce40 RBX: 00000000000345c1 RCX: 0000000000000000 <4> [479.330711] RDX: 0000000000000046 RSI: ffffffff82124e7a RDI: ffffffff820d38bf <4> [479.330714] RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000000 <4> [479.330718] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880277bac980 <4> [479.330721] R13: ffffffff82298578 R14: 0000000000000000 R15: 0000006f99a3e023 <4> [479.330734] do_idle+0x1f3/0x260 <4> [479.330740] cpu_startup_entry+0x6a/0x70 <4> [479.330746] start_secondary+0x19d/0x1f0 <4> [479.330751] secondary_startup_64+0xa4/0xb0 <4> [479.330760] Modules linked in: i915(+) amdgpu chash gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic btusb btrtl x86_pkg_temp_thermal coretemp btbcm crct10dif_pclmul btintel crc32_pclmul bluetooth ghash_clmulni_intel ecdh_generic lpc_ich r8169 snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me pinctrl_broxton pinctrl_intel mei prime_numbers [last unloaded: i915] <0> [479.330813] Dumping ftrace buffer: <0> [479.330818] --------------------------------- Filed https://bugs.freedesktop.org/show_bug.cgi?id=108732 for GEM_BUG_ON(intel_engine_signaled(engine, seqno)), which is the only one happening in BAT. Now closing this bug and archiving. I will write new bugs when issues will come in drmtip. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.