Summary: | [CI][BAT][guc] igt@*(suspend|s3)* - dmesg-warn - Unexpected event: no suitable handler | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Fernando Pacheco <fernando.pacheco> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | highest | CC: | intel-gfx-bugs, lakshminarayana.vudum |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | CFL | i915 features: | firmware/guc |
Description
Martin Peres
2019-06-18 10:35:40 UTC
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * GUC: suspend tests - dmesg-warn - Unexpected event: no suitable handler - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3168/fi-cfl-guc/igt@gem_exec_suspend@basic-s3.html The CI Bug Log issue associated to this bug has been updated. ### New filters associated * GUC: igt@runner@aborted - fail - Previous test: .*(suspend|s3).* - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13196/fi-cfl-guc/igt@runner@aborted.html - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3168/fi-cfl-guc/igt@runner@aborted.html Michal pointed out that in guc_disable_communication() we set the function pointer to the _nop function inside intel_guc_ct_disable() and therefore before guc_disable_interrupts(), so this issue might be caused by a log flush interrupt between those 2 calls. Simply flipping the order of the 2 calls should fix the warning, but we'll lose the flush event. However, after CTBs are disabled the guc events are logged in the SCRATCH_15 register, so we should be able to get it from there. I'll check with Michal if he has any issues with this approach or any better idea and if he hasn't I'll prepare a patch. A CI Bug Log filter associated to this bug has been updated: {- GUC: suspend tests - dmesg-warn - Unexpected event: no suitable handler -} {+ GUC: suspend tests / igt@i915_selftest@live_* - dmesg-warn - Unexpected event: no suitable handler +} No new failures caught with the new filter A CI Bug Log filter associated to this bug has been updated: {- GUC: igt@runner@aborted - fail - Previous test: .*(suspend|s3).* -} {+ GUC: igt@runner@aborted - fail - Previous test: .*(suspend|s3|live_).* +} No new failures caught with the new filter Soon will be resolved/fixed? Patches to fix this are in flight (https://patchwork.freedesktop.org/series/62412/), just a couple of minor comments to address commit e29cc1d7e820d662d2d73b35281b70f01a24610b Author: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Date: Fri Jun 21 11:21:22 2019 -0700 drm/i915/guc: reorder enable/disable communication steps Make sure we always have CT buffers enabled when the interrupts are enabled, so we can always handle interrupts from GuC. Also move the setting of the guc->send and guc->handler functions to the GuC communication control functions for consistency. The reorder also fixes the onion unwinding of intel_uc_init_hw, because guc_enable_communication would've left interrupts enabled when failing to enable CTB. v2: always retunr the result of ctch_enable() in intel_guc_ct_enable() (Michal) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110943 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190621182123.31368-1-daniele.ceraolospurio@intel.com *** Bug 110910 has been marked as a duplicate of this bug. *** Happened again in CI_DRM_6499 https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6499/fi-icl-dsi/igt@i915_pm_rpm@module-reload.html <4> [332.218126] Unexpected event: no suitable handler <4> [332.218288] WARNING: CPU: 4 PID: 0 at drivers/gpu/drm/i915/gt/uc/intel_guc.c:325 intel_guc_to_host_event_handler_nop+0xc/0x10 [i915] <4> [332.218293] Modules linked in: i915(-) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet crct10dif_pclmul mii crc32_pclmul snd_hda_codec e1000e snd_hwdep snd_hda_core ghash_clmulni_intel ptp pps_core snd_pcm mei_me mei prime_numbers [last unloaded: snd_hda_intel] <4> [332.218322] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G U 5.2.0-CI-CI_DRM_6499+ #1 <4> [332.218325] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3102.A00.1903052247 03/05/2019 <4> [332.218417] RIP: 0010:intel_guc_to_host_event_handler_nop+0xc/0x10 [i915] <4> [332.218421] Code: 48 c7 c7 2f 85 98 a0 e8 92 43 85 e0 0f 0b b8 ed ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c7 b0 09 96 a0 e8 74 43 85 e0 <0f> 0b c3 90 41 54 55 48 89 fd 53 48 8b 47 e8 80 b8 da 0b 00 00 0a <4> [332.218425] RSP: 0018:ffffc900001fcea0 EFLAGS: 00010086 <4> [332.218429] RAX: 0000000000000000 RBX: ffffc90009000000 RCX: 0000000000000000 <4> [332.218432] RDX: 0000000000000007 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [332.218435] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000001 <4> [332.218438] R10: ffffc900001fce38 R11: 0000000000000000 R12: 0000000000000000 <4> [332.218441] R13: 0000000000000000 R14: 0000000000000019 R15: ffff88827b270000 <4> [332.218445] FS: 0000000000000000(0000) GS:ffff88829be00000(0000) knlGS:0000000000000000 <4> [332.218448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [332.218452] CR2: 00007f50551b8c38 CR3: 000000024a894006 CR4: 0000000000760ee0 <4> [332.218455] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4> [332.218458] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <4> [332.218461] PKRU: 55555554 <4> [332.218463] Call Trace: <4> [332.218468] <IRQ> <4> [332.218525] gen11_irq_handler+0x2f5/0x420 [i915] <4> [332.218540] __handle_irq_event_percpu+0x41/0x2d0 <4> [332.218544] ? handle_irq_event+0x27/0x50 <4> [332.218552] handle_irq_event_percpu+0x2b/0x70 <4> [332.218557] handle_irq_event+0x2f/0x50 <4> [332.218564] handle_edge_irq+0xee/0x1a0 <4> [332.218570] handle_irq+0x17/0x20 <4> [332.218577] do_IRQ+0x5e/0x130 <4> [332.218584] common_interrupt+0xf/0xf <4> [332.218588] </IRQ> <4> [332.218594] RIP: 0010:cpuidle_enter_state+0xae/0x450 <4> [332.218598] Code: 44 00 00 31 ff e8 52 87 92 ff 45 84 f6 74 12 9c 58 f6 c4 02 0f 85 78 03 00 00 31 ff e8 7b 69 99 ff e8 76 6d 9d ff fb 45 85 ed <0f> 88 c9 02 00 00 4c 2b 24 24 48 ba cf f7 53 e3 a5 9b c4 20 49 63 <4> [332.218601] RSP: 0018:ffffc9000012be80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdc <4> [332.218605] RAX: ffff88829a09c040 RBX: ffffffff822a0a60 RCX: 0000000000000004 <4> [332.218608] RDX: 0000000000000046 RSI: 0000000000000006 RDI: ffffffff8212d6d1 <4> [332.218611] RBP: ffff88829720d508 R08: 0000000000000000 R09: 0000000000000000 <4> [332.218614] R10: 0000000000000000 R11: 0000000000000000 R12: 0000004d59b7ee90 <4> [332.218616] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000002 <4> [332.218637] cpuidle_enter+0x24/0x40 <4> [332.218654] do_idle+0x1f3/0x260 <4> [332.218663] cpu_startup_entry+0x14/0x20 <4> [332.218669] start_secondary+0x173/0x1c0 <4> [332.218677] secondary_startup_64+0xa4/0xb0 <4> [332.218692] irq event stamp: 24479052 <4> [332.218697] hardirqs last enabled at (24479049): [<ffffffff817bf2ba>] cpuidle_enter_state+0xaa/0x450 <4> [332.218701] hardirqs last disabled at (24479050): [<ffffffff810019f0>] trace_hardirqs_off_thunk+0x1a/0x1c <4> [332.218707] softirqs last enabled at (24479052): [<ffffffff810b6408>] irq_enter+0x58/0x60 <4> [332.218712] softirqs last disabled at (24479051): [<ffffffff810b63ed>] irq_enter+0x3d/0x60 <4> [332.218744] WARNING: CPU: 4 PID: 0 at drivers/gpu/drm/i915/gt/ucb/intel_guc.c:325 intel_guc_to_host_event_handler_nop+0xc/0x10 [i915] Looks like a different issue. I believe the problem here is due to a known issue in the unload path where we call uc_sanitize before we call intel_uc_fini_hw and therefore we hit the !intel_guc_is_loaded(guc) check in the latter, thus skipping the guc_disable_communication(). We saw this issue with guc submission enabled (because we skip the intel_guc_submission_disable call) and Fernando was looking at fixing it, we just didn't realize there was an implication on the non-submission path as well. *** Bug 111164 has been marked as a duplicate of this bug. *** A CI Bug Log filter associated to this bug has been updated: {- GUC: suspend tests / igt@i915_selftest@live_* - dmesg-warn - Unexpected event: no suitable handler -} {+ ICL GUC: reload/suspend tests / igt@i915_selftest@live_* - dmesg-warn - Unexpected event: no suitable handler +} No new failures caught with the new filter Fernando, this is an highest priority bug which needs an update everyday (SLA for highest priority bug is 1 day). Any updates here? Do you agree with the priority of this issue? What are next steps? Sorry for the slowness in updating. Patch to address this is currently under review: https://patchwork.freedesktop.org/patch/324196/ Updated patchset in review: https://patchwork.freedesktop.org/series/65901/ commit 9be02fde93e5b564663f10a48f19159fba718e3d (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Fernando Pacheco <fernando.pacheco@intel.com> Date: Thu Aug 29 10:41:53 2019 -0700 drm/i915/uc: Extract common code from GuC stop/disable comm During normal driver unload we attempt to disable GuC communication while it is currently stopped. This results in a nop'd call to intel_guc_ct_disable within guc_disable_communication because stop/disable rely on the same flag to prevent further comms with CT. We can avoid the call to disable and still leave communication in a satisfactory state by extracting a set of shared steps from stop/disable. This set can include guc_disable_interrupts as we do not require the single caller of guc_stop_communication to be atomic: "drm/i915/selftests: Fixup atomic reset checking". This situation (stop -> disable) only occurs during intel_uc_fini_hw, so during fini, call guc_disable_communication only if currently enabled. The symmetric calls to enable/disable remain unmodified for all other scenarios. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110943 Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190829174154.14675-1-fernando.pacheco@intel.com |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.