108622 – [CI][BAT] igt@drv_selftest@live_execlists - dmesg-warn - *ERROR* MMIO: GuC action 0x2 failed with error -5 0xf000f000

Bug 108622 - [CI][BAT] igt@drv_selftest@live_execlists - dmesg-warn - *ERROR* MMIO: GuC action 0x2 failed with error -5 0xf000f000

Summary: [CI][BAT] igt@drv_selftest@live_execlists - dmesg-warn - *ERROR* MMIO: GuC ac...

Status:	RESOLVED WONTFIX

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	high normal
Assignee:	Jon Ewins
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Duplicates (1):	108732 (view as bug list)
Depends on:
Blocks:

Reported:	2018-11-01 11:09 UTC by Martin Peres
Modified:	2019-08-21 13:23 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	BXT
i915 features:	firmware/guc

Attachments

Description Martin Peres 2018-11-01 11:09:28 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4704/fi-apl-guc/igt@drv_selftest@live_execlists.html

<3> [557.763361] [drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action 0x2 failed with error -5 0xf000f000
<4> [557.763716] ------------[ cut here ]------------
<4> [557.763815] WARN_ON(intel_guc_send(guc, data, (sizeof(data) / sizeof((data)[0]) + (sizeof(struct { int:(-!!(__builtin_types_compatible_p(typeof((data)), typeof(&(data)[0])))); })))))
<4> [557.763945] WARNING: CPU: 0 PID: 256 at drivers/gpu/drm/i915/intel_guc_submission.c:620 inject_preempt_context+0x4f4/0x600 [i915]
<4> [557.763950] Modules linked in: i915(+) amdgpu chash gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic btusb btrtl btbcm btintel x86_pkg_temp_thermal coretemp bluetooth crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ecdh_generic lpc_ich r8169 snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei pinctrl_broxton pinctrl_intel prime_numbers [last unloaded: i915]
<4> [557.764064] CPU: 0 PID: 256 Comm: kworker/u9:0 Tainted: G     U            4.19.0-CI-CI_DRM_5064+ #1
<4> [557.764068] Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018
<4> [557.764153] Workqueue: i915-guc_preempt inject_preempt_context [i915]
<4> [557.764240] RIP: 0010:inject_preempt_context+0x4f4/0x600 [i915]
<4> [557.764244] Code: 00 00 48 c7 c2 50 91 59 a0 48 c7 c7 6a 16 4e a0 e8 f1 19 c8 e0 0f 0b 48 c7 c6 c8 39 5d a0 48 c7 c7 a7 2f 5b a0 e8 5c 34 ba e0 <0f> 0b 49 0f ba b5 98 04 00 00 01 f0 49 0f ba ad d8 03 00 00 00 0f
<4> [557.764248] RSP: 0018:ffffc90000247e00 EFLAGS: 00010286
<4> [557.764255] RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
<4> [557.764259] RDX: 0000000000000007 RSI: ffffffff820c31be RDI: 00000000ffffffff
<4> [557.764263] RBP: ffffc90000485112 R08: 00000000a47aa0a9 R09: 0000000000000000
<4> [557.764266] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8802685735a8
<4> [557.764270] R13: ffff880269822158 R14: ffff880236661310 R15: ffff88025e45ba78
<4> [557.764274] FS:  0000000000000000(0000) GS:ffff880277a00000(0000) knlGS:0000000000000000
<4> [557.764278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [557.764281] CR2: 000055ad3b1b8980 CR3: 0000000005210000 CR4: 00000000003406f0
<4> [557.764285] Call Trace:
<4> [557.764304]  process_one_work+0x245/0x610
<4> [557.764318]  worker_thread+0x37/0x380
<4> [557.764327]  ? process_one_work+0x610/0x610
<4> [557.764332]  kthread+0x119/0x130
<4> [557.764337]  ? kthread_park+0x80/0x80
<4> [557.764346]  ret_from_fork+0x3a/0x50
<4> [557.764363] irq event stamp: 164702
<4> [557.764369] hardirqs last  enabled at (164701): [<ffffffff810f828a>] console_unlock+0x3fa/0x5f0
<4> [557.764375] hardirqs last disabled at (164702): [<ffffffff81001930>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [557.764380] softirqs last  enabled at (164684): [<ffffffff81c00319>] __do_softirq+0x319/0x48e
<4> [557.764386] softirqs last disabled at (164677): [<ffffffff8108c5e9>] irq_exit+0xa9/0xc0
<4> [557.764470] WARNING: CPU: 0 PID: 256 at drivers/gpu/drm/i915/intel_guc_submission.c:620 inject_preempt_context+0x4f4/0x600 [i915]
<4> [557.764474] ---[ end trace 0e556d9560b6a02b ]---

Comment 1 Martin Peres 2018-11-01 11:10:41 UTC

The priority is set to high because the guc is not used in production.

Comment 2 Chris Wilson 2018-11-01 11:23:56 UTC

A simple solution would be to accept that guc preemption is not yet functional.

--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -1285,6 +1285,7 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
        engine->reset.prepare = guc_reset_prepare;
 
        engine->flags &= ~I915_ENGINE_SUPPORTS_STATS;
+       engine->flags &= ~I915_ENGINE_HAS_PREEMPTION;
 }
 
 int intel_guc_submission_enable(struct intel_guc *guc)

Comment 3 Chris Wilson 2018-11-13 16:43:43 UTC

*** Bug 108732 has been marked as a duplicate of this bug. ***

Comment 4 Lakshmi 2018-11-25 20:24:09 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_145/fi-apl-guc/igt@runner@aborted.html

Aborting.
Previous test: i915_hangman (hangcheck-unterminated)
Next test: gem_exec_params (rel-constants-invalid)

Kernel tainted (0x240 -- 200)

Test result of igt@i915_hangman@hangcheck-unterminated is pass.

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_145/fi-apl-guc/igt%40i915_hangman%40hangcheck-unterminated.html

Comment 5 Jon Ewins 2019-02-15 22:21:38 UTC

Issue reported on BXT part with boot parameter i915.enable_guc=3, which means submission via guc enabled. Further investigation will be deferred until after upcoming update to guc version and only if issue seen with i915.enable_guc=2 (no submission via GuC) which is support model for Gen9.

Comment 6 Chris Wilson 2019-02-15 22:27:12 UTC

(In reply to Jon Ewins from comment #5)
> Issue reported on BXT part with boot parameter i915.enable_guc=3, which
> means submission via guc enabled. Further investigation will be deferred
> until after upcoming update to guc version and only if issue seen with
> i915.enable_guc=2 (no submission via GuC) which is support model for Gen9.

Of course it won't be seen with enable_guc=2, the issue is that preemption is broken via the guc! I take it we should just remove the unstable unsupported feature from the guc submission path?

Comment 7 Jon Ewins 2019-02-16 00:47:11 UTC

Agreed, this was too generic an update for this particular bug.  We are about to upstream new patches for Gen9 and Gen11 that support a changed unified interface to the GuC along with the corresponding firmware and so are deferring further actions on current guc issues.  In this specific case, the existing Gen9 guc preemption paths are no longer required and GuC submission on gen9 will not be supported. The imminent new guc code patches will initially disable preemption with guc, but not touch/remove this code. Follow up patches will reinstate GuC based preemption for Gen11+, at which time this code will be cleaned up.  Certainly some of the current code can be cleaned up now, but How the new preemption implementation will be staged in terms of H2G interactions is tbd.

Comment 8 Chris Wilson 2019-05-28 11:17:49 UTC

commit a2904ade3dc28cf1a1b7deded41f4369f75e664c
Author: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date:   Mon May 27 18:35:58 2019 +0000

    drm/i915/guc: Don't allow GuC submission
    
    Due to the upcoming changes to the GuC ABI interface, we must
    disable GuC submission mode until final ABI will be available
    on all GuC firmwares.
    
    To avoid regressions on systems configured to run with no longer
    supported configuration "enable_guc=3" or "enable_guc=1" clear
    GuC submission bit.
    
    v2: force switch to non-GuC submission mode
    v3: use GEM_BUG_ON (Joonas)
    
    Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Cc: John Spotswood <john.a.spotswood@intel.com>
    Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
    Cc: Tony Ye <tony.ye@intel.com>
    Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
    Cc: Jeff Mcgee <jeff.mcgee@intel.com>
    Cc: Antonio Argenziano <antonio.argenziano@intel.com>
    Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
    Cc: Martin Peres <martin.peres@linux.intel.com>
    Acked-by: Martin Peres <martin.peres@linux.intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190527183613.17076-3-michal.wajdeczko@intel.com

Comment 9 CI Bug Log 2019-08-21 13:23:54 UTC

The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.