Continuing with the series of "Initial findings" with Intel-GFX-CI and i915 selftests: drv_selftest@live_guc fails with GUC enabled on SKL, KBL and CFL with RIP: 0010:destroy_doorbell. There exists patches from Michał Winiarski to fix this issue. (example): [ 533.683289] WARN_ON(!has_doorbell(client)) [ 533.683339] WARNING: CPU: 2 PID: 9821 at drivers/gpu/drm/i915/intel_guc_submission.c:226 create_doorbell+0x104/0x120 [i915] [ 533.683341] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel snd_pcm e1000e mei_me mei prime_numbers [last unloaded: i915] [ 533.683375] CPU: 2 PID: 9821 Comm: drv_selftest Tainted: G U 4.18.0-rc3-CI-CI_DRM_4455+ #1 [ 533.683377] Hardware name: Micro-Star International Co., Ltd. MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017 [ 533.683414] RIP: 0010:create_doorbell+0x104/0x120 [i915] [ 533.683416] Code: 04 48 8b 74 24 10 65 48 33 34 25 28 00 00 00 75 22 48 83 c4 18 5b c3 48 c7 c6 38 02 5c a0 48 c7 c7 96 fe 59 a0 e8 cc 26 bb e0 <0f> 0b b8 ed ff ff ff eb ce e8 9e 29 bb e0 0f 1f 40 00 66 2e 0f 1f [ 533.683539] RSP: 0018:ffffc90000457aa8 EFLAGS: 00010282 [ 533.683542] RAX: 0000000000000000 RBX: ffff8801f5690000 RCX: 0000000000000001 [ 533.683544] RDX: 0000000080000001 RSI: ffffffff820c643c RDI: 00000000ffffffff [ 533.683546] RBP: ffff8801f5691548 R08: 0000000008efb11c R09: 0000000000000000 [ 533.683548] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801f5691550 [ 533.683550] R13: ffff8801f5691260 R14: ffff8801f5691548 R15: ffff8801f5691550 [ 533.683552] FS: 00007f729b5d5980(0000) GS:ffff880266280000(0000) knlGS:0000000000000000 [ 533.683554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 533.683556] CR2: 0000563a31e48508 CR3: 0000000260a16006 CR4: 00000000003606e0 [ 533.683558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 533.683559] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 533.683561] Call Trace: [ 533.683601] guc_clients_doorbell_init.isra.14+0x12/0x50 [i915] [ 533.683639] igt_guc_clients+0x262/0x450 [i915] [ 533.683680] __i915_subtests+0x44/0xd0 [i915] [ 533.683720] __run_selftests+0x10b/0x190 [i915] [ 533.683758] i915_live_selftests+0x2c/0x60 [i915] [ 533.683789] i915_pci_probe+0x3b/0x90 [i915] [ 533.683794] pci_device_probe+0xa1/0x130 [ 533.683799] driver_probe_device+0x306/0x480 ... Full trace at: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4455/fi-cfl-guc/igt@drv_selftest@live_guc.html More traces available at: https://intel-gfx-ci.01.org/tree/drm-tip/igt@drv_selftest@live_guc.html
commit a63983f26008804e8db12457e429e5fc18841894 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Michał Winiarski <michal.winiarski@intel.com> Date: Thu Jul 12 12:20:13 2018 +0100 drm/i915/selftests: Fixup GuC FW negative test Since: 0d4b78b3d2c0 ("drm/i915/guc: Assert we have the doorbell before setting it up") We have asserts in GuC doorbell related functions, which is a good thing. Unfortunately, we were using those to check whether GuC FW is refusing to allocate invalid doorbell - which makes the test fail. Well, it would make the test WARN, except we fumbled cleanup ordering and eat the BUG_ON instead. Let's keep the asserts and use the internal implementation in the test. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107186 Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180712112013.3253-1-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The warning seems to have gone, but the tests still fail with [ 530.511387] [drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action 0x10 failed with error -5 0xf000f000 [ 530.683625] i915: probe of 0000:00:02.0 failed with error -25 Chris, Is this new error the same as https://bugs.freedesktop.org/show_bug.cgi?id=107258 now?
Same old error as bug 107258.
*** This bug has been marked as a duplicate of bug 107258 ***
(In reply to Dhinakaran Pandiyan from comment #4) > > *** This bug has been marked as a duplicate of bug 107258 *** No. The *original* bug was completely different. That is the bug that has been fixed.
Got it, thanks for correcting the resolution.
Closing the bug as the original failure has been addressed and not seen in CI since CI_DRM_4476 (https://intel-gfx-ci.01.org/tree/drm-tip/igt@drv_selftest@live_guc.html) The current "dmesg-warn"s are tracked under https://bugs.freedesktop.org/show_bug.cgi?id=107258
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.