Bug 107186

Summary: [CI][GUC] drv_selftest@live_guc RIP: 0010:destroy_doorbell
Product: DRI Reporter: Tomi Sarvela <tomi.p.sarvela>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Tomi Sarvela 2018-07-11 08:20:05 UTC
Continuing with the series of "Initial findings" with Intel-GFX-CI and i915 selftests: drv_selftest@live_guc fails with GUC enabled on SKL, KBL and CFL with RIP: 0010:destroy_doorbell. There exists patches from Michał Winiarski to fix this issue.

(example):

[  533.683289] WARN_ON(!has_doorbell(client))
[  533.683339] WARNING: CPU: 2 PID: 9821 at drivers/gpu/drm/i915/intel_guc_submission.c:226 create_doorbell+0x104/0x120 [i915]
[  533.683341] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel snd_pcm e1000e mei_me mei prime_numbers [last unloaded: i915]
[  533.683375] CPU: 2 PID: 9821 Comm: drv_selftest Tainted: G     U            4.18.0-rc3-CI-CI_DRM_4455+ #1
[  533.683377] Hardware name: Micro-Star International Co., Ltd. MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
[  533.683414] RIP: 0010:create_doorbell+0x104/0x120 [i915]
[  533.683416] Code: 04 48 8b 74 24 10 65 48 33 34 25 28 00 00 00 75 22 48 83 c4 18 5b c3 48 c7 c6 38 02 5c a0 48 c7 c7 96 fe 59 a0 e8 cc 26 bb e0 <0f> 0b b8 ed ff ff ff eb ce e8 9e 29 bb e0 0f 1f 40 00 66 2e 0f 1f 
[  533.683539] RSP: 0018:ffffc90000457aa8 EFLAGS: 00010282
[  533.683542] RAX: 0000000000000000 RBX: ffff8801f5690000 RCX: 0000000000000001
[  533.683544] RDX: 0000000080000001 RSI: ffffffff820c643c RDI: 00000000ffffffff
[  533.683546] RBP: ffff8801f5691548 R08: 0000000008efb11c R09: 0000000000000000
[  533.683548] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801f5691550
[  533.683550] R13: ffff8801f5691260 R14: ffff8801f5691548 R15: ffff8801f5691550
[  533.683552] FS:  00007f729b5d5980(0000) GS:ffff880266280000(0000) knlGS:0000000000000000
[  533.683554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  533.683556] CR2: 0000563a31e48508 CR3: 0000000260a16006 CR4: 00000000003606e0
[  533.683558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  533.683559] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  533.683561] Call Trace:
[  533.683601]  guc_clients_doorbell_init.isra.14+0x12/0x50 [i915]
[  533.683639]  igt_guc_clients+0x262/0x450 [i915]
[  533.683680]  __i915_subtests+0x44/0xd0 [i915]
[  533.683720]  __run_selftests+0x10b/0x190 [i915]
[  533.683758]  i915_live_selftests+0x2c/0x60 [i915]
[  533.683789]  i915_pci_probe+0x3b/0x90 [i915]
[  533.683794]  pci_device_probe+0xa1/0x130
[  533.683799]  driver_probe_device+0x306/0x480
...

Full trace at:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4455/fi-cfl-guc/igt@drv_selftest@live_guc.html

More traces available at:
https://intel-gfx-ci.01.org/tree/drm-tip/igt@drv_selftest@live_guc.html
Comment 1 Chris Wilson 2018-07-12 14:29:19 UTC
commit a63983f26008804e8db12457e429e5fc18841894 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Michał Winiarski <michal.winiarski@intel.com>
Date:   Thu Jul 12 12:20:13 2018 +0100

    drm/i915/selftests: Fixup GuC FW negative test
    
    Since:
    0d4b78b3d2c0 ("drm/i915/guc: Assert we have the doorbell before setting it up")
    
    We have asserts in GuC doorbell related functions, which is a good thing.
    Unfortunately, we were using those to check whether GuC FW is refusing
    to allocate invalid doorbell - which makes the test fail.
    Well, it would make the test WARN, except we fumbled cleanup ordering
    and eat the BUG_ON instead.
    Let's keep the asserts and use the internal implementation in the test.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107186
    Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Michel Thierry <michel.thierry@intel.com>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180712112013.3253-1-chris@chris-wilson.co.uk
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 2 Dhinakaran Pandiyan 2018-07-19 20:40:12 UTC
The warning seems to have gone, but the tests still fail with

[  530.511387] [drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action 0x10 failed with error -5 0xf000f000
[  530.683625] i915: probe of 0000:00:02.0 failed with error -25

Chris,
Is this new error the same as https://bugs.freedesktop.org/show_bug.cgi?id=107258 now?
Comment 3 Chris Wilson 2018-07-19 20:45:21 UTC
Same old error as bug 107258.
Comment 4 Dhinakaran Pandiyan 2018-07-19 20:53:46 UTC

*** This bug has been marked as a duplicate of bug 107258 ***
Comment 5 Chris Wilson 2018-07-20 07:15:13 UTC
(In reply to Dhinakaran Pandiyan from comment #4)
> 
> *** This bug has been marked as a duplicate of bug 107258 ***

No. The *original* bug was completely different. That is the bug that has been fixed.
Comment 6 Dhinakaran Pandiyan 2018-07-21 00:20:24 UTC
Got it, thanks for correcting the resolution.
Comment 7 Dhinakaran Pandiyan 2018-07-21 00:39:23 UTC
Closing the bug as the original failure has been addressed and not seen in CI since CI_DRM_4476 (https://intel-gfx-ci.01.org/tree/drm-tip/igt@drv_selftest@live_guc.html)

The current "dmesg-warn"s are tracked under
https://bugs.freedesktop.org/show_bug.cgi?id=107258

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.