106884 – [CI][BYT only] igt@drv_suspend@forcewake - dmesg-warn - Expected 00000005 fw_domains to be active, but 00000005 are off

Bug 106884 - [CI][BYT only] igt@drv_suspend@forcewake - dmesg-warn - Expected 00000005 fw_domains to be active, but 00000005 are off

Summary: [CI][BYT only] igt@drv_suspend@forcewake - dmesg-warn - Expected 00000005 fw_...

Status:	CLOSED WORKSFORME

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2018-06-11 12:38 UTC by Martin Peres
Modified:	2018-06-19 14:25 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	BYT
i915 features:	power/suspend-resume

Attachments

Description Martin Peres 2018-06-11 12:38:36 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_58/fi-byt-n2820/igt@drv_suspend@forcewake.html

[  566.558990] ------------[ cut here ]------------
[  566.559160] Expected 00000005 fw_domains to be active, but 00000005 are off
[  566.559333] WARNING: CPU: 1 PID: 1440 at drivers/gpu/drm/i915/intel_uncore.c:784 assert_forcewakes_active+0x4b/0xa0 [i915]
[  566.559339] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl intel_powerclamp btbcm coretemp crct10dif_pclmul btintel snd_hda_intel crc32_pclmul ghash_clmulni_intel snd_hda_codec bluetooth snd_hwdep snd_hda_core ecdh_generic snd_pcm r8169 mii lpc_ich prime_numbers i2c_hid
[  566.559480] CPU: 1 PID: 1440 Comm: kworker/u4:15 Tainted: G     U            4.17.0-rc7-g1d2b97e15aaf-drmtip_58+ #1
[  566.559486] Hardware name: \xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff \xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff/DN2820FYK, BIOS FYBYT10H.86A.0059.2017.0607.2130 06/07/2017
[  566.559498] Workqueue: events_unbound async_run_entry_fn
[  566.559602] RIP: 0010:assert_forcewakes_active+0x4b/0xa0 [i915]
[  566.559608] RSP: 0000:ffffb7720060fd38 EFLAGS: 00010282
[  566.559619] RAX: 0000000000000000 RBX: ffff9d2663b10000 RCX: 0000000000000001
[  566.559625] RDX: 0000000080000001 RSI: ffffffff820fbde9 RDI: 00000000ffffffff
[  566.559631] RBP: ffff9d26635e0008 R08: 00000000b6bb6cf9 R09: 0000000000000000
[  566.559637] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9d2663b176b8
[  566.559643] R13: ffff9d2663b10068 R14: 0000000000000000 R15: 0000000000000000
[  566.559649] FS:  0000000000000000(0000) GS:ffff9d267fd00000(0000) knlGS:0000000000000000
[  566.559656] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  566.559661] CR2: 0000000000000000 CR3: 000000000d210000 CR4: 00000000001006e0
[  566.559667] Call Trace:
[  566.559775]  reset_ring+0x40/0x2f0 [i915]
[  566.559878]  i915_gem_sanitize+0x81/0x100 [i915]
[  566.559971]  ? vlv_resume_prepare+0x670/0x670 [i915]
[  566.560056]  i915_pm_resume_early+0xba/0x150 [i915]
[  566.560072]  dpm_run_callback+0x5d/0x2f0
[  566.560088]  device_resume_early+0xa6/0xe0
[  566.560102]  async_resume_early+0x14/0x40
[  566.560111]  async_run_entry_fn+0x34/0x160
[  566.560124]  process_one_work+0x229/0x6a0
[  566.560148]  worker_thread+0x35/0x380
[  566.560162]  ? process_one_work+0x6a0/0x6a0
[  566.560170]  kthread+0x119/0x130
[  566.560179]  ? kthread_flush_work_fn+0x10/0x10
[  566.560194]  ret_from_fork+0x3a/0x50
[  566.560225] Code: 00 85 c0 74 52 23 b3 a4 0e 00 00 8b 93 a8 0e 00 00 f7 d2 21 f2 75 08 48 83 c4 08 5b c3 f3 c3 48 c7 c7 d8 ad 66 c0 e8 45 09 b1 c0 <0f> 0b eb e8 80 3d 1b 4b 18 00 00 75 c3 48 c7 c7 28 a9 66 c0 89 
[  566.560542] irq event stamp: 928
[  566.560551] hardirqs last  enabled at (927): [<ffffffff810fd267>] vprintk_emit+0x4b7/0x4d0
[  566.560559] hardirqs last disabled at (928): [<ffffffff81a0111c>] error_entry+0x7c/0x100
[  566.560566] softirqs last  enabled at (850): [<ffffffff81c0032b>] __do_softirq+0x32b/0x4e1
[  566.560575] softirqs last disabled at (843): [<ffffffff81090104>] irq_exit+0xa4/0xb0
[  566.560679] WARNING: CPU: 1 PID: 1440 at drivers/gpu/drm/i915/intel_uncore.c:784 assert_forcewakes_active+0x4b/0xa0 [i915]
[  566.560684] ---[ end trace 687f48bc3df179e2 ]---

This seems to be a regression introduced in drmtip_58.

Comment 1 Chris Wilson 2018-06-11 12:45:01 UTC

Should not be possible after commit c3160da9a6af0e2d8f4fb3410df9d027a178ca3d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 31 09:22:45 2018 +0100

    drm/i915: After reset on sanitization, reset the engine backends
    
    As we reset the GPU on suspend/resume, we also do need to reset the
    engine state tracking so call into the engine backends. This is
    especially important so that we can also sanitize the state tracking
    across resume.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-3-ch
ris@chris-wilson.co.uk

as that does grab all forcewake domains. If it is persisting, either we aren't grabbing the domains or failing to mark them as held.

Comment 2 Chris Wilson 2018-06-11 12:48:48 UTC

Since it was still happening this weekend, it appears that we fail to take forcewake during resume. Odd.

Comment 3 Chris Wilson 2018-06-19 10:59:58 UTC

The assert was removed in

commit 4fdd5b4e9aba5fbbc6d3072a5a87fa1d3f3fc030
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 16 21:25:34 2018 +0100

    drm/i915: Fix fallout of fake reset along resume
    
    commit b2209e62a450 ("drm/i915/execlists: Reset the CSB head tracking on
    reset/sanitization") and commit 1288786b18f7 ("drm/i915: Move GEM sanitize
    from resume_early to resume") show the conflicting requirements on the
    code. We must reset the GPU before trashing live state on a fast resume
    (hibernation debug, or error paths), but we must only reset our state
    tracking iff the GPU is reset (or power cycled). This is tricky if we
    are disabling GPU reset to simulate broken hardware; we reset our state
    tracking but the GPU is left intact and recovers from its stale state.
    
    v2: Again without the assertion for forcewake, no longer required since
    commit b3ee09a4de33 ("drm/i915/ringbuffer: Fix context restore upon reset")
    as the contexts are reset from the CS ensuring everything is powered up.
    
    Fixes: b2209e62a450 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization")
    Fixes: 1288786b18f7 ("drm/i915: Move GEM sanitize from resume_early to resume")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk

so we no longer need worry about why we weren't restoring the fw_domains correctly in this corner case. Bug still exists to be sure...

Comment 4 Jani Saarinen 2018-06-19 14:25:47 UTC

Closing, thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.