Bug 106953

Summary: [CI] igt@gem_eio@(hibernate|suspend) - incomplete - WARNING execlists_submission_tasklet
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BDW, BSW/CHT, BXT, CFL, CNL, GLK, KBL, SKL i915 features: GEM/Other

Description Martin Peres 2018-06-18 14:04:13 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-cfl-8700k/igt@gem_eio@hibernate.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-cfl-8700k/igt@gem_eio@suspend.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-bsw-n3050/igt@gem_eio@hibernate.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-bsw-n3050/igt@gem_eio@suspend.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-bdw-5557u/igt@gem_eio@hibernate.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-bdw-5557u/igt@gem_eio@suspend.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-kbl-7500u/igt@gem_eio@hibernate.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-kbl-7560u/igt@gem_eio@suspend.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-skl-6260u/igt@gem_eio@hibernate.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-skl-6700hq/igt@gem_eio@hibernate.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-skl-6700k2/igt@gem_eio@hibernate.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-skl-6770hq/igt@gem_eio@hibernate.html


<0>[  277.303716] ---------------------------------
<4>[  277.303717] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_pcm mei_me e1000e mei prime_numbers
<4>[  277.303727] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U            4.17.0-rc7-g02d8db1a894b-drmtip_64+ #1
<4>[  277.303727] Hardware name: Micro-Star International Co., Ltd. MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.00 10/31/2017
<4>[  277.303743] RIP: 0010:process_csb+0x53c/0x8b0 [i915]
<4>[  277.303744] RSP: 0018:ffff940966203e18 EFLAGS: 00010286
<4>[  277.303745] RAX: 000000000000000d RBX: 0000000000000018 RCX: 0000000000000000
<4>[  277.303746] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff94096542fa38
<4>[  277.303746] RBP: ffff940966203e90 R08: 00000000001840b0 R09: ffff940965481000
<4>[  277.303747] R10: 0000000000000000 R11: ffff94096542fa38 R12: 0000000000000003
<4>[  277.303747] R13: ffff94095690c6f0 R14: ffff9409527b6040 R15: ffff94095690c2a8
<4>[  277.303748] FS:  0000000000000000(0000) GS:ffff940966200000(0000) knlGS:0000000000000000
<4>[  277.303749] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  277.303750] CR2: 000056541d36ab90 CR3: 0000000101210002 CR4: 00000000003606f0
<4>[  277.303750] Call Trace:
<4>[  277.303751]  <IRQ>
<4>[  277.303768]  execlists_submission_tasklet+0xb1/0xe20 [i915]
<4>[  277.303770]  ? lock_acquire+0xa6/0x210
<4>[  277.303772]  ? handle_irq_event+0x3a/0x50
<4>[  277.303774]  tasklet_action_common.isra.5+0x47/0xb0
<4>[  277.303776]  __do_softirq+0xc1/0x4e1
<4>[  277.303778]  ? _raw_spin_unlock+0x29/0x40
<4>[  277.303780]  irq_exit+0xa4/0xb0
<4>[  277.303781]  do_IRQ+0x9a/0x120
<4>[  277.303782]  common_interrupt+0xf/0xf
<4>[  277.303783]  </IRQ>
<4>[  277.303785] RIP: 0010:cpuidle_enter_state+0xac/0x360
<4>[  277.303785] RSP: 0018:ffffffff8f203e70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffffd8
<4>[  277.303786] RAX: ffffffff8f2167c0 RBX: 0000000000014582 RCX: 0000000000000000
<4>[  277.303787] RDX: 0000000000000046 RSI: ffffffff8f0fc071 RDI: ffffffff8f0a8eef
<4>[  277.303788] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
<4>[  277.303788] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8f296138
<4>[  277.303789] R13: ffffcb4dffa00a70 R14: 0000000000000000 R15: 000000408f13b924
<4>[  277.303792]  do_idle+0x1f3/0x250
<4>[  277.303794]  cpu_startup_entry+0x6a/0x70
<4>[  277.303796]  start_kernel+0x4a2/0x4c2
<4>[  277.303799]  secondary_startup_64+0xa5/0xb0
<4>[  277.303801] Code: e8 f3 1f bb cd 48 8b 35 03 63 19 00 49 c7 c0 90 c5 64 c0 b9 27 04 00 00 48 c7 c2 b0 4f 61 c0 48 c7 c7 07 be 54 c0 e8 64 8a c1 cd <0f> 0b 48 8d 83 20 16 00 00 48 89 c7 48 89 45 c8 e8 3f d9 3f ce 
<1>[  277.303840] RIP: process_csb+0x53c/0x8b0 [i915] RSP: ffff940966203e18
<4>[  277.303866] ---[ end trace bb212b5641eb6667 ]---
Comment 1 Chris Wilson 2018-06-18 14:10:32 UTC
commit 4fdd5b4e9aba5fbbc6d3072a5a87fa1d3f3fc030
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 16 21:25:34 2018 +0100

    drm/i915: Fix fallout of fake reset along resume
    
    commit b2209e62a450 ("drm/i915/execlists: Reset the CSB head tracking on
    reset/sanitization") and commit 1288786b18f7 ("drm/i915: Move GEM sanitize
    from resume_early to resume") show the conflicting requirements on the
    code. We must reset the GPU before trashing live state on a fast resume
    (hibernation debug, or error paths), but we must only reset our state
    tracking iff the GPU is reset (or power cycled). This is tricky if we
    are disabling GPU reset to simulate broken hardware; we reset our state
    tracking but the GPU is left intact and recovers from its stale state.
    
    v2: Again without the assertion for forcewake, no longer required since
    commit b3ee09a4de33 ("drm/i915/ringbuffer: Fix context restore upon reset")
    as the contexts are reset from the CS ensuring everything is powered up.
    
    Fixes: b2209e62a450 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization")
    Fixes: 1288786b18f7 ("drm/i915: Move GEM sanitize from resume_early to resume")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk
Comment 2 Jani Saarinen 2018-06-19 14:25:57 UTC
Closing, thanks.
Comment 3 Martin Peres 2018-06-19 20:48:41 UTC
Still happening with drm-tip: 2018y-06m-17d-12h-42m-13s UTC integration manifest

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_66/fi-whl-u/igt@gem_eio@suspend.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_66/fi-whl-u/igt@gem_eio@hibernate.html

The output looks a little different though, so I guess it is progress? :)
Comment 4 Chris Wilson 2018-06-19 20:56:00 UTC
Bug fix hasn't percolated as far as drmtip-66. Check again after drmtip-67/-68!
Comment 5 Martin Peres 2018-06-19 21:09:18 UTC
(In reply to Chris Wilson from comment #4)
> Bug fix hasn't percolated as far as drmtip-66. Check again after
> drmtip-67/-68!

Are you sure? You pushed the patch on the 16th, and the drmtip run was with drmtip 2018y-06m-17d-12h-42m-13s.
Comment 6 Chris Wilson 2018-06-19 21:15:28 UTC
Pretty confident, yes. The error (CSB head==5 but mmio reads 1) is the same as fixed by the patch, and the same as showing up in the shards for https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4332/ fixed in CI_DRM_4333.

I do think we need a clearer indication of what base drmtip is using.

drm-intel drm-intel-next-queued f677bd558de2e98b70b7f8c522024b26d2d1120d
	drm/i915/icl: update VBT's child_device_config flags2 field

which is just (2 patches!) before

commit 4fdd5b4e9aba5fbbc6d3072a5a87fa1d3f3fc030
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 16 21:25:34 2018 +0100

    drm/i915: Fix fallout of fake reset along resume

was committed.
Comment 7 Martin Peres 2018-06-20 07:54:43 UTC
Indeed was not reproduced on drmtip_67. Closing! Thanks and sorry for the noise :s

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.