Bug 107257 - [BAT] igt@drv_selftest@live_workarounds - WARN_ON(i915->gt.awake)
Summary: [BAT] igt@drv_selftest@live_workarounds - WARN_ON(i915->gt.awake)
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-17 08:25 UTC by Martin Peres
Modified: 2018-08-27 12:51 UTC (History)
1 user (show)

See Also:
i915 platform: BSW/CHT, CFL, KBL, SKL
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-07-17 08:25:21 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4495/fi-kbl-x1275/igt@drv_selftest@live_workarounds.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4498/fi-kbl-7567u/igt@drv_selftest@live_workarounds.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4495/fi-bsw-cyan/igt@drv_selftest@live_workarounds.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4495/fi-skl-6700k2/igt@drv_selftest@live_workarounds.html

[  521.561085] ---------------------------------
[  521.577805] Whitelist not preserved in context across engine reset!
[  521.577856] i915/intel_workarounds_live_selftests: live_reset_whitelist failed with error -5
[  521.603781] ------------[ cut here ]------------
[  521.603783] WARN_ON(i915->gt.awake)
[  521.603841] WARNING: CPU: 6 PID: 9659 at drivers/gpu/drm/i915/i915_gem.c:5094 i915_gem_suspend+0x137/0x140 [i915]
[  521.603843] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal snd_hda_codec intel_powerclamp snd_hwdep coretemp snd_hda_core crct10dif_pclmul crc32_pclmul snd_pcm ghash_clmulni_intel e1000e mei_me mei prime_numbers [last unloaded: i915]
[  521.603866] CPU: 6 PID: 9659 Comm: drv_selftest Tainted: G     U            4.18.0-rc4-CI-CI_DRM_4495+ #1
[  521.603867] Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 0802 09/02/2015
[  521.603900] RIP: 0010:i915_gem_suspend+0x137/0x140 [i915]
[  521.603901] Code: c7 c7 81 9d 2f a0 e8 58 d1 e9 e0 0f 0b 48 89 ef e8 1e 7a ff ff eb d0 48 c7 c6 b7 a0 2f a0 48 c7 c7 81 9d 2f a0 e8 39 d1 e9 e0 <0f> 0b eb ad 0f 1f 44 00 00 41 57 41 56 31 f6 41 55 41 54 49 89 fd 
[  521.603958] RSP: 0018:ffffc90000363b48 EFLAGS: 00010282
[  521.603960] RAX: 0000000000000000 RBX: ffff88018ae1b0f8 RCX: 0000000000000001
[  521.603962] RDX: 0000000080000001 RSI: ffffffff820c65b4 RDI: 00000000ffffffff
[  521.603963] RBP: ffff88018ae10000 R08: 00000000baacb979 R09: 0000000000000000
[  521.603964] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa02d63e0
[  521.603965] R13: ffffffffa039d8d0 R14: ffffffffa039d860 R15: ffffc90000363ea0
[  521.603966] FS:  00007fc19cfe3980(0000) GS:ffff880255d80000(0000) knlGS:0000000000000000
[  521.603968] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  521.603969] CR2: 0000562458cfa170 CR3: 000000023d7ba006 CR4: 00000000003606e0
[  521.603970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  521.603971] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  521.603972] Call Trace:
[  521.604001]  i915_driver_unload+0x63/0x110 [i915]
[  521.604028]  i915_pci_remove+0x19/0x30 [i915]
[  521.604053]  i915_pci_probe+0x60/0xa0 [i915]
[  521.604057]  pci_device_probe+0xa1/0x130
[  521.604061]  driver_probe_device+0x306/0x480
[  521.604063]  __driver_attach+0xdb/0x100
[  521.604065]  ? driver_probe_device+0x480/0x480
[  521.604067]  ? driver_probe_device+0x480/0x480
[  521.604069]  bus_for_each_dev+0x74/0xc0
[  521.604072]  bus_add_driver+0x15f/0x250
[  521.604074]  ? 0xffffffffa06b1000
[  521.604076]  driver_register+0x56/0xe0
[  521.604078]  ? 0xffffffffa06b1000
[  521.604080]  do_one_initcall+0x58/0x370
[  521.604083]  ? do_init_module+0x1d/0x1ea
[  521.604085]  ? rcu_read_lock_sched_held+0x6f/0x80
[  521.604087]  ? kmem_cache_alloc_trace+0x282/0x2e0
[  521.604091]  do_init_module+0x56/0x1ea
[  521.604093]  load_module+0x2435/0x2b20
[  521.604104]  ? __se_sys_finit_module+0xd3/0xf0
[  521.604106]  __se_sys_finit_module+0xd3/0xf0
[  521.604112]  do_syscall_64+0x55/0x190
[  521.604115]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  521.604117] RIP: 0033:0x7fc19c8b8839
[  521.604117] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 
[  521.604174] RSP: 002b:00007ffdb75b3a48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  521.604177] RAX: ffffffffffffffda RBX: 000055948cd67540 RCX: 00007fc19c8b8839
[  521.604178] RDX: 0000000000000000 RSI: 000055948cd6eb50 RDI: 0000000000000004
[  521.604179] RBP: 000055948cd6eb50 R08: 0000000000000004 R09: 0000000000000000
[  521.604180] R10: 00007ffdb75b3bc0 R11: 0000000000000246 R12: 0000000000000000
[  521.604181] R13: 000055948cd63bf0 R14: 0000000000000020 R15: 000000000000003f
[  521.604187] irq event stamp: 245336
[  521.604189] hardirqs last  enabled at (245335): [<ffffffff810f896c>] console_unlock+0x3fc/0x600
[  521.604191] hardirqs last disabled at (245336): [<ffffffff81a0111c>] error_entry+0x7c/0x100
[  521.604194] softirqs last  enabled at (245016): [<ffffffff817efa6c>] peernet2id+0x4c/0x70
[  521.604195] softirqs last disabled at (245014): [<ffffffff817efa4d>] peernet2id+0x2d/0x70
[  521.604226] WARNING: CPU: 6 PID: 9659 at drivers/gpu/drm/i915/i915_gem.c:5094 i915_gem_suspend+0x137/0x140 [i915]
[  521.604228] ---[ end trace 35a99a529d605ef6 ]---
Comment 1 Chris Wilson 2018-07-17 08:31:18 UTC
The pronouncement from live_workarounds about the nature of the failure is a little misleading. The root cause here is the unrecoverable reset (same as live_hangcheck), but what's more interesting in this case is the subsequent oops after we have wedged the device... That shouldn't occur -- wedging should be fail-safe!
Comment 3 Chris Wilson 2018-07-19 07:43:09 UTC
commit 01f8f33e9986aed42a0366704f715cf30e7cb41c (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 17 09:41:21 2018 +0100

    drm/i915: Always retire residual requests before suspend
    
    If the driver is wedged, we skip idling the GPU. However, we may still
    have a few requests still not retired following the wedging (since they
    will be waiting for a background worker trying to acquire struct_mutex).
    As we hold the struct_mutex, always do a quick request retirement in
    order to flush the wedged path.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107257
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180717084121.28185-1-chris@chris-wilson.co.uk

will reduce this down to the existing bug for the reset failing inside live_workaround: bug 107188.
Comment 4 Dhinakaran Pandiyan 2018-07-19 19:45:34 UTC
While the test still fails, WARN_ON(i915->gt.awake) appears to have been fixed.
Comment 5 Francesco Balestrieri 2018-08-04 09:21:29 UTC
Can we close this then? Martin?
Comment 6 Lakshmi 2018-08-24 05:57:37 UTC
Closing the bug, last time seen one month ago.
Comment 7 Lakshmi 2018-08-27 12:51:06 UTC
This bug used to appear after every 5 rounds of execution, but this doesn't appear since last 198 rounds. Closing this bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.