106683 – [CI] igt@drv_suspend@sysfs-reader - dmesg-warn - *ERROR* bcs0: reset request timeout

Bug 106683 - [CI] igt@drv_suspend@sysfs-reader - dmesg-warn - *ERROR* bcs0: reset request timeout

Summary: [CI] igt@drv_suspend@sysfs-reader - dmesg-warn - *ERROR* bcs0: reset request ...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2018-05-28 11:14 UTC by Martin Peres
Modified:	2018-10-16 08:39 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	SKL
i915 features:	GEM/Other

Attachments

Description Martin Peres 2018-05-28 11:14:19 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_38/fi-skl-6600u/igt@drv_suspend@sysfs-reader.html

[  378.040826] [drm:gen8_reset_engines [i915]] *ERROR* bcs0: reset request timeout
[  378.041039] ------------[ cut here ]------------
[  378.041040] WARN_ON(intel_gpu_reset(i915, (~0)))
[  378.041079] WARNING: CPU: 1 PID: 126 at drivers/gpu/drm/i915/i915_gem.c:4978 i915_gem_sanitize+0x4d/0x80 [i915]
[  378.041080] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core i915 snd_pcm asix usbnet btusb btrtl btbcm btintel mii bluetooth ecdh_generic mei_me mei prime_numbers i2c_hid pinctrl_sunrisepoint pinctrl_intel
[  378.041121] CPU: 1 PID: 126 Comm: kworker/u8:2 Tainted: G     U  W         4.17.0-rc4-gfe5bde58dca5-drmtip_38+ #1
[  378.041122] Hardware name: Dell Inc. XPS 13 9350/, BIOS 1.4.12 11/30/2016
[  378.041126] Workqueue: events_unbound async_run_entry_fn
[  378.041152] RIP: 0010:i915_gem_sanitize+0x4d/0x80 [i915]
[  378.041154] RSP: 0018:ffff9ec7004c7cc8 EFLAGS: 00010286
[  378.041156] RAX: 0000000000000000 RBX: ffff8b2865c90000 RCX: 0000000000000001
[  378.041158] RDX: 0000000080000001 RSI: ffffffffb20fb2c9 RDI: 00000000ffffffff
[  378.041159] RBP: ffff8b2865c90068 R08: 00000000564bbeab R09: 0000000000000000
[  378.041161] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b2865c989b0
[  378.041162] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  378.041164] FS:  0000000000000000(0000) GS:ffff8b287dc80000(0000) knlGS:0000000000000000
[  378.041165] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  378.041167] CR2: 000055eba5d3b1a8 CR3: 0000000026210006 CR4: 00000000003606e0
[  378.041168] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  378.041170] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  378.041171] Call Trace:
[  378.041196]  i915_gem_suspend+0xec/0x140 [i915]
[  378.041215]  i915_drm_suspend+0x5f/0x160 [i915]
[  378.041220]  pci_pm_suspend+0x7c/0x130
[  378.041223]  ? pci_pm_freeze+0xc0/0xc0
[  378.041226]  dpm_run_callback+0x5d/0x2f0
[  378.041230]  __device_suspend+0x11f/0x600
[  378.041234]  ? dpm_watchdog_set+0x60/0x60
[  378.041240]  async_suspend+0x15/0x90
[  378.041243]  async_run_entry_fn+0x34/0x160
[  378.041247]  process_one_work+0x229/0x6a0
[  378.041252]  worker_thread+0x35/0x380
[  378.041256]  ? process_one_work+0x6a0/0x6a0
[  378.041258]  kthread+0x119/0x130
[  378.041261]  ? _kthread_create_on_node+0x60/0x60
[  378.041279]  ret_from_fork+0x3a/0x50
[  378.041286] Code: e0 03 00 84 c0 74 f1 be ff ff ff ff 48 89 df e8 5a de 03 00 85 c0 74 e0 48 c7 c6 c0 b3 6c c0 48 c7 c7 1d 30 6b c0 e8 93 28 ae f0 <0f> 0b eb c9 48 8d 6f 68 31 f6 48 89 ef e8 81 e8 39 f1 48 89 df 
[  378.041359] irq event stamp: 3276
[  378.041362] hardirqs last  enabled at (3275): [<ffffffffb10fc757>] vprintk_emit+0x4b7/0x4d0
[  378.041365] hardirqs last disabled at (3276): [<ffffffffb1a0111c>] error_entry+0x7c/0x100
[  378.041367] softirqs last  enabled at (3258): [<ffffffffb1c0032b>] __do_softirq+0x32b/0x4e1
[  378.041370] softirqs last disabled at (3237): [<ffffffffb108f6f4>] irq_exit+0xa4/0xb0
[  378.041393] WARNING: CPU: 1 PID: 126 at drivers/gpu/drm/i915/i915_gem.c:4978 i915_gem_sanitize+0x4d/0x80 [i915]
[  378.041395] ---[ end trace 84ea7be84ec5687c ]---

Comment 1 Chris Wilson 2018-08-13 15:50:52 UTC

The behaviour should have substantially changed with

commit f4e60c5cfbf217cc9faa3aeb63742860154fcfef (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued)
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Mon Aug 13 16:01:16 2018 +0300

    drm/i915: Force reset on unready engine
    
    If engine reports that it is not ready for reset, we
    give up. Evidence shows that forcing a per engine reset
    on an engine which is not reporting to be ready for reset,
    can bring it back into a working order. There is risk that
    we corrupt the context image currently executing on that
    engine. But that is a risk worth taking as if we unblock
    the engine, we prevent a whole device wedging in a case
    of full gpu reset.
    
    Reset individual engine even if it reports that it is not
    prepared for reset, but only if we aim for full gpu reset
    and not on first reset attempt.
    
    v2: force reset only on later attempts, readability (Chris)
    v3: simplify with adequate caffeine levels (Chris)
    v4: comment about risks and migitations (Chris)
    
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180813130116.7250-1-mika.kuoppala@linux.intel.com

Comment 2 Lakshmi 2018-08-24 06:39:13 UTC

Closing the bug as this seen last time 2 months ago.

Comment 3 Lakshmi 2018-08-28 06:29:15 UTC

This occurred only twice in the past with a gap of 20 rounds of drmtip execution. So, to make the bug is really closed we can wait for few more rounds of execution to see if this still occurs. So, reopening this issue again. 

But this doesn't mean that this issue needs a fix.

Comment 4 Lakshmi 2018-10-16 08:39:03 UTC

Last seen this issue with drmtip_59 (4 months, 1 week / 2229 runs ago).
Closing this bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.