Bug 107957 - [CI][SHARDS] igt@gem_eio@in-flight-suspend - dmesg-warn - Failed to initialise HW following reset (-5)
Summary: [CI][SHARDS] igt@gem_eio@in-flight-suspend - dmesg-warn - Failed to initialis...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: low normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-17 07:30 UTC by Martin Peres
Modified: 2019-03-18 11:01 UTC (History)
1 user (show)

See Also:
i915 platform: BXT, GLK, ICL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-09-17 07:30:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4829/shard-apl5/igt@gem_eio@in-flight-suspend.html

[179.251251] [drm:i915_reset [i915]] *ERROR* Failed to initialise HW following reset (-5)

Relevant part information leading to this line:
<7> [179.246841] [drm:i915_reset_device [i915]] resetting chip
<5> [179.247282] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [179.247880] [drm:drm_dp_i2c_do_msg] native defer
<7> [179.249532] i915_gem_set_wedged rcs0
<7> [179.249540] i915_gem_set_wedged \x09current seqno 46, last 46, hangcheck 46 [138 ms]
<7> [179.249545] i915_gem_set_wedged \x09Reset count: 1 (global 5)
<7> [179.249554] [drm:drm_dp_i2c_do_msg] native defer
<7> [179.249561] i915_gem_set_wedged \x09Requests:
<7> [179.249606] i915_gem_set_wedged \x09RING_START: 0x00000000
<7> [179.249613] i915_gem_set_wedged \x09RING_HEAD:  0x00000000
<7> [179.249620] i915_gem_set_wedged \x09RING_TAIL:  0x00000000
<7> [179.249629] i915_gem_set_wedged \x09RING_CTL:   0x00000000
<7> [179.249638] i915_gem_set_wedged \x09RING_MODE:  0x00000200 [idle]
<7> [179.249645] i915_gem_set_wedged \x09RING_IMR: ffffffff
<7> [179.249657] i915_gem_set_wedged \x09ACTHD:  0x00000000_00000000
<7> [179.249668] i915_gem_set_wedged \x09BBADDR: 0x00000000_00000000
<7> [179.249726] i915_gem_set_wedged \x09DMA_FADDR: 0x00000000_00000000
<7> [179.249733] i915_gem_set_wedged \x09IPEIR: 0x00000000
<7> [179.249739] i915_gem_set_wedged \x09IPEHR: 0x00000000
<7> [179.249748] i915_gem_set_wedged \x09Execlist status: 0x00000001 00000000
<7> [179.249756] i915_gem_set_wedged \x09Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (disabled)
<7> [179.249761] i915_gem_set_wedged \x09\x09ELSP[0] idle
<7> [179.249765] i915_gem_set_wedged \x09\x09ELSP[1] idle
<7> [179.249769] i915_gem_set_wedged \x09\x09HW active? 0x0
<7> [179.249848] i915_gem_set_wedged \x09\x09Queue priority: -1024
<7> [179.249928] i915_gem_set_wedged \x09\x09Q 0 [5:16] prio=-1024 @ 3ms: (null)
<7> [179.249941] i915_gem_set_wedged IRQ? 0x0 (breadcrumbs? no)
<7> [179.249945] i915_gem_set_wedged HWSP:
<7> [179.249952] i915_gem_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.249956] i915_gem_set_wedged *
<7> [179.249965] i915_gem_set_wedged [0040] 00000001 00000000 00000018 00000000 00000001 00000000 00000018 00000003
<7> [179.249973] i915_gem_set_wedged [0060] 00008002 00000002 00008002 00000002 00000000 00000000 00000000 00000005
<7> [179.249980] i915_gem_set_wedged [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.249984] i915_gem_set_wedged *
<7> [179.249990] i915_gem_set_wedged [00c0] 00000046 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.249995] i915_gem_set_wedged [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.249999] i915_gem_set_wedged *
<7> [179.250007] i915_gem_set_wedged Idle? no
<7> [179.250014] i915_gem_set_wedged bcs0
<7> [179.250022] i915_gem_set_wedged \x09current seqno 4, last 4, hangcheck 4 [139 ms]
<7> [179.250027] i915_gem_set_wedged \x09Reset count: 0 (global 5)
<7> [179.250032] i915_gem_set_wedged \x09Requests:
<7> [179.250040] i915_gem_set_wedged \x09RING_START: 0x00000000
<7> [179.250047] i915_gem_set_wedged \x09RING_HEAD:  0x00000000
<7> [179.250059] i915_gem_set_wedged \x09RING_TAIL:  0x00000000
<7> [179.250069] i915_gem_set_wedged \x09RING_CTL:   0x00000000
<7> [179.250078] i915_gem_set_wedged \x09RING_MODE:  0x00000200 [idle]
<7> [179.250085] i915_gem_set_wedged \x09RING_IMR: ffffffff
<7> [179.250100] i915_gem_set_wedged \x09ACTHD:  0x00000000_00000000
<7> [179.250111] i915_gem_set_wedged \x09BBADDR: 0x00000000_00000000
<7> [179.250124] i915_gem_set_wedged \x09DMA_FADDR: 0x00000000_00000000
<7> [179.250131] i915_gem_set_wedged \x09IPEIR: 0x00000000
<7> [179.250139] i915_gem_set_wedged \x09IPEHR: 0x00000000
<7> [179.250149] i915_gem_set_wedged \x09Execlist status: 0x00000001 00000000
<7> [179.250157] i915_gem_set_wedged \x09Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (disabled)
<7> [179.250161] i915_gem_set_wedged \x09\x09ELSP[0] idle
<7> [179.250166] i915_gem_set_wedged \x09\x09ELSP[1] idle
<7> [179.250170] i915_gem_set_wedged \x09\x09HW active? 0x0
<7> [179.250178] i915_gem_set_wedged \x09\x09Queue priority: -1024
<7> [179.250186] i915_gem_set_wedged \x09\x09Q 0 [8:a] prio=-1024 @ 3ms: (null)
<7> [179.250193] i915_gem_set_wedged IRQ? 0x0 (breadcrumbs? no)
<7> [179.250198] i915_gem_set_wedged HWSP:
<7> [179.250204] i915_gem_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250208] i915_gem_set_wedged *
<7> [179.250215] i915_gem_set_wedged [0040] 00000001 00000000 00000018 00000000 00000001 00000000 00000018 00000003
<7> [179.250221] i915_gem_set_wedged [0060] 00000018 00000000 00000001 00000000 00000000 00000000 00000000 00000005
<7> [179.250229] i915_gem_set_wedged [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250234] i915_gem_set_wedged *
<7> [179.250242] i915_gem_set_wedged [00c0] 00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250249] i915_gem_set_wedged [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250253] i915_gem_set_wedged *
<7> [179.250258] i915_gem_set_wedged Idle? no
<7> [179.250263] i915_gem_set_wedged vcs0
<7> [179.250267] i915_gem_set_wedged \x09current seqno 4, last 4, hangcheck 4 [139 ms]
<7> [179.250275] i915_gem_set_wedged \x09Reset count: 0 (global 5)
<7> [179.250282] i915_gem_set_wedged \x09Requests:
<7> [179.250292] i915_gem_set_wedged \x09RING_START: 0x00000000
<7> [179.250299] i915_gem_set_wedged \x09RING_HEAD:  0x00000000
<7> [179.250305] i915_gem_set_wedged \x09RING_TAIL:  0x00000000
<7> [179.250315] i915_gem_set_wedged \x09RING_CTL:   0x00000000
<7> [179.250326] i915_gem_set_wedged \x09RING_MODE:  0x00000200 [idle]
<7> [179.250334] i915_gem_set_wedged \x09RING_IMR: ffffffff
<7> [179.250346] i915_gem_set_wedged \x09ACTHD:  0x00000000_00000000
<7> [179.250358] i915_gem_set_wedged \x09BBADDR: 0x00000000_00000000
<7> [179.250371] i915_gem_set_wedged \x09DMA_FADDR: 0x00000000_00000000
<7> [179.250378] i915_gem_set_wedged \x09IPEIR: 0x00000000
<7> [179.250385] i915_gem_set_wedged \x09IPEHR: 0x00000000
<7> [179.250395] i915_gem_set_wedged \x09Execlist status: 0x00000001 00000000
<7> [179.250404] i915_gem_set_wedged \x09Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (disabled)
<7> [179.250410] i915_gem_set_wedged \x09\x09ELSP[0] idle
<7> [179.250414] i915_gem_set_wedged \x09\x09ELSP[1] idle
<7> [179.250418] i915_gem_set_wedged \x09\x09HW active? 0x0
<7> [179.250427] i915_gem_set_wedged \x09\x09Queue priority: -1024
<7> [179.250434] i915_gem_set_wedged \x09\x09Q 0 [b:7] prio=-1024 @ 3ms: (null)
<7> [179.250439] i915_gem_set_wedged IRQ? 0x0 (breadcrumbs? no)
<7> [179.250443] i915_gem_set_wedged HWSP:
<7> [179.250450] i915_gem_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250456] i915_gem_set_wedged *
<7> [179.250463] i915_gem_set_wedged [0040] 00000001 00000000 00000018 00000000 00000001 00000000 00000018 00000003
<7> [179.250471] i915_gem_set_wedged [0060] 00008002 00000002 00000018 00000002 00000000 00000000 00000000 00000005
<7> [179.250477] i915_gem_set_wedged [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250480] i915_gem_set_wedged *
<7> [179.250486] i915_gem_set_wedged [00c0] 00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250493] i915_gem_set_wedged [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250499] i915_gem_set_wedged *
<7> [179.250506] i915_gem_set_wedged Idle? no
<7> [179.250511] i915_gem_set_wedged vecs0
<7> [179.250515] i915_gem_set_wedged \x09current seqno 4, last 4, hangcheck 4 [139 ms]
<7> [179.250519] i915_gem_set_wedged \x09Reset count: 0 (global 5)
<7> [179.250524] i915_gem_set_wedged \x09Requests:
<7> [179.250534] i915_gem_set_wedged \x09RING_START: 0x00000000
<7> [179.250542] i915_gem_set_wedged \x09RING_HEAD:  0x00000000
<7> [179.250550] i915_gem_set_wedged \x09RING_TAIL:  0x00000000
<7> [179.250558] i915_gem_set_wedged \x09RING_CTL:   0x00000000
<7> [179.250569] i915_gem_set_wedged \x09RING_MODE:  0x00000200 [idle]
<7> [179.250579] i915_gem_set_wedged \x09RING_IMR: ffffffff
<7> [179.250591] i915_gem_set_wedged \x09ACTHD:  0x00000000_00000000
<7> [179.250603] i915_gem_set_wedged \x09BBADDR: 0x00000000_00000000
<7> [179.250614] i915_gem_set_wedged \x09DMA_FADDR: 0x00000000_00000000
<7> [179.250622] i915_gem_set_wedged \x09IPEIR: 0x00000000
<7> [179.250629] i915_gem_set_wedged \x09IPEHR: 0x00000000
<7> [179.250639] i915_gem_set_wedged \x09Execlist status: 0x00000001 00000000
<7> [179.250646] i915_gem_set_wedged \x09Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (disabled)
<7> [179.250656] i915_gem_set_wedged \x09\x09ELSP[0] idle
<7> [179.250661] i915_gem_set_wedged \x09\x09ELSP[1] idle
<7> [179.250665] i915_gem_set_wedged \x09\x09HW active? 0x0
<7> [179.250670] i915_gem_set_wedged \x09\x09Queue priority: -1024
<7> [179.250675] i915_gem_set_wedged \x09\x09Q 0 [e:9] prio=-1024 @ 3ms: (null)
<7> [179.250704] i915_gem_set_wedged IRQ? 0x0 (breadcrumbs? no)
<7> [179.250710] i915_gem_set_wedged HWSP:
<7> [179.250717] i915_gem_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250721] i915_gem_set_wedged *
<7> [179.250727] i915_gem_set_wedged [0040] 00000001 00000000 00000018 00000000 00000001 00000000 00000018 00000003
<7> [179.250732] i915_gem_set_wedged [0060] 00000018 00000002 00000001 00000000 00000000 00000000 00000000 00000005
<7> [179.250739] i915_gem_set_wedged [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250745] i915_gem_set_wedged *
<7> [179.250753] i915_gem_set_wedged [00c0] 00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250759] i915_gem_set_wedged [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250763] i915_gem_set_wedged *
<7> [179.250768] i915_gem_set_wedged Idle? no
<3> [179.251251] [drm:i915_reset [i915]] *ERROR* Failed to initialise HW following reset (-5)
Comment 1 Chris Wilson 2018-09-17 07:40:09 UTC
In the init_hw, we found the device was still wedged; so either we failed to clear the wedged status at the start of the reset (with reporting an error) or another thread set-wedged during the reset. As the first is impossible, the race seems more likely.
Comment 2 Lakshmi 2018-10-12 10:11:43 UTC
This issue occurred only once 3 weeks 6 days.
Comment 3 Martin Peres 2018-10-30 14:41:08 UTC
Also seen on GLK: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5051/shard-glk3/igt@gem_eio@in-flight-suspend.html

<3> [909.326086] [drm:i915_reset [i915]] *ERROR* Failed to initialise HW following reset (-5)
Comment 4 Martin Peres 2018-11-29 14:37:35 UTC
Also seen on ICL: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5219/shard-iclb5/igt@gem_eio@in-flight-suspend.html

<3> [1705.709370] [drm:i915_reset [i915]] *ERROR* Failed to initialise HW following reset (-5)
Comment 5 Francesco Balestrieri 2019-01-09 07:46:10 UTC
Last occurrence three weeks ago: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_173/fi-glk-j4005/igt@gem_eio@in-flight-suspend.html

Was happening sporadically before that so we need to keep monitoring, but I'm lowering the priority for now.
Comment 6 Francesco Balestrieri 2019-03-18 11:01:17 UTC
This issue hasn't shown up in 2 months, but on the other hand the interval between previous occurrences was every 1-2 months so there is not guarantee it's gone, at least by just looking at the CI result. I'm lowering priority and unassigning it for now.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.