Bug 107957 - [CI][SHARDS] igt@gem_eio@in-flight-suspend - dmesg-warn - Failed to initialise HW following reset (-5)
Summary: [CI][SHARDS] igt@gem_eio@in-flight-suspend - dmesg-warn - Failed to initialis...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: low normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-17 07:30 UTC by Martin Peres
Modified: 2019-07-31 12:13 UTC (History)
1 user (show)

See Also:
i915 platform: BXT, GLK, ICL
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-09-17 07:30:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4829/shard-apl5/igt@gem_eio@in-flight-suspend.html

[179.251251] [drm:i915_reset [i915]] *ERROR* Failed to initialise HW following reset (-5)

Relevant part information leading to this line:
<7> [179.246841] [drm:i915_reset_device [i915]] resetting chip
<5> [179.247282] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [179.247880] [drm:drm_dp_i2c_do_msg] native defer
<7> [179.249532] i915_gem_set_wedged rcs0
<7> [179.249540] i915_gem_set_wedged \x09current seqno 46, last 46, hangcheck 46 [138 ms]
<7> [179.249545] i915_gem_set_wedged \x09Reset count: 1 (global 5)
<7> [179.249554] [drm:drm_dp_i2c_do_msg] native defer
<7> [179.249561] i915_gem_set_wedged \x09Requests:
<7> [179.249606] i915_gem_set_wedged \x09RING_START: 0x00000000
<7> [179.249613] i915_gem_set_wedged \x09RING_HEAD:  0x00000000
<7> [179.249620] i915_gem_set_wedged \x09RING_TAIL:  0x00000000
<7> [179.249629] i915_gem_set_wedged \x09RING_CTL:   0x00000000
<7> [179.249638] i915_gem_set_wedged \x09RING_MODE:  0x00000200 [idle]
<7> [179.249645] i915_gem_set_wedged \x09RING_IMR: ffffffff
<7> [179.249657] i915_gem_set_wedged \x09ACTHD:  0x00000000_00000000
<7> [179.249668] i915_gem_set_wedged \x09BBADDR: 0x00000000_00000000
<7> [179.249726] i915_gem_set_wedged \x09DMA_FADDR: 0x00000000_00000000
<7> [179.249733] i915_gem_set_wedged \x09IPEIR: 0x00000000
<7> [179.249739] i915_gem_set_wedged \x09IPEHR: 0x00000000
<7> [179.249748] i915_gem_set_wedged \x09Execlist status: 0x00000001 00000000
<7> [179.249756] i915_gem_set_wedged \x09Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (disabled)
<7> [179.249761] i915_gem_set_wedged \x09\x09ELSP[0] idle
<7> [179.249765] i915_gem_set_wedged \x09\x09ELSP[1] idle
<7> [179.249769] i915_gem_set_wedged \x09\x09HW active? 0x0
<7> [179.249848] i915_gem_set_wedged \x09\x09Queue priority: -1024
<7> [179.249928] i915_gem_set_wedged \x09\x09Q 0 [5:16] prio=-1024 @ 3ms: (null)
<7> [179.249941] i915_gem_set_wedged IRQ? 0x0 (breadcrumbs? no)
<7> [179.249945] i915_gem_set_wedged HWSP:
<7> [179.249952] i915_gem_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.249956] i915_gem_set_wedged *
<7> [179.249965] i915_gem_set_wedged [0040] 00000001 00000000 00000018 00000000 00000001 00000000 00000018 00000003
<7> [179.249973] i915_gem_set_wedged [0060] 00008002 00000002 00008002 00000002 00000000 00000000 00000000 00000005
<7> [179.249980] i915_gem_set_wedged [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.249984] i915_gem_set_wedged *
<7> [179.249990] i915_gem_set_wedged [00c0] 00000046 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.249995] i915_gem_set_wedged [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.249999] i915_gem_set_wedged *
<7> [179.250007] i915_gem_set_wedged Idle? no
<7> [179.250014] i915_gem_set_wedged bcs0
<7> [179.250022] i915_gem_set_wedged \x09current seqno 4, last 4, hangcheck 4 [139 ms]
<7> [179.250027] i915_gem_set_wedged \x09Reset count: 0 (global 5)
<7> [179.250032] i915_gem_set_wedged \x09Requests:
<7> [179.250040] i915_gem_set_wedged \x09RING_START: 0x00000000
<7> [179.250047] i915_gem_set_wedged \x09RING_HEAD:  0x00000000
<7> [179.250059] i915_gem_set_wedged \x09RING_TAIL:  0x00000000
<7> [179.250069] i915_gem_set_wedged \x09RING_CTL:   0x00000000
<7> [179.250078] i915_gem_set_wedged \x09RING_MODE:  0x00000200 [idle]
<7> [179.250085] i915_gem_set_wedged \x09RING_IMR: ffffffff
<7> [179.250100] i915_gem_set_wedged \x09ACTHD:  0x00000000_00000000
<7> [179.250111] i915_gem_set_wedged \x09BBADDR: 0x00000000_00000000
<7> [179.250124] i915_gem_set_wedged \x09DMA_FADDR: 0x00000000_00000000
<7> [179.250131] i915_gem_set_wedged \x09IPEIR: 0x00000000
<7> [179.250139] i915_gem_set_wedged \x09IPEHR: 0x00000000
<7> [179.250149] i915_gem_set_wedged \x09Execlist status: 0x00000001 00000000
<7> [179.250157] i915_gem_set_wedged \x09Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (disabled)
<7> [179.250161] i915_gem_set_wedged \x09\x09ELSP[0] idle
<7> [179.250166] i915_gem_set_wedged \x09\x09ELSP[1] idle
<7> [179.250170] i915_gem_set_wedged \x09\x09HW active? 0x0
<7> [179.250178] i915_gem_set_wedged \x09\x09Queue priority: -1024
<7> [179.250186] i915_gem_set_wedged \x09\x09Q 0 [8:a] prio=-1024 @ 3ms: (null)
<7> [179.250193] i915_gem_set_wedged IRQ? 0x0 (breadcrumbs? no)
<7> [179.250198] i915_gem_set_wedged HWSP:
<7> [179.250204] i915_gem_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250208] i915_gem_set_wedged *
<7> [179.250215] i915_gem_set_wedged [0040] 00000001 00000000 00000018 00000000 00000001 00000000 00000018 00000003
<7> [179.250221] i915_gem_set_wedged [0060] 00000018 00000000 00000001 00000000 00000000 00000000 00000000 00000005
<7> [179.250229] i915_gem_set_wedged [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250234] i915_gem_set_wedged *
<7> [179.250242] i915_gem_set_wedged [00c0] 00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250249] i915_gem_set_wedged [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250253] i915_gem_set_wedged *
<7> [179.250258] i915_gem_set_wedged Idle? no
<7> [179.250263] i915_gem_set_wedged vcs0
<7> [179.250267] i915_gem_set_wedged \x09current seqno 4, last 4, hangcheck 4 [139 ms]
<7> [179.250275] i915_gem_set_wedged \x09Reset count: 0 (global 5)
<7> [179.250282] i915_gem_set_wedged \x09Requests:
<7> [179.250292] i915_gem_set_wedged \x09RING_START: 0x00000000
<7> [179.250299] i915_gem_set_wedged \x09RING_HEAD:  0x00000000
<7> [179.250305] i915_gem_set_wedged \x09RING_TAIL:  0x00000000
<7> [179.250315] i915_gem_set_wedged \x09RING_CTL:   0x00000000
<7> [179.250326] i915_gem_set_wedged \x09RING_MODE:  0x00000200 [idle]
<7> [179.250334] i915_gem_set_wedged \x09RING_IMR: ffffffff
<7> [179.250346] i915_gem_set_wedged \x09ACTHD:  0x00000000_00000000
<7> [179.250358] i915_gem_set_wedged \x09BBADDR: 0x00000000_00000000
<7> [179.250371] i915_gem_set_wedged \x09DMA_FADDR: 0x00000000_00000000
<7> [179.250378] i915_gem_set_wedged \x09IPEIR: 0x00000000
<7> [179.250385] i915_gem_set_wedged \x09IPEHR: 0x00000000
<7> [179.250395] i915_gem_set_wedged \x09Execlist status: 0x00000001 00000000
<7> [179.250404] i915_gem_set_wedged \x09Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (disabled)
<7> [179.250410] i915_gem_set_wedged \x09\x09ELSP[0] idle
<7> [179.250414] i915_gem_set_wedged \x09\x09ELSP[1] idle
<7> [179.250418] i915_gem_set_wedged \x09\x09HW active? 0x0
<7> [179.250427] i915_gem_set_wedged \x09\x09Queue priority: -1024
<7> [179.250434] i915_gem_set_wedged \x09\x09Q 0 [b:7] prio=-1024 @ 3ms: (null)
<7> [179.250439] i915_gem_set_wedged IRQ? 0x0 (breadcrumbs? no)
<7> [179.250443] i915_gem_set_wedged HWSP:
<7> [179.250450] i915_gem_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250456] i915_gem_set_wedged *
<7> [179.250463] i915_gem_set_wedged [0040] 00000001 00000000 00000018 00000000 00000001 00000000 00000018 00000003
<7> [179.250471] i915_gem_set_wedged [0060] 00008002 00000002 00000018 00000002 00000000 00000000 00000000 00000005
<7> [179.250477] i915_gem_set_wedged [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250480] i915_gem_set_wedged *
<7> [179.250486] i915_gem_set_wedged [00c0] 00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250493] i915_gem_set_wedged [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250499] i915_gem_set_wedged *
<7> [179.250506] i915_gem_set_wedged Idle? no
<7> [179.250511] i915_gem_set_wedged vecs0
<7> [179.250515] i915_gem_set_wedged \x09current seqno 4, last 4, hangcheck 4 [139 ms]
<7> [179.250519] i915_gem_set_wedged \x09Reset count: 0 (global 5)
<7> [179.250524] i915_gem_set_wedged \x09Requests:
<7> [179.250534] i915_gem_set_wedged \x09RING_START: 0x00000000
<7> [179.250542] i915_gem_set_wedged \x09RING_HEAD:  0x00000000
<7> [179.250550] i915_gem_set_wedged \x09RING_TAIL:  0x00000000
<7> [179.250558] i915_gem_set_wedged \x09RING_CTL:   0x00000000
<7> [179.250569] i915_gem_set_wedged \x09RING_MODE:  0x00000200 [idle]
<7> [179.250579] i915_gem_set_wedged \x09RING_IMR: ffffffff
<7> [179.250591] i915_gem_set_wedged \x09ACTHD:  0x00000000_00000000
<7> [179.250603] i915_gem_set_wedged \x09BBADDR: 0x00000000_00000000
<7> [179.250614] i915_gem_set_wedged \x09DMA_FADDR: 0x00000000_00000000
<7> [179.250622] i915_gem_set_wedged \x09IPEIR: 0x00000000
<7> [179.250629] i915_gem_set_wedged \x09IPEHR: 0x00000000
<7> [179.250639] i915_gem_set_wedged \x09Execlist status: 0x00000001 00000000
<7> [179.250646] i915_gem_set_wedged \x09Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (disabled)
<7> [179.250656] i915_gem_set_wedged \x09\x09ELSP[0] idle
<7> [179.250661] i915_gem_set_wedged \x09\x09ELSP[1] idle
<7> [179.250665] i915_gem_set_wedged \x09\x09HW active? 0x0
<7> [179.250670] i915_gem_set_wedged \x09\x09Queue priority: -1024
<7> [179.250675] i915_gem_set_wedged \x09\x09Q 0 [e:9] prio=-1024 @ 3ms: (null)
<7> [179.250704] i915_gem_set_wedged IRQ? 0x0 (breadcrumbs? no)
<7> [179.250710] i915_gem_set_wedged HWSP:
<7> [179.250717] i915_gem_set_wedged [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250721] i915_gem_set_wedged *
<7> [179.250727] i915_gem_set_wedged [0040] 00000001 00000000 00000018 00000000 00000001 00000000 00000018 00000003
<7> [179.250732] i915_gem_set_wedged [0060] 00000018 00000002 00000001 00000000 00000000 00000000 00000000 00000005
<7> [179.250739] i915_gem_set_wedged [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250745] i915_gem_set_wedged *
<7> [179.250753] i915_gem_set_wedged [00c0] 00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250759] i915_gem_set_wedged [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [179.250763] i915_gem_set_wedged *
<7> [179.250768] i915_gem_set_wedged Idle? no
<3> [179.251251] [drm:i915_reset [i915]] *ERROR* Failed to initialise HW following reset (-5)
Comment 1 Chris Wilson 2018-09-17 07:40:09 UTC
In the init_hw, we found the device was still wedged; so either we failed to clear the wedged status at the start of the reset (with reporting an error) or another thread set-wedged during the reset. As the first is impossible, the race seems more likely.
Comment 2 Lakshmi 2018-10-12 10:11:43 UTC
This issue occurred only once 3 weeks 6 days.
Comment 3 Martin Peres 2018-10-30 14:41:08 UTC
Also seen on GLK: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5051/shard-glk3/igt@gem_eio@in-flight-suspend.html

<3> [909.326086] [drm:i915_reset [i915]] *ERROR* Failed to initialise HW following reset (-5)
Comment 4 Martin Peres 2018-11-29 14:37:35 UTC
Also seen on ICL: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5219/shard-iclb5/igt@gem_eio@in-flight-suspend.html

<3> [1705.709370] [drm:i915_reset [i915]] *ERROR* Failed to initialise HW following reset (-5)
Comment 5 Francesco Balestrieri 2019-01-09 07:46:10 UTC
Last occurrence three weeks ago: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_173/fi-glk-j4005/igt@gem_eio@in-flight-suspend.html

Was happening sporadically before that so we need to keep monitoring, but I'm lowering the priority for now.
Comment 6 Francesco Balestrieri 2019-03-18 11:01:17 UTC
This issue hasn't shown up in 2 months, but on the other hand the interval between previous occurrences was every 1-2 months so there is not guarantee it's gone, at least by just looking at the CI result. I'm lowering priority and unassigning it for now.
Comment 7 Chris Wilson 2019-03-23 22:17:07 UTC
commit 9a3b19a16dc28ab717cf1663d09ffee0715b735a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Feb 13 23:20:47 2019 +0000

    drm/i915: Only try to park engines after a failed reset
    
    Currently we try to stop the engine by programming the ring registers to
    be disabled before we perform the reset. Sometimes, we see the context
    image also have invalid ring registers, which one presumes may be
    actually caused by us doing so. Lets risk not doing programming the
    ring to zero on the first attempt to avoid preserving that corruption
    into the context image, leaving the w/a in place for subsequent
    reset attempts.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190213232047.8486-1-ch
ris@chris-wilson.co.uk
Comment 8 Lakshmi 2019-07-31 12:13:14 UTC
Thanks, closing this bug as fixed. No occurrences from last months or so.
Comment 9 CI Bug Log 2019-07-31 12:13:20 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.