Summary: | [CI] igt@gem_eio@in-flight-suspend - dmesg-warn - WARNING: CPU: 4 PID: 1503 at drivers/gpu/drm/i915/intel_ringbuffer.c | incomplete | ||
---|---|---|---|
Product: | DRI | Reporter: | Marta Löfstedt <marta.lofstedt> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | HSW, SNB | i915 features: | GEM/Other |
Description
Marta Löfstedt
2017-10-13 12:47:17 UTC
Also, incomplete on shards-SNB https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3228/shard-snb2/igt@gem_eio@in-flight-suspend.html <7>[ 41.696323] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x61/0x80 [i915], irq posted? no, current seqno=cf3a, last=cf7c <7>[ 49.696247] [drm:i915_reset_device [i915]] resetting chip <5>[ 49.696598] i915 0000:00:02.0: Resetting chip after gpu hang <7>[ 49.697285] [drm:i915_reset [i915]] GPU reset disabled ... <4>[ 54.749897] WARN_ON((dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const i915_reg_t){ .reg = (((engine)->mmio_base)+0x9c) })), true) & (1 << 9)) == 0) <4>[ 54.749918] ------------[ cut here ]------------ <4>[ 54.749968] WARNING: CPU: 4 PID: 1503 at drivers/gpu/drm/i915/intel_ringbuffer.c:448 init_ring_common+0x606/0x610 [i915] So at a basic level it is a side-effect of the test. As we disable the GPU reset to cause the EIO, the ring is not idle when we try to restart it. It looks like we can (a) always do stop-rings upon reset regardless of the availability of the GPU reset, and (b) extend the stop-ring coverage in init_ring_common() to not clear the STOP bit until after we are ready to restart. (In reply to Marta Löfstedt from comment #1) > Also, incomplete on shards-SNB > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3228/shard-snb2/ > igt@gem_eio@in-flight-suspend.html Unlikely to be this. My theory for those is https://patchwork.freedesktop.org/series/31848/ Note at least the test so far give is stable results. Both incomplete and dmesg-warn are also on CI_DRM_3227: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3227/shard-snb5/igt@gem_eio@in-flight-suspend.html and https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3227/shard-hsw4/igt@gem_eio@in-flight-suspend.html commit 5896a5c8c9c01b09af05b02cdb2ae275ef143959 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 13 14:12:18 2017 +0100 drm/i915: Always stop the rings before a missing GPU reset Always try to stop the rings, even if the GPU reset itself has been disabled (via modparam i915.reset). This should at least stop the hw from spinning in the background consuming resources (e.g. power and memory bandwidth) letting the system rest-in-peace. References: https://bugs.freedesktop.org/show_bug.cgi?id=103260 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171013131218.18013-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> commit 7836cd02f27c03af2fca04b450177c51fc7caf1e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 13 14:12:17 2017 +0100 drm/i915: Keep the rings stopped until they have been re-initialized Before modifying the ring register (RING_START, HEAD, TAIL, CTL) we first make sure it is stopped (or else the hw may not resample the registers). However, we do not need to let the hw restart until after we have reprogrammed all the rings. This should help prevent situations where pending operations on the ring may resume (because we are trying to re-initialize following an unsuccessful GPU hang, i.e. from i915_gem_unset_wedged). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103260 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171013131218.18013-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Above fixes was integrated into CI_DRM_3237. and the dmesg-warn for HSW-shards appear to be gone. However, the SNB incompletes are still present. According to CI results this specific warn haven't been seen in a while, neither the SNB incompletes with this test. So closing, thank you. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.