Summary: | [CI] igt@gem_eio@in-flight-external - incomplete - i915_gem_find_active_request:2880 GEM_BUG_ON((__builtin_constant_p((DMA_FENCE_FLAG_SIGNALED_BIT)) ? constant_test_bit((DMA_FENCE_FLAG_SIGNALED_BIT), (&request->fence.flags)) : variable_test_bit((DMA_FENCE | ||
---|---|---|---|
Product: | DRI | Reporter: | Marta Löfstedt <marta.lofstedt> |
Component: | DRM/Intel | Assignee: | Marta Löfstedt <marta.lofstedt> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=104945 https://bugs.freedesktop.org/show_bug.cgi?id=105358 |
||
Whiteboard: | ReadyForDev | ||
i915 platform: | HSW | i915 features: | GEM/Other |
Description
Marta Löfstedt
2018-03-05 06:40:09 UTC
Further set-wedge vs reset race. :| I closed bug 104945 now we are hitting this issue instead: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3875/shard-kbl5/igt@gem_eio@in-flight-external.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3877/shard-apl6/igt@gem_eio@in-flight-contexts.html From pstore: <5>[ 136.759626] i915 0000:00:02.0: Resetting chip after gpu hang <3>[ 136.759926] i915_gem_find_active_request:2880 GEM_BUG_ON((__builtin_constant_p((DMA_FENCE_FLAG_SIGNALED_BIT)) ? constant_test_bit((DMA_FENCE_FLAG_SIGNALED_BIT), (&request->fence.flags)) : variable_test_bit((DMA_FENCE_FLAG_SIGNALED_BIT), (&request->fence.flags)))) <4>[ 136.760269] ------------[ cut here ]------------ <2>[ 136.760274] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:2880! <4>[ 136.760325] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI Unfortunately this link is dead: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3878/shard-apl2/igt@gem_eio@in-flight.html However it is these links: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3878/shard-apl2/run6.log https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3878/shard-apl2/dmesg6.log https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3878/shard-apl2/pstore6-1520336826_Panic_3.log https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3903/fi-cnl-drrs/igt@gem_eio@in-flight-contexts.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3903/fi-cnl-y3/igt@gem_eio@in-flight-contexts.html commit ac697ae8013a7c7301174c9c3b02a92fe418b7ea Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Mar 15 15:10:15 2018 +0000 drm/i915: Stop engines when declaring the machine wedged If we fail to reset the GPU, we declare the machine wedged. However, the GPU may well still be running in the background with an in-flight request. So despite our efforts in cleaning up the request queue and faking the breadcrumb in the HWSP, the GPU may eventually write the in-flght seqno there breaking all of our assumptions and throwing the driver into a deep turmoil, wedging beyond wedged. To avoid this we ideally want to reset the GPU. Since that has already failed, make sure the rings have the stop bit set instead. This is part of the normal GPU reset sequence, but that is actually disabled by igt/gem_eio to force the wedged state. If we assume the worst, we must poke at the bit again before we give up. v2: Move the intel_gpu_reset() from set-wedged in the reset error path into i915_gem_set_wedged() itself. Even if the reset fails (e.g. if it is disabled by gem_eio), it still tries to make sure the engines are stopped. For i915_gem_set_wedged() callers from outside of i915_reset(), this should make sure the GPU is disabled while the driver is marked as being wedged. Testcase: igt/gem_eio Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180315151015.22741-1-chris@chris-wilson.co.uk It's looking green! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.