Bug 110946 - [CI][BAT] igt@gem_sync@basic-store-each - fail - Failed assertion: !"GPU hung"
Summary: [CI][BAT] igt@gem_sync@basic-store-each - fail - Failed assertion: !"GPU hung"
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 110942 110948 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-06-19 06:00 UTC by Martin Peres
Modified: 2019-09-09 08:10 UTC (History)
2 users (show)

See Also:
i915 platform: BDW, BXT, ICL, KBL, SKL
i915 features: GEM/Other


Attachments

Description Martin Peres 2019-06-19 06:00:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6297/fi-bdw-5557u/igt@gem_sync@basic-store-each.html

Starting subtest: basic-store-each
(gem_sync:2797) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:502:
(gem_sync:2797) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-store-each failed.
Comment 1 CI Bug Log 2019-06-19 06:01:13 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* BDW KBL: igt@gem_sync@basic-store-each - fail - Failed assertion: !"GPU hung"
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6297/fi-bdw-5557u/igt@gem_sync@basic-store-each.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6299/fi-kbl-7567u/igt@gem_sync@basic-store-each.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13337/fi-bdw-5557u/igt@gem_sync@basic-store-each.html
Comment 2 Chris Wilson 2019-06-19 08:43:50 UTC
Odd. They have the same look of the currently executing ring finished but we did not switch to the second port and didn't get a completion CS. Nothing looked funky in recent commits, but this definitely shouldn't be happening and will cause unpleasant GPU stalls.
Comment 3 Chris Wilson 2019-06-19 09:54:45 UTC
Given the indication that engine idling is too soon, this could be more fallout.
Comment 4 Chris Wilson 2019-06-19 10:28:09 UTC
*** Bug 110942 has been marked as a duplicate of this bug. ***
Comment 5 Chris Wilson 2019-06-19 10:32:15 UTC
[  386.221128] i915: Running i915_gem_context_live_selftests/igt_vm_isolation
[  386.345411] assert_pending_valid:646 GEM_BUG_ON(!intel_context_is_pinned(ce))
Comment 6 CI Bug Log 2019-06-19 11:26:31 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BDW KBL: igt@gem_sync@basic-store-each - fail - Failed assertion: !"GPU hung" -}
{+ BDW SKL KBL: igt@gem_sync@basic-store-each - fail - Failed assertion: !"GPU hung" +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6293/shard-skl6/igt@gem_sync@basic-store-each.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6301/fi-skl-iommu/igt@gem_sync@basic-store-each.html
Comment 7 Chris Wilson 2019-06-19 12:58:16 UTC
*** Bug 110948 has been marked as a duplicate of this bug. ***
Comment 8 Eero Tamminen 2019-06-19 14:20:07 UTC
This bug is marked for BDW & KBL.  Hang that was duplicated to this, was happening BXT/SKL/KBL, but not BDW, and another duplicate on ICL => added ICL/SKL/BXT to platforms.

(Bug 110948 has BXT/SKL/KBL error states as attachments.)
Comment 9 Chris Wilson 2019-06-19 19:23:56 UTC
commit 09c5ab384f6fb30f834a5777888b4486dd7f015d (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 19 18:01:35 2019 +0100

    drm/i915: Keep rings pinned while the context is active
    
    Remember to keep the rings pinned as well as the context image until the
    GPU is no longer active.
    
    v2: Introduce a ring->pin_count primarily to hide the
    mock_ring that doesn't fit into the normal GGTT vma picture.
    
    v3: Order is important in teardown, ringbuffer submission needs to drop
    the pin count on the engine->kernel_context before it can gleefully free
    its ring.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110946
    Fixes: ce476c80b8bf ("drm/i915: Keep contexts pinned until after the next kernel context switch")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190619170135.15281-1-chris@chris-wilson.co.uk
Comment 10 CI Bug Log 2019-06-20 06:07:22 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BDW SKL KBL: igt@gem_sync@basic-store-each - fail - Failed assertion: !&quot;GPU hung&quot; -}
{+ BDW SKL KBL ICL: all tests - fail - Failed assertion: !&quot;GPU hung&quot; +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6304/fi-icl-dsi/igt@gem_exec_parallel@basic.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6309/fi-icl-dsi/igt@gem_exec_create@basic.html
Comment 11 CI Bug Log 2019-06-20 06:31:35 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BDW SKL KBL ICL: all tests - fail - Failed assertion: !&quot;GPU hung&quot; -}
{+ BDW SKL GLK KBL ICL: all tests - fail - Failed assertion: !&quot;GPU hung&quot; +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6300/shard-glk2/igt@gem_exec_create@forked.html
Comment 12 Eero Tamminen 2019-06-20 08:12:04 UTC
I'm not seeing anymore the hangs from duplicate bug 110948, so for my part I can verify the fix (except for BXT, due to bug 110848).
Comment 13 CI Bug Log 2019-09-09 08:10:09 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.