Bug 110380 - [CI][SHARDS] igt@gem_exec_schedule@semaphore-codependency - fail - Failed assertion: !"GPU hung"
Summary: [CI][SHARDS] igt@gem_exec_schedule@semaphore-codependency - fail - Failed ass...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-10 13:07 UTC by Martin Peres
Modified: 2019-04-17 07:57 UTC (History)
1 user (show)

See Also:
i915 platform: BXT, GLK, ICL, KBL
i915 features: GEM/Other


Attachments

Description Martin Peres 2019-04-10 13:07:50 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4937/shard-glk6/igt@gem_exec_schedule@semaphore-codependency.html

Starting subtest: semaphore-codependency
(gem_exec_schedule:12832) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:501:
(gem_exec_schedule:12832) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest semaphore-codependency failed.
Comment 1 CI Bug Log 2019-04-10 13:08:57 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* APL KBL GLK ICL: igt@gem_exec_schedule@semaphore-codependency - fail - Failed assertion: !"GPU hung"
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2832/shard-apl6/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2832/shard-iclb6/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2832/shard-kbl1/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2833/shard-apl4/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2833/shard-glk8/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2833/shard-kbl2/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4937/shard-apl2/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4937/shard-glk6/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4937/shard-iclb6/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4937/shard-kbl4/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2834/shard-apl8/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2834/shard-glk5/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2834/shard-kbl2/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4936/shard-glk8/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4936/shard-iclb5/igt@gem_exec_schedule@semaphore-codependency.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4936/shard-kbl6/igt@gem_exec_schedule@semaphore-codependency.html
Comment 2 Chris Wilson 2019-04-10 13:11:34 UTC
Hindsight is perfect. If only I had thought of this test before writing the code.

Fix all ready to go, https://patchwork.freedesktop.org/series/59232/ just waiting for a report from the media guys if it fixes their perf regression. Maybe they'll even get around to reporting a bug.
Comment 3 CI Bug Log 2019-04-10 14:22:57 UTC
A CI Bug Log filter associated to this bug has been updated:

{- APL KBL GLK ICL: igt@gem_exec_schedule@semaphore-codependency - fail - Failed assertion: !"GPU hung" -}
{+ SKL APL KBL GLK ICL: igt@gem_exec_schedule@semaphore-codependency - fail - Failed assertion: !"GPU hung" +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5898/shard-skl8/igt@gem_exec_schedule@semaphore-codependency.html
Comment 4 Chris Wilson 2019-04-11 06:43:59 UTC
commit b7404c7ecb38b66f103cec694e23a8e99252829e (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 9 16:29:22 2019 +0100

    drm/i915: Bump ready tasks ahead of busywaits
    
    Consider two tasks that are running in parallel on a pair of engines
    (vcs0, vcs1), but then must complete on a shared engine (rcs0). To
    maximise throughput, we want to run the first ready task on rcs0 (i.e.
    the first task that completes on either of vcs0 or vcs1). When using
    semaphores, however, we will instead queue onto rcs in submission order.
    
    To resolve this incorrect ordering, we want to re-evaluate the priority
    queue when each of the request is ready. Normally this happens because
    we only insert into the priority queue requests that are ready, but with
    semaphores we are inserting ahead of their readiness and to compensate
    we penalize those tasks with reduced priority (so that tasks that do not
    need to busywait should naturally be run first). However, given a series
    of tasks that each use semaphores, the queue degrades into submission
    fifo rather than readiness fifo, and so to counter this we give a small
    boost to semaphore users as their dependent tasks are completed (and so
    we no longer require any busywait prior to running the user task as they
    are then ready themselves).
    
    v2: Fixup irqsave for schedule_lock (Tvrtko)
    
    Testcase: igt/gem_exec_schedule/semaphore-codependency
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
    Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190409152922.23894-1-chris@chris-wilson.co.uk
Comment 5 Martin Peres 2019-04-17 07:57:26 UTC
(In reply to Chris Wilson from comment #4)
> commit b7404c7ecb38b66f103cec694e23a8e99252829e (HEAD ->
> drm-intel-next-queued, drm-intel/for-linux-next,
> drm-intel/drm-intel-next-queued)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Apr 9 16:29:22 2019 +0100
> 
>     drm/i915: Bump ready tasks ahead of busywaits
>     
>     Consider two tasks that are running in parallel on a pair of engines
>     (vcs0, vcs1), but then must complete on a shared engine (rcs0). To
>     maximise throughput, we want to run the first ready task on rcs0 (i.e.
>     the first task that completes on either of vcs0 or vcs1). When using
>     semaphores, however, we will instead queue onto rcs in submission order.
>     
>     To resolve this incorrect ordering, we want to re-evaluate the priority
>     queue when each of the request is ready. Normally this happens because
>     we only insert into the priority queue requests that are ready, but with
>     semaphores we are inserting ahead of their readiness and to compensate
>     we penalize those tasks with reduced priority (so that tasks that do not
>     need to busywait should naturally be run first). However, given a series
>     of tasks that each use semaphores, the queue degrades into submission
>     fifo rather than readiness fifo, and so to counter this we give a small
>     boost to semaphore users as their dependent tasks are completed (and so
>     we no longer require any busywait prior to running the user task as they
>     are then ready themselves).
>     
>     v2: Fixup irqsave for schedule_lock (Tvrtko)
>     
>     Testcase: igt/gem_exec_schedule/semaphore-codependency
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>     Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
>     Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20190409152922.23894-1-
> chris@chris-wilson.co.uk

Thanks, this definitely fixed the issue! It used to fail multiple times per run (~3) and now not seen in 36 runs.
Comment 6 CI Bug Log 2019-04-17 07:57:34 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.