Bug 110987 - [CI][SHARDS] igt@gem_ctx_engines@independent - fail - Engine instance [2] executed too late
Summary: [CI][SHARDS] igt@gem_ctx_engines@independent - fail - Engine instance [2] exe...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-25 06:33 UTC by Martin Peres
Modified: 2019-08-13 15:43 UTC (History)
1 user (show)

See Also:
i915 platform: GLK
i915 features: GEM/Other


Attachments

Description Martin Peres 2019-06-25 06:33:06 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6333/shard-glk9/igt@gem_ctx_engines@independent.html

Starting subtest: independent
(gem_ctx_engines:3353) CRITICAL: Test assertion failure function independent, file ../tests/i915/gem_ctx_engines.c:485:
(gem_ctx_engines:3353) CRITICAL: Failed assertion: (map[i] - last) > 0
(gem_ctx_engines:3353) CRITICAL: Engine instance [2] executed too late
Comment 1 CI Bug Log 2019-06-25 06:34:21 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* GLK: igt@gem_ctx_engines@independent - fail - Engine instance [2] executed too late
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6333/shard-glk9/igt@gem_ctx_engines@independent.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13219/shard-glk4/igt@gem_ctx_engines@independent.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13255/shard-glk7/igt@gem_ctx_engines@independent.html
Comment 2 Chris Wilson 2019-06-26 10:20:33 UTC
There's a danger here as the test assumes in-order execution without explicit fencing (or else it is hard to say that each channel is independent), but we may trigger a timeslice evaluation in the middle and reorder.

The goal of the test is to say that the engine[] are distinct and have no inherent common timeline (i.e. they all have their own rings and timelines). So we set them up with a fence that encourages them to execute in the opposite order to submission.

Easiest way forward then would be to trickle feed the fences.
Comment 3 Francesco Balestrieri 2019-07-30 04:21:17 UTC
Seen once in a month, although out of only 8 runs it would seem. Based on this, and the description of the test, I'm setting the priority to medium.
Comment 5 Chris Wilson 2019-08-13 15:43:25 UTC
I claim
commit bfd7241fa594d772e1414574e09d1e4d9fa6643a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 26 11:26:10 2019 +0100

    i915/gem_ctx_engine: Drip feed requests into 'independent'
    
    The intent of the test is to exercise that each channel in the engine[]
    is an independent context/ring/timeline. It setups 64 channels pointing
    to rcs0 and then submits one request to each in turn waiting on a
    timeline that will force them to run out of submission order. They can
    only run in fence order and not submission order if the timelines of
    each channel are truly independent.
    
    However, we released the fences en masse, and once the requests are
    ready they are independent and may be executed in any order by the HW,
    especially true with timeslicing that may reorder the requests on a
    whim. So instead of releasing all requests at once, increment the
    timeline step by step and check we get our results advancing. If the
    requests can not be run in fence order and fall back to submission
    order, we will time out waiting for our incremental results and trigger
    a few GPU hangs.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110987
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Andi Shyti <andi.shyti@intel.com>

is the fix here.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.