Bug 110987

Summary:	[CI][SHARDS] igt@gem_ctx_engines@independent - fail - Engine instance [2] executed too late
Product:	DRI	Reporter:	Martin Peres <martin.peres>
Component:	DRM/Intel	Assignee:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Status:	RESOLVED FIXED	QA Contact:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity:	normal
Priority:	medium	CC:	intel-gfx-bugs
Version:	XOrg git
Hardware:	Other
OS:	All
Whiteboard:	ReadyForDev
i915 platform:	GLK	i915 features:	GEM/Other

Description Martin Peres 2019-06-25 06:33:06 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6333/shard-glk9/igt@gem_ctx_engines@independent.html

Starting subtest: independent
(gem_ctx_engines:3353) CRITICAL: Test assertion failure function independent, file ../tests/i915/gem_ctx_engines.c:485:
(gem_ctx_engines:3353) CRITICAL: Failed assertion: (map[i] - last) > 0
(gem_ctx_engines:3353) CRITICAL: Engine instance [2] executed too late

Comment 1 CI Bug Log 2019-06-25 06:34:21 UTC

The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* GLK: igt@gem_ctx_engines@independent - fail - Engine instance [2] executed too late
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6333/shard-glk9/igt@gem_ctx_engines@independent.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13219/shard-glk4/igt@gem_ctx_engines@independent.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13255/shard-glk7/igt@gem_ctx_engines@independent.html

Comment 2 Chris Wilson 2019-06-26 10:20:33 UTC

There's a danger here as the test assumes in-order execution without explicit fencing (or else it is hard to say that each channel is independent), but we may trigger a timeslice evaluation in the middle and reorder.

The goal of the test is to say that the engine[] are distinct and have no inherent common timeline (i.e. they all have their own rings and timelines). So we set them up with a fence that encourages them to execute in the opposite order to submission.

Easiest way forward then would be to trickle feed the fences.

Comment 3 Francesco Balestrieri 2019-07-30 04:21:17 UTC

Seen once in a month, although out of only 8 runs it would seem. Based on this, and the description of the test, I'm setting the priority to medium.

Comment 4 Chris Wilson 2019-08-03 11:13:46 UTC

https://patchwork.freedesktop.org/patch/320668/?series=64451&rev=1

Comment 5 Chris Wilson 2019-08-13 15:43:25 UTC

I claim
commit bfd7241fa594d772e1414574e09d1e4d9fa6643a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 26 11:26:10 2019 +0100

    i915/gem_ctx_engine: Drip feed requests into 'independent'
    
    The intent of the test is to exercise that each channel in the engine[]
    is an independent context/ring/timeline. It setups 64 channels pointing
    to rcs0 and then submits one request to each in turn waiting on a
    timeline that will force them to run out of submission order. They can
    only run in fence order and not submission order if the timelines of
    each channel are truly independent.
    
    However, we released the fences en masse, and once the requests are
    ready they are independent and may be executed in any order by the HW,
    especially true with timeslicing that may reorder the requests on a
    whim. So instead of releasing all requests at once, increment the
    timeline step by step and check we get our results advancing. If the
    requests can not be run in fence order and fall back to submission
    order, we will time out waiting for our incremental results and trigger
    a few GPU hangs.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110987
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Andi Shyti <andi.shyti@intel.com>

is the fix here.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.