Bug 104676 - [CI] igt@gem_eio@in-flight* - fail - Failed assertion: sync_fence_status(fence[n]) == -5
Summary: [CI] igt@gem_eio@in-flight* - fail - Failed assertion: sync_fence_status(fenc...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Marta Löfstedt
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-17 13:44 UTC by Marta Löfstedt
Modified: 2018-02-26 08:00 UTC (History)
1 user (show)

See Also:
i915 platform: HSW
i915 features: GEM/Other


Attachments

Description Marta Löfstedt 2018-01-17 13:44:03 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3640/shard-hsw8/igt@gem_eio@in-flight-contexts.html

(gem_eio:2714) CRITICAL: Test assertion failure function test_inflight_contexts, file gem_eio.c:333:
(gem_eio:2714) CRITICAL: Failed assertion: sync_fence_status(fence[n]) == -5
(gem_eio:2714) CRITICAL: error: 1 != -5
Subtest in-flight-contexts failed.
Comment 1 Chris Wilson 2018-01-17 14:47:23 UTC
Also a couple runs earlier in gem_eio/in-flight-suspend. The check says that the fence completed normally and was not detected as causing a hang (corresponds with the test completing far too quickly as well). So the spinner didn't spin? Fishy.
Comment 2 Marta Löfstedt 2018-01-18 06:54:17 UTC
Also,
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4151/shard-hsw1/igt@gem_eio@in-flight.html

(gem_eio:1741) CRITICAL: Test assertion failure function test_inflight, file gem_eio.c:222:
(gem_eio:1741) CRITICAL: Failed assertion: sync_fence_status(fence[n]) == -5
(gem_eio:1741) CRITICAL: error: 1 != -5
Subtest in-flight failed.
Comment 3 Marta Löfstedt 2018-01-18 06:55:00 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3639/shard-hsw7/igt@gem_eio@in-flight-suspend.html

(gem_eio:1876) CRITICAL: Test assertion failure function test_inflight_suspend, file gem_eio.c:268:
(gem_eio:1876) CRITICAL: Failed assertion: sync_fence_status(fence[n]) == -5
(gem_eio:1876) CRITICAL: error: 1 != -5
Subtest in-flight-suspend failed.
Comment 5 Chris Wilson 2018-02-06 10:38:26 UTC
Having no luck yet catching this for myself. Nothing obviously looks wrong, but the test is complaining that the wedging didn't occur. The dmesg concurs in that there doesn't appear to be a reset in the middle of the test...
Comment 6 Chris Wilson 2018-02-23 10:20:13 UTC
Swapped out the quick hang injection for a slow spinner, seems to have fixed this on hsw, but found a whole new issue on execlists. (Seems to be that the CS interrupt is firing and kicking off the execlists as we are trying to prune it. tasklet_kill() where are you?)

commit 9ba3717a86553e15aa6e4aec8a77c2e3460fd4d3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 6 22:55:33 2018 +0000

    igt/gem_eio: Use slow spinners to inject hangs
    
    One weird issue we see in bug 104676 is that the hangs are too fast on
    HSW! So force the use of the slow spinners that do not try to trigger
    a hang by injecting random bytes into the batch.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104676
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Closing this bug report for cibuglog, Marta will you file a new one for the incompletes?
Comment 7 Marta Löfstedt 2018-02-23 10:56:34 UTC
(In reply to Chris Wilson from comment #6)
> Swapped out the quick hang injection for a slow spinner, seems to have fixed
> this on hsw, but found a whole new issue on execlists. (Seems to be that the
> CS interrupt is firing and kicking off the execlists as we are trying to
> prune it. tasklet_kill() where are you?)
> 
> commit 9ba3717a86553e15aa6e4aec8a77c2e3460fd4d3
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Feb 6 22:55:33 2018 +0000
> 
>     igt/gem_eio: Use slow spinners to inject hangs
>     
>     One weird issue we see in bug 104676 is that the hangs are too fast on
>     HSW! So force the use of the slow spinners that do not try to trigger
>     a hang by injecting random bytes into the batch.
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104676
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> Closing this bug report for cibuglog, Marta will you file a new one for the
> incompletes?

OK, I will monitor this over the weekend. As far as I can see there are no incompletes on igt@gem_eio@in-flight* on HSW.

The GLK, KBL, APL igt@gem_eio@in-flight* incompletes are on bug 104945. However, I am wondering if KBL is hitting a new issue now.
Comment 8 Marta Löfstedt 2018-02-23 11:02:53 UTC
The fix was already in CI_DRM_3815, all green on HSW, but is maybe the recent increase in incompletes on APL and KBL related? bug 104945


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.