Bug 106125

Summary: [CI] igt@gem_wait@wait-default - fail - Failed assertion: wait.timeout_ns > 0
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Francesco Balestrieri <francesco.balestrieri>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ALL i915 features: GEM/Other

Description Martin Peres 2018-04-18 10:59:27 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4062/shard-apl3/igt@gem_wait@wait-default.html

(gem_wait:1355) CRITICAL: Test assertion failure function basic, file ../tests/gem_wait.c:116:
(gem_wait:1355) CRITICAL: Failed assertion: wait.timeout_ns > 0
(gem_wait:1355) CRITICAL: Last errno: 62, Timer expired
Subtest wait-default failed.
Comment 1 Martin Peres 2018-04-20 12:27:01 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_23/fi-ivb-3520m/igt@gem_wait@wait-bsd.html

(gem_wait:2916) CRITICAL: Test assertion failure function basic, file ../tests/gem_wait.c:116:
(gem_wait:2916) CRITICAL: Failed assertion: wait.timeout_ns > 0
(gem_wait:2916) CRITICAL: Last errno: 62, Timer expired
Subtest wait-bsd failed.
Comment 2 Martin Peres 2018-05-02 12:12:00 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4119/fi-bxt-j4205/igt@gem_wait@basic-wait-all.html

(gem_wait:3555) CRITICAL: Test assertion failure function basic, file ../tests/gem_wait.c:116:
(gem_wait:3555) CRITICAL: Failed assertion: wait.timeout_ns > 0
(gem_wait:3555) CRITICAL: Last errno: 62, Timer expired
Subtest basic-wait-all failed.
Comment 3 Chris Wilson 2018-05-02 12:16:19 UTC
Not at bad as it seems; the wait completed as intended. Just for whatever reason the wakeup took 1ns longer than we allowed for.
Comment 4 Chris Wilson 2018-05-02 12:16:37 UTC
(Ok, can be many more ns, as all beyond the limit become 0 ;)
Comment 5 Chris Wilson 2018-05-03 14:16:41 UTC
Warning be silenced:

commit f772d9a910130b3aec8efa4f09ed723618fae656 (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 2 12:11:26 2018 +0100

    igt/gem_wait: Relax assertion for wait completion
    
    When waiting for a finite batch, all that we require is that the batch
    completes. If it takes the full second (or longer) for us to wake up and
    notice the completed batch is immaterial, so only assert that we don't
    report an infinite timeout afterwards.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

We can still get a timer error if the spin_set_timeout() doesn't fire within 1s (target is 0.5s), but we no longer trigger a warning (as in this case) when we don't wake up within 1s (due to whatever scheduling latency) but have detected the completed batch (or we wouldn't wake up at all... except stray signals?)
Comment 6 Martin Peres 2018-05-22 20:39:22 UTC
(In reply to Chris Wilson from comment #5)
> Warning be silenced:
> 
> commit f772d9a910130b3aec8efa4f09ed723618fae656 (HEAD, upstream/master)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed May 2 12:11:26 2018 +0100
> 
>     igt/gem_wait: Relax assertion for wait completion
>     
>     When waiting for a finite batch, all that we require is that the batch
>     completes. If it takes the full second (or longer) for us to wake up and
>     notice the completed batch is immaterial, so only assert that we don't
>     report an infinite timeout afterwards.
>     
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> We can still get a timer error if the spin_set_timeout() doesn't fire within
> 1s (target is 0.5s), but we no longer trigger a warning (as in this case)
> when we don't wake up within 1s (due to whatever scheduling latency) but
> have detected the completed batch (or we wouldn't wake up at all... except
> stray signals?)

That makes sense, thanks!

However, is this delay something we would like to reduce for our users? I know that Linux is not an RTOS, but this 1s before realising a batch buffer has executed sound terrible, especially since it is something mesa would need.

What's your take on this?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.