Bug 106125 - [CI] igt@gem_wait@wait-default - fail - Failed assertion: wait.timeout_ns > 0
Summary: [CI] igt@gem_wait@wait-default - fail - Failed assertion: wait.timeout_ns > 0
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Francesco Balestrieri
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-18 10:59 UTC by Martin Peres
Modified: 2018-05-22 20:39 UTC (History)
1 user (show)

See Also:
i915 platform: ALL
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-04-18 10:59:27 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4062/shard-apl3/igt@gem_wait@wait-default.html

(gem_wait:1355) CRITICAL: Test assertion failure function basic, file ../tests/gem_wait.c:116:
(gem_wait:1355) CRITICAL: Failed assertion: wait.timeout_ns > 0
(gem_wait:1355) CRITICAL: Last errno: 62, Timer expired
Subtest wait-default failed.
Comment 1 Martin Peres 2018-04-20 12:27:01 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_23/fi-ivb-3520m/igt@gem_wait@wait-bsd.html

(gem_wait:2916) CRITICAL: Test assertion failure function basic, file ../tests/gem_wait.c:116:
(gem_wait:2916) CRITICAL: Failed assertion: wait.timeout_ns > 0
(gem_wait:2916) CRITICAL: Last errno: 62, Timer expired
Subtest wait-bsd failed.
Comment 2 Martin Peres 2018-05-02 12:12:00 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4119/fi-bxt-j4205/igt@gem_wait@basic-wait-all.html

(gem_wait:3555) CRITICAL: Test assertion failure function basic, file ../tests/gem_wait.c:116:
(gem_wait:3555) CRITICAL: Failed assertion: wait.timeout_ns > 0
(gem_wait:3555) CRITICAL: Last errno: 62, Timer expired
Subtest basic-wait-all failed.
Comment 3 Chris Wilson 2018-05-02 12:16:19 UTC
Not at bad as it seems; the wait completed as intended. Just for whatever reason the wakeup took 1ns longer than we allowed for.
Comment 4 Chris Wilson 2018-05-02 12:16:37 UTC
(Ok, can be many more ns, as all beyond the limit become 0 ;)
Comment 5 Chris Wilson 2018-05-03 14:16:41 UTC
Warning be silenced:

commit f772d9a910130b3aec8efa4f09ed723618fae656 (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 2 12:11:26 2018 +0100

    igt/gem_wait: Relax assertion for wait completion
    
    When waiting for a finite batch, all that we require is that the batch
    completes. If it takes the full second (or longer) for us to wake up and
    notice the completed batch is immaterial, so only assert that we don't
    report an infinite timeout afterwards.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

We can still get a timer error if the spin_set_timeout() doesn't fire within 1s (target is 0.5s), but we no longer trigger a warning (as in this case) when we don't wake up within 1s (due to whatever scheduling latency) but have detected the completed batch (or we wouldn't wake up at all... except stray signals?)
Comment 6 Martin Peres 2018-05-22 20:39:22 UTC
(In reply to Chris Wilson from comment #5)
> Warning be silenced:
> 
> commit f772d9a910130b3aec8efa4f09ed723618fae656 (HEAD, upstream/master)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed May 2 12:11:26 2018 +0100
> 
>     igt/gem_wait: Relax assertion for wait completion
>     
>     When waiting for a finite batch, all that we require is that the batch
>     completes. If it takes the full second (or longer) for us to wake up and
>     notice the completed batch is immaterial, so only assert that we don't
>     report an infinite timeout afterwards.
>     
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> We can still get a timer error if the spin_set_timeout() doesn't fire within
> 1s (target is 0.5s), but we no longer trigger a warning (as in this case)
> when we don't wake up within 1s (due to whatever scheduling latency) but
> have detected the completed batch (or we wouldn't wake up at all... except
> stray signals?)

That makes sense, thanks!

However, is this delay something we would like to reduce for our users? I know that Linux is not an RTOS, but this 1s before realising a batch buffer has executed sound terrible, especially since it is something mesa would need.

What's your take on this?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.