Bug 108888 - [CI][SHARDS] igt@gem_exec_fence@basic-await-default - fail - Failed assertion: out[n] == 0
Summary: [CI][SHARDS] igt@gem_exec_fence@basic-await-default - fail - Failed assertion...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-28 14:23 UTC by Martin Peres
Modified: 2019-02-19 08:47 UTC (History)
1 user (show)

See Also:
i915 platform: BYT, HSW, IVB
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-11-28 14:23:52 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5211/shard-hsw1/igt@gem_exec_fence@basic-await-default.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5210/shard-hsw6/igt@gem_exec_fence@basic-await-default.html

Starting subtest: basic-await-default
(gem_exec_fence:2990) CRITICAL: Test assertion failure function test_fence_await, file ../tests/i915/gem_exec_fence.c:403:
(gem_exec_fence:2990) CRITICAL: Failed assertion: out[n] == 0
(gem_exec_fence:2990) CRITICAL: error: 0x1 != 0
Subtest basic-await-default failed.
Comment 1 Chris Wilson 2018-12-03 12:28:57 UTC
commit f36c071f6344e0a335ed4b4e0b3a38c0dd54648b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Dec 3 11:36:56 2018 +0000

    drm/i915/ringbuffer: Clear semaphore sync registers on ring init
    
    Ensure that the sync registers are cleared every time we restart the
    ring to avoid stale values from creeping in from random neutrinos.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108888
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181203113701.12106-3-chris@chris-wilson.co.uk
Comment 2 Francesco Balestrieri 2018-12-28 08:39:33 UTC
According to CI Buglog, this still occurs:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_178/fi-hsw-4770r/igt@gem_exec_fence@basic-await-default.html

Reopening.
Comment 3 Chris Wilson 2018-12-28 14:53:53 UTC
Haswell is no longer the odd one out,

commit 6faf5916e6beb0dedb0fcbbafbaa152adeaea758
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Dec 28 14:07:35 2018 +0000

    drm/i915: Remove HW semaphores for gen7 inter-engine synchronisation
    
    The writing is on the wall for the existence of a single execution queue
    along each engine, and as a consequence we will not be able to track
    dependencies along the HW queue itself, i.e. we will not be able to use
    HW semaphores on gen7 as they use a global set of registers (and unlike
    gen8+ we can not effectively target memory to keep per-context seqno and
    dependencies).
    
    On the positive side, when we implement request reordering for gen7 we
    also can not presume a simple execution queue and would also require
    removing the current semaphore generation code. So this bring us another
    step closer to request reordering for ringbuffer submission!
    
    The negative side is that using interrupts to drive inter-engine
    synchronisation is much slower (4us -> 15us to do a nop on each of the 3
    engines on ivb). This is much better than it was at the time of introducing
    the HW semaphores and equally important userspace weaned itself off
    intermixing dependent BLT/RENDER operations (the prime culprit was glyph
    rendering in UXA). So while we regress the microbenchmarks, it should not
    impact the user.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=108888
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-2-chris@chris-wilson.co.uk
Comment 4 CI Bug Log 2018-12-31 09:57:56 UTC
A CI Bug Log filter associated to this bug has been updated:

{- HSW: igt@gem_exec_fence@basic-await-default - fail - Failed assertion: out[n] == 0 -}
{+ BYT IVB IVBm HSW: igt@gem_exec_fence@basic-await-default - fail - Failed assertion: out[n] == 0 +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_169/fi-byt-clapper/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_170/fi-byt-clapper/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_170/fi-byt-j1900/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_170/fi-byt-n2820/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_170/fi-ivb-3520m/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_170/fi-ivb-3770/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_171/fi-byt-clapper/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_171/fi-byt-j1900/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_171/fi-byt-n2820/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_171/fi-ivb-3520m/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_171/fi-ivb-3770/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_172/fi-byt-clapper/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_178/fi-byt-j1900/igt@gem_exec_fence@basic-await-default.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_178/fi-ivb-3520m/igt@gem_exec_fence@basic-await-default.html
Comment 5 Martin Peres 2018-12-31 10:00:36 UTC
(In reply to Chris Wilson from comment #3)
> Haswell is no longer the odd one out,
> 
> commit 6faf5916e6beb0dedb0fcbbafbaa152adeaea758
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Dec 28 14:07:35 2018 +0000
> 
>     drm/i915: Remove HW semaphores for gen7 inter-engine synchronisation
>     
>     The writing is on the wall for the existence of a single execution queue
>     along each engine, and as a consequence we will not be able to track
>     dependencies along the HW queue itself, i.e. we will not be able to use
>     HW semaphores on gen7 as they use a global set of registers (and unlike
>     gen8+ we can not effectively target memory to keep per-context seqno and
>     dependencies).
>     
>     On the positive side, when we implement request reordering for gen7 we
>     also can not presume a simple execution queue and would also require
>     removing the current semaphore generation code. So this bring us another
>     step closer to request reordering for ringbuffer submission!
>     
>     The negative side is that using interrupts to drive inter-engine
>     synchronisation is much slower (4us -> 15us to do a nop on each of the 3
>     engines on ivb). This is much better than it was at the time of
> introducing
>     the HW semaphores and equally important userspace weaned itself off
>     intermixing dependent BLT/RENDER operations (the prime culprit was glyph
>     rendering in UXA). So while we regress the microbenchmarks, it should not
>     impact the user.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=108888
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-2-
> chris@chris-wilson.co.uk

Thanks! Let's hope it fixes the IVB and BYT issues too (which are also gen7).

So far, it's been only 2 drmtip runs since this patch landed but the reproduction rate was quite sporadic, so I guess we'll have to be a little patient.
Comment 6 Martin Peres 2019-01-04 15:34:02 UTC
(In reply to Martin Peres from comment #5)
> Thanks! Let's hope it fixes the IVB and BYT issues too (which are also gen7).
> 
> So far, it's been only 2 drmtip runs since this patch landed but the
> reproduction rate was quite sporadic, so I guess we'll have to be a little
> patient.

Seems like we were too optimistic: 

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_178/fi-byt-j1900/igt@gem_exec_fence@basic-await-default.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_178/fi-ivb-3520m/igt@gem_exec_fence@basic-await-default.html
Comment 7 Chris Wilson 2019-01-11 19:10:14 UTC
(In reply to Martin Peres from comment #6)
> (In reply to Martin Peres from comment #5)
> > Thanks! Let's hope it fixes the IVB and BYT issues too (which are also gen7).
> > 
> > So far, it's been only 2 drmtip runs since this patch landed but the
> > reproduction rate was quite sporadic, so I guess we'll have to be a little
> > patient.
> 
> Seems like we were too optimistic: 
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_178/fi-byt-j1900/
> igt@gem_exec_fence@basic-await-default.html
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_178/fi-ivb-3520m/
> igt@gem_exec_fence@basic-await-default.html

Nah, don't be confused by a real GPU hang! Which definitely didn't happen and has nothing at all to do with full-ppgtt. Nope, definitely not that at all.
Comment 8 Lakshmi 2019-02-19 08:35:35 UTC
Last seen drmtip_178 (1 month, 3 weeks / 1032 runs ago).
I assume this issue has been fixed, changing the status to Closed.
Comment 9 CI Bug Log 2019-02-19 08:47:01 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.