Bug 110367 - [CI][DRMTIP] igt@gem_exec_schedule@semaphore-user - warn - Failed assertion: !"GPU hung"
Summary: [CI][DRMTIP] igt@gem_exec_schedule@semaphore-user - warn - Failed assertion: ...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-09 09:30 UTC by Lakshmi
Modified: 2019-04-17 14:15 UTC (History)
1 user (show)

See Also:
i915 platform: BDW, BSW/CHT
i915 features: GPU hang


Attachments

Description Lakshmi 2019-04-09 09:30:48 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_249/fi-bdw-5557u/igt@gem_exec_schedule@semaphore-user.html

Starting subtest: semaphore-user
Subtest semaphore-user: SUCCESS (6.436s)
(gem_exec_schedule:1226) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:500:
(gem_exec_schedule:1226) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
Test gem_exec_schedule failed.
**** DEBUG ****
(gem_exec_schedule:1226) DEBUG: Test requirement passed: gem_scheduler_has_semaphores(i915)
(gem_exec_schedule:1226) drmtest-DEBUG: Test requirement passed: is_i915_device(fd) && has_known_intel_chipset(fd)
(gem_exec_schedule:1226) igt_debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(gem_exec_schedule:1226) ioctl_wrappers-DEBUG: Test requirement passed: dir >= 0
(gem_exec_schedule:1226) ioctl_wrappers-DEBUG: Test requirement passed: err == 0
(gem_exec_schedule:1226) ioctl_wrappers-DEBUG: Test requirement passed: gem_has_ring(fd, ring)
(gem_exec_schedule:1226) igt_dummyload-DEBUG: Test requirement passed: nengine
(gem_exec_schedule:1226) DEBUG: Test requirement passed: spin
(gem_exec_schedule:1226) igt_core-INFO: Subtest semaphore-user: SUCCESS (6.436s)
(gem_exec_schedule:1226) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:500:
(gem_exec_schedule:1226) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
(gem_exec_schedule:1226) igt_core-INFO: Stack trace:
(gem_exec_schedule:1226) igt_core-INFO:   #0 ../lib/igt_core.c:1474 __igt_fail_assert()
(gem_exec_schedule:1226) igt_core-INFO:   #1 ../lib/igt_aux.c:504 igt_fork_hang_detector()
(gem_exec_schedule:1226) igt_core-INFO:   #2 [killpg+0x40]
(gem_exec_schedule:1226) igt_core-INFO:   #3 ../sysdeps/unix/syscall-template.S:78 ioctl()
(gem_exec_schedule:1226) igt_core-INFO:   #4 /home/cidrm/libdrm/xf86drm.c:191 drmIoctl()
(gem_exec_schedule:1226) igt_core-INFO:   #5 ../lib/ioctl_wrappers.c:591 __gem_execbuf()
(gem_exec_schedule:1226) igt_core-INFO:   #6 ../lib/ioctl_wrappers.c:1271 gem_has_ring()
(gem_exec_schedule:1226) igt_core-INFO:   #7 ../lib/igt_gt.c:652 gem_ring_has_physical_engine()
(gem_exec_schedule:1226) igt_core-INFO:   #8 ../tests/i915/gem_exec_schedule.c:1406 __real_main1349()
(gem_exec_schedule:1226) igt_core-INFO:   #9 ../tests/i915/gem_exec_schedule.c:1349 main()
(gem_exec_schedule:1226) igt_core-INFO:   #10 ../csu/libc-start.c:344 __libc_start_main()
(gem_exec_schedule:1226) igt_core-INFO:   #11 [_start+0x2a]
Comment 1 CI Bug Log 2019-04-09 09:32:46 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* BDW igt@gem_exec_schedule@semaphore-user - warn - Failed assertion: !"GPU hung"
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_249/fi-bdw-5557u/igt@gem_exec_schedule@semaphore-user.html
Comment 2 Chris Wilson 2019-04-09 09:34:16 UTC
commit bac24f59f45419a3853af2f58130cb82b7bdca64
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Mar 29 13:40:24 2019 +0000

    drm/i915/execlists: Enable coarse preemption boundaries for gen8
    
    When we introduced preemption, we chose to keep it disabled for gen8 as
    supporting preemption inside GPGPU user batches required various w/a in
    userspace. Since then, the desire to preempt long queues of requests
    between batches (e.g. within busywaiting semaphores) has grown. So allow
    arbitration within the busywaits and between requests, but disable
    arbitration within user batches so that we can preempt between requests
    and not risk breaking GPGPU.
    
    However, since this preemption is much coarser and doesn't interfere
    with userspace, we decline to include it amongst the scheduler
    capabilities. (This is also required for us to skip over the preemption
    selftests that expect to be able to preempt user batches.)
    
    Michal suggested that we could perhaps allow preemption inside gen8
    userspace batches if we can satisfy ourselves that the default
    preemption settings are viable with existing userspace (principally
    OpenCL which already should carry any known workaround). We could then
    merge the two code paths back into one, even dropping the artifical
    has-preemption device feature flag.
    
    Testcase: igt/gem_exec_scheduler/semaphore-user
    References: beecec901790 ("drm/i915/execlists: Preemption!")
    Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Michal Winiarski <michal.winiarski@intel.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> #irc
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190329134024.5254-1-chris@chris-wilson.co.uk
Comment 3 CI Bug Log 2019-04-09 10:16:59 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BDW igt@gem_exec_schedule@semaphore-user - warn - Failed assertion: !&quot;GPU hung&quot; -}
{+ BSW BDW igt@gem_exec_schedule@semaphore-user - warn/fail - Failed assertion: !&quot;GPU hung&quot; +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_249/fi-bsw-n3050/igt@gem_exec_schedule@semaphore-user.html
Comment 4 Lakshmi 2019-04-09 10:18:03 UTC
(In reply to CI Bug Log from comment #3)
> A CI Bug Log filter associated to this bug has been updated:
> 
> {- BDW igt@gem_exec_schedule@semaphore-user - warn - Failed assertion:
> !&quot;GPU hung&quot; -}
> {+ BSW BDW igt@gem_exec_schedule@semaphore-user - warn/fail - Failed
> assertion: !&quot;GPU hung&quot; +}
> 
> New failures caught by the filter:
> 
> *
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_249/fi-bsw-n3050/
> igt@gem_exec_schedule@semaphore-user.html

Also seen on BSW.
Comment 5 Martin Peres 2019-04-17 14:15:04 UTC
(In reply to Chris Wilson from comment #2)
> commit bac24f59f45419a3853af2f58130cb82b7bdca64
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Mar 29 13:40:24 2019 +0000
> 
>     drm/i915/execlists: Enable coarse preemption boundaries for gen8
>     
>     When we introduced preemption, we chose to keep it disabled for gen8 as
>     supporting preemption inside GPGPU user batches required various w/a in
>     userspace. Since then, the desire to preempt long queues of requests
>     between batches (e.g. within busywaiting semaphores) has grown. So allow
>     arbitration within the busywaits and between requests, but disable
>     arbitration within user batches so that we can preempt between requests
>     and not risk breaking GPGPU.
>     
>     However, since this preemption is much coarser and doesn't interfere
>     with userspace, we decline to include it amongst the scheduler
>     capabilities. (This is also required for us to skip over the preemption
>     selftests that expect to be able to preempt user batches.)
>     
>     Michal suggested that we could perhaps allow preemption inside gen8
>     userspace batches if we can satisfy ourselves that the default
>     preemption settings are viable with existing userspace (principally
>     OpenCL which already should carry any known workaround). We could then
>     merge the two code paths back into one, even dropping the artifical
>     has-preemption device feature flag.
>     
>     Testcase: igt/gem_exec_scheduler/semaphore-user
>     References: beecec901790 ("drm/i915/execlists: Preemption!")
>     Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine
> synchronisation on gen8+")
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Michal Winiarski <michal.winiarski@intel.com>
>     Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> #irc
>     Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20190329134024.5254-1-
> chris@chris-wilson.co.uk

Thanks, the bug was seen twice on one drmtip run, and not for the following 6 runs. This means we are over the 10x rule and can consider this fixed. Thanks!
Comment 6 CI Bug Log 2019-04-17 14:15:11 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.