Bug 111606

Summary: [CI][RESUME] igt@gem_exec_schedule@preempt-* - incomplete
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: not set    
Priority: high CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: TGL i915 features: GEM/Other

Description Martin Peres 2019-09-09 10:20:38 UTC

    
Comment 1 Martin Peres 2019-09-09 10:21:23 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-other-blt.html

Starting subtest: preempt-other-blt
Received signal SIGQUIT.
Stack trace: 
Received signal SIGQUIT.
Stack trace: 

<6> [39.484592] Console: switching to colour dummy device 80x25
<6> [39.484633] [IGT] gem_exec_schedule: executing
<5> [39.490357] Setting dangerous option reset - tainting kernel
<5> [39.496881] Setting dangerous option reset - tainting kernel
<5> [39.497197] Setting dangerous option reset - tainting kernel
<5> [39.502267] Setting dangerous option reset - tainting kernel
<5> [39.502456] Setting dangerous option reset - tainting kernel
<5> [39.507587] Setting dangerous option reset - tainting kernel
<5> [39.507787] Setting dangerous option reset - tainting kernel
<6> [39.510678] [IGT] gem_exec_schedule: starting subtest preempt-other-blt
Comment 2 CI Bug Log 2019-09-09 10:21:37 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* TGL: igt@gem_exec_schedule@preempt-* - incomplete
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-other-blt.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-bsd2.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-contexts-bsd1.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-chain-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-contexts-chain-vebox.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-contexts-bsd2.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-contexts-chain-blt.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-chain-blt.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-chain-bsd1.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-chain-render.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-bsd2.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-contexts-blt.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_exec_schedule@preempt-queue-contexts-render.html
Comment 3 Chris Wilson 2019-09-09 10:25:30 UTC
There's no information there as to how long it ran before the SIGQUIT.
Comment 4 Matthew Brost 2019-09-17 01:40:31 UTC
Took a quick look. This test is inserting a spin batch at NOISE level and write batch at a LOW level on the same engine. On a different engine a write batch is inserted at a HIGH level. This write should boost the LOW write batch past the NOISE spin batch and preempt it. Once both writes complete the spin batch is released and the test should complete. Since the test is killed by a SIGQUIT, I'm guessing the spin batch is never preempted and the test just hangs. I'd suggest adding some tracing to the driver to see if LOW write batch priority is bumped by the HIGH write batch and if the preemption code path is triggered.

I'm specifically referring to preempt-other-blt.

It looks like other preemption sections are also failing, specifically preempt-queue-* sections. These IMO are more complicated as they contain more exec buf IOCTLs than preempt-other-blt so I'd focus on debugging preempt-other-blt. All the failures are likely the same issue so if preempt-other-blt gets fixed, I'd imagine all the other failing sections will also get fixed.

I need to start running some of the new GuC code I've been developing on TGL later this week so I plan on grabbing a RIL TGL system. Once I have a TGL system, I'll see if I can recreate this failure and dig into it a bit.
Comment 5 Chris Wilson 2019-09-17 16:36:02 UTC
commit c210e85b8f3371168ce78c8da00b913839a84ec7 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 17 13:30:55 2019 +0100

    drm/i915/tgl: Extend MI_SEMAPHORE_WAIT
    
    On Tigerlake, MI_SEMAPHORE_WAIT grew an extra dword, so be sure to
    update the length field and emit that extra parameter and any padding
    noop as required.
    
    v2: Define the token shift while we are adding the updated MI_SEMAPHORE_WAIT
    v3: Use int instead of bool in the addition so that readers are not left
    wondering about the intricacies of the C spec. Now they just have to
    worry what the integer value of a boolean operation is...
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Cc: Michal Winiarski <michal.winiarski@intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190917123055.28965-1-chris@chris-wilson.co.uk

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.