Bug 110512

Summary: [CI][BAT][guc] igt@gem_exec_suspend@basic-s3 - dmesg-warn - Failed to idle engines, declaring wedged!
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED WONTFIX QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs, lakshminarayana.vudum, tvrtko.ursulin
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BXT i915 features: firmware/guc

Description Martin Peres 2019-04-25 12:07:08 UTC
igt@gem_exec_suspend@basic-s3 - dmesg-warn - Failed to idle engines, declaring wedged!

<3> [98.743091] i915 0000:00:02.0: Failed to idle engines, declaring wedged!
Comment 1 CI Bug Log 2019-04-25 12:07:21 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* fi-apl-guc: igt@gem_exec_suspend@basic-s3 - dmesg-warn - Failed to idle engines, declaring wedged!
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4179/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4183/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4200/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4202/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12866/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4206/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12867/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12869/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4207/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12870/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5997/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2916/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5998/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5999/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12868/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4191/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2917/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4192/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5996/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4194/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4174/fi-apl-guc/igt@gem_exec_suspend@basic-s3.html
Comment 2 Martin Peres 2019-04-25 12:10:45 UTC
The bug came with: drm/i915: Allow multiple user handles to the same VM (https://patchwork.freedesktop.org/series/59913/)

Assigning Chris and Cc:ing Tvrtko.
Comment 3 Chris Wilson 2019-04-25 12:13:19 UTC
The trace looks quite clean. We queue a request on engine to each to switch to the kernel context on resume, and vecs0 persistently refuses to start. I wondered if it was a timing issue or an ordering issue in the guc resume paths, nothing has leapt out. I hear though that this firmware is being phased out...
Comment 4 Chris Wilson 2019-04-25 12:14:12 UTC
(In reply to Martin Peres from comment #2)
> The bug came with: drm/i915: Allow multiple user handles to the same VM
> (https://patchwork.freedesktop.org/series/59913/)
> 
> Assigning Chris and Cc:ing Tvrtko.

No, it's "drm/i915: Invert the GEM wakeref hierarchy", but I decided it was a real issue with the fw given historical precedence.
Comment 5 Tvrtko Ursulin 2019-04-25 12:28:41 UTC
vecs has RING_START: 0x00000000, shouldn't it read back a valid address if we wrote one in?
Comment 6 Chris Wilson 2019-04-25 12:29:58 UTC
(In reply to Tvrtko Ursulin from comment #5)
> vecs has RING_START: 0x00000000, shouldn't it read back a valid address if
> we wrote one in?

s/we/guc/.

Yes.
Comment 7 Chris Wilson 2019-04-25 12:31:16 UTC
(In reply to Chris Wilson from comment #6)
> (In reply to Tvrtko Ursulin from comment #5)
> > vecs has RING_START: 0x00000000, shouldn't it read back a valid address if
> > we wrote one in?
> 
> s/we/guc/.
> 
> Yes.

More precisely: if the context-restore has occurred.
Comment 8 CI Bug Log 2019-04-27 19:10:18 UTC
A CI Bug Log filter associated to this bug has been updated:

{- fi-apl-guc: igt@gem_exec_suspend@basic-s3 - dmesg-warn - Failed to idle engines, declaring wedged! -}
{+ fi-apl-guc: igt@gem_exec_suspend@basic-s3|igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes - dmesg-warn - Failed to idle engines, declaring wedged! +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_265/fi-apl-guc/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html
Comment 9 CI Bug Log 2019-04-27 19:25:11 UTC
A CI Bug Log filter associated to this bug has been updated:

{- fi-apl-guc: igt@gem_exec_suspend@basic-s3|igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes - dmesg-warn - Failed to idle engines, declaring wedged! -}
{+ fi-apl-guc: all tests - dmesg-warn - Failed to idle engines, declaring wedged! +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_265/fi-apl-guc/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_265/fi-apl-guc/igt@i915_suspend@sysfs-reader.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_265/fi-apl-guc/igt@gem_eio@suspend.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_265/fi-apl-guc/igt@i915_suspend@forcewake.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_266/fi-apl-guc/igt@kms_vblank@pipe-b-ts-continuation-suspend.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_266/fi-apl-guc/igt@kms_frontbuffer_tracking@fbc-suspend.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_266/fi-apl-guc/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_266/fi-apl-guc/igt@kms_vblank@pipe-a-ts-continuation-suspend.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_266/fi-apl-guc/igt@i915_suspend@fence-restore-untiled.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_266/fi-apl-guc/igt@kms_flip@flip-vs-suspend-interruptible.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_266/fi-apl-guc/igt@kms_vblank@pipe-c-ts-continuation-dpms-suspend.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_266/fi-apl-guc/igt@gem_exec_suspend@basic-s4-devices.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-apl-guc/igt@kms_vblank@pipe-a-ts-continuation-dpms-suspend.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-apl-guc/igt@kms_frontbuffer_tracking@fbc-suspend.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-apl-guc/igt@i915_suspend@fence-restore-untiled.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-apl-guc/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-apl-guc/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-apl-guc/igt@kms_vblank@pipe-a-ts-continuation-suspend.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-apl-guc/igt@kms_cursor_crc@cursor-128x128-suspend.html
Comment 10 Chris Wilson 2019-04-27 19:39:16 UTC
*** Bug 110537 has been marked as a duplicate of this bug. ***
Comment 11 CI Bug Log 2019-04-30 06:37:31 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* fi-apl-guc: all tests - fail - Failed assertion: !&quot;GPU hung&quot;
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_268/fi-apl-guc/igt@gem_exec_schedule@preempt-other-blt.html
Comment 12 Lakshmi 2019-04-30 06:38:04 UTC
(In reply to Chris Wilson from comment #10)
> *** Bug 110537 has been marked as a duplicate of this bug. ***

Another instance 
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_268/fi-apl-guc/igt@gem_exec_schedule@preempt-other-blt.html
Comment 13 Chris Wilson 2019-05-06 12:52:40 UTC
*** Bug 110620 has been marked as a duplicate of this bug. ***
Comment 14 Chris Wilson 2019-05-06 17:39:08 UTC
*** Bug 110623 has been marked as a duplicate of this bug. ***
Comment 15 CI Bug Log 2019-05-07 07:22:16 UTC
A CI Bug Log filter associated to this bug has been updated:

{- fi-apl-guc: all tests - fail - Failed assertion: !&quot;GPU hung&quot; -}
{+ fi-apl-guc: all tests - fail - Failed assertion: !&quot;GPU hung&quot; +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_275/fi-apl-guc/igt@gem_exec_store@cachelines-render.html
Comment 16 CI Bug Log 2019-05-09 08:34:27 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* fi-apl-guc: igt@gem_exec_capture@capture-vebox - dmesg-fail -  Failed assertion: gem_bo_busy(fd, obj[SCRATCH].handle) , GPU HANG
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_281/fi-apl-guc/igt@gem_exec_capture@capture-vebox.html
Comment 17 Chris Wilson 2019-05-28 11:16:59 UTC
commit a2904ade3dc28cf1a1b7deded41f4369f75e664c
Author: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date:   Mon May 27 18:35:58 2019 +0000

    drm/i915/guc: Don't allow GuC submission
    
    Due to the upcoming changes to the GuC ABI interface, we must
    disable GuC submission mode until final ABI will be available
    on all GuC firmwares.
    
    To avoid regressions on systems configured to run with no longer
    supported configuration "enable_guc=3" or "enable_guc=1" clear
    GuC submission bit.
    
    v2: force switch to non-GuC submission mode
    v3: use GEM_BUG_ON (Joonas)
    
    Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Cc: John Spotswood <john.a.spotswood@intel.com>
    Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
    Cc: Tony Ye <tony.ye@intel.com>
    Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
    Cc: Jeff Mcgee <jeff.mcgee@intel.com>
    Cc: Antonio Argenziano <antonio.argenziano@intel.com>
    Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
    Cc: Martin Peres <martin.peres@linux.intel.com>
    Acked-by: Martin Peres <martin.peres@linux.intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190527183613.17076-3-michal.wajdeczko@intel.com

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.