Bug 109605 - [CI][SHARDS]: igt@gem_mmap_gtt@hang - incomplete/timeout - i915 0000:00:02.0: i915_reset_device timed out, cancelling all in-flight rendering.
Summary: [CI][SHARDS]: igt@gem_mmap_gtt@hang - incomplete/timeout - i915 0000:00:02.0...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-11 10:27 UTC by Lakshmi
Modified: 2019-03-06 16:21 UTC (History)
1 user (show)

See Also:
i915 platform: ALL
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lakshmi 2019-02-11 10:27:41 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5571/shard-iclb7/igt@gem_mmap_gtt@hang.html

<6> [84.707626] Console: switching to colour dummy device 80x25
<6> [84.707682] [IGT] gem_mmap_gtt: executing
<6> [84.718097] [IGT] gem_mmap_gtt: starting subtest hang
<5> [84.718559] Setting dangerous option reset - tainting kernel
<5> [84.718739] Setting dangerous option reset - tainting kernel
<6> [84.764533] i915 0000:00:02.0: GPU HANG: ecode 11:0:0x00000000, Manually set wedged engine mask = ffffffffffffffff
<6> [84.764649] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6> [84.764652] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6> [84.764655] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6> [84.764658] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6> [84.764661] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<7> [84.766048] [drm:i915_reset_device [i915]] resetting chip
<5> [84.766619] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.771953] [drm:i915_reset_device [i915]] resetting chip
<5> [84.772029] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.774732] [drm:i915_reset_device [i915]] resetting chip
<5> [84.774833] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.780093] [drm:i915_reset_device [i915]] resetting chip
<5> [84.780363] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.782978] [drm:i915_reset_device [i915]] resetting chip
<5> [84.783144] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.786491] [drm:i915_reset_device [i915]] resetting chip
<5> [84.786604] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.788766] [drm:i915_reset_device [i915]] resetting chip
<5> [84.788890] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.791159] [drm:i915_reset_device [i915]] resetting chip
<5> [84.791275] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.794766] [drm:i915_reset_device [i915]] resetting chip
<5> [84.794873] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.809412] [drm:i915_reset_device [i915]] resetting chip
<5> [84.809473] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.811395] [drm:i915_reset_device [i915]] resetting chip
<5> [84.811454] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.814156] [drm:i915_reset_device [i915]] resetting chip
<5> [84.814218] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.816245] [drm:i915_reset_device [i915]] resetting chip
<5> [84.816309] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.818233] [drm:i915_reset_device [i915]] resetting chip
<5> [84.818316] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.820917] [drm:i915_reset_device [i915]] resetting chip
<5> [84.821026] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.823321] [drm:i915_reset_device [i915]] resetting chip
<5> [84.823413] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.825417] [drm:i915_reset_device [i915]] resetting chip
<5> [84.825563] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.828815] [drm:i915_reset_device [i915]] resetting chip
<5> [84.828921] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.832666] [drm:i915_reset_device [i915]] resetting chip
<5> [84.832766] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.835362] [drm:i915_reset_device [i915]] resetting chip
<5> [84.835442] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.837547] [drm:i915_reset_device [i915]] resetting chip
<5> [84.837609] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.840669] [drm:i915_reset_device [i915]] resetting chip
<5> [84.840729] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.843368] [drm:i915_reset_device [i915]] resetting chip
<5> [84.843426] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.845970] [drm:i915_reset_device [i915]] resetting chip
<5> [84.846064] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.849473] [drm:i915_reset_device [i915]] resetting chip
<5> [84.849573] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.851589] [drm:i915_reset_device [i915]] resetting chip
<5> [84.851673] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.853874] [drm:i915_reset_device [i915]] resetting chip
<5> [84.853967] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.856862] [drm:i915_reset_device [i915]] resetting chip
<5> [84.856957] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.859521] [drm:i915_reset_device [i915]] resetting chip
<5> [84.859625] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.861684] [drm:i915_reset_device [i915]] resetting chip
<5> [84.861797] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.869536] [drm:i915_reset_device [i915]] resetting chip
<5> [84.869641] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.873063] [drm:i915_reset_device [i915]] resetting chip
<5> [84.873194] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.875375] [drm:i915_reset_device [i915]] resetting chip
<5> [84.875450] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.880653] [drm:i915_reset_device [i915]] resetting chip
<5> [84.880743] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [84.884095] [drm:i915_reset_device [i915]] resetting chip
<5> [84.884197] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
<7> [87.485265] [drm:edp_panel_vdd_off_sync [i915]] Turning eDP port A VDD off
<7> [87.485421] [drm:edp_panel_vdd_off_sync [i915]] PP_STATUS: 0x80000008 PP_CONTROL: 0x00000067
<7> [87.485449] [drm:intel_power_well_disable [i915]] disabling DC off
<7> [87.485477] [drm:skl_enable_dc6 [i915]] Enabling DC6
<7> [87.485502] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02
<3> [89.981354] i915 0000:00:02.0: i915_reset_device timed out, cancelling all in-flight rendering.
<3> [186.749106] INFO: task kworker/u16:0:7 blocked for more than 60 seconds.

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_213/fi-bwr-2160/igt@gem_mmap_gtt@hang.html

<3> [102.903976] i915 0000:00:02.0: i915_reset_device timed out, cancelling all in-flight rendering.
<3> [502.988114] [drm:intel_finish_reset [i915]] *ERROR* Restoring old state failed with -4
Comment 1 CI Bug Log 2019-02-11 10:29:02 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ALL machines: igt@gem_mmap_gtt@hang - incomplete/timeout -  i915 0000:00:02.0: i915_reset_device timed out, cancelling all in-flight rendering.
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5571/shard-iclb7/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_213/fi-blb-e6850/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_213/fi-bwr-2160/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_213/fi-cfl-8700k/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_213/fi-gdg-551/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_213/fi-kbl-r/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_213/fi-pnv-d510/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_213/fi-skl-6770hq/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_213/fi-whl-u/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_214/fi-blb-e6850/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_214/fi-bwr-2160/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_214/fi-cfl-8700k/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_214/fi-gdg-551/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_214/fi-hsw-peppy/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_214/fi-kbl-r/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_214/fi-skl-6700k2/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_214/fi-whl-u/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5578/shard-skl10/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_215/fi-blb-e6850/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_215/fi-bwr-2160/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_215/fi-cfl-8700k/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_215/fi-gdg-551/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_215/fi-kbl-r/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_215/fi-kbl-x1275/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_215/fi-pnv-d510/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_215/fi-skl-6700k2/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_215/fi-skl-iommu/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_215/fi-whl-u/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5582/shard-skl2/igt@gem_mmap_gtt@hang.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5583/shard-skl2/igt@gem_mmap_gtt@hang.html
Comment 2 Chris Wilson 2019-02-12 16:53:01 UTC
commit aeaaa55c7368ea0e7c195baa35dea37b806efb11
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 12 13:08:30 2019 +0000

    drm/i915: Recursive i915_reset_trylock() verboten
    
    We cannot nest i915_reset_trylock() as the inner may wait for the
    I915_RESET_BACKOFF which in turn is waiting upon sync_srcu who is
    waiting for our outermost lock. As we take the reset srcu around the
    fence update, we have to defer taking it in i915_gem_fault() until after
    we acquire the pin on the fence to avoid nesting. This is a little ugly,
    but still works. If a reset occurs between i915_vma_pin_fence() and the
    second reset lock, the reset will restore the fence register back to the
    pinned value before the reset lock allows us to proceed (our mmap won't
    be revoked as we haven't yet marked it as being a userfault as that
    requires us to hold the reset lock), so the pagefault is still
    serialised with the revocation in reset.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109605
    Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190212130831.14425-1-chris@chris-wilson.co.uk
Comment 3 Martin Peres 2019-03-06 16:21:46 UTC
(In reply to Chris Wilson from comment #2)
> commit aeaaa55c7368ea0e7c195baa35dea37b806efb11
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Feb 12 13:08:30 2019 +0000
> 
>     drm/i915: Recursive i915_reset_trylock() verboten
>     
>     We cannot nest i915_reset_trylock() as the inner may wait for the
>     I915_RESET_BACKOFF which in turn is waiting upon sync_srcu who is
>     waiting for our outermost lock. As we take the reset srcu around the
>     fence update, we have to defer taking it in i915_gem_fault() until after
>     we acquire the pin on the fence to avoid nesting. This is a little ugly,
>     but still works. If a reset occurs between i915_vma_pin_fence() and the
>     second reset lock, the reset will restore the fence register back to the
>     pinned value before the reset lock allows us to proceed (our mmap won't
>     be revoked as we haven't yet marked it as being a userfault as that
>     requires us to hold the reset lock), so the pagefault is still
>     serialised with the revocation in reset.
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109605
>     Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence
> registers across reset")
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20190212130831.14425-1-
> chris@chris-wilson.co.uk

10 runs without any issues, as opposed to multiple failures per run. Seems like it was the right fix! Thanks!
Comment 4 CI Bug Log 2019-03-06 16:21:57 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.