Summary: | [gen9] Hang recovery fails for atomic+textureBuffer hang | ||
---|---|---|---|
Product: | DRI | Reporter: | Jason Ekstrand <jason> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs, monsterovich |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | Triaged | ||
i915 platform: | CFL, KBL, SKL | i915 features: | GPU hang |
Description
Jason Ekstrand
2019-06-25 18:50:35 UTC
Looks like another situation where asking for a GPU reset turns into a request for a power cycle. It doesn't make any difference whether we ask for a full reset or rcs0 reset; with or without rings disabled; it dies a few milliseconds after asserting the GDRST. At the moment, the only way out is not to fall in. Fwiw, the GPU hang does not present itself on bxt. Just an issue in skl and its derivatives, already fixed in icl. Disable atomics in L3; no hang (kbl). diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/g t/intel_workarounds.c index 704ace01e7f5..890a3bcfacea 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -667,6 +667,10 @@ gen9_gt_workarounds_init(struct drm_i915_private *i915, stru ct i915_wa_list *wal MMCD_PCLA | MMCD_HOTSPOT_EN); } + wa_write_masked_or(wal, _MMIO(0xb008), BIT(0), 0); + wa_write_masked_or(wal, _MMIO(0xb118), BIT(22), 0); + wa_write_masked_or(wal, _MMIO(0xb11c), BIT(8), 0); + /* WaDisableHDCInvalidation:skl,bxt,kbl,cfl */ wa_write_or(wal, GAM_ECOCHK, I'm aware that disabling L3$ for atomics fixes the hang (though I hadn't realized you could disable L3$ for just atomics that easily). This bug is about the fact that the kernel fails to recover. (In reply to Jason Ekstrand from comment #5) > I'm aware that disabling L3$ for atomics fixes the hang (though I hadn't > realized you could disable L3$ for just atomics that easily). This bug is > about the fact that the kernel fails to recover. The HW dies. Courtesy of Jason, https://patchwork.freedesktop.org/series/64920/ Hi Jason, Chris, I tested this patch with one more game https://github.com/doitsujin/dxvk/issues/794 And it also fixed the hang. Is there any reasons not to push this patch? (In reply to Denis from comment #8) > Hi Jason, Chris, I tested this patch with one more game > https://github.com/doitsujin/dxvk/issues/794 > And it also fixed the hang. Is there any reasons not to push this patch? We were waiting for confirmation that it helped UE4 and not just piglit. Thanks. Hang and subsequent death avoided by commit 9d7b01e93526efe79dbf75b69cc5972b5a4f7b37 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Sep 4 11:07:07 2019 +0100 drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+ This bit was fliped on for "syncing dependencies between camera and graphics". BSpec has no recollection why, and it is causing unrecoverable GPU hangs with Vulkan compute workloads. From BSpec, setting bit5 to 0 enables relaxed padding requirements for buffers, 1D and 2D non-array, non-MSAA, non-mip-mapped linear surfaces; and *must* be set to 0h on skl+ to ensure "Out of Bounds" case is suppressed. Reported-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110998 Fixes: 8424171e135c ("drm/i915/gen9: h/w w/a: syncing dependencies between camera and graphics") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: denys.kostin@globallogic.com Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.1+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190904100707.7377-1-chris@chris-wilson.co.uk Solves the immediate test case. *** Bug 109020 has been marked as a duplicate of this bug. *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.