Summary: | [CI][SHARDS] igt@gem_eio@reset-stress - fail - Failed assertion: elapsed < 250e6 | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | chris, intel-gfx-bugs |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | HSW | i915 features: | GEM/Other |
Description
Martin Peres
2018-08-06 14:46:10 UTC
It is an interesting bug. To all appearances the HW doesn't generate an interrupt if we execute a MI_USER_INTERRUPT shortly after the GPU reset. A partially successful workaround was: commit 39f3be162c46bc2349ad7a5bd89536eb83561c81 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jul 30 08:53:50 2018 +0100 drm/i915: Kick waiters on resetting legacy rings still the window after the kick and before interrupts are being received. A long shot (one that I've tried earlier with no success) was commit a6476ebd4350d51146ef0492b4b06bc0d31e8827 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Aug 6 15:56:47 2018 +0100 drm/i915: Stop dropping irq around resets A long time ago, we were afraid of handling interrupts and signaling waiters during a reset, worrying that the confusion in request handling would interfere with our attempts to process the reset in an orderly fashion. Since then, we have isolated our irq-driven request handling by virtue of the engine->timeline.lock and control of kthreads where required, eliminating the danger of concurrently processing interrupts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180806145647.13131-1-chris@chris-wilson.co.uk but at least that confirms that it's not a shadow caused by the disabling of irq across reset. Fresh ideas required. commit a4a717010f4e8cacaa3f0cae8a22f25c39ae1d41 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Aug 8 11:51:00 2018 +0100 drm/i915: Unmask user interrupts writes into HWSP on snb/ivb/vlv/hsw An oddity occurs on Sandybridge, Ivybridge and Haswell (and presumably Valleyview) in that for the period following the GPU restart after a reset, there are no GT interrupts received. From Ville's notes, bit 0 in the HWSTAM corresponds to the render interrupt, and if we unmask it we do see immediate resumption of GT interrupt delivery (via the master irq handler) after the reset. v2: Limit the w/a to the render interrupt from rcs Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107500 Fixes: c5498089463b ("drm/i915: Mask everything in ring HWSTAM on gen6+ in ringbuffer mode") References: d420a50c21ef ("drm/i915: Clean up the HWSTAM mess") Testcase: igt/gem_eio/reset-stress Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> This issue is resolved/fixed. Last seen 1 month 3 weeks ago, until then this failure appears in every round. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.