Created attachment 114141 [details] dmesg ==System Environment== -------------------------- Regression: not sure, it always has gpu hang issue on SKL, reported bug 89037 no-working platforms: SKL ==kernel== -------------------------- drm-intel-nightly/c2f6e584b215dd0d7e5a8a02716a89e985366ec0 commit c2f6e584b215dd0d7e5a8a02716a89e985366ec0 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri Mar 6 18:30:27 2015 +0100 drm-intel-nightly: 2015y-03m-06d-17h-29m-53s UTC integration manifest ==Bug detailed description== ----------------------------- Run this case 20 cycles, it causes GPU hang. root@x-skly05:/home/lh# dmesg -r|egrep "<[1-4]>"|grep drm <3>[ 46.724398] [drm:i915_hangcheck_elapsed [i915]] *ERROR* Hangcheck timer elapsed... blitter ring idle <3>[ 48.589274] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun Reproduce steps: ------------------------- 1. xinit 2. bin/copyteximage 1D_ARRAY -samples=2 -auto
Created attachment 114142 [details] output no error state collected in /sys/kernel/debug/dri/0/i915_error_state
Neil reports that hangcheck is triggering "from time to time", not just on this piglit test. No better characterization yet.
This hangcheck complaint is due to race between request->list addition and hangcheck inspecting that list. This is not real hang and and could be not even a missed interrupt. Very likely that hangcheck seeing ring being idle even if its not (due to that race).
Please try kernel v4.4.
Known to be still broken in -nightly.
commit 7c17d377374ddbcfb7873366559fc4ed8b296e11 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jan 20 15:43:35 2016 +0200 drm/i915: Use ordered seqno write interrupt generation on gen8+ execlists Broadwell and later currently use the same unordered command sequence to update the seqno in the HWS status page and then assert the user interrupt. We should apply the w/a from legacy (where we do an mmio read to delay the seqno read after the interrupt), but this is not enough to enforce coherent seqno visibilty on Skylake. Rather than search for the proper post-interrupt seqno barrier, use a strongly ordered command sequence to write the seqno, then assert the user interrupt from the ring. v2: Move around the wa tail dwords to avoid adding duplicate code. v3: Add references, comments on workarounds and bit5 check. References: https://bugs.freedesktop.org/show_bug.cgi?id=93693 Testcase: igt/gem_ring_sync_loop #skl Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1453297415-17793-1-git-send-email-mika.kuoppala@intel.com
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.