Created attachment 92828 [details] dmesg System Environment: -------------------------- Platform: Sandybridge Kernel(drm-intel-nightly)f27f16540be56813df2ebb8e1106dd5c258f07c3 Bug detailed description: ------------------------- It causes system hang on sandybridge with -nightly, -queued and -fixes kernel. Bsiect shows: igt commit c05c88c2b641aaab83608fb2c8e816893690c1fe is the first bad commit. commit c05c88c2b641aaab83608fb2c8e816893690c1fe Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> AuthorDate: Tue Jan 21 17:40:08 2014 +0200 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Wed Jan 22 09:45:27 2014 +0100 tests/gem_reset_stats: stop only one ring when submitting hang If we stop all the rings, we can end up blaming the innocent rings on hangcheck. Reference: https://bugs.freedesktop.org/show_bug.cgi?id=73652 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> output: IGT-Version: 1.5-gb5109e6 (x86_64) (Linux: 3.13.0-rc8_drm-intel-nightly_f27f16_20140126+ x86_64) Subtest close-pending-fork-render: SUCCESS Test requirement not met in function gem_require_ring, file ./../lib/drmtest.h:311: Last errno: 11, Resource temporarily unavailable Test requirement: (!(gem_has_vebox(fd))) Subtest reset-stats-vebox: SKIP Subtest reset-stats-ctx-vebox: SKIP Subtest ban-vebox: SKIP Subtest ban-ctx-vebox: SKIP Subtest reset-count-vebox: SKIP Subtest reset-count-ctx-vebox: SKIP Subtest unrelated-ctx-vebox: SKIP Subtest close-pending-vebox: SKIP Subtest close-pending-ctx-vebox: SKIP Subtest close-pending-fork-vebox: SKIP Reproduce steps: ------------------------- 1. ./gem_reset_stats --run-subtest close-pending-fork-render
Can you please repaste the error message after applying: commit 48ad03ca0c5f078b8d12a64323fd93b3858041af Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jan 31 16:56:01 2014 +0000 lib: Capture errno on entry When printing the errno, it is important that we capture the user errno before we make any library calls - as they may alter the value. References: https://bugs.freedesktop.org/show_bug.cgi?id=74007 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
please retest with latest igt and drm-intel-nightly
Created attachment 93589 [details] new dmesg It still causes system hang on latest igt and -nightly kernel. output: IGT-Version: 1.5-g0269d1d (x86_64) (Linux: 3.13.0_drm-intel-nightly_8f5284_20140207+ x86_64) Subtest close-pending-fork-render: SUCCESS Test requirement not met in function gem_require_ring, file ./../lib/drmtest.h:312: Last errno: 11, Resource temporarily unavailable Test requirement: (!(gem_has_vebox(fd))) Subtest reset-stats-vebox: SKIP Subtest reset-stats-ctx-vebox: SKIP Subtest ban-vebox: SKIP Subtest ban-ctx-vebox: SKIP Subtest reset-count-vebox: SKIP Subtest reset-count-ctx-vebox: SKIP Subtest unrelated-ctx-vebox: SKIP Subtest close-pending-vebox: SKIP Subtest close-pending-ctx-vebox: SKIP Subtest close-pending-fork-vebox: SKIP
Ok, I've cleaned up the superflous SKIP behaviour and some other stuff in the testcase. And I can repro the issue here. Looking at netconsole the system seems to go down with a failed gpu reset: [ 63.823960] [drm] stuck on render ring [ 63.824130] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 63.824847] [drm] GPU HANG [e77fffff] [ 63.824893] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 63.824979] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 63.825057] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 63.825138] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 63.825978] [drm] Simulated gpu hang, resetting stop_rings [ 65.805203] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off [ 71.812051] [drm] stuck on blitter ring [ 72.316616] [drm:i915_reset] *ERROR* Failed to reset chip: -110 That shouldn't really happen ...
The two render ring hangs are a proper hangs injected by test but the third one in blitter is unexpected. The recovery from this hang fails as attempting to reset the GPU hard hangs the whole machine. The SNB I tested with hangs immediately on setting the reset bit. As the hang recovery works properly with other tests, I will decrease importance.
With this patch here http://patchwork.freedesktop.org/patch/21673/ I can convert the hard system hang in a failed gpu hang. Can you please test this patch and confirm that it improves the situation? gem_reset_stat should still fail, but the machine will survive at least.
(In reply to comment #6) > With this patch here > > http://patchwork.freedesktop.org/patch/21673/ > > I can convert the hard system hang in a failed gpu hang. Can you please test > this patch and confirm that it improves the situation? gem_reset_stat should > still fail, but the machine will survive at least. Patch fail.
On Mon, Mar 10, 2014 at 7:03 AM, <bugzilla-daemon@freedesktop.org> wrote: > Patch fail. Please provide more details to the nature of the failure, this information is next to useless.
patching file drivers/gpu/drm/i915/intel_uncore.c Hunk #1 FAILED at 989. 1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/i915/intel_uncore.c.rej drivers/gpu/drm/i915/intel_uncore.c : if (fw_engine) dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_engine); if (IS_GEN6(dev) || IS_GEN7(dev)) dev_priv->uncore.fifo_count = __raw_i915_read32(dev_priv, GTFIFOCTL) & GT_FIFO_FREE_ENTRIES_MASK; spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags); return ret; } int intel_gpu_reset(struct drm_device *dev) patch: @@ -989,9 +989,11 @@ static int gen6_do_reset(struct drm_device *dev) if (fw_engine) dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_engine); - if (IS_GEN6(dev) || IS_GEN7(dev)) - WARN_ON((__raw_i915_read32(dev_priv, GTFIFOCTL) & - GT_FIFO_FREE_ENTRIES_MASK) != 0); + if (IS_GEN6(dev) || IS_GEN7(dev)) { + if (WARN_ON((__raw_i915_read32(dev_priv, GTFIFOCTL) & + GT_FIFO_FREE_ENTRIES_MASK) != 0)) + ret = -EIO; + } dev_priv->uncore.fifo_count = 0;
My apologies for not updating this bug, I've already ripped out the offending code. The system hang is still there though. Reping to Mika to just look into disabling this specific subtest on snb.
Created attachment 95821 [details] [review] hang preventer hack I think I'm onto something here. Please test the attached patch and check whether the hangs disappear. Note that there might be some additional test failures with this (since it rips out a bit of code), the important part is whether the snb still hangs or not.
Created attachment 95822 [details] [review] fix up semaphore hangcheck code Now also a real patch. Please test both this patch and the earlier quick hack, thanks.
Created attachment 95913 [details] dmesg (In reply to comment #12) > Created attachment 95822 [details] [review] [review] > fix up semaphore hangcheck code > > Now also a real patch. Please test both this patch and the earlier quick > hack, thanks. Test this patch, It still occurs.
Mika, can you please create a patch to just skip the offending tests on snb?
Please test the below patch: http://patchwork.freedesktop.org/patch/25173/
The hang goes away on latest -nightly and -fixes kernel. Close it. output: IGT-Version: 1.6-gc1404e0 (x86_64) (Linux: 3.15.0-rc2_drm-intel-fixes_7f1950_20140429+ x86_64) Subtest close-pending-fork-render: SUCCESS Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:820: Last errno: 0, Success Test requirement: (!(gem_has_vebox(fd)))
Verified.Fixed.
As I failed to trigger the bug without Daniel's patch I proceed to bisect the significant commit. It turned out to be: commit 5582e8c3c49150c0e7398688b5ed167d6c3d44fd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Apr 9 09:19:41 2014 +0100 drm/i915: Preserve ring buffers objects across resume
Closing verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.