System Environment: -------------------------- Platform: Broadwell kernel (drm-intel-nightly)639e4d5e3b595cdf3088e23092afd49532addf56 Bug detailed description: ----------------------------- It causes system on Broadwell with -nightly kernel. output: Using monotonic timestamps Beginning flip-vs-modeset-vs-hang-interruptible on crtc 3, connector 15 1920x1200 60 1920 1968 2000 2080 1200 1203 1209 1235 0x9 0x48 154000 . (BTW, I didn't get any dmesg. We use USB network card, it doesn't support netconsole.I will try to use serial port later.) Steps: --------------------------- ./kms_flip --run-subtest flip-vs-modeset-vs-hang-interruptible
Created attachment 90689 [details] dmesg
It's regression, Test on commit 798183c5, the system doesn't hang. The latest known good commit: 798183c54799fbe1e5a5bfabb3a8c0505ffd2149 The latest known bad commit: c461562e84d180fb691af57f93a42bd9cc7eb69c
Is it possible to bisect to get the offending commit?
> > (BTW, I didn't get any dmesg. We use USB network card, it doesn't support > netconsole.I will try to use serial port later.) On board ethernet should work just fine. Do you require USB ethernet?
(In reply to comment #2) > It's regression, Test on commit 798183c5, the system doesn't hang. > The latest known good commit: 798183c54799fbe1e5a5bfabb3a8c0505ffd2149 > The latest known bad commit: c461562e84d180fb691af57f93a42bd9cc7eb69c Retest on commit 798183c5, It also causes randomly system hang. It randomly causes system hang on latest -queued branch. It happens 4 in 5 runs.
Can we please get the netconsole output using the onboard ethernet?
Created attachment 92250 [details] dmesg(netconsole)
kms_flip subcase flip-vs-panning-vs-hang-interruptible also causes system hang.
Assigning to Mika. Also, there is a rumor that this patch series might fix it: http://lists.freedesktop.org/archives/intel-gfx/2014-January/038953.html
(In reply to comment #9) > Assigning to Mika. Also, there is a rumor that this patch series might fix > it: > http://lists.freedesktop.org/archives/intel-gfx/2014-January/038953.html Test this patch, It still happens.
Created attachment 94048 [details] dmesg with patch
Created attachment 94292 [details] [review] drm/i915: fix forcewake counts for gen8
(In reply to comment #12) > Created attachment 94292 [details] [review] [review] > drm/i915: fix forcewake counts for gen8 Could you please test with the attached patch if it makes a difference?
Created attachment 94296 [details] [review] drm/i915: Fix forcewake counts for gen8
(In reply to comment #14) > Created attachment 94296 [details] [review] [review] > drm/i915: Fix forcewake counts for gen8 Test this patch, It core dumped, doesn't cause system hang. output: Using monotonic timestamps Beginning flip-vs-modeset-vs-hang-interruptible on crtc 3, connector 10 1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780 ..Test assertion failure function exec_nop, file kms_flip.c:693: Last errno: 5, Input/output error Failed assertion: drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf) == 0 Subtest flip-vs-modeset-vs-hang-interruptible: FAIL Test assertion failure function gem_quiescent_gpu, file drmtest.c:156: Last errno: 5, Input/output error Failed assertion: drmIoctl((fd), ((((1U) << (((0+8)+8)+14)) | ((('d')) << (0+8)) | (((0x40 + 0x29)) << 0) | ((((sizeof(struct drm_i915_gem_execbuffer2)))) << ((0+8)+8)))), (&execbuf)) == 0 kms_flip: drmtest.c:1113: igt_fail: Assertion `!test_with_subtests || in_fixture' failed. Aborted (core dumped)
Created attachment 94371 [details] [review] tests/kms_flip: stop only ring when hanging the gpu
Please retest with both attached patches applied.
Created attachment 94412 [details] dmesg(2 patches) (In reply to comment #17) > Please retest with both attached patches applied. Test these 2 patches, It still core dumped. output Using monotonic timestamps Beginning flip-vs-modeset-vs-hang-interruptible on crtc 3, connector 10 1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780 ...Test assertion failure function exec_nop, file kms_flip.c:693: Last errno: 5, Input/output error Failed assertion: drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf) == 0 Subtest flip-vs-modeset-vs-hang-interruptible: FAIL Test assertion failure function gem_quiescent_gpu, file drmtest.c:156: Last errno: 5, Input/output error Failed assertion: drmIoctl((fd), ((((1U) << (((0+8)+8)+14)) | ((('d')) << (0+8)) | (((0x40 + 0x29)) << 0) | ((((sizeof(struct drm_i915_gem_execbuffer2)))) << ((0+8)+8)))), (&execbuf)) == 0 kms_flip: drmtest.c:1113: igt_fail: Assertion `!test_with_subtests || in_fixture' failed. Aborted (core dumped)
Created attachment 94417 [details] [review] Fallback to mmio flips Can you please try this patch to see if prevents the EIO being reported whilst flipping?
I think one problem is that the kernel decides to ban the default context. The test doesn't expect that.
(In reply to comment #19) > Created attachment 94417 [details] [review] [review] > Fallback to mmio flips > > Can you please try this patch to see if prevents the EIO being reported > whilst flipping? Test this patch, It also core dumped. output: IGT-Version: 1.5-g06189c6 (x86_64) (Linux: 3.13.0_nightly_72631patch_20140221+ x86_64) Using monotonic timestamps Beginning flip-vs-modeset-vs-hang-interruptible on crtc 3, connector 10 1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780 ...Test assertion failure function exec_nop, file kms_flip.c:693: Last errno: 5, Input/output error Failed assertion: drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf) == 0 Subtest flip-vs-modeset-vs-hang-interruptible: FAIL Test assertion failure function gem_quiescent_gpu, file drmtest.c:156: Last errno: 5, Input/output error Failed assertion: drmIoctl((fd), ((((1U) << (((0+8)+8)+14)) | ((('d')) << (0+8)) | (((0x40 + 0x29)) << 0) | ((((sizeof(struct drm_i915_gem_execbuffer2)))) << ((0+8)+8)))), (&execbuf)) == 0 kms_flip: drmtest.c:1113: igt_fail: Assertion `!test_with_subtests || in_fixture' failed. Aborted (core dumped)
Yeah, as Ville pointed out, the core dump is first from the exec_nop() failing, not at the pageflip failing (which would be next).
This should stop that particular assertion failure: commit 3db29744f74017a99d1b430b30623dce405ebb1a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Feb 21 09:38:43 2014 +0000 kms_flip: Try to make hang_gpu() robust against hanging the GPU On a bad day, hanging the GPU may be terminal. Yet even if the GPU is terminally wedged we expect modesetting (and pageflips) to continue. That deserves to be a dedicated test, but in the meantime we should strive to avoid falling over just because the code is not resilient. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> but there are probably a couple of other spots that require help before we can run the testing against a terminally wedged GPU.
I pushed a couple of other things. One patch to prevent the kernel from getting stuck waiting for page flips, and the other to fail the hang tests when it didn't actually test anything. commit 5f190f2d674222b27eff9f80d14761fde2e8fe7a Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Fri Feb 21 16:08:28 2014 +0200 kms_flip: Fail the subtest if page flip hang recovery wasn't actually tested Context banning can prevent the page flip hang tests from actaully testing anything, so make the relevant subtests fail in that case. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> commit 48ba2cdf969698a2520193ec0c9cff99f89fe1f6 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Fri Feb 21 15:14:33 2014 +0200 kms_flip: Restore rings to running state in unhang_gpu() If things go bad, make sure the rings aren't left in the stopped state. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
run on igt commit 8ebc02a54c22b7a83a34c923153861848183cd96, It still fails with system hang. output: IGT-Version: 1.5-g8ebc02a (x86_64) (Linux: 3.13.0_drm-intel-nightly_1be8f2_20140 Using monotonic timestamps Beginning flip-vs-modeset-vs-hang-interruptible on crtc 3, connector 10 1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780 .
Created attachment 94630 [details] dmesg(commit 8ebc02a)
Should be fixed with: commit ccc7bed05e27a654db1e9e248ce5fb291c12add1 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Fri Feb 21 16:26:47 2014 +0200 drm/i915: Don't ban default context when stop_rings!=0 fallout handled in: https://bugs.freedesktop.org/show_bug.cgi?id=75876
Verified.Fixed.
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.