Bug 75145 - [SNB Regression]igt/kms_flip/flip-vs-modeset-vs-hang doesn't exit testing
Summary: [SNB Regression]igt/kms_flip/flip-vs-modeset-vs-hang doesn't exit testing
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium major
Assignee: Todd Previte
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-18 08:32 UTC by lu hua
Modified: 2017-10-06 14:39 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (60.77 KB, text/plain)
2014-02-18 08:32 UTC, lu hua
no flags Details
[PATCH] drm/i915: Don't ban default context when stop_rings!=0 (1.48 KB, patch)
2014-03-04 19:03 UTC, Ville Syrjala
no flags Details | Splinter Review
dmesg (126.54 KB, text/plain)
2014-03-07 03:36 UTC, lu hua
no flags Details

Description lu hua 2014-02-18 08:32:25 UTC
Created attachment 94270 [details]
dmesg

System Environment:
--------------------------
Platform: Sandybridge
kernel:   (drm-intel-nightly)1be8f2b4dd6d3db00af24d4891c82d2650bd282d

Bug detailed description:
---------------------------
Run ./kms_flip --run-subtest flip-vs-modeset-vs-hang, It doesn't exit testing.
It happens on Sandybridge with -queued and -nightly kernel.

TThe latest known good commit: c461562e84d180fb691af57f93a42bd9cc7eb69c
The latest known bad commit:  4c0e552882114d1edb588242d45035246ab078a0

output:
IGT-Version: 1.5-g9597836 (i686) (Linux: 3.13.0_drm-intel-nightly_1be8f2_20140218+ i686)
Using monotonic timestamps
Beginning flip-vs-modeset-vs-hang on crtc 3, connector 7
  1280x1024 60 1280 1328 1440 1688 1024 1025 1028 1066 0x5 0x48 108000
...Test assertion failure function exec_nop, file kms_flip.c:693:
Last errno: 5, Input/output error
Failed assertion: drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf) == 0
Subtest flip-vs-modeset-vs-hang: FAIL
Test assertion failure function gem_quiescent_gpu, file drmtest.c:156:
Last errno: 5, Input/output error
Failed assertion: drmIoctl((fd), ((((1U) << (((0+8)+8)+14)) | ((('d')) << (0+8)) | (((0x40 + 0x29)) << 0) | ((((sizeof(struct drm_i915_gem_execbuffer2)))) << ((0+8)+8)))), (&execbuf)) == 0

Reproduce steps:
---------------------------- 
1. ./kms_flip --run-subtest flip-vs-modeset-vs-hang
Comment 1 lu hua 2014-02-18 08:35:14 UTC
Fellow cases also have this issue:
igt/kms_flip/flip-vs-modeset-vs-hang-interruptible
igt/kms_flip/flip-vs-panning-vs-hang
igt/kms_flip/flip-vs-panning-vs-hang-interruptible
igt/kms_flip/rcs-wf_vblank-vs-dpms-interruptible
igt/kms_flip/plain-flip-ts-check
Comment 2 Daniel Vetter 2014-03-03 14:40:58 UTC
This sounds like either the gpu reset failed (and the gpu is gone for good) or we have a spurious -EIO somewhere.

Do other hang tests (anything which contains "hang" somewhere in the test/subtest name) also fail like this on snb?

Mika&I are aware that gpu reset seems to be a bit busted currently on snb :(
Comment 3 Ville Syrjala 2014-03-03 15:41:08 UTC
This is probably just the same old default context ban issue. There's a kernel patch on the list to prevent the ban, but I'm not sure if we can apply it since Mika tells me it's got something to do with dmesg spam handling in piglit. There's another way to work around the problem by adding ~10second sleep between the iterations of these subtests. That will prevent the context ban, but it will slow down the test significantly.

The reason the test actually gets stuck while trying to terminate is that it does something signal-unsafe (printf is the likely culprit) from the signal hadler, and then gets stuck in some glic malloc futex.
Comment 4 lu hua 2014-03-04 07:33:23 UTC
It randomly has timeout issue, sometimes it aborted. It doesn't exit testing 3 in 5 runs, the rest 3 runs are aborted.
output:
IGT-Version: 1.5-g072d358 (i686) (Linux: 3.14.0-rc5_drm-intel-nightly_2bbdb4_20140304+ i686)
Using monotonic timestamps
Beginning flip-vs-panning-vs-hang on crtc 3, connector 7
  1280x1024 60 1280 1328 1440 1688 1024 1025 1028 1066 0x5 0x48 108000
...Test assertion failure function run_test_step, file kms_flip.c:933:
Last errno: 5, Input/output error
Failed assertion: hang
failed to exercise page flip hang recovery
Subtest flip-vs-panning-vs-hang: FAIL
Test assertion failure function gem_quiescent_gpu, file drmtest.c:156:
Last errno: 5, Input/output error
Failed assertion: drmIoctl((fd), ((((1U) << (((0+8)+8)+14)) | ((('d')) << (0+8)) | (((0x40 + 0x29)) << 0) | ((((sizeof(struct drm_i915_gem_execbuffer2)))) << ((0+8)+8)))), (&execbuf)) == 0
kms_flip: drmtest.c:1113: igt_fail: Assertion `!test_with_subtests || in_fixture' failed.
Aborted (core dumped)
Comment 5 Daniel Vetter 2014-03-04 18:39:49 UTC
Can anyone please dig out that the default context ban prevention patch for Lu to test?
Comment 6 Ville Syrjala 2014-03-04 19:03:55 UTC
Created attachment 95112 [details] [review]
[PATCH] drm/i915: Don't ban default context when stop_rings!=0

Here you go.
Comment 7 lu hua 2014-03-05 05:42:55 UTC
(In reply to comment #6)
> Created attachment 95112 [details] [review] [review]
> [PATCH] drm/i915: Don't ban default context when stop_rings!=0
> 
> Here you go.

Fixed by this patch.
output:
IGT-Version: 1.5-g072d358 (i686) (Linux: 3.14.0-rc5_prts_558104_20140305 i686)
Using monotonic timestamps
Beginning flip-vs-modeset-vs-hang on crtc 3, connector 7
  1280x1024 60 1280 1328 1440 1688 1024 1025 1028 1066 0x5 0x48 108000
....
flip-vs-modeset-vs-hang on crtc 3, connector 7: PASSED

Beginning flip-vs-modeset-vs-hang on crtc 5, connector 7
  1280x1024 60 1280 1328 1440 1688 1024 1025 1028 1066 0x5 0x48 108000
....
flip-vs-modeset-vs-hang on crtc 5, connector 7: PASSED

Subtest flip-vs-modeset-vs-hang: SUCCESS
Comment 8 Daniel Vetter 2014-03-06 07:51:20 UTC
Fix merged into dinq:

commit ccc7bed05e27a654db1e9e248ce5fb291c12add1
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Fri Feb 21 16:26:47 2014 +0200

    drm/i915: Don't ban default context when stop_rings!=0
Comment 9 lu hua 2014-03-07 03:36:52 UTC
Created attachment 95278 [details]
dmesg

Output shows pass, but call trace appears in dmesg.
IGT-Version: 1.5-gcdf74b6 (i686) (Linux: 3.13.0_drm-intel-next-queued_eb162c_20140307+ i686)
Using monotonic timestamps
Beginning flip-vs-modeset-vs-hang on crtc 3, connector 7
  1280x1024 60 1280 1328 1440 1688 1024 1025 1028 1066 0x5 0x48 108000
....
flip-vs-modeset-vs-hang on crtc 3, connector 7: PASSED

Beginning flip-vs-modeset-vs-hang on crtc 5, connector 7
  1280x1024 60 1280 1328 1440 1688 1024 1025 1028 1066 0x5 0x48 108000
....
flip-vs-modeset-vs-hang on crtc 5, connector 7: PASSED

Subtest flip-vs-modeset-vs-hang: SUCCESS

[   28.739294] ------------[ cut here ]------------
[   28.739309] WARNING: CPU: 0 PID: 736 at drivers/gpu/drm/i915/intel_uncore.c:994 intel_gpu_reset+0x125/0x426 [i915]()
[   28.739310] Modules linked in: dm_mod snd_hda_codec_hdmi snd_hda_codec_realtek dcdbas pcspkr serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc lpc_ich snd_timer mfd_core snd soundcore acpi_cpufreq i915 video button drm_kms_helper drm
[   28.739322] CPU: 0 PID: 736 Comm: kworker/0:1 Not tainted 3.13.0_drm-intel-next-queued_eb162c_20140307+ #534
[   28.739323] Hardware name: Dell Inc. OptiPlex 990/0DXWW6, BIOS A02 02/26/2011
[   28.739329] Workqueue: events i915_error_work_func [i915]
[   28.739330]  00000007 c08910a9 00000000 c022b7a3 f81d0c41 f4d20000 f5145000 f4d20064
[   28.739333]  00000000 c022b7c3 00000009 00000000 f81d0c41 00000293 f5145000 f5145000
[   28.739336]  f5145014 f4d20000 f81808f5 0000000f f4d21aa8 f5145000 f5e11c80 00000000
[   28.739338] Call Trace:
[   28.739342]  [<c08910a9>] ? dump_stack+0x3e/0x4e
[   28.739345]  [<c022b7a3>] ? warn_slowpath_common+0x61/0x74
[   28.739352]  [<f81d0c41>] ? intel_gpu_reset+0x125/0x426 [i915]
[   28.739354]  [<c022b7c3>] ? warn_slowpath_null+0xd/0x10
[   28.739363]  [<f81d0c41>] ? intel_gpu_reset+0x125/0x426 [i915]
[   28.739372]  [<f81808f5>] ? i915_reset+0x3b/0x11e [i915]
[   28.739378]  [<f81855f1>] ? i915_error_work_func+0xa5/0xf5 [i915]
[   28.739382]  [<c023a32e>] ? process_one_work+0x16b/0x278
[   28.739384]  [<c023a800>] ? worker_thread+0x19b/0x27b
[   28.739385]  [<c023a665>] ? rescuer_thread+0x20d/0x20d
[   28.739387]  [<c023e396>] ? kthread+0xa1/0xa6
[   28.739390]  [<c0899c77>] ? ret_from_kernel_thread+0x1b/0x28
[   28.739392]  [<c023e2f5>] ? kthread_freezable_should_stop+0x3b/0x3b
[   28.739393] ---[ end trace ec54be492b20dfeb ]---
Comment 10 Elizabeth 2017-10-06 14:39:50 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.