Bug 103160

Summary: [CI] gt@kms_flip@vblank-vs-[dpms-suspend-interruptible|modeset-suspend-interruptible|suspend-interruptible] - Softdog incomplete
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: imre.deak, intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BXT i915 features: display/Other

Description Marta Löfstedt 2017-10-09 10:46:08 UTC
There has been flip-flipping softdog incompletes for the APL-shards on the following tests:

igt@kms_flip@vblank-vs-dpms-suspend-interruptible
igt@kms_flip@vblank-vs-modeset-suspend-interruptible
igt@kms_flip@vblank-vs-suspend-interruptible

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-apl8/igt@kms_flip@vblank-vs-dpms-suspend-interruptible.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3184/shard-apl1/igt@kms_flip@vblank-vs-modeset-suspend-interruptible.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3190/shard-apl3/igt@kms_flip@vblank-vs-suspend-interruptible.html
Comment 1 Marta Löfstedt 2017-10-09 10:47:00 UTC
Note this could be duplicate of BUG 102490.
Comment 2 Imre Deak 2017-10-11 14:24:13 UTC
AFAICS, in the tests the 500Hz interrupt frequency is too high for the main thread, so syscalls there will be interrupted and delayed indefinitely leading to watchdog timeout. To test this theory we should try something like:

diff --git a/lib/igt_aux.c b/lib/igt_aux.c
index fa6594c3..2f01c723 100644
--- a/lib/igt_aux.c
+++ b/lib/igt_aux.c
@@ -278,7 +278,7 @@ static void __attribute__((noreturn)) signal_helper_process(pid_t pid)
 {
        /* Interrupt the parent process at 500Hz, just to be annoying */
        while (1) {
-               usleep(1000 * 1000 / 500);
+               usleep(1000 * 1000 / 50);
                if (kill(pid, SIGCONT)) /* Parent has died, so must we. */
                        exit(0);
        }
Comment 3 Daniel Vetter 2017-10-11 14:51:13 UTC
The softdog fires when the kernel stops making progress. Constantly getting back to userspace is progress from the kernel's pov.

It would result in the test not completing, but that's not what's going on here.
Comment 4 Chris Wilson 2017-10-11 18:59:48 UTC
This is what I get:

ickle@broxton:~/intel-gpu-tools/tests$ sudo ./kms_flip --r vblank-vs-dpms-suspend-interruptible
IGT-Version: 1.20-gd8493ea7 (x86_64) (Linux: 4.14.0-rc4+ x86_64)
Using monotonic timestamps
Beginning vblank-vs-dpms-suspend-interruptible on pipe A, connector DP-2
  1920x1080 60 1920 2008 2052 2200 1080 1084 1089 1125 0x5 0x48 148500
.rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Oct 11 18:58:44 2017

Message from syslogd@broxton at Oct 11 19:58:44 ...
 kernel:[24901.110247] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: a600000000020408

Message from syslogd@broxton at Oct 11 19:58:44 ...
 kernel:[24901.110251] mce: [Hardware Error]: TSC 0 ADDR fef13b00 

Message from syslogd@broxton at Oct 11 19:58:44 ...
 kernel:[24901.110267] mce: [Hardware Error]: PROCESSOR 0:506c9 TIME 1507748324 SOCKET 0 APIC 0 microcode 20
(kms_flip:31975) CRITICAL: Test assertion failure function run_test_step, file kms_flip.c:750:
(kms_flip:31975) CRITICAL: Failed assertion: (drmWaitVBlank(drm_fd, &vbl)) == 0
(kms_flip:31975) CRITICAL: Last errno: 22, Invalid argument
Stack trace:
  #0 [__igt_fail_assert+0xf1]
  #1 [run_test_on_crtc_set.constprop.14+0x1d36]
  #2 [run_test+0x272]
  #3 [main+0x492]
  #4 [__libc_start_main+0xf1]
  #5 [_start+0x2a]
  #6 [<unknown>+0x2a]
Subtest vblank-vs-dpms-suspend-interruptible failed.
**** DEBUG ****
(kms_flip:31975) DEBUG: Test requirement passed: (flags & TEST_HANG) == 0 || !is_wedged(drm_fd)
(kms_flip:31975) DEBUG: Test requirement passed: modes
(kms_flip:31975) INFO: Beginning vblank-vs-dpms-suspend-interruptible on pipe A, connector DP-2
(kms_flip:31975) igt-fb-DEBUG: igt_create_fb_with_bo_size(width=1920, height=1080, format=0x34325258, tiling=0x0, size=0)
(kms_flip:31975) igt-fb-DEBUG: igt_create_fb_with_bo_size(handle=2, pitch=7680)
(kms_flip:31975) igt-fb-DEBUG: igt_create_fb_with_bo_size(width=1920, height=1080, format=0x34325258, tiling=0x0, size=0)
(kms_flip:31975) igt-fb-DEBUG: igt_create_fb_with_bo_size(handle=3, pitch=7680)
(kms_flip:31975) igt-kms-INFO:   1920x1080 60 1920 2008 2052 2200 1080 1084 1089 1125 0x5 0x48 148500
(kms_flip:31975) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation()
(kms_flip:31975) igt-aux-DEBUG: Test requirement passed: (power_dir = open("/sys/power", O_RDONLY)) >= 0
(kms_flip:31975) igt-aux-DEBUG: Test requirement passed: get_supported_suspend_states(power_dir) & (1 << state)
(kms_flip:31975) igt-aux-DEBUG: Test requirement passed: test == SUSPEND_TEST_NONE || faccessat(power_dir, "pm_test", R_OK | W_OK, 0) == 0
(kms_flip:31975) igt-aux-DEBUG: Test requirement passed: ret == 0
(kms_flip:31975) DEBUG: name = vblank
last_ts = 0.000000
last_received_ts = 0.000000
last_seq = 0
current_ts = 24901.122355
current_received_ts = 24901.976559
current_seq = 11
count = 0
seq_step = 10
(kms_flip:31975) CRITICAL: Test assertion failure function run_test_step, file kms_flip.c:750:
(kms_flip:31975) CRITICAL: Failed assertion: (drmWaitVBlank(drm_fd, &vbl)) == 0
(kms_flip:31975) CRITICAL: Last errno: 22, Invalid argument
(kms_flip:31975) igt-core-INFO: Stack trace:
(kms_flip:31975) igt-core-INFO:   #0 [__igt_fail_assert+0xf1]
(kms_flip:31975) igt-core-INFO:   #1 [run_test_on_crtc_set.constprop.14+0x1d36]
(kms_flip:31975) igt-core-INFO:   #2 [run_test+0x272]
(kms_flip:31975) igt-core-INFO:   #3 [main+0x492]
(kms_flip:31975) igt-core-INFO:   #4 [__libc_start_main+0xf1]
(kms_flip:31975) igt-core-INFO:   #5 [_start+0x2a]
(kms_flip:31975) igt-core-INFO:   #6 [<unknown>+0x2a]
****  END  ****
Subtest vblank-vs-dpms-suspend-interruptible: FAIL (0.960s)
Comment 5 Chris Wilson 2017-10-11 19:02:27 UTC
Hmm, chvt broke after running that test:

[<ffffffff813513ac>] __vt_event_wait.isra.0.part.1+0x2c/0x60
[<ffffffff813515ac>] vt_waitactive+0x6c/0xc0
[<ffffffff813522c8>] vt_ioctl+0xb78/0x1220
[<ffffffff8134592b>] tty_ioctl+0xdb/0x850
[<ffffffff81193ddf>] do_vfs_ioctl+0x8f/0x5b0
[<ffffffff8119433c>] SyS_ioctl+0x3c/0x70
[<ffffffff81538660>] entry_SYSCALL_64_fastpath+0x13/0x94
Comment 6 Marta Löfstedt 2017-10-12 08:51:45 UTC
Here is another APL-shards softdog

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3210/shard-apl6/igt@gem_cpu_reloc@full.html
Comment 7 Jani Saarinen 2017-10-16 10:56:42 UTC
Commit in IGT:
author	Imre Deak <imre.deak@intel.com>	2017-10-13 11:30:09 (GMT)
committer	Imre Deak <imre.deak@intel.com>	2017-10-16 10:35:41 (GMT)
commit	2aeececacc18204f763ec946b35877afb1f8f5b5
Comment 8 Marta Löfstedt 2017-10-17 05:33:57 UTC
Imre patch was integrated in CI_DRM_3241 since then the incompletes on APL-shards on igt@kms_flip@vblank-vs-dpms-suspend-interruptible,
igt@kms_flip@vblank-vs-modeset-suspend-interruptible and
igt@kms_flip@vblank-vs-suspend-interruptible has not been reproduced.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.