Bug 105702 - [CI] igt@kms_flip@* - fail - Failed assertion: (drmWaitVBlank(drm_fd, &wait)) == 0
Summary: [CI] igt@kms_flip@* - fail - Failed assertion: (drmWaitVBlank(drm_fd, &wait))...
Status: REOPENED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-23 08:54 UTC by Marta Löfstedt
Modified: 2019-06-26 13:25 UTC (History)
2 users (show)

See Also:
i915 platform: ALL
i915 features: display/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marta Löfstedt 2018-03-23 08:54:52 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_6/fi-kbl-r/igt@kms_flip@plain-flip-ts-check-interruptible.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_6/fi-skl-6600u/igt@kms_flip@plain-flip-ts-check-interruptible.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_6/fi-kbl-r/igt@kms_flip@flip-vs-absolute-wf_vblank.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_6/fi-skl-6600u/igt@kms_flip@flip-vs-absolute-wf_vblank.html

(kms_flip:1637) CRITICAL: Test assertion failure function calibrate_ts, file ../tests/kms_flip.c:1126:
(kms_flip:1637) CRITICAL: Failed assertion: (drmWaitVBlank(drm_fd, &wait)) == 0
(kms_flip:1637) CRITICAL: Last errno: 16, Device or resource busy
Subtest plain-flip-ts-check-interruptible failed.
Comment 2 Martin Peres 2018-04-13 11:04:05 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_8/fi-ctg-p8600/igt@kms_flip@blocking-wf_vblank.html

(kms_flip:1268) CRITICAL: Test assertion failure function calibrate_ts, file ../tests/kms_flip.c:1126:
(kms_flip:1268) CRITICAL: Failed assertion: (drmWaitVBlank(drm_fd, &wait)) == 0
(kms_flip:1268) CRITICAL: Last errno: 16, Device or resource busy
Subtest blocking-wf_vblank failed.

We also got the following WARNING in dmesg:
[  300.104209] vblank wait timed out on crtc 1
Comment 3 Martin Peres 2018-04-13 11:46:02 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_8/fi-ctg-p8600/igt@kms_flip@basic-flip-vs-wf_vblank.html

(kms_flip:1554) CRITICAL: Test assertion failure function calibrate_ts, file ../tests/kms_flip.c:1126:
(kms_flip:1554) CRITICAL: Failed assertion: (drmWaitVBlank(drm_fd, &wait)) == 0
(kms_flip:1554) CRITICAL: Last errno: 16, Device or resource busy
Subtest basic-flip-vs-wf_vblank failed.
Comment 5 Martin Peres 2018-04-20 12:23:39 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_23/fi-ctg-p8600/igt@kms_flip@2x-flip-vs-wf_vblank.html

(kms_flip:1332) CRITICAL: Test assertion failure function calibrate_ts, file ../tests/kms_flip.c:1126:
(kms_flip:1332) CRITICAL: Failed assertion: (drmWaitVBlank(drm_fd, &wait)) == 0
(kms_flip:1332) CRITICAL: Last errno: 16, Device or resource busy
Subtest 2x-flip-vs-wf_vblank failed.

[  408.544218] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  418.784228] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CONNECTOR:52:LVDS-1] flip_done timed out
[  429.024211] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:28:primary A] flip_done timed out
[  429.128199] ------------[ cut here ]------------
[  429.128211] vblank wait timed out on crtc 0
[  429.128268] WARNING: CPU: 0 PID: 1332 at drivers/gpu/drm/drm_vblank.c:1084 drm_wait_one_vblank+0x175/0x180
[  429.128276] Modules linked in: vgem i915 coretemp snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core lpc_ich snd_pcm mei_me e1000e mei prime_numbers
[  429.128400] CPU: 0 PID: 1332 Comm: kms_flip Tainted: G     U            4.17.0-rc1-gc1645edc253f-drmtip_23+ #1
[  429.128409] Hardware name: LENOVO 74591P0/74591P0, BIOS 6DET28WW (1.05 ) 07/30/2008
[  429.128421] RIP: 0010:drm_wait_one_vblank+0x175/0x180
[  429.128429] RSP: 0018:ffffa4618075ba38 EFLAGS: 00010282
[  429.128446] RAX: 0000000000000000 RBX: ffff939725320000 RCX: 0000000000000001
[  429.128454] RDX: 0000000080000001 RSI: ffffffffb6084f36 RDI: 00000000ffffffff
[  429.128463] RBP: 0000000000000000 R08: 0000000033a4da69 R09: 0000000000000000
[  429.128471] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  429.128480] R13: 0000000000004a85 R14: ffff93971ed05d30 R15: ffff93972dc52e98
[  429.128490] FS:  00007ff258c87980(0000) GS:ffff93973bc00000(0000) knlGS:0000000000000000
[  429.128499] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  429.128508] CR2: 00007f77a19f0068 CR3: 000000012cdf2000 CR4: 00000000000406f0
[  429.128516] Call Trace:
[  429.128537]  ? wait_woken+0x90/0x90
[  429.128714]  intel_pre_plane_update+0x17c/0x220 [i915]
[  429.128883]  intel_atomic_commit_tail+0xce/0xd20 [i915]
[  429.129067]  intel_atomic_commit+0x240/0x320 [i915]
[  429.129087]  ? drm_mode_getcrtc+0x170/0x170
[  429.129102]  drm_atomic_helper_set_config+0x7b/0x90
[  429.129109]  __drm_mode_set_config_internal+0x62/0x120
[  429.129109]  drm_mode_setcrtc+0x41b/0x5c0
[  429.129109]  ? drm_mode_getcrtc+0x170/0x170
[  429.129109]  drm_ioctl_kernel+0x7c/0xf0
[  429.129109]  drm_ioctl+0x2e6/0x3a0
[  429.129109]  ? drm_mode_getcrtc+0x170/0x170
[  429.129109]  do_vfs_ioctl+0xa0/0x6c0
[  429.129109]  ? lock_acquire+0xa6/0x210
[  429.129109]  ksys_ioctl+0x35/0x60
[  429.129109]  __x64_sys_ioctl+0x11/0x20
[  429.129109]  do_syscall_64+0x55/0x190
[  429.129109]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  429.129109] RIP: 0033:0x7ff2579d85d7
[  429.129109] RSP: 002b:00007ffdb09af698 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  429.129109] RAX: ffffffffffffffda RBX: 00007ffdb09af6d0 RCX: 00007ff2579d85d7
[  429.129109] RDX: 00007ffdb09af6d0 RSI: 00000000c06864a2 RDI: 0000000000000003
[  429.129109] RBP: 00007ffdb09af6d0 R08: 0000000000000000 R09: 0000000000000000
[  429.129109] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c06864a2
[  429.129109] R13: 0000000000000003 R14: 00007ff2588856d1 R15: 00007ff258886c34
[  429.129109] Code: 2f ff ff ff e8 1d 71 a7 ff 48 89 e6 4c 89 f7 e8 c2 83 ac ff 45 85 e4 0f 85 0c ff ff ff 89 ee 48 c7 c7 38 fa 0c b6 e8 1b 6e a7 ff <0f> 0b e9 f7 fe ff ff 0f 1f 40 00 8b b7 f8 00 00 00 48 8b 3f e9 
[  429.129109] WARNING: CPU: 0 PID: 1332 at drivers/gpu/drm/drm_vblank.c:1084 drm_wait_one_vblank+0x175/0x180
[  429.129109] irq event stamp: 111740
[  429.129109] hardirqs last  enabled at (111739): [<ffffffffb50fbf06>] console_unlock+0x426/0x640
[  429.129109] hardirqs last disabled at (111740): [<ffffffffb5a0111c>] error_entry+0x7c/0x100
[  429.129109] softirqs last  enabled at (111456): [<ffffffffb5c0032b>] __do_softirq+0x32b/0x4e1
[  429.129109] softirqs last disabled at (111449): [<ffffffffb508f594>] irq_exit+0xa4/0xb0
[  429.129109] ---[ end trace 89784d36fb754821 ]---
Comment 6 Martin Peres 2018-06-19 22:37:53 UTC
Used to happen every roughly every 3 runs, not it has not happened since drmtip_48 (3 weeks, 5 days / 18 runs ago). Closing!
Comment 8 Lakshmi 2018-10-19 15:28:10 UTC
Update: Last seen drmtip_104 (1 month, 1 week / 590 runs ago).
The frequency of this bug occurrence is in between 1 week to 2 months.
Comment 9 CI Bug Log 2019-01-31 11:26:18 UTC
A CI Bug Log filter associated to this bug has been updated:

{- all machines: igt@kms_flip@* - Failed assertion: (drmWaitVBlank(drm_fd, &amp;wait)) == 0 -}
{+ all machines: igt@kms_flip@* - Failed assertion: (drmWaitVBlank(drm_fd, &amp;wait)) == 0 +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_192/fi-skl-6600u/igt@kms_flip@plain-flip-fb-recreate.html
Comment 10 CI Bug Log 2019-03-13 10:39:49 UTC
A CI Bug Log filter associated to this bug has been updated:

{- all machines: igt@kms_flip@* - Failed assertion: (drmWaitVBlank(drm_fd, &amp;wait)) == 0 -}
{+ all machines: igt@kms_flip@* - Failed assertion: (drmWaitVBlank(drm_fd, &amp;wait)) == 0 +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_240/fi-kbl-7560u/igt@kms_flip@2x-plain-flip-fb-recreate.html
Comment 11 Arek Hiler 2019-06-26 13:25:05 UTC
This is an old one that we haven't seen for 3.5 months. Prior to that the reproduction rate varied quite a bit. Shortly before disappearance it was seen about once every two weeks. This means we can close the bug after 20 weeks ~= 5 months, so around mid August 2019.

As of what happens, code explains it pretty well (from kms_flip.c/calibrate_ts()):

	memset(&wait, 0, sizeof(wait));
	wait.request.type = kmstest_get_vbl_flag(crtc_idx);
	wait.request.type |= DRM_VBLANK_RELATIVE | DRM_VBLANK_NEXTONMISS;
	do_or_die(drmWaitVBlank(drm_fd, &wait));

	last_seq = wait.reply.sequence;
	last_timestamp = wait.reply.tval_sec;
	last_timestamp *= 1000000;
	last_timestamp += wait.reply.tval_usec;

	memset(&wait, 0, sizeof(wait));
	wait.request.type = kmstest_get_vbl_flag(crtc_idx);
	wait.request.type |= DRM_VBLANK_ABSOLUTE | DRM_VBLANK_EVENT;
	wait.request.sequence = last_seq;
	for (n = 0; n < CALIBRATE_TS_STEPS; n++) {
		drmVBlank check = {};

		++wait.request.sequence;
		do_or_die(drmWaitVBlank(drm_fd, &wait));

		/* Double check that haven't already missed the vblank */
		check.request.type = kmstest_get_vbl_flag(crtc_idx);
		check.request.type |= DRM_VBLANK_RELATIVE;
		do_or_die(drmWaitVBlank(drm_fd, &check));

		igt_assert(!igt_vblank_after(check.reply.sequence, wait.request.sequence));
	}

So we are waiting for a begining of next vblank to get its seq number (NEXTONMISS), then for the next CALIBRATE_TS_STEPS we are waiting for the very next vblank (DRM_VBLANK_ABSOLUTE, ++wait.request.sequence) and double checking.

drmWaitVBlank(drm_fd, &wait) seems to be failing with -EBUSY:
    do {
       ret = ioctl(fd, DRM_IOCTL_WAIT_VBLANK, vbl);
       vbl->request.type &= ~DRM_VBLANK_RELATIVE;
       if (ret && errno == EINTR) {
           clock_gettime(CLOCK_MONOTONIC, &cur);
           /* Timeout after 1s */
           if (cur.tv_sec > timeout.tv_sec + 1 ||
               (cur.tv_sec == timeout.tv_sec && cur.tv_nsec >=
                timeout.tv_nsec)) {
                   errno = EBUSY;
                   ret = -1;
                   break;
           }
       }
    } while (ret && errno == EINTR);

out:
    return ret;


Seems like we are missing the vblank with given seq and then bailing out after 1s with EBUSY, which sounds rather serious and may point to a bug in the kernel, as this amount of code should not take to long even with unfavorable scheduling. Let's see whether it reappears, but hopefully it was "accidentally fixed."


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.