Bug 110536

Summary: [CI][DRMTIP] igt@kms_flip@busy-flip - fail - Failed assertion: do_page_flip(o, new_fb_id, 0) == -16
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: NEW --- QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: minor    
Priority: low CC: intel-gfx-bugs, matthew.d.roper
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BSW/CHT, BYT, I965GM i915 features: display/Other

Description Lakshmi 2019-04-27 18:11:57 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_264/fi-bwr-2160/igt@kms_flip@busy-flip.html
Starting subtest: busy-flip
(kms_flip:1144) CRITICAL: Test assertion failure function run_test_step, file ../tests/kms_flip.c:767:
(kms_flip:1144) CRITICAL: Failed assertion: do_page_flip(o, new_fb_id, 0) == -16
(kms_flip:1144) CRITICAL: error: 0 != -16
Subtest busy-flip failed.
**** DEBUG ****
(kms_flip:1144) DEBUG: Test requirement passed: (flags & TEST_HANG) == 0 || !is_wedged(drm_fd)
(kms_flip:1144) DEBUG: Test requirement passed: resources
(kms_flip:1144) DEBUG: Test requirement passed: modes
(kms_flip:1144) INFO: Beginning busy-flip on pipe A, connector VGA-1
(kms_flip:1144) igt_fb-DEBUG: igt_create_fb_with_bo_size(width=1024, height=768, format=XR24(0x34325258), modifier=0x0, size=0)
(kms_flip:1144) igt_fb-DEBUG: igt_create_fb_with_bo_size(handle=2, pitch=4096)
(kms_flip:1144) ioctl_wrappers-DEBUG: Test requirement passed: igt_has_fb_modifiers(fd)
(kms_flip:1144) igt_fb-DEBUG: igt_create_fb_with_bo_size(width=1024, height=768, format=XR24(0x34325258), modifier=0x0, size=0)
(kms_flip:1144) igt_fb-DEBUG: igt_create_fb_with_bo_size(handle=3, pitch=4096)
(kms_flip:1144) ioctl_wrappers-DEBUG: Test requirement passed: igt_has_fb_modifiers(fd)
(kms_flip:1144) igt_fb-DEBUG: Test requirement passed: cairo_surface_status(fb->cairo_surface) == CAIRO_STATUS_SUCCESS
(kms_flip:1144) igt_fb-DEBUG: Test requirement passed: cairo_surface_status(fb->cairo_surface) == CAIRO_STATUS_SUCCESS
(kms_flip:1144) igt_kms-INFO:   1024x768 60 1024 1048 1184 1344 768 771 777 806 0xa 0x40 65000
(kms_flip:1144) DEBUG: name = flip
last_ts = 0.000000
Comment 1 CI Bug Log 2019-04-27 18:13:07 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* BWR: igt@kms_flip@busy-flip - fail - Failed assertion: do_page_flip(o, new_fb_id, 0) == -16
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_264/fi-bwr-2160/igt@kms_flip@busy-flip.html
Comment 2 CI Bug Log 2019-07-01 10:52:52 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BWR: igt@kms_flip@busy-flip - fail - Failed assertion: do_page_flip(o, new_fb_id, 0) == -16 -}
{+ BWR GLK: igt@kms_flip@busy-flip - fail - Failed assertion: do_page_flip(o, new_fb_id, 0) == -16 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5072/shard-glk3/igt@kms_flip@busy-flip.html
Comment 3 CI Bug Log 2019-08-19 06:46:43 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BWR GLK: igt@kms_flip@busy-flip - fail - Failed assertion: do_page_flip(o, new_fb_id, 0) == -16 -}
{+ BWR BSW GLK: igt@kms_flip@busy-flip - fail - Failed assertion: do_page_flip(o, new_fb_id, 0) == -16 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_345/fi-bsw-n3050/igt@kms_flip@busy-flip.html
Comment 4 CI Bug Log 2019-09-16 11:18:20 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BWR BSW GLK: igt@kms_flip@busy-flip - fail - Failed assertion: do_page_flip(o, new_fb_id, 0) == -16 -}
{+ BWR BYT BSW GLK: igt@kms_flip@busy-flip - fail - Failed assertion: do_page_flip(o, new_fb_id, 0) == -16 +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_369/fi-byt-j1900/igt@kms_flip@busy-flip.html
Comment 5 Matt Roper 2019-10-10 23:10:56 UTC
Our pageflip ioctl is supposed to return -EBUSY to userspace if a second pageflip is submitted while a first is still pending.  This test tries to exercise that by attempting to submit two flip requests within the same vblank (since the first flip won't be able to complete if the vblank hasn't happened yet); in this case the second flip is occasionally returning a success return code rather than -EBUSY, hence the failure.

The problem here most likely does not arise from the kernel behavior, but rather from the fragility of the testing approach.  The test software in userspace doesn't have any way to guarantee that both flips are actually submitted within the same vblank, so any unexpected delay in OS scheduling could cause the two flips to land in different vblanks; when that happens the first flip has already completed before the second is issued, to the second flip is successful (rather than EBUSY) and the test sees that as incorrect behavior.  The test attempts to mitigate this unluckiness as much as possible by doing a forced vblank wait before the first flip in the hopes that the first flip will happen very shortly after the vblank and give the second flip time to be submitted within the same window, but this strategy is not foolproof and unexpected scheduling delay caused by other background tasks can still cause one or both of the flips to be issued later than we'd hope.  It's probably not surprising that the occasional failures we get from this test happen on some of our slowest machines (atom rather than core).

Since we can't absolutely guarantee that both flips are submitted within the same vblank window, one alternative to consider would be to try to make the first flip take longer than a single vblank to complete.  Flips will block until outstanding rendering against their framebuffer is completed, so issuing a complicated render operation on the first framebuffer being flipped could make that flip take multiple vblank periods to complete, effectively extending the window for us to submit a second flip and receive -EBUSY.  Of course performing complicated rendering operations in IGT without a full GL stack isn't terribly easy either.  Maybe another alternative would be a debugfs interface that causes the very next flip issued to take an artificially long time to complete by internally creating a fence that the flip must wait on?

There shouldn't be any end-user impact here, only a small impact to CI from false positive failures.  Marking this bug as 'low' importance.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.