Bug 106008 - [CI] igt@kms_rmfb@(close-fd|rmfb-ioctl) - fail - Failed assertion: planeres->fb_id == 0
Summary: [CI] igt@kms_rmfb@(close-fd|rmfb-ioctl) - fail - Failed assertion: planeres->...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-12 17:27 UTC by Martin Peres
Modified: 2019-11-29 17:45 UTC (History)
1 user (show)

See Also:
i915 platform: I915G, I965G, PNV
i915 features: display/Other


Attachments

Description Martin Peres 2018-04-12 17:27:24 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-bwr-2160/igt@kms_rmfb@close-fd.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-gdg-551/igt@kms_rmfb@close-fd.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-elk-e7500/igt@kms_rmfb@close-fd.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-blb-e6850/igt@kms_rmfb@close-fd.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-pnv-d510/igt@kms_rmfb@close-fd.html

(kms_rmfb:1621) CRITICAL: Test assertion failure function test_rmfb, file ../tests/kms_rmfb.c:120:
(kms_rmfb:1621) CRITICAL: Failed assertion: planeres->fb_id == 0
(kms_rmfb:1621) CRITICAL: Last errno: 22, Invalid argument
(kms_rmfb:1621) CRITICAL: error: 60 != 0
Subtest close-fd failed.
Comment 1 Martin Peres 2018-04-12 17:49:07 UTC
Also seen on https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-blb-e6850/igt@kms_rmfb@rmfb-ioctl.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-bwr-2160/igt@kms_rmfb@rmfb-ioctl.html

(kms_rmfb:1491) CRITICAL: Test assertion failure function test_rmfb, file ../tests/kms_rmfb.c:120:
(kms_rmfb:1491) CRITICAL: Failed assertion: planeres->fb_id == 0
(kms_rmfb:1491) CRITICAL: error: 46 != 0
Subtest rmfb-ioctl failed.
Comment 2 Arek Hiler 2019-08-23 08:33:57 UTC
Seem like on pre-ILK rmfb does not behave like it should.

The test looks correct and does what it advertises:
/*
 * 1. Set primary plane to a known fb.
 * 2. Make sure getcrtc returns the correct fb id.
 * 3. Call rmfb on the fb.
 * 4. Make sure getcrtc returns 0 fb id.
 *
 * RMFB is supposed to free the framebuffers from any and all planes,
 * so test this and make sure it works.
 */

The first thing that can go wrong is race condition - freeing frambuffers from the planes is a deferred task:
  /*
   * we now own the reference that was stored in the fbs list
   *
   * drm_framebuffer_remove may fail with -EINTR on pending signals,
   * so run this in a separate stack as there's no way to correctly
   * handle this after the fb is already removed from the lookup table.
   */
  struct drm_mode_rmfb_work arg;

  INIT_WORK_ONSTACK(&arg.work, drm_mode_rmfb_work_fn);
  INIT_LIST_HEAD(&arg.fbs);
  list_add_tail(&fb->filp_head, &arg.fbs);

  schedule_work(&arg.work);
  flush_work(&arg.work);
  destroy_work_on_stack(&arg.work);

But there are also two paths there - for atomic and legacy removing of fb:
  if (drm_drv_uses_atomic_modeset(dev)) {
  	int ret = atomic_remove_fb(fb);
  	WARN(ret, "atomic remove_fb failed with %i\n", ret);
  } else
  	legacy_remove_fb(fb);
  }

And the offending machines seem to exactly match the legacy path:
  /* Disable nuclear pageflip by default on pre-ILK */
  if (!i915_modparams.nuclear_pageflip && match_info->gen < 5)
  	dev_priv->drm.driver_features &= ~DRIVER_ATOMIC;


The code on the legacy path looks correct though, and the main difference seem to be:
drm_modeset_lock_all(dev);

Which may take time to acquire and it's actually combination of the two possible issues from above.


Possible user impact in worst case: memory leak when using rmfb for pre-atomic platforms, but quite likely it's not that and there were no reports of this in the wild. Keeping the priority as "medium".
Comment 3 Arek Hiler 2019-08-23 09:51:45 UTC
Let's find out whether it's just the delay:
https://patchwork.freedesktop.org/series/65694/
Comment 4 Arek Hiler 2019-08-26 05:40:34 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/TrybotIGT_36/fi-elk-e7500/igt@kms_rmfb@close-fd.html

The test still fails on non-atomic platforms, even with the 5s 'wait_on'. I think we actually have an issue with the legacy codepath not releasing the FB correctly. This is beyond being debugable over Trybot.
Comment 6 Martin Peres 2019-11-29 17:45:35 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/107.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.