Bug 112092

Summary: [CI][RESUME] igt@kms_flip_tiling@* - fail - Failed assertion: ret == 0
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, jani.saarinen, matthew.d.roper
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: TGL i915 features: display/Other
Attachments:
Description Flags
Requested info none

Description Lakshmi 2019-10-22 07:31:03 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7104/re-tgl-u/igt@kms_flip_tiling@flip-to-y-tiled.html
Starting subtest: flip-to-Y-tiled
(kms_flip_tiling:929) igt_kms-CRITICAL: Test assertion failure function igt_primary_plane_commit_legacy, file ../lib/igt_kms.c:2814:
(kms_flip_tiling:929) igt_kms-CRITICAL: Failed assertion: ret == 0
(kms_flip_tiling:929) igt_kms-CRITICAL: Last errno: 22, Invalid argument
(kms_flip_tiling:929) igt_kms-CRITICAL: error: -22 != 0
Subtest flip-to-Y-tiled failed.
Comment 1 CI Bug Log 2019-10-22 07:32:23 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* TGL:  igt@kms_flip_tiling@* - fail - Failed assertion: ret == 0
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7104/re-tgl-u/igt@kms_flip_tiling@flip-to-y-tiled.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7104/re-tgl-u/igt@kms_flip_tiling@flip-y-tiled.html
Comment 2 Matt Roper 2019-10-29 22:42:28 UTC
<7> [429.032901] [drm:skl_allocate_pipe_ddb [i915]] Requested display configuration exceeds system DDB limitations
<7> [429.032960] [drm:skl_allocate_pipe_ddb [i915]] minimum required 243/238

That seems somewhat suspicious given that we don't seem to be trying to use a bunch of planes+connectors here.

Can you provide the contents of:
  /sys/kernel/debug/dri/0/i915_display_info.txt
  /sys/kernel/debug/dri/0/i915_ddb_info
  /sys/kernel/debug/dri/0/i915_pri_wm_latency
Comment 3 Jani Saarinen 2019-10-30 05:14:26 UTC
Created attachment 145839 [details]
Requested info

Requested info attached
Comment 4 Matt Roper 2019-10-30 18:13:04 UTC
Okay, it looks like we have 3 very large displays attached, so maybe this failure is legitimate.  TGL has 2048 DDB blocks split across two DBUF's, but today our driver only supports a single DBUF so we only have 1024 available to us.  That buffer allocation needs to be split between the active pipes and according to display_info you have 3 pipes active with modes 4k, 4k, 5k.  The ddb is partitioned proportional to the display data rates, so your pipes get 307, 307, 410 respectively.  Some of those blocks are reserved for use by the cursor plane, so on pipe A (where the failure arose) we had 238 blocks left to drive the primary plane (and any sprites we might have wanted to turn on).  For a large 4k display, 238 doesn't seem too unreasonable (the exact requirements depend on a bunch of variables, including the latency of the RAM your platform is using so it's not easy to calculate by hand without a spreadsheet).  So I suspect everything is actually working as expected and we simply don't have the resources necessary to satisfy IGT's request here (flipping to Y-tiled buffers generally requires more DDB than linear/x-tiled).

A lot of the DDB pressure here comes from the fact that we don't yet enable the second DBUF that's availabe on ICL+.  Stanislav is working on some patches to address that now, so the situation will likely improve soon and we'll be able to avoid running out of DDB space on large monitors unless we also try to turn on lots of planes too.

So the failure returned by the kernel here is correct/expected; in theory a sophisticated userspace compositor would react to this failure by retrying with a less aggressive display setup (possibly reducing the resolution of monitor(s)) until the configuration could be satisfied.  Also, the situation isn't really any worse than we already have on ICL or before, so marking this as medium priority/exposure.

Since IGT isn't sophisticated enough to retry with less aggressive configurations when DDB exhaustion occurs, we could probably avoid these kind of issues by updating this test (and any others that hit similar DDB exhaustion) to make sure they turn off any pipes they aren't explicitly trying to use before they start their testing.  That would allow i915 to give the full platform DDB allocation of 1024 blocks to the one active pipe, meaning we'd have plenty of blocks to run this test without running out.
Comment 5 Lakshmi 2019-10-31 07:00:10 UTC
(In reply to Matt Roper from comment #4)
> Okay, it looks like we have 3 very large displays attached, so maybe this
> failure is legitimate.  TGL has 2048 DDB blocks split across two DBUF's, but
> today our driver only supports a single DBUF so we only have 1024 available
> to us.  That buffer allocation needs to be split between the active pipes
> and according to display_info you have 3 pipes active with modes 4k, 4k, 5k.
> The ddb is partitioned proportional to the display data rates, so your pipes
> get 307, 307, 410 respectively.  Some of those blocks are reserved for use
> by the cursor plane, so on pipe A (where the failure arose) we had 238
> blocks left to drive the primary plane (and any sprites we might have wanted
> to turn on).  For a large 4k display, 238 doesn't seem too unreasonable (the
> exact requirements depend on a bunch of variables, including the latency of
> the RAM your platform is using so it's not easy to calculate by hand without
> a spreadsheet).  So I suspect everything is actually working as expected and
> we simply don't have the resources necessary to satisfy IGT's request here
> (flipping to Y-tiled buffers generally requires more DDB than
> linear/x-tiled).
> 
> A lot of the DDB pressure here comes from the fact that we don't yet enable
> the second DBUF that's availabe on ICL+.  Stanislav is working on some
> patches to address that now, so the situation will likely improve soon and
> we'll be able to avoid running out of DDB space on large monitors unless we
> also try to turn on lots of planes too.
> 
> So the failure returned by the kernel here is correct/expected; in theory a
> sophisticated userspace compositor would react to this failure by retrying
> with a less aggressive display setup (possibly reducing the resolution of
> monitor(s)) until the configuration could be satisfied.  Also, the situation
> isn't really any worse than we already have on ICL or before, so marking
> this as medium priority/exposure.
> 
> Since IGT isn't sophisticated enough to retry with less aggressive
> configurations when DDB exhaustion occurs, we could probably avoid these
> kind of issues by updating this test (and any others that hit similar DDB
> exhaustion) to make sure they turn off any pipes they aren't explicitly
> trying to use before they start their testing.  That would allow i915 to
> give the full platform DDB allocation of 1024 blocks to the one active pipe,
> meaning we'd have plenty of blocks to run this test without running out.

Matt, this issue should be under IGT?
Comment 6 Matt Roper 2019-10-31 12:29:43 UTC
In the short term it can be worked around in IGT by making sure the tests disable resources they aren't actively using.  However arguably the more important "fix" will be https://patchwork.freedesktop.org/series/67771/ (and equivalent enablement for TGL) which will double the effective resources available for the driver to use.
Comment 7 Martin Peres 2019-11-29 19:42:54 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/538.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.