107724 – [CI] [BAT] [DRMTIP] igt@* - dmesg-warn / dmesg-fail - *ERROR* CPU pipe [ABC] FIFO underrun

Bug 107724 - [CI] [BAT] [DRMTIP] igt@* - dmesg-warn / dmesg-fail - *ERROR* CPU pipe [ABC] FIFO underrun

Summary: [CI] [BAT] [DRMTIP] igt@* - dmesg-warn / dmesg-fail - *ERROR* CPU pipe [ABC] ...

Status:	RESOLVED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	highest critical
Assignee:	Ville Syrjala
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Duplicates (4):	105681 105915 107720 108336 (view as bug list)
Depends on:
Blocks:	105980
	Show dependency tree / graph

Reported:	2018-08-28 15:23 UTC by Martin Peres
Modified:	2019-06-05 11:46 UTC (History)
CC List:	7 users (show)

See Also:
i915 platform:	ICL
i915 features:	display/Other

Attachments

Description Martin Peres 2018-08-28 15:23:37 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-icl-u/igt@kms_plane_lowres@pipe-a-tiling-y.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-icl-u/igt@kms_vblank@pipe-b-ts-continuation-modeset-hang.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-icl-u/igt@kms_flip@wf_vblank-ts-check-interruptible.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-icl-u/igt@kms_plane_lowres@pipe-b-tiling-x.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-icl-u/igt@kms_flip@busy-flip.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-icl-u/igt@kms_vblank@pipe-b-query-idle.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_95/fi-icl-u/igt@kms_ccs@pipe-a-crc-primary-rotation-180.html

[  214.128456] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe C FIFO underrun
[  224.735148] asynchronous wait on fence i915:kms_busy[1456]/0:1 timed out

Comment 1 James Ausmus 2018-09-24 23:03:21 UTC

*** Bug 107720 has been marked as a duplicate of this bug. ***

Comment 2 Paulo Zanoni 2018-09-28 17:16:22 UTC

Update: working on it. I have already identified a few problems, this is not something we're going to solve with a single patch. I'll provide more updates once I have real patches.

Comment 3 Paulo Zanoni 2018-10-04 23:33:06 UTC

Submitted https://patchwork.freedesktop.org/series/50579/ but I'm not sure it will solve the problem.

I was looking at the logs for fi-icl-u and it seems that sometimes during boot the interrupts just act like crazy: either you get a ton of interrupts, or the IIR registers are unclearable and contain crazy values, with even reserved bits set. I simply can't reproduce this type of problem you're seeing. Example:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_115/fi-icl-u/boot3.log

This could be some memory corruption happening, or it could simply be a bad BIOS.

Another thing that was brought to my attention is that we often get the mysterious crazy interrupts right after enabling DMC. Would it be possible to run a few tests with DMC disabled? The CI pages suggest the problem happens only around 11% of the time, so a few rounds of tests would probably be needed :/

Perhaps giving me remote access to the machine would also help us move forward a little faster.

Thanks,
Paulo

Comment 4 steven.j.hockemeier 2018-10-09 20:22:52 UTC

I also noticed that this is being classified as Highest, but I thought that classification was reserved for showstopper.  Does 11% failure rate still fall within that severity?  (just checking)

Comment 5 Lakshmi 2018-10-10 06:52:19 UTC

This issue is occurring in every round of drm-tip execution.

Comment 6 Paulo Zanoni 2018-10-12 17:09:45 UTC

(In reply to Lakshmi from comment #5)
> This issue is occurring in every round of drm-tip execution.

When when it happens, boot.log always show crazyness during machine initialization:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_125/fi-icl-u/boot8.log

None of the machines we have here can reproduce this problem. I wonder if it's a hardware/bios issue with the specific ICL machine that's in CI.

Comment 7 Ville Syrjala 2018-10-12 20:33:30 UTC

The pipe C IIR noise is a bit odd. Bspec it seems to be telling me that these registers live in PG2, whereas the code appears to assume that the register lives in whatever power well the pipe lives in. So that might be a bit wrong in the code (though it should probably still work just fine) or the spec is wrong. Either way since all the power wells up to pg4 should be enabled it shouldn't really matter here either way. I agree with Paulo that DMC might have something to do with this as well.

The WARN_ON(!intel_pstate->base.fb) is also mysterious. We should have either reused the BIOS fb or disabled all the planes. So can't really see how an enabled plane could get that far without a framebuffer.

Comment 8 Lakshmi 2018-10-15 16:30:03 UTC

(In reply to Paulo Zanoni from comment #6)
> (In reply to Lakshmi from comment #5)
> > This issue is occurring in every round of drm-tip execution.
> 
> When when it happens, boot.log always show crazyness during machine
> initialization:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_125/fi-icl-u/boot8.log
> 
> None of the machines we have here can reproduce this problem. I wonder if
> it's a hardware/bios issue with the specific ICL machine that's in CI.

Paulo, last seen this issue 
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_127/fi-icl-u/igt@kms_busy@extended-modeset-hang-oldfb-render-a.html

Looks like this is still happening. Do you think this occurred for some other reason?

Comment 9 James Ausmus 2018-10-18 18:28:00 UTC

After discussion with Paulo and JaniS, it appears this problem is specific to this one ICL board, as it's not reproducing on the other ICL in CI, and we can't reproduce on any of our local hardware. It was agreed to swap a different board in for CI. I'm lowering this to Medium, and if the issue doesn't reproduce with the new CI HW, we should close this.

Comment 10 Lakshmi 2018-10-30 14:47:36 UTC

(In reply to James Ausmus from comment #9)
> After discussion with Paulo and JaniS, it appears this problem is specific
> to this one ICL board, as it's not reproducing on the other ICL in CI, and
> we can't reproduce on any of our local hardware. It was agreed to swap a
> different board in for CI. I'm lowering this to Medium, and if the issue
> doesn't reproduce with the new CI HW, we should close this.

This issue appears in fi-icl-U2 as well.
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4700/fi-icl-u2/igt@gem_ctx_create@basic-files.html
Rising the priority as it happens with BAT.

Comment 11 James Ausmus 2018-10-30 15:00:08 UTC

Hmm - this looks suspicious. If you take a look at https://intel-gfx-ci.01.org/tree/drm-tip/fi-icl-u2.html you'll see that the only two times igt@gem_ctx_create@basic-files *failed*, are the two times that igt@debugfs_test@read_all_entries and igt@gem_exec_suspend@basic-s3 *didn't* fail with powerwell related errors.

There could be interrelation with the powerwell issues.

Comment 12 Lakshmi 2018-10-30 15:11:52 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_135/fi-icl-u/igt@kms_plane_scaling@pipe-c-scaler-with-pixel-format.html

Comment 13 James Ausmus 2018-11-02 16:01:33 UTC

This hasn't occurred again in BAT since the two tests with the suspicious result pattern. Moving this back to high

Comment 14 James Ausmus 2018-11-14 21:23:26 UTC

Waiting to mvoe forward with this with the idea that Ville's watermark series at https://patchwork.freedesktop.org/series/51878/ might help out here.

Comment 15 Jani Saarinen 2018-11-15 15:46:21 UTC

Ville, you have some WM patches in review, should they help here.
On latest runs this now getting worse, also on ICL-u2

Comment 16 Radosław Szwichtenberg 2018-11-16 13:13:02 UTC

Affecting 289 tests on CI.

Comment 17 Jani Saarinen 2018-12-12 07:43:24 UTC

Hi Matt, I see saw comments on https://bugs.freedesktop.org/show_bug.cgi?id=105458 that 
"I also notice that CI indicates a bunch of pre-existing ICL watermark failures are no longer happening with my series (or the earlier revisions of my series), so it's possible that we've also fixed
https://bugs.freedesktop.org/show_bug.cgi?id=107724 "by accident" with this series."

Fingers crossed!

Comment 18 Jani Saarinen 2018-12-17 19:29:16 UTC

Nope, it did not fix. WIP still

Comment 19 Jani Saarinen 2019-01-18 08:43:30 UTC

Still WIP progress to know root cause.

Comment 20 Stanislav Lisovskiy 2019-01-24 12:41:31 UTC

*** Bug 108336 has been marked as a duplicate of this bug. ***

Comment 21 CI Bug Log 2019-02-12 11:07:43 UTC

A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend) -}
{+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium \(hdmi-cmp-nv12\) +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_217/fi-icl-u2/igt@runner@aborted.html

Comment 22 Petri Latvala 2019-02-25 10:03:39 UTC

*** Bug 105915 has been marked as a duplicate of this bug. ***

Comment 23 Martin Peres 2019-04-02 11:31:05 UTC

Ville has been working on it, and he has patches addressing the issue: https://patchwork.freedesktop.org/series/58299/

They are not landed because IGT does not cope well with the additional restrictions on plane size. The IGT test fixes are coming along, and we can expect to close this issue in the coming week or so.

The customer impact of this issue is very high as this could lead to transient black screens / flickering, due to exceeding the memory bandwidth available to the display controller.

To fix these issues for good, a test is being developed by Stan to stress-test watermark selection, which will hopefully allow us to iron out the final kinks and make FIFO underruns a thing of the past.

Comment 24 Martin Peres 2019-04-04 12:09:31 UTC

*** Bug 107720 has been marked as a duplicate of this bug. ***

Comment 25 Martin Peres 2019-04-08 11:34:13 UTC

*** Bug 105681 has been marked as a duplicate of this bug. ***

Comment 26 CI Bug Log 2019-04-09 06:50:55 UTC

A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium \(hdmi-cmp-nv12\) -}
{+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none) +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_249/fi-icl-u2/igt@runner@aborted.html

Comment 27 CI Bug Log 2019-04-09 06:51:06 UTC

A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none) -}
{+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none) +}

 No new failures caught with the new filter

Comment 28 Jani Saarinen 2019-04-11 10:48:51 UTC

Fix WIP, Ville any eta here?

Comment 29 Jani Saarinen 2019-04-15 08:04:23 UTC

Still WIP.

Comment 30 Jani Saarinen 2019-04-23 11:05:15 UTC

Still WIP, talking to architects.

Comment 31 Jani Saarinen 2019-04-25 06:51:28 UTC

Ville, we need patches sent for review asap.

Comment 32 CI Bug Log 2019-04-29 11:41:29 UTC

A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none) -}
{+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-icl-u2/igt@runner@aborted.html

Comment 33 CI Bug Log 2019-04-30 06:58:18 UTC

A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) -}
{+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_268/fi-icl-u2/igt@runner@aborted.html

Comment 34 Jani Saarinen 2019-05-02 15:04:07 UTC

Still WIP, Ville, please update when updates.

Comment 35 Jani Saarinen 2019-05-06 05:25:15 UTC

Reference: https://patchwork.freedesktop.org/series/60271/

Comment 36 Jani Saarinen 2019-05-10 13:20:56 UTC

Under review still

Comment 37 CI Bug Log 2019-05-16 08:21:10 UTC

A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) -}
{+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_286/fi-icl-dsi/igt@runner@aborted.html

Comment 38 CI Bug Log 2019-05-16 08:33:36 UTC

A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) -}
{+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_286/fi-icl-dsi/igt@runner@aborted.html

Comment 39 CI Bug Log 2019-05-16 08:34:29 UTC

The CI Bug Log issue associated to this bug has been updated.

### Removed filters

* ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) (added on a minute ago)

### New filters associated

* ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x)
  (No new failures associated)

Comment 40 CI Bug Log 2019-05-16 08:34:40 UTC

The CI Bug Log issue associated to this bug has been updated.

### Removed filters

* ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) (added on 13 minutes ago)

### New filters associated

* ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x)
  (No new failures associated)

Comment 41 Jani Saarinen 2019-06-05 11:46:06 UTC

Changes from Ville were merged in CI_DRM_6150. 
c457d9cf256e drm/i915: Make sure we have enough memory bandwidth on ICL
d284d5145eb8 drm/i915: Make sandybridge_pcode_read() deal with the second data register

There might be some underruns still but this bug was due to major issues seen and that now should be fixed. If new issues lets make new bug rather than re-opening this

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.