Summary: | [CI] [BAT] [DRMTIP] igt@* - dmesg-warn / dmesg-fail - *ERROR* CPU pipe [ABC] FIFO underrun | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Ville Syrjala <ville.syrjala> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | critical | ||
Priority: | highest | CC: | intel-gfx-bugs, james.ausmus, marta.lofstedt, matthew.d.roper, przanoni, ricardo.o.perez, ville.syrjala |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | ICL | i915 features: | display/Other |
Bug Depends on: | |||
Bug Blocks: | 105980 |
Description
Martin Peres
2018-08-28 15:23:37 UTC
*** Bug 107720 has been marked as a duplicate of this bug. *** Update: working on it. I have already identified a few problems, this is not something we're going to solve with a single patch. I'll provide more updates once I have real patches. Submitted https://patchwork.freedesktop.org/series/50579/ but I'm not sure it will solve the problem. I was looking at the logs for fi-icl-u and it seems that sometimes during boot the interrupts just act like crazy: either you get a ton of interrupts, or the IIR registers are unclearable and contain crazy values, with even reserved bits set. I simply can't reproduce this type of problem you're seeing. Example: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_115/fi-icl-u/boot3.log This could be some memory corruption happening, or it could simply be a bad BIOS. Another thing that was brought to my attention is that we often get the mysterious crazy interrupts right after enabling DMC. Would it be possible to run a few tests with DMC disabled? The CI pages suggest the problem happens only around 11% of the time, so a few rounds of tests would probably be needed :/ Perhaps giving me remote access to the machine would also help us move forward a little faster. Thanks, Paulo I also noticed that this is being classified as Highest, but I thought that classification was reserved for showstopper. Does 11% failure rate still fall within that severity? (just checking) This issue is occurring in every round of drm-tip execution. (In reply to Lakshmi from comment #5) > This issue is occurring in every round of drm-tip execution. When when it happens, boot.log always show crazyness during machine initialization: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_125/fi-icl-u/boot8.log None of the machines we have here can reproduce this problem. I wonder if it's a hardware/bios issue with the specific ICL machine that's in CI. The pipe C IIR noise is a bit odd. Bspec it seems to be telling me that these registers live in PG2, whereas the code appears to assume that the register lives in whatever power well the pipe lives in. So that might be a bit wrong in the code (though it should probably still work just fine) or the spec is wrong. Either way since all the power wells up to pg4 should be enabled it shouldn't really matter here either way. I agree with Paulo that DMC might have something to do with this as well. The WARN_ON(!intel_pstate->base.fb) is also mysterious. We should have either reused the BIOS fb or disabled all the planes. So can't really see how an enabled plane could get that far without a framebuffer. (In reply to Paulo Zanoni from comment #6) > (In reply to Lakshmi from comment #5) > > This issue is occurring in every round of drm-tip execution. > > When when it happens, boot.log always show crazyness during machine > initialization: > > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_125/fi-icl-u/boot8.log > > None of the machines we have here can reproduce this problem. I wonder if > it's a hardware/bios issue with the specific ICL machine that's in CI. Paulo, last seen this issue https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_127/fi-icl-u/igt@kms_busy@extended-modeset-hang-oldfb-render-a.html Looks like this is still happening. Do you think this occurred for some other reason? After discussion with Paulo and JaniS, it appears this problem is specific to this one ICL board, as it's not reproducing on the other ICL in CI, and we can't reproduce on any of our local hardware. It was agreed to swap a different board in for CI. I'm lowering this to Medium, and if the issue doesn't reproduce with the new CI HW, we should close this. (In reply to James Ausmus from comment #9) > After discussion with Paulo and JaniS, it appears this problem is specific > to this one ICL board, as it's not reproducing on the other ICL in CI, and > we can't reproduce on any of our local hardware. It was agreed to swap a > different board in for CI. I'm lowering this to Medium, and if the issue > doesn't reproduce with the new CI HW, we should close this. This issue appears in fi-icl-U2 as well. https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4700/fi-icl-u2/igt@gem_ctx_create@basic-files.html Rising the priority as it happens with BAT. Hmm - this looks suspicious. If you take a look at https://intel-gfx-ci.01.org/tree/drm-tip/fi-icl-u2.html you'll see that the only two times igt@gem_ctx_create@basic-files *failed*, are the two times that igt@debugfs_test@read_all_entries and igt@gem_exec_suspend@basic-s3 *didn't* fail with powerwell related errors. There could be interrelation with the powerwell issues. This hasn't occurred again in BAT since the two tests with the suspicious result pattern. Moving this back to high Waiting to mvoe forward with this with the idea that Ville's watermark series at https://patchwork.freedesktop.org/series/51878/ might help out here. Ville, you have some WM patches in review, should they help here. On latest runs this now getting worse, also on ICL-u2 Affecting 289 tests on CI. Hi Matt, I see saw comments on https://bugs.freedesktop.org/show_bug.cgi?id=105458 that "I also notice that CI indicates a bunch of pre-existing ICL watermark failures are no longer happening with my series (or the earlier revisions of my series), so it's possible that we've also fixed https://bugs.freedesktop.org/show_bug.cgi?id=107724 "by accident" with this series." Fingers crossed! Nope, it did not fix. WIP still Still WIP progress to know root cause. *** Bug 108336 has been marked as a duplicate of this bug. *** A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend) -} {+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium \(hdmi-cmp-nv12\) +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_217/fi-icl-u2/igt@runner@aborted.html *** Bug 105915 has been marked as a duplicate of this bug. *** Ville has been working on it, and he has patches addressing the issue: https://patchwork.freedesktop.org/series/58299/ They are not landed because IGT does not cope well with the additional restrictions on plane size. The IGT test fixes are coming along, and we can expect to close this issue in the coming week or so. The customer impact of this issue is very high as this could lead to transient black screens / flickering, due to exceeding the memory bandwidth available to the display controller. To fix these issues for good, a test is being developed by Stan to stress-test watermark selection, which will hopefully allow us to iron out the final kinks and make FIFO underruns a thing of the past. *** Bug 107720 has been marked as a duplicate of this bug. *** *** Bug 105681 has been marked as a duplicate of this bug. *** A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium \(hdmi-cmp-nv12\) -} {+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none) +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_249/fi-icl-u2/igt@runner@aborted.html A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none) -} {+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none) +} No new failures caught with the new filter Fix WIP, Ville any eta here? Still WIP. Still WIP, talking to architects. Ville, we need patches sent for review asap. A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none) -} {+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_267/fi-icl-u2/igt@runner@aborted.html A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) -} {+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_268/fi-icl-u2/igt@runner@aborted.html Still WIP, Ville, please update when updates. Under review still A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) -} {+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_286/fi-icl-dsi/igt@runner@aborted.html A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) -} {+ ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_286/fi-icl-dsi/igt@runner@aborted.html The CI Bug Log issue associated to this bug has been updated. ### Removed filters * ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) (added on a minute ago) ### New filters associated * ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) (No new failures associated) The CI Bug Log issue associated to this bug has been updated. ### Removed filters * ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) (added on 13 minutes ago) ### New filters associated * ICL: igt@runnerr@aborted - fail - Previous test: kms_cursor_crc (cursor-64x64-suspend)|kms_chamelium (hdmi-cmp-nv12)|kms_plane_lowres (pipe-b-tiling-none|x) (No new failures associated) Changes from Ville were merged in CI_DRM_6150. c457d9cf256e drm/i915: Make sure we have enough memory bandwidth on ICL d284d5145eb8 drm/i915: Make sandybridge_pcode_read() deal with the second data register There might be some underruns still but this bug was due to major issues seen and that now should be fixed. If new issues lets make new bug rather than re-opening this |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.