Bug 102403 - [BAT][BXT] loads of "*ERROR* Atomic update failure on pipe A" when using a DSI panel
Summary: [BAT][BXT] loads of "*ERROR* Atomic update failure on pipe A" when using a DS...
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high major
Assignee: Vidya Srinivas
QA Contact: Intel GFX Bugs mailing list
Whiteboard: ReadyForDev
: 103020 (view as bug list)
Depends on:
Reported: 2017-08-25 10:12 UTC by Martin Peres
Modified: 2017-11-17 08:51 UTC (History)
3 users (show)

See Also:
i915 platform: BXT
i915 features: display/DSI


Description Martin Peres 2017-08-25 10:12:44 UTC
A lot of tests generate the following dmesg warning on the machine fi-bxt-dsi (Broxton with a DSI panel attached) when running IGT:

[  522.912705] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=21 end=22) time 186 us, min 1192, max 1199, scanline start 1, end 1

Full logs:
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@pm_rps@basic-api.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@prime_vgem@basic-busy-default.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@gem_exec_flush@basic-wb-ro-before-default.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@gem_ctx_switch@basic-default.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@gem_exec_reloc@basic-write-read-noreloc.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@gem_pread@basic.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@gem_wait@basic-wait-all.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@gem_busy@basic-busy-default.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@gem_ringfill@basic-default-fd.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@gem_exec_reloc@basic-gtt-read-noreloc.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@drv_module_reload@basic-no-display.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3001/fi-bxt-dsi/igt@kms_pipe_crc_basic@read-crc-pipe-a.html
Comment 1 Martin Peres 2017-08-25 16:37:28 UTC
This bug happens randomly on all tests, making this bug a blocker for CI.
Comment 2 Chris Wilson 2017-08-25 17:11:54 UTC
The suggestion was that PIPEDSL() doesn't work on DSI, and always returns 0 (bumped to 1 by crtc->scanline_offset). That means that vblank avoidance is non-existent.
Comment 3 Jani Nikula 2017-08-29 14:36:31 UTC
Bspec topic 7652 on PIPEDSL (or PIPE_SCANLINE) says, "Not supported with MIPI DSI".
Comment 4 Uma Shankar 2017-09-08 14:19:25 UTC
For gen9 platforms, dsi timings are driven from port instead of pipe
(unlike ddi). Thus, we can't rely on pipe registers to get the timing
information. Even scanline register read will not be functional.
This is causing vblank evasion logic to fail since it relies on
scanline, causing atomic update failure warnings.

We created a patch which uses pipe framestamp and current timestamp registers
to calculate scanline. This is an indirect way to get the scanline.
It helps resolve atomic update failure for gen9 dsi platforms.
Comment 5 Jani Saarinen 2017-09-20 06:09:35 UTC
Also seen still that patches on trybot fixes issues seen:
Comment 6 Jani Nikula 2017-09-25 11:55:56 UTC
(In reply to Uma Shankar from comment #4)
> https://patchwork.kernel.org/patch/9944249/

Side note, please use the patchwork instance at https://patchwork.freedesktop.org/project/intel-gfx only. We have all the CI results etc. there. Plus you can reference patches using the message-id:
Comment 7 Jani Saarinen 2017-09-26 09:31:15 UTC
Patch merged: https://cgit.freedesktop.org/drm-tip/commit/?id=aec0246f3e3882065b5c29916a84b539afe4e4af

CI also green on that run. Resolving and whitelisting machine on CI.
Comment 8 Marta Löfstedt 2017-10-06 05:45:16 UTC
The issue is back but this time only on one test.

Comment 9 Vidya Srinivas 2017-10-06 09:35:11 UTC
We executed the test in a loop for around 4 hours with a delay of 5 sec between each iteration. We couldn't reproduce this issue on our end.

Is there any specific scenario (BKM) to reproduce it? Does it happen consistently or its one-off/random kind of an issue?

We tested with below configuration:

DSI APL with AUO panel (1920x1200p)
branch drm-tip
top commit:
commit 247cb84af034b8e90ecd22cd69adb13a7a305350
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Thu Oct 5 09:44:50 2017 +0300

    drm-tip: 2017y-10m-05d-06h-44m-09s UTC integration manifest

As per our understanding, this particular issue is not related to the evasion issues caused by scanline reporting and is more of a modeset/flip sequence/locking problem related to cursor.

So, though the error message is similar, the cause for the same is different. Hence it will require a separate debug. 

Note: There have been some fixes related to cursor updates by Maarten. Can Maarten also have a look at it in parallel as to why this cursor test is failing? Any pointers would be helpful based on the past debug.
Comment 10 Maarten Lankhorst 2017-10-09 10:15:04 UTC
Could you try with the debug kernel then? It will more likely cause timeouts because lockdep makes it slow.

Comment 11 Vidya Srinivas 2017-10-09 14:12:56 UTC
We tested with drm.debug=0x2e command line parameters (all logs were enabled). We are not able to reproduce the issue.
Comment 12 Marta Löfstedt 2017-10-10 13:19:06 UTC
*** Bug 103020 has been marked as a duplicate of this bug. ***
Comment 13 Jani Saarinen 2017-10-10 14:04:37 UTC
Lowering as not really blocker anymore.
Comment 14 Marta Löfstedt 2017-11-17 08:51:54 UTC
The issue hasn't been seen since  CI_DRM_3180: 2017-10-05

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.