106678 – [CI] igt@* - dmesg-warn/fail - *ERROR* Potential atomic update failure on pipe [ABC]

Bug 106678 - [CI] igt@* - dmesg-warn/fail - *ERROR* Potential atomic update failure on pipe [ABC]

Summary: [CI] igt@* - dmesg-warn/fail - *ERROR* Potential atomic update failure on pip...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	high normal
Assignee:	Dhinakaran Pandiyan
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2018-05-28 07:46 UTC by Martin Peres
Modified:	2018-08-07 08:12 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:	CFL, KBL
i915 features:	display/PSR

Attachments

Description Martin Peres 2018-05-28 07:46:19 UTC

Starting from drmtip_50, we got over a thousand "*ERROR* Potential atomic update failure on pipe *" in dmesg. This is a pretty serious regression :s

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_50/fi-cfl-s3/igt@kms_cursor_crc@cursor-128x42-offscreen.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_50/fi-cfl-u/igt@kms_cursor_crc@cursor-256x256-suspend.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_50/fi-kbl-7560u/igt@perf_pmu@idle-no-semaphores-bcs0.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_50/fi-kbl-r/igt@perf_pmu@idle-no-semaphores-bcs0.html

[  158.366252] [drm:intel_pipe_update_start [i915]] *ERROR* Potential atomic update failure on pipe A

Comment 1 Maarten Lankhorst 2018-05-28 08:34:13 UTC

Differences in drm-misc-fixes:

$ git shortlog 2b6207291b7b277a5df9d1aab44b56815a292dba...2bc5ff0bdc00d81d719dad74589317a260d583ed
Dhinakaran Pandiyan (1):
      drm/psr: Fix missed entry in PSR setup time table.

Tomi Valkeinen (1):
      drm/omap: fix NULL deref crash with SDI displays

Differences in drm-intel-fixes:
$ git shortlog 771c577c23bac90597c685971d7297ea00f99d11...57ebdafc306af9decd893b4cb11bd834a7e27ed1
Chris Wilson (2):
      drm/i915/lvds: Move acpi lid notification registration to registration phase
      drm/i915/query: Protect tainted function pointer lookup

Ondrej Zary (1):
      drm/i915: Disable LVDS on Radiant P845

Ville Syrjälä (1):
      drm/i915: Restore planes after load detection

Differences in drm-misc-next:
$ git shortlog 3c5f134ac9d0e405a15af652c3ce8cbaa9bf1bc7...2edd4e698dc8a0c497a502c75561c87be0e8a9a6
Chris Wilson (4):
      drm/mm: Reject over-sized allocation requests early
      drm/mm: Add a search-by-address variant to only inspect a single hole
      drm/i915: Limit searching for PIN_HIGH
      drm/i915: Pin the ring high

Souptick Joarder (1):
      gpu: drm: vgem: Change return type to vm_fault_t

Differences in drm-intel-next-queued:
$ git shortlog c894d63c6b36de20f0248d88801be5ace8e6bee8...09a4c02e58c1b3d9748f78242962b7f63c68477e
Chris Wilson (1):
      drm/i915: Look for an active kernel context before switching

Dhinakaran Pandiyan (6):
      drm/i915/psr: Nuke PSR support for VLV and CHV
      drm/i915/psr: Avoid DPCD reads when panel does not support PSR
      drm/i915/psr: Check for SET_POWER_CAPABLE bit at PSR init time.
      drm/i915/psr: Avoid unnecessary DPCD read of DP_PSR_CAPS
      drm/i915/psr: Fall back to max. synchronization latency if DPCD read fails
      drm/i915/psr: Fix ALPM cap check for PSR2

Vathsala Nagaraju (1):
      drm/i915/psr: vbt change for psr

Yunwei Zhang (3):
      drm/i915/cnl: Implement WaProgramMgsrForCorrectSliceSpecificMmioReads
      drm/i915/icl: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads
      drm/i915: Implement WaProgramMgsrForL3BankSpecificMmioReads


Are the failures on PSR capable systems by any chance?

Comment 2 Maarten Lankhorst 2018-05-28 10:05:36 UTC

Seems to be the case, all the systems have PSR in common. Does the issue trigger on drm-intel-next-queued runs as well?

Comment 3 Tomi Sarvela 2018-05-28 12:01:23 UTC

We don't really know, because shards don't have PSR panels, and there is no "shard-run" for anything else than DRM-Tip.

Comment 4 Jani Nikula 2018-05-29 10:38:14 UTC

Trimmed list of suspects based on gut feeling.

(In reply to Maarten Lankhorst from comment #1)
> Dhinakaran Pandiyan (1):
>       drm/psr: Fix missed entry in PSR setup time table.
> 
> Ville Syrjälä (1):
>       drm/i915: Restore planes after load detection
> 
> Dhinakaran Pandiyan (6):
>       drm/i915/psr: Nuke PSR support for VLV and CHV
>       drm/i915/psr: Avoid DPCD reads when panel does not support PSR
>       drm/i915/psr: Check for SET_POWER_CAPABLE bit at PSR init time.
>       drm/i915/psr: Avoid unnecessary DPCD read of DP_PSR_CAPS
>       drm/i915/psr: Fall back to max. synchronization latency if DPCD read
> fails
>       drm/i915/psr: Fix ALPM cap check for PSR2
> 
> Vathsala Nagaraju (1):
>       drm/i915/psr: vbt change for psr

Comment 5 Dhinakaran Pandiyan 2018-05-29 17:56:38 UTC

(In reply to Martin Peres from comment #0)
> Starting from drmtip_50, 

@martin,
What kind of runs are these? I don't see frontbuffer_tracking: fbcpsr* tests being part of fastfeedback. And isn't the full suite executed only on shards?

we got over a thousand "*ERROR* Potential atomic
> update failure on pipe *" in dmesg. This is a pretty serious regression :s
>

Comment 6 Martin Peres 2018-05-29 18:25:04 UTC

(In reply to Dhinakaran Pandiyan from comment #5)
> (In reply to Martin Peres from comment #0)
> > Starting from drmtip_50, 
> 
> @martin,
> What kind of runs are these? I don't see frontbuffer_tracking: fbcpsr* tests
> being part of fastfeedback. And isn't the full suite executed only on shards?

These are the runs with the shards machine's testlist (CI-Full), but executed during the idle time of all the other machines. We get about 4 to 6 of these runs per week.

Comment 7 Dhinakaran Pandiyan 2018-05-29 23:52:11 UTC

"drm/i915/psr: vbt change for psr" changed the exit link training time from 500 us to 2.5 ms on these machines. The frame counter is possibly stuck for a longer duration now and pipe_update_start() is not aware that the counter is stuck and warns.

We've been discussing this problem for some time now and the VBT change appears to have made it more likely to occur.

Related discussion can be found in the April email archives under:
"[Intel-gfx] [RFC] drm/i915: Rework "Potential atomic update error" to handle PSR exit" 

I wish this was caught in pre-merge instead of these drm-tip runs.

Comment 8 Jani Saarinen 2018-05-30 06:50:57 UTC

There was not PSR panels on shards.

Comment 9 Martin Peres 2018-06-19 13:55:47 UTC

(In reply to Dhinakaran Pandiyan from comment #7)
> "drm/i915/psr: vbt change for psr" changed the exit link training time from
> 500 us to 2.5 ms on these machines. The frame counter is possibly stuck for
> a longer duration now and pipe_update_start() is not aware that the counter
> is stuck and warns.
> 
> We've been discussing this problem for some time now and the VBT change
> appears to have made it more likely to occur.
> 
> Related discussion can be found in the April email archives under:
> "[Intel-gfx] [RFC] drm/i915: Rework "Potential atomic update error" to
> handle PSR exit" 
> 
> I wish this was caught in pre-merge instead of these drm-tip runs.

The failure is still happening... If making a patch to fix this issue is taking too long, why has this patch not been reverted yet?

We need to be more aggressive at keeping the bug count low...

Comment 10 Dhinakaran Pandiyan 2018-07-14 21:48:39 UTC

Fixes merged to drm-tip:
c3d433617d20 drm/i915: Use crtc_state->has_psr instead of CAN_PSR for pipe update
a608987970b9 drm/i915: Wait for PSR exit before checking for vblank evasion

Marking the bug resolved, please re-open if there are "Potential atomic update failure on pipe A" errors on *PSR* machines.

Comment 11 Francesco Balestrieri 2018-08-04 09:25:05 UTC

Martin, OK to close?

Comment 12 Francesco Balestrieri 2018-08-07 08:12:34 UTC

Not seen in cibuglogger, closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.