93699 – [BAT BDW] sporadic WARNING: "Device suspended during HW access" during kms_pipe_crc_basic on pipe B/C

Bug 93699 - [BAT BDW] sporadic WARNING: "Device suspended during HW access" during kms_pipe_crc_basic on pipe B/C

Summary: [BAT BDW] sporadic WARNING: "Device suspended during HW access" during kms_pi...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	highest normal
Assignee:	Imre Deak
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-01-13 15:11 UTC by Daniel Vetter
Modified:	2017-07-24 22:43 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Daniel Vetter 2016-01-13 15:11:08 UTC

This seems to be a fairly sporadic issue on bdw-ultra. See long-term view for affected testcases:

/archive/results/CI_IGT_test/igt@kms_pipe_crc_basic@read-crc-pipe-b.html

/archive/results/CI_IGT_test/igt@kms_pipe_crc_basic@read-crc-pipe-c.html

I noticed this since it hit one of my own patch series. The testcase itself passes, it's just dmesg backtrace that's upsetting. Backtrace looks like

[  308.179223] ------------[ cut here ]------------
[  308.179292] WARNING: CPU: 1 PID: 6179 at drivers/gpu/drm/i915/intel_drv.h:1467 gen8_write32+0x230/0x2f0 [i915]()
[  308.179297] Device suspended during HW access
[  308.179300] Modules linked in: i915 ax88179_178a i2c_hid x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul cdc_ncm usbnet mii mei_me mei lpc_ich i2c_designware_platform i2c_designware_core e1000e sdhci_pci ptp pps_core sdhci_acpi sdhci mmc_core [last unloaded: i915]
[  308.179356] CPU: 1 PID: 6179 Comm: kms_pipe_crc_ba Tainted: G     U          4.4.0-gfxbench+ #1
[  308.179361] Hardware name: Intel Corporation Broadwell Client platform/Wilson Beach SDS, BIOS BDW-E2R1.86C.0095.R09.1410300006 10/30/2014
[  308.179365]  ffffffffa03b8db0 ffff880242d97908 ffffffff813df90c ffff880242d97950
[  308.179376]  ffff880242d97940 ffffffff810746e1 0000000000045110 ffff880089cf0000
[  308.179385]  ffff8800aa6d67b0 0000000008670a00 ffff880242d97a40 ffff880242d979a0
[  308.179394] Call Trace:
[  308.179406]  [<ffffffff813df90c>] dump_stack+0x4e/0x82
[  308.179414]  [<ffffffff810746e1>] warn_slowpath_common+0x81/0xc0
[  308.179419]  [<ffffffff81074767>] warn_slowpath_fmt+0x47/0x50
[  308.179466]  [<ffffffffa03226c0>] gen8_write32+0x230/0x2f0 [i915]
[  308.179491]  [<ffffffffa02d3988>] _ilk_disable_lp_wm+0x98/0xd0 [i915]
[  308.179517]  [<ffffffffa02d8ebf>] ilk_program_watermarks+0x3ef/0x880 [i915]
[  308.179544]  [<ffffffffa02d9582>] ilk_initial_watermarks+0x102/0x120 [i915]
[  308.179594]  [<ffffffffa0344c91>] intel_pre_plane_update+0x81/0xf0 [i915]
[  308.179642]  [<ffffffffa0345635>] intel_atomic_commit+0x525/0x17c0 [i915]
[  308.179652]  [<ffffffff8150dc45>] ? drm_atomic_check_only+0x145/0x660
[  308.179658]  [<ffffffff8150d728>] ? drm_atomic_set_crtc_for_connector+0x38/0xe0
[  308.179664]  [<ffffffff8150e192>] drm_atomic_commit+0x32/0x50
[  308.179673]  [<ffffffff814eb155>] drm_atomic_helper_set_config+0x75/0xb0
[  308.179680]  [<ffffffff814fd090>] drm_mode_set_config_internal+0x60/0x110
[  308.179687]  [<ffffffff81501e26>] drm_mode_setcrtc+0x186/0x4f0
[  308.179693]  [<ffffffff81183ef8>] ? __might_fault+0x48/0xa0
[  308.179700]  [<ffffffff814f3eed>] drm_ioctl+0x13d/0x590
[  308.179707]  [<ffffffff81501ca0>] ? drm_mode_setplane+0x1b0/0x1b0
[  308.179714]  [<ffffffff811d4c4c>] do_vfs_ioctl+0x2fc/0x550
[  308.179722]  [<ffffffff8179a8a6>] ? int_ret_from_sys_call+0x52/0x9f
[  308.179730]  [<ffffffff811e06ba>] ? __fget_light+0x6a/0x90
[  308.179735]  [<ffffffff811d4edc>] SyS_ioctl+0x3c/0x70
[  308.179741]  [<ffffffff8179a71b>] entry_SYSCALL_64_fastpath+0x16/0x73
[  308.179747] ---[ end trace d0b8db053feb462f ]---

Comment 1 Ville Syrjala 2016-01-13 15:29:32 UTC

I'd say it's the same problem as in bug #93698, namely that the modeset path calls intel_pre_plane_update() somewhat unconditionally. A quick glance suggests that we should just:

--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13831,9 +13831,8 @@ static int intel_atomic_commit(struct drm_device *dev,
                if (!needs_modeset(crtc->state))
                        continue;
 
-               intel_pre_plane_update(intel_crtc);
-
                if (crtc_state->active) {
+                       intel_pre_plane_update(intel_crtc);
                        intel_crtc_disable_planes(crtc, crtc_state->plane_mask);
                        dev_priv->display.crtc_disable(crtc);
                        intel_crtc->active = false;
@@ -13885,12 +13884,12 @@ static int intel_atomic_commit(struct drm_device *dev,
                        hw_check = true;
                }
 
-               if (!modeset)
-                       intel_pre_plane_update(intel_crtc);
-
                if (crtc->state->active &&
-                   (crtc->state->planes_changed || update_pipe))
+                   (crtc->state->planes_changed || update_pipe)) {
+                       if (!modeset)
+                               intel_pre_plane_update(intel_crtc);
                        drm_atomic_helper_commit_planes_on_crtc(crtc_state);
+               }
 
                if (put_domains)
                        modeset_put_power_domains(dev_priv, put_domains);

but it's bit of mess all around so it's hard to be sure without more digging.

Comment 2 Daniel Vetter 2016-01-19 18:52:34 UTC

Regression because it's a new WARNING causing all kinds of CI trouble.

Imre, please revert/tune down to debug level asap.

Comment 3 Imre Deak 2016-01-19 18:57:36 UTC

(In reply to Daniel Vetter from comment #2)
> Regression because it's a new WARNING causing all kinds of CI trouble.
> 
> Imre, please revert/tune down to debug level asap.

The assert that checks if the device is suspended during HW has been always around. My wakeref check patches added an additional check that checks for the wakeref reference count, which is a separate thing.

Comment 4 Daniel Vetter 2016-01-19 19:04:25 UTC

(In reply to Imre Deak from comment #3)
> (In reply to Daniel Vetter from comment #2)
> > Regression because it's a new WARNING causing all kinds of CI trouble.
> > 
> > Imre, please revert/tune down to debug level asap.
> 
> The assert that checks if the device is suspended during HW has been always
> around. My wakeref check patches added an additional check that checks for
> the wakeref reference count, which is a separate thing.

Sorry, mixed it up with another bug. We still need to remedy this somehow, since it's causing piles of noise in CI due to the WARN_ONCE.

Comment 5 Imre Deak 2016-02-10 11:42:11 UTC

I can't see this error any more on the CI results, so I assume something has fixed the actual problem. I consider a WARN or at least ERROR message still useful to catch regressions in the future, so until it causes problems for CI again I would leave it as-is and close this bug.

Daniel, if you disagree please reopen, but also consider using error instead of debug level.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.