Summary: | [SKL] igt/kms_plane/plane-position-covered-pipe-b-plane-2 fail and cause system crash | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Olivier Berthier <olivierx.berthier> | ||||||||||
Component: | DRM/Intel | Assignee: | cprigent <christophe.prigent> | ||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||
Severity: | blocker | ||||||||||||
Priority: | highest | CC: | intel-gfx-bugs, matthew.d.roper, patrik.r.jakobsson | ||||||||||
Version: | DRI git | ||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||
OS: | Linux (All) | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | SKL | i915 features: | display/atomic | ||||||||||
Attachments: |
|
Description
Olivier Berthier
2015-09-29 16:48:00 UTC
Bug scrub: Priority updated, this is a blocker Created attachment 118887 [details] dmesg log file on drm-intel-testing-2015-10-10 The bug is still reproduced on Sky Lake Y with the drm-intel-testing-2015-10-10 kernel and with the patches : http://lists.freedesktop.org/archives/intel-gfx/2015-August/074657.html http://lists.freedesktop.org/archives/intel-gfx/2015-August/074828.html. Setup: ------ kernel: drm-intel-testing 2015-10-10 c38f2c24fb6484fc6900efa6f8d968e8ee964e9c cairo: 1.14.2 93422b3cb5e0ef8104b8194c8873124ce2f5ea2d libdrm: 2.4.65 c3496167637e35cf8a52d5e7e53a412e79d80db0 intel-driver: 1.6.1 35858c69166b845c59ca32e19a3dbb0b758df209 libva: 1.6.1 613eb962b45fbbd1526d751e88e0d8897af6c0e0 mesa: 11.0.3 914966befcd57764941405707d8f57d3e7e7f768 xf86-video-intel: 2.99.917 baec802b21387d04aebb10ac29e719a1800c5aa0 xserver: 1.17.2 2123f7682d522619f101b05fb75efa75dabbe371 intel-gpu-tools: origin/master, origin/HEAD bfea74a9f64a900bcb90f946b38746781017449f I'm seeing the below WARN when testing with displayport connected, previously I would probably get the same hard hang as well. [ 68.099084] ------------[ cut here ]------------ [ 68.099101] WARNING: CPU: 0 PID: 1261 at drivers/gpu/drm/i915/intel_uncore.c:619 hsw_unclaimed_reg_debug+0x66/0x80 [i915]() [ 68.099102] Unclaimed register detected after writing to register 0x71240 [ 68.099122] CPU: 0 PID: 1261 Comm: kms_plane Tainted: G U W 4.3.0-rc3-patser+ #4398 [ 68.099124] Hardware name: Intel Corporation Skylake Client platform/Skylake DT DDR4 RVP8, BIOS SKLSE2R1.R00.B084.B02.1505180148 05/18/2015 [ 68.099125] ffffffffc01a8600 ffff880089dbb9c0 ffffffff812d3cfc ffff880089dbba08 [ 68.099127] ffff880089dbb9f8 ffffffff8107e2ad ffff880455b60000 0000000000071240 [ 68.099130] 0000000000071240 ffff880455b60080 0000000000000246 ffff880089dbba58 [ 68.099132] Call Trace: [ 68.099136] [<ffffffff812d3cfc>] dump_stack+0x4e/0x82 [ 68.099139] [<ffffffff8107e2ad>] warn_slowpath_common+0x7d/0xb0 [ 68.099141] [<ffffffff8107e327>] warn_slowpath_fmt+0x47/0x50 [ 68.099144] [<ffffffff810a417d>] ? get_parent_ip+0xd/0x50 [ 68.099159] [<ffffffffc0124eb6>] hsw_unclaimed_reg_debug+0x66/0x80 [i915] [ 68.099172] [<ffffffffc0129107>] gen9_write32+0x2a7/0x320 [i915] [ 68.099181] [<ffffffffc00e6803>] skl_update_wm+0x2e3/0x710 [i915] [ 68.099190] [<ffffffffc00e7bc9>] intel_update_watermarks+0x19/0x20 [i915] [ 68.099205] [<ffffffffc014841e>] intel_atomic_commit+0x48e/0x13d0 [i915] [ 68.099208] [<ffffffff813f3f2f>] ? drm_atomic_check_only+0x13f/0x5b0 [ 68.099211] [<ffffffff813f3a88>] ? drm_atomic_add_affected_connectors+0x88/0xf0 [ 68.099214] [<ffffffff813f43d2>] drm_atomic_commit+0x32/0x50 [ 68.099216] [<ffffffff813d2132>] drm_atomic_helper_set_config+0x72/0xb0 [ 68.099219] [<ffffffff813e440d>] drm_mode_set_config_internal+0x5d/0xf0 [ 68.099220] [<ffffffff813e84f3>] drm_mode_setcrtc+0x183/0x4c0 [ 68.099223] [<ffffffff813da43b>] drm_ioctl+0x12b/0x550 [ 68.099225] [<ffffffff813e8370>] ? drm_mode_setplane+0x1a0/0x1a0 [ 68.099228] [<ffffffff81285ea7>] ? ioctl_has_perm+0xa7/0xc0 [ 68.099230] [<ffffffff811a89fc>] do_vfs_ioctl+0x2fc/0x550 [ 68.099232] [<ffffffff81285f0b>] ? selinux_file_ioctl+0x4b/0xd0 [ 68.099235] [<ffffffff8127b4ae>] ? security_file_ioctl+0x3e/0x60 [ 68.099236] [<ffffffff811a8cc4>] SyS_ioctl+0x74/0x80 [ 68.099239] [<ffffffff817126d7>] entry_SYSCALL_64_fastpath+0x12/0x6a [ 68.099240] ---[ end trace a22947acc427d3eb ]--- It looks like the disabled watermarks get written after the power well is turned off. It should probably be done before crtc is turned off. I'll add Matt Roper to CC so he can take a look. (In reply to Maarten Lankhorst from comment #4) > It looks like the disabled watermarks get written after the power well is > turned off. It should probably be done before crtc is turned off. I'll add > Matt Roper to CC so he can take a look. Yeah, I see the same unclaimed register on BXT too. I don't think it's related to the atomic watermark changes as far as I can tell. Honestly I'm not terribly familiar with the details of power well handling. It seems to fix the issue if I apply changes like the patch below, but I'm not really sure that's the proper way to be handling it. Paulo can probably comment on whether this is right or not... diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index df22b9c..fdc8a4e 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -3619,8 +3619,15 @@ static void skl_update_wm(struct drm_crtc *crtc) results->dirty[intel_crtc->pipe] = true; skl_update_other_pipe_wm(dev, crtc, &config, results); + + intel_display_power_get(dev_priv, POWER_DOMAIN_PIPE_A); + intel_display_power_get(dev_priv, POWER_DOMAIN_PIPE_B); + intel_display_power_get(dev_priv, POWER_DOMAIN_PIPE_C); skl_write_wm_values(dev_priv, results); skl_flush_wm_values(dev_priv, results); + intel_display_power_put(dev_priv, POWER_DOMAIN_PIPE_C); + intel_display_power_put(dev_priv, POWER_DOMAIN_PIPE_B); + intel_display_power_put(dev_priv, POWER_DOMAIN_PIPE_A); /* store the new configuration */ dev_priv->wm.skl_hw = *results; I think you should just write watermark values before turning off the crtc if the crtc will end up being disabled. After that you should no longer touch the crtc specific registers since its turned off and dead. IIUC this is fixed by commit 1d337b286098c8e9057854ee59dff05f8ffa81e6 Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Thu Oct 22 13:56:34 2015 +0200 drm/i915/skl: Prevent unclaimed register writes on skylake. in drm-intel-fixes, headed for v4.3. If the problem persists with current -fixes or -nightly, please do reopen. Created attachment 119946 [details] dmesg log file on drm-intel-next-fixes-2015-11-06 Maybe it isn't linked to this bug, the test pass but still cause a system crash with external DP screen. Setup: ------ Hardware Platform: SKY LAKE Y A0 CPU : Intel(R) Core(TM) m5-6Y57 CPU @ 1.10GHz (family: 6, model: 78 stepping: 3) MCP : SKL-Y D0 2+2 (ou ULX-D1) QDF : QJA4 CPU : SKL D0 Chipset PCH: Sunrise Point LP C1 CRB : SKY LAKE Y LPDDR3 RVP3 CRB FAB2 Reworks : All Mandatories Software BIOS : SKLSE2R1.R00.B104.B01.1511110114 ME FW : 11.0.0.1191 Ksc (EC FW): 1.19 Linux : Ubuntu 15.04 64 bits Kernel : 4.3.0-rc5 drm-intel-next-fixes-2015-11-06 commit 816d2206f0f9953ca854e4ff1a2749a5cbd62715 Merge: d0baf92 1b0e3a0 Author: Dave Airlie <airlied@gmail.com> Date: Sat Nov 7 17:16:59 2015 +1000 Merge tag 'drm-intel-next-fixes-2015-11-06' of git://anongit.freedesktop.org/drm-in cairo: 1.14.2 drm: 2.4.65 vaapi/intel-driver: 1.6.1 vaapi/libva: 1.6.1 mesa: mesa-11.0.5 xf86-video-intel: 2.99.917 xserver: xorg-server-1.17.2 Intel GPU Tools: master bfea74a9f64a900bcb90f946b38746781017449f The hang and the wm_changed warning looks like two unrelated issues. On my SKL-Y the hang occurs only when using DP - Pipe B - Plane A+B. All other combinations I've tried so far works fine. Running with disable_power_well=0 makes no difference so I'm guessing watermarks is the problem here. Can you try out the latest -nightly as I wasn't able to reproduce the bug with our Skylake-Y. This patch that has recently landed into -nightly might have a positive effect https://patchwork.freedesktop.org/patch/66442/ Tried with drm-intel-nightly from yesterday (2015-11-30) but the system still hangs. CATERR_LED lights up if that is of any help. Didn't see any fifo underrun warnings this time so that might be an improvement at least. Is this still an issue? I haven't tested in a week but last time I looked I found that this is a regression and bisected it to: commit 942840371cde152fe57c15e0e8483b760e7763e3 Author: Matt Roper <matthew.d.roper@intel.com> Date: Mon Sep 21 17:21:48 2015 -0700 It seems that at some point we're unreferencing a framebuffer that is in use. (In reply to Patrik Jakobsson from comment #13) > I haven't tested in a week but last time I looked I found that this is a > regression and bisected it to: > commit 942840371cde152fe57c15e0e8483b760e7763e3 > Author: Matt Roper <matthew.d.roper@intel.com> > Date: Mon Sep 21 17:21:48 2015 -0700 > > It seems that at some point we're unreferencing a framebuffer that is in use. Has fb reference counting always been an issue since this bug was opened back in September or is that a relatively recent discovery? I just fixed a bug that could lead to refcount problems with http://patchwork.freedesktop.org/patch/68713/ although I feel like that bug was introduced too recently to be the original root cause here. I'm wondering if it could be multiple bugs contributing to the behavior seeing rather than a single bug. It seems to have been a problem since drm enabled atomic fbdev restore. I've rerun the bisect several times but unfortunately I now get varying results. So we have points where the problem is less visible after 9428403 (possibly due to i915 atomic changes) but it still looks like refcounting of fbs is the problem here. Is this not fixed with commit 7118fd9bd975a9f309323? Assigning to QA to verify whether fixed with commit 7118fd9bd975a9f309323 I just tested with latest nightly (7118fd9bd975a9f309323 included) but I still get the hang. Created attachment 121455 [details] 4.5-rc1_nuc-skly_kms-flip_kern.log I don't reproduce the crash (tested 10 times), test is success but with following log: ./kms_plane --run-subtest plane-position-covered-pipe-B-plane-2 IGT-Version: 1.13-NOT-GIT (x86_64) (Linux: 4.5.0-rc1-nightly+ x86_64) Testing connector HDMI-A-1 using pipe B plane 2 Testing connector DP-1 using pipe B plane 2 Subtest plane-position-covered-pipe-B-plane-2: SUCCESS (0.746s) [ 33.164446] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun [ 33.178989] [drm:intel_atomic_commit [i915]] *ERROR* mismatch in dpll_hw_state.cfgcr1 (expected 0x80400173, found 0x000003a5) [ 33.179006] [drm:intel_atomic_commit [i915]] *ERROR* mismatch in base.adjusted_mode.crtc_clock (expected 148500, found 168400) [ 33.179020] [drm:intel_atomic_commit [i915]] *ERROR* mismatch in port_clock (expected 148500, found 168400) Hardware Platform: NUC6i3SYH CPU: Intel(R) Core(TM) i3-6100U CPU @ 2.30GHZ (family 6, model 78, stepping 3) Motherboard version: H81132-502 GPU: IntelĀ® HD Graphics 520 - Intel Corporation Sky Lake Integrated Graphics (rev 07) Memory: one 8GB card Kingston KVR21S15D8/8 SSD: Samsung 850 EVO M.2 120 Go Software Bios: SYSKLi35.86A.0024.2015.1027.2142 Linux distribution: Ubuntu 15.10 64 bits Kernel: drm-intel-nightly 4.5.0-rc1 5d3deb0 from http://cgit.freedesktop.org/drm-intel commit 5d3deb0902a962218ad9b0e583e4d1bbdec29f9a Author: Rodrigo Vivi <rodrigo.vivi@intel.com> Date: Mon Feb 1 12:05:18 2016 -0800 drm-intel-nightly: 2016y-02m-01d-20h-05m-03s UTC integration manifest drm: tag libdrm-2.4.66 e342c0f from http://cgit.freedesktop.org/mesa/drm/ mesa: tag mesa-11.0.8 261daab from http://cgit.freedesktop.org/mesa/mesa/ cairo: tag 1.15.2 db8a7f1 from http://cgit.freedesktop.org/cairo waffle: master bb29b2a from https://github.com/waffle-gl/waffle xorg-server-macros: master d7acec2 from git://git.freedesktop.org/git/xorg/util/macros libva: tag libva-1.6.1 cb418f6 from http://cgit.freedesktop.org/libva/ vaapi-intel-driver: tag 1.6.1 2110b3a from http://cgit.freedesktop.org/vaapi/intel-driver I still see the hang with nightly 2016-02-10. System info Platform: Skylake Y RVP3 D0/C1 CPU: Intel(R) Core(TM) m7-6Y75 CPU @ 1.20GHz BIOS: 94.4 (SKLSE2R1.R00.B094.B04.1508102148) EC: 1.15 Monitors: eDP: 3200x1800 DP: 1600x900 I know I'm on a rather old BIOS and EC. Will upgrade to see if that makes a difference. Reassign to Patrik Jakobsson to check with last Bios and EC. The following patch by Matt fixes the problem on my machine (Note that it's not yet merged). https://patchwork.freedesktop.org/patch/79170/ Assigning to QA for verification This test is passing under the following configuration Software configuration ======================= Linux distribution: Ubuntu 15.10 64 bits Kernel: drm-intel-nightly 4.6.0-rc3_d9131d6 from http://cgit.freedesktop.org/drm-intel/ commit d9131d62d18ba94fb3ca019f1156c22b5f4ce23c Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Fri Apr 15 14:54:26 2016 +0100 drm-intel-nightly: 2016y-04m-15d-13h-53m-44s UTC integration manifestdrm: tag libdrm-2.4.66-33-gf884af9 libdrm 2.4.67-25 cc9a53f from git://git.freedesktop.org/git/mesa/drm mesa 11.1.2 7bcd827 from git://git.freedesktop.org/git/mesa/mesa cairo 1.15.2 db8a7f1 from git://git.freedesktop.org/git/cairo xorg/xserver 1.18.0-274 8437955 from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel 2.99.917-634 81029be from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel vaapi/libva 1.7.0-1 2339d10 from git://git.freedesktop.org/git/vaapi/libva vaapi/intel-driver 1.7.0-8 2c1bec0 from git://git.freedesktop.org/git/vaapi/intel-driver intel-gpu-tool 1.14 7bd2ac6 from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git rendercheck master 44032a7 from http://anongit.freedesktop.org/git/xorg/app/rendercheck.git test output ============== #./kms_plane --run-subtest plane-position-covered-pipe-B-plane-2 IGT-Version: 1.14-g41a26b5 (x86_64) (Linux: 4.6.0-rc3-nightly+ x86_64) Testing connector eDP-1 using pipe B plane 2 Testing connector HDMI-A-1 using pipe B plane 2 Subtest plane-position-covered-pipe-B-plane-2: SUCCESS (3.746s) |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.