Bug 88909

Summary: [all bisected ]igt/kms_flip/bo-too-big-interruptible causes system hang
Product: DRI Reporter: Ding Heng <hengx.ding>
Component: DRM/IntelAssignee: Matt Roper <matthew.d.roper>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: daniel, intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg none

Description Ding Heng 2015-02-02 06:25:40 UTC
Created attachment 113032 [details]
dmesg

==System Environment==
--------------------------
Regression: yes
Non-working platforms:  BYT
==kernel==
--------------------------
drm-intel-nightly/8b4216f91c7bf8d3459cadf9480116220bd6545e(2015-02-02)

==Bug detailed description==
-----------------------------
igt/kms_flip/bo-too-big-interruptible fail, segment fault, abnormal output in dmesg.
run igt/kms_flip/bo-too-big will cause similar output in dmesg.

./kms_flip --run-subtest bo-too-big-interruptible
IGT-Version: 1.9-g51d87b8 (x86_64) (Linux: 3.19.0-rc6_drm-intel-nightly_8b4216_20150202+ x86_64)
Using monotonic timestamps
Beginning bo-too-big-interruptible on crtc 20, connector 38
  1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780
Segmentation fault

==Reproduce steps==
---------------------------- 
1. ./kms_flip --run-subtest bo-too-big-interruptible
Comment 1 Ding Heng 2015-02-02 06:29:19 UTC
a679064a7e9e8799177a64a31668a34a1bc6a4f1 is the first bad commit.
commit a679064a7e9e8799177a64a31668a34a1bc6a4f1
Author:     Matt Roper <matthew.d.roper@intel.com>
AuthorDate: Fri Jan 30 16:22:37 2015 -0800
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Sat Jan 31 10:35:46 2015 +0100

    drm/i915: Switch planes from transitional helpers to full atomic helpers
    
    There are two sets of helper functions provided by the DRM core that can
    implement the .update_plane() and .disable_plane() hooks in terms of a
    driver's atomic entrypoints.  The transitional helpers (which we have
    been using so far) create a plane state and then use the plane's atomic
    entrypoints to perform the atomic begin/check/prepare/commit/finish
    sequence on that single plane only.  The full atomic helpers create a
    top-level atomic state (which is capable of holding multiple object
    states for planes, crtc's, and/or connectors) and then passes the
    top-level atomic state through the full "atomic modeset" pipeline.
    
    Switching from the transitional to full helpers here shouldn't result in
    any functional change, but will enable us to exercise/test more of the
    internal atomic pipeline with the legacy API's used by existing
    applications.
    
    Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 2 lu hua 2015-02-02 08:52:12 UTC
It impacts all platforms. I test it on SNB and PNV, system hang. They have the same bisect commit.
Test on SNB, it takes more tan 10 minutes and doesn't exit testing. ctrl+c, it also doesn't exit testing. I can connect it via ssh and reboot.
Test on PNV, system is no response.
output(SNB)
IGT-Version: 1.9-g51d87b8 (x86_64) (Linux: 3.19.0-rc5_kcloud_a67906_20150202+ x86_64)
Using monotonic timestamps
Beginning bo-too-big on crtc 19, connector 25
  1600x900 60 1600 1664 1706 1970 900 903 906 912 0xa 0x48 107800
^C^C^C^C^C^C^C^C^C^C

output on PNV:
IGT-Version: 1.9-g51d87b8 (i686) (Linux: 3.19.0-rc6_drm-intel-nightly_8b4216_20150201+ i686)
Using monotonic timestamps
Beginning bo-too-big on crtc 22, connector 23
  1024x600 60 1024 1072 1104 1200 600 603 609 625 0xa 0x48 45000

Message from syslogd@x-pnv2 at Feb  2 16:12:53 ...
 kernel:[20110.832007] CPU: 0 PID: 1596 Comm: kms_flip Tainted: G        W      3.19.0-rc6_drm-intel-nightly_8b4216_20150201+ #90

Message from syslogd@x-pnv2 at Feb  2 16:12:53 ...
 kernel:[20110.832007] Hardware name: MICRO-STAR INTERNATIONAL CO., LTD MS-N014/MS-N014, BIOS EN014IMS.10B 11/30/2009

Message from syslogd@x-pnv2 at Feb  2 16:12:53 ...
 kernel:[20110.832007] task: f634e300 ti: f08f6000 task.ti: f08f6000

Message from syslogd@x-pnv2 at Feb  2 16:12:53 ...
 kernel:[20110.832007] Stack:

Message from syslogd@x-pnv2 at Feb  2 16:12:53 ...
 kernel:[20110.832007] Call Trace:

Message from syslogd@x-pnv2 at Feb  2 16:12:53 ...
 kernel:[20110.832007] Code: cb 38 f8 68 b1 10 00 00 68 02 c2 38 f8 e8 32 3a d1 c8 83 c4 0c eb c7 5a 5b 5e c3 56 53 89 c3 56 e8 3d ff ff ff 85 c0 89 c6 75 02 <0f> 0b 8a 40 60 88 44 24 02 24 0f 88 44 24 03 75 02 0f 0b 8b 43

Message from syslogd@x-pnv2 at Feb  2 16:12:53 ...
 kernel:[20110.832007] EIP: [<f831f24a>] i915_gem_object_ggtt_unpin+0x10/0x61 [i915] SS:ESP 0068:f08f7cc8
Comment 3 Jani Nikula 2015-02-03 12:38:58 UTC
Matt, any ideas?
Comment 4 Matt Roper 2015-02-03 21:11:35 UTC
(In reply to Jani Nikula from comment #3)
> Matt, any ideas?

I think this should fix it:
  http://patchwork.freedesktop.org/patch/41682/
Comment 5 Jani Nikula 2015-02-04 12:59:43 UTC
commit 706dc7b549175e47f23e913b7f1e52874a7d0f56
Author: Matt Roper <matthew.d.roper@intel.com>
Date:   Tue Feb 3 13:10:04 2015 -0800

    drm/i915: Ensure plane->state->fb stays in sync with plane->fb

Please reopen if the problem persists.
Comment 6 Ding Heng 2015-02-06 06:04:49 UTC
In reply to Jani Nikula from comment #5)
> commit 706dc7b549175e47f23e913b7f1e52874a7d0f56
Author: Matt Roper
> <matthew.d.roper@intel.com>
Date:   Tue Feb 3 13:10:04 2015 -0800

   
> drm/i915: Ensure plane->state->fb stays in sync with plane->fb

Please
> reopen if the problem persists.

Pass on this commit. change state to verified.
Comment 7 Jani Nikula 2015-03-13 13:43:06 UTC
Also fixed in drm-intel-fixes for v4.0-rc by

commit 2dccc9898d45cd552f372c3f0b4a7f42126312f1
Author: Xi Ruoyao <xry111@outlook.com>
Date:   Thu Mar 12 20:16:32 2015 +0800

    drm/i915: Ensure plane->state->fb stays in sync with plane->fb
Comment 8 Andreas Reis 2015-03-18 22:11:24 UTC
This fix causes my Haswell to fail to boot (it instantly freezes on udev's "starting version 219" message) when both of two HDMI monitors are connected.

If only one is connected (during the entire boot process, ie. it still freezes if the second one is disconnected only after the power-on) the kernel boots as usual.
Comment 9 Josh Boyer 2015-03-23 19:02:40 UTC
I'm also having issues with this in 4.0-rc5 on a headless NUC machine as reported on lkml and intel-gfx.  It looks like the reference counting is at a minimum causing kref_get to be very unhappy, and then things go downhill from there.  I reverted 319c1d420a0b62 from 4.0-rc5 and the machine boots fine then.
Comment 10 Andreas Reis 2015-03-29 21:16:00 UTC
Everything back to normal for me with drm-intel-fixes-2015-03-26.
Comment 11 Matt Roper 2015-04-01 22:32:18 UTC
Andreas confirmed this is fixed on drm-intel-fixes and I believe -rc6 should also be fixed now.  Please re-open if anyone still has problems.
Comment 12 Ding Heng 2015-04-02 02:58:35 UTC
(In reply to Andreas Reis from comment #10)
> Everything back to normal for me with drm-intel-fixes-2015-03-26.

Pass on nightly latest branch and fix branch. close this bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.