Bug 89388 - [all bisected]system hang and BUG: unable to handle kernel NULL pointer dereference at 0000000000000084 when kill X
Summary: [all bisected]system hang and BUG: unable to handle kernel NULL pointer deref...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high critical
Assignee: Matt Roper
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-02 05:47 UTC by lu hua
Modified: 2017-10-06 14:31 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (119.37 KB, text/plain)
2015-03-02 05:47 UTC, lu hua
no flags Details
Xorg.0.log (15.19 KB, text/plain)
2015-03-02 05:48 UTC, lu hua
no flags Details

Description lu hua 2015-03-02 05:47:51 UTC
Created attachment 113904 [details]
dmesg

==System Environment==
--------------------------
Regression: Yes, regression on drm-intel-next-queued branch.
good commit: cda54fe1188d4900843c0616acab7fb9c2989eef
bad commit: fd2d61341bf39d1054256c07d6eddd624ebc4241

no-working platforms: all

Libdrm:         (master)libdrm-2.4.59-31-gf799a527db2851b2890146a9ce777f73fea30176
Mesa:           (master)b51ff50a767cc78d678ed3d2c25995f5c4194fea
Xf86_video_intel:   (master)2.99.917-164-g9fb815462902a1d2047e135cf5037f47eb0d83d2
Xserver:        (master)xorg-server-1.17.0-16-g3a06faf3fcdb7451125a46181f9152e8e59e9770
Libva:          (master)f9741725839ea144e9a6a1827f74503ee39946c3
Libva_intel_driver:   (master)e8fde1cdaafb93c2b54d6092a728d099ad7cdd11

==kernel==
--------------------------
drm-intel-nightly/855932144a48a66081a62288bea6f2bbbf48e2e7
commit 855932144a48a66081a62288bea6f2bbbf48e2e7
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Feb 27 20:02:52 2015 +0100

    drm-intel-nightly: 2015y-02m-27d-18h-58m-02s UTC integration manifest

==Bug detailed description==
-----------------------------
start X than kill X, system hang.

dmesg:
[   41.365871] BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
[   41.366821] IP: [<ffffffffa007fbc7>] ilk_update_wm+0x11c/0xa81 [i915]
[   41.367783] PGD 148dfb067 PUD 1444c7067 PMD 0
[   41.368736] Oops: 0000 [#1] SMP
[   41.369681] Modules linked in: dm_mod iTCO_wdt iTCO_vendor_support ppdev snd_hda_codec_hdmi pcspkr i2c_i801 snd_hda_intel lpc_ich snd_hda_controller mfd_core snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore battery parport_pc parport ac acpi_cpufreq i915 button video drm_kms_helper drm
[   41.371877] CPU: 1 PID: 4823 Comm: X Not tainted 4.0.0-rc1_drm-intel-nightly_855932_20150228+ #10
[   41.372996] task: ffff8800a44828f0 ti: ffff8801492c8000 task.ti: ffff8801492c8000
[   41.374129] RIP: 0010:[<ffffffffa007fbc7>]  [<ffffffffa007fbc7>] ilk_update_wm+0x11c/0xa81 [i915]
[   41.375283] RSP: 0018:ffff8801492cb978  EFLAGS: 00010246
[   41.376429] RAX: 0000000000000000 RBX: ffff880144660000 RCX: ffff8801444d1c00
[   41.377583] RDX: 0000000000021e1c RSI: 0000000000000008 RDI: ffff88014952e000
[   41.378734] RBP: ffff8800a7ca5000 R08: ffff8800a7ca5358 R09: ffff8801493fc4f0
[   41.379898] R10: 0000000000000003 R11: 0000000000003201 R12: ffff88014952e000
[   41.381061] R13: ffff8801492cba64 R14: 0000000000000000 R15: ffff8800a7ca5358
[   41.382221] FS:  00007fbe643058c0(0000) GS:ffff88014ec40000(0000) knlGS:0000000000000000
[   41.383398] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   41.384586] CR2: 0000000000000084 CR3: 0000000144c9e000 CR4: 00000000003406e0
[   41.385791] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   41.387002] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   41.388208] Stack:
[   41.389400]  0000000000000000 ffff880148ee3840 0000000000011220 0000000000000000
[   41.390625]  ffff880148ee3870 00000000492cb9e8 ffff880000000000 ffffffff810d2bf4
[   41.391859]  ffff8800a44828f0 0000000000011220 0000000000000000 0000000000000000
[   41.393098] Call Trace:
[   41.394333]  [<ffffffff810d2bf4>] ? mempool_alloc+0x52/0x11c
[   41.395594]  [<ffffffffa00cad73>] ? intel_begin_crtc_commit+0x12e/0x165 [i915]
[   41.396855]  [<ffffffffa0056986>] ? drm_atomic_helper_commit_planes+0x4e/0x143 [drm_kms_helper]
[   41.398142]  [<ffffffffa00e1c69>] ? intel_atomic_commit+0xa7/0xcd [i915]
[   41.399415]  [<ffffffffa005738d>] ? drm_atomic_helper_disable_plane+0xbf/0x111 [drm_kms_helper]
[   41.400700]  [<ffffffffa000cbce>] ? __setplane_internal+0x4a/0x295 [drm]
[   41.401988]  [<ffffffffa00181f8>] ? drm_modeset_lock_all_crtcs+0x69/0x81 [drm]
[   41.403283]  [<ffffffffa000f994>] ? drm_mode_setplane+0x17f/0x1dc [drm]
[   41.404580]  [<ffffffffa00047d4>] ? drm_ioctl+0x344/0x3b3 [drm]
[   41.405873]  [<ffffffffa000f815>] ? drm_mode_getplane+0xcb/0xcb [drm]
[   41.407167]  [<ffffffff811228d2>] ? dput+0x192/0x1aa
[   41.408458]  [<ffffffff8110baf3>] ? kmem_cache_free+0xe2/0x11a
[   41.409754]  [<ffffffff811228d2>] ? dput+0x192/0x1aa
[   41.411051]  [<ffffffff8111fa62>] ? do_vfs_ioctl+0x360/0x424
[   41.412350]  [<ffffffff810442ea>] ? __set_task_blocked+0x5d/0x64
[   41.413648]  [<ffffffff8104616d>] ? __set_current_blocked+0x2d/0x40
[   41.414949]  [<ffffffff8111fb6f>] ? SyS_ioctl+0x49/0x7a
[   41.416248]  [<ffffffff810462d7>] ? SyS_rt_sigprocmask+0x5f/0x94
[   41.417542]  [<ffffffff81798e92>] ? system_call_fastpath+0x12/0x17
[   41.418827] Code: 8b 44 24 70 be 08 00 00 00 89 94 24 98 00 00 00 48 8b 80 38 02 00 00 48 8b 40 10 c6 84 24 d8 00 00 00 04 c6 84 24 a9 00 00 00 01 <8b> 80 84 00 00 00 99 f7 fe 88 84 24 a8 00 00 00 8b 81 d0 01 00
[   41.420280] RIP  [<ffffffffa007fbc7>] ilk_update_wm+0x11c/0xa81 [i915]
[   41.421675]  RSP <ffff8801492cb978>
[   41.423056] CR2: 0000000000000084
[   41.434517] ---[ end trace 73cfb94e14dba7a1 ]---


==Reproduce steps==
---------------------------- 
1. clean boot system
2. xinit
3. pkill X
Comment 1 lu hua 2015-03-02 05:48:30 UTC
Created attachment 113905 [details]
Xorg.0.log

output:
root@x-bdw05:~# pkill X
root@x-bdw05:~# xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":0"
xinit: connection to X server lost

waiting for X server to shut down ..........
xinit: X server slow to shut down, sending KILL signal

waiting for server to die
Comment 2 Paulo Zanoni 2015-03-02 18:59:07 UTC
Can you please bisect this?
Comment 3 lu hua 2015-03-03 03:23:43 UTC
Bisect shows: fd2d61341bf39d1054256c07d6eddd624ebc4241 is the first bad commit
commit fd2d61341bf39d1054256c07d6eddd624ebc4241
Author:     Matt Roper <matthew.d.roper@intel.com>
AuthorDate: Fri Feb 27 10:12:01 2015 -0800
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Fri Feb 27 19:56:17 2015 +0100

    drm/i915: Use plane->state->fb in watermark code (v2)

    plane->fb is a legacy pointer that not always be up-to-date (or updated
    early enough).  Make sure the watermark code uses plane->state->fb so
    that we're always doing our calculations based on the correct
    framebuffers.

    This patch was generated by Coccinelle with the following semantic
    patch:

            @@
            struct drm_plane *P;
            @@
            - P->fb
            + P->state->fb

    v2: Rebase

    Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Revert this commit, this issue goes away.
Comment 4 Jesse Barnes 2015-03-03 19:55:21 UTC
Should be fixed by the revert of that patch that Daniel pushed today.  Can you verify?
Comment 5 Michael Leuchtenburg 2015-03-03 19:56:19 UTC
I get a very similar crash to this on BDW hardware when modeswitching; changing to console results in an oops in the same function. gdb claims the IP was in the line changed in fd2d613, 	p->pri.bytes_per_pixel = crtc->primary->state->fb->bits_per_pixel / 8;
so I suspect it's the same bug with a different trigger. If I revert the patch, I am able to change to console without a crash, though the mode in the console is not correct and causes the text to flicker in three locations.
Comment 6 Matt Roper 2015-03-04 02:18:02 UTC
I believe this should be fixed by these two patches:

http://patchwork.freedesktop.org/patch/43919/
http://patchwork.freedesktop.org/patch/43920/
Comment 7 lu hua 2015-03-04 03:21:15 UTC
(In reply to Matt Roper from comment #6)
> I believe this should be fixed by these two patches:
> 
> http://patchwork.freedesktop.org/patch/43919/
> http://patchwork.freedesktop.org/patch/43920/

Test on the latest -nightly kernel(commit a5217f77503a1089ae 2015y-03m-03d-08h-25m-29s) without these 2 patches, this issue still exists.
Apply these 2 patches on commit a5217f77503a1089ae, it works well.
Comment 8 wendy.wang 2015-03-10 05:55:07 UTC
Hello Matt,
Would you pls help merge up your patches for this bug?
Comment 9 Matt Roper 2015-03-10 15:38:06 UTC
(In reply to wendy.wang from comment #8)
> Hello Matt,
> Would you pls help merge up your patches for this bug?

We actually reworked the patches a bit, but Daniel just merged the last one this morning, so I think this should be fixed upstream; moving status to 'resolved.'

Please reopen this defect if you still see the bug on latest di-nightly.
Comment 10 lu hua 2015-03-11 08:29:49 UTC
Test on the latest -nightly kernel. It works well.
Comment 11 Elizabeth 2017-10-06 14:31:19 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.