Bug 90067 - [HSW Bisected]HSW boot fail
Summary: [HSW Bisected]HSW boot fail
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: highest blocker
Assignee: Damien Lespiau
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-17 07:50 UTC by lu hua
Modified: 2017-10-06 14:30 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (66.13 KB, text/plain)
2015-04-17 07:50 UTC, lu hua
no flags Details
dmesg info with two patches after system boot up (64.91 KB, text/plain)
2015-04-29 03:38 UTC, ye.tian
no flags Details

Description lu hua 2015-04-17 07:50:34 UTC
Created attachment 115150 [details]
dmesg

==System Environment==
--------------------------
Regression: Yes

good commit: c7240c3bc5d6610b42dbb10fda71bbbf1dad5515
bad commit: 097f8261ddda4b1896dd335dec95dedeecfeaa1b

Non-working platforms: BDW

==kernel==
--------------------------
drm-intel-nightly/d600654ab94b325f253e267422dcf60302120ea0
commit d600654ab94b325f253e267422dcf60302120ea0
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Apr 16 17:54:10 2015 +0200

    drm-intel-nightly: 2015y-04m-16d-15h-53m-28s UTC integration manifest

==Bug detailed description==
-----------------------------
System boot fail, it happens on HSW with drm-intel-nightly kernel and drm-intel-next-queued kernel.
Test two machines, one machine can log in, but run xinit or reboot, system is no response, one machine can't log in.

dmesg:
[   28.578707] BUG: unable to handle kernel paging request at ffff8800f4369193
[   28.662009] IP: [<ffffffffa00ed510>] intel_prepare_ddi+0x5d/0x308 [i915]
[   28.742222] PGD 33d7067 PUD 33db067 PMD 0 
[   28.791193] Oops: 0000 [#1] SMP 
[   28.829750] Modules linked in: i915(+) button video drm_kms_helper drm
[   28.907910] CPU: 7 PID: 1375 Comm: udevd Not tainted 4.0.0_drm-intel-nightly_d60065_20150417+ #234
[   29.015147] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A03 09/17/2013
[   29.101560] task: ffff8800db9920c0 ti: ffff8800dae28000 task.ti: ffff8800dae28000
[   29.191099] RIP: 0010:[<ffffffffa00ed510>]  [<ffffffffa00ed510>] intel_prepare_ddi+0x5d/0x308 [i915]
[   29.300448] RSP: 0018:ffff8800dae2b958  EFLAGS: 00010283
[   29.363968] RAX: ffff8800db373f08 RBX: ffff88011953b000 RCX: 000000001953d800
[   29.449353] RDX: 0000000080000007 RSI: 0000000000000282 RDI: ffff8800dae2b998
[   29.534730] RBP: ffff8800daf60000 R08: 0000000000000004 R09: 0000000000000000
[   29.620105] R10: 0000000000000001 R11: 0000000000000003 R12: ffff88011953b060
[   29.705475] R13: ffff8800db373f00 R14: ffff88011953b000 R15: 000000001953d800
[   29.790848] FS:  00007f71429bc840(0000) GS:ffff88011ebc0000(0000) knlGS:0000000000000000
[   29.887669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   29.956376] CR2: ffff8800f4369193 CR3: 00000000db885000 CR4: 00000000001406e0
[   30.041751] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   30.127130] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   30.212502] Stack:
[   30.236459]  ffff8800daf60000 ffff88011953b340 ffff8800daf60140 0000000000000001
[   30.325006]  0000000000000001 ffffffffa00bb256 ffff8800daf60000 0000000000f60080
[   30.413551]  0000000000000282 ffff88011953b000 ffff8800daf60000 ffff88011953b060
[   30.502086] Call Trace:
[   30.531262]  [<ffffffffa00bb256>] ? __intel_uncore_forcewake_put+0x8e/0xa5 [i915]
[   30.620815]  [<ffffffffa00da14b>] ? intel_modeset_init_hw+0x9/0x2c [i915]
[   30.702041]  [<ffffffffa00dc28a>] ? intel_modeset_gem_init+0x77/0x13b [i915]
[   30.786379]  [<ffffffffa010667a>] ? i915_driver_load+0xf9e/0x119c [i915]
[   30.866539]  [<ffffffffa00076f5>] ? drm_dev_register+0x73/0xe5 [drm]
[   30.942535]  [<ffffffffa00098b4>] ? drm_get_pci_dev+0xf7/0x1b3 [drm]
[   31.018535]  [<ffffffff8135fd01>] ? local_pci_probe+0x35/0x79
[   31.087248]  [<ffffffff8135fe10>] ? pci_device_probe+0xcb/0xef
[   31.157007]  [<ffffffff813ea0e6>] ? driver_probe_device+0x9c/0x1d1
[   31.230929]  [<ffffffff813ea2a3>] ? __driver_attach+0x53/0x73
[   31.299650]  [<ffffffff813ea250>] ? __device_attach+0x35/0x35
[   31.368369]  [<ffffffff813e89b9>] ? bus_for_each_dev+0x6e/0x78
[   31.438132]  [<ffffffff813e99b4>] ? bus_add_driver+0x101/0x1cb
[   31.507884]  [<ffffffff813ea899>] ? driver_register+0x83/0xbb
[   31.576600]  [<ffffffffa0151000>] ? 0xffffffffa0151000
[   31.638032]  [<ffffffff810002fd>] ? do_one_initcall+0xe2/0x161
[   31.707794]  [<ffffffff81109093>] ? kmem_cache_alloc_trace+0x2a/0xfb
[   31.783809]  [<ffffffff81792459>] ? do_init_module+0x55/0x1b5
[   31.852538]  [<ffffffff8108f2bd>] ? load_module+0x1479/0x1951
[   31.921278]  [<ffffffff8108ce10>] ? store_uevent+0x36/0x36
[   31.986887]  [<ffffffff8179cb62>] ? page_fault+0x22/0x30
[   32.050416]  [<ffffffff8108f839>] ? SyS_init_module+0xa4/0xd3
[   32.119143]  [<ffffffff8179b0f2>] ? system_call_fastpath+0x12/0x17
[   32.193073] Code: 00 00 4c 8d 68 f8 49 8d 86 40 03 00 00 48 89 44 24 08 49 8d 45 08 48 3b 44 24 08 0f 84 a6 02 00 00 45 8b bd d8 00 00 00 44 89 f9 <80> 7c 0c 3b 00 0f 85 84 02 00 00 49 8b 5e 28 49 63 c7 8a 84 43 
[   32.419751] RIP  [<ffffffffa00ed510>] intel_prepare_ddi+0x5d/0x308 [i915]
[   32.501009]  RSP <ffff8800dae2b958>
[   32.542666] CR2: ffff8800f4369193
[   32.582245] ---[ end trace 933ad1f5dc5c917f ]---

==Reproduce steps==
---------------------------- 
1. clean boot system.
Comment 1 lu hua 2015-04-17 08:15:22 UTC
Bisect shows: b403745c84592b26a0713e6944c2b109f6df5c82 is the first bad commit
commit b403745c84592b26a0713e6944c2b109f6df5c82
Author:     Damien Lespiau <damien.lespiau@intel.com>
AuthorDate: Mon Aug 4 22:01:33 2014 +0100
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Thu Apr 16 11:42:38 2015 +0200

    drm/i915: Iterate through the initialized DDIs to prepare their buffers

    Not every DDIs is necessarily connected can be strapped off and, in the
    future, we'll have platforms with a different number of default DDI
    ports. So, let's only call intel_prepare_ddi_buffers() on DDI ports that
    are actually detected.

    We also use the opportunity to give a struct intel_digital_port to
    intel_prepare_ddi_buffers() as we'll need it in a following patch to
    query if the port supports HMDI or not.

    On my HSW machine this removes the initialization of a couple of
    (unused) DDIs.

    Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
    Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 2 Imre Deak 2015-04-17 10:42:23 UTC
We should dereference intel_digital_port only for encoders with a digital port. This breaks on HSW for the VGA encoder. Will follow up with a fix.
Comment 3 Imre Deak 2015-04-17 12:11:33 UTC
Could you try the following? :
http://lists.freedesktop.org/archives/intel-gfx/2015-April/064973.html
Comment 4 lu hua 2015-04-20 02:05:56 UTC
(In reply to Imre Deak from comment #3)
> Could you try the following? :
> http://lists.freedesktop.org/archives/intel-gfx/2015-April/064973.html

Fixed by this patch.
Comment 5 wendy.wang 2015-04-24 07:49:54 UTC
Hello Imre,

Would you pls help push your patch merged up, as this bug has blocked latest kernel testing on hsw platform.

Thanks.
Comment 6 Jani Nikula 2015-04-28 13:18:33 UTC
(In reply to lu hua from comment #4)
> (In reply to Imre Deak from comment #3)
> > Could you try the following? :
> > http://lists.freedesktop.org/archives/intel-gfx/2015-April/064973.html
> 
> Fixed by this patch.

Please re-test with patches

http://patchwork.freedesktop.org/patch/47388/
http://patchwork.freedesktop.org/patch/47389/
Comment 7 ye.tian 2015-04-29 03:34:39 UTC
(In reply to Jani Nikula from comment #6)
> (In reply to lu hua from comment #4)
> > (In reply to Imre Deak from comment #3)
> > > Could you try the following? :
> > > http://lists.freedesktop.org/archives/intel-gfx/2015-April/064973.html
> > 
> > Fixed by this patch.
> 
> Please re-test with patches
> 
> http://patchwork.freedesktop.org/patch/47388/
> http://patchwork.freedesktop.org/patch/47389/

Re-test on the latest nightly kernel(9a4da5ec4)with two patches,
1,The machine can boot up the first time.
2,The machine can't reboot after step 1. 
3,Cold boot success need try at least 3 times.
Comment 8 ye.tian 2015-04-29 03:38:03 UTC
Created attachment 115424 [details]
dmesg info with two patches after system boot up
Comment 9 Imre Deak 2015-04-29 10:01:40 UTC
(In reply to ye.tian from comment #8)
> Created attachment 115424 [details]
> dmesg info with two patches after system boot up

Looks like an independent issue, also reported at bug 90229. Since the original bug fixed by these patches are blocking other people, I would suggest disabling the sound driver and retesting the two patches.
Comment 10 ye.tian 2015-04-30 06:46:28 UTC
(In reply to Imre Deak from comment #9)
> (In reply to ye.tian from comment #8)
> > Created attachment 115424 [details]
> > dmesg info with two patches after system boot up
> 
> Looks like an independent issue, also reported at bug 90229. Since the
> original bug fixed by these patches are blocking other people, I would
> suggest disabling the sound driver and retesting the two patches.



Re-test on the latest nightly kernel(e53e6002)with two patches and disabling the sound driver.
1,The machine can boot up and reboot success.
2,Start X and glxinfo cause GPU hang.


output:
=====================

[  143.760583] [drm] stuck on render ring
[  143.761326] [drm] GPU HANG: ecode 7:0:0xabeff7fb, in glxinfo [4009], reason: Ring hun                                                                                g, action: reset
[  143.761327] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, incl                                                                                uding userspace.
[  143.761328] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI                                                                                 -> DRM/Intel
[  143.761329] [drm] drm/i915 developers can then reassign to the right component if it'                                                                                s not a kernel issue.
[  143.761329] [drm] The gpu crash dump is required to analyze gpu hangs, so please alwa                                                                                ys attach it.
[  143.761330] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  143.763406] drm/i915: Resetting chip after gpu hang
Comment 11 Mika Kuoppala 2015-04-30 07:01:47 UTC
For me it seems that the display init bug is fixed but then you have run into this:
https://bugs.freedesktop.org/show_bug.cgi?id=90190
Comment 12 ye.tian 2015-04-30 07:26:05 UTC
(In reply to Mika Kuoppala from comment #11)
> For me it seems that the display init bug is fixed but then you have run
> into this:
> https://bugs.freedesktop.org/show_bug.cgi?id=90190

Re-test on the latest nightly kernel(e53e6002)with two patches and bug 90190 two patches and disabling the sound driver, this issue does not exists.
Comment 13 ye.tian 2015-04-30 07:45:03 UTC
(In reply to ye.tian from comment #12)
> (In reply to Mika Kuoppala from comment #11)
> > For me it seems that the display init bug is fixed but then you have run
> > into this:
> > https://bugs.freedesktop.org/show_bug.cgi?id=90190
> 
> Re-test on the latest nightly kernel(e53e6002)with two patches and bug 90190
> two patches and disabling the sound driver, this issue does not exists.

Enable the sound driver, this issue also does not exists.
Comment 14 Jani Nikula 2015-04-30 09:37:46 UTC
Fixed by

commit faa0cdbec1c258896bff8bb59051bbada4fd6f09
Author: Imre Deak <imre.deak@intel.com>
Date:   Fri Apr 17 19:31:22 2015 +0300

    drm/i915: fix intel_prepare_ddi

in drm-intel-next-queued, closing. Let's track the other issues mentioned in other bugs; please file new ones as necessary.
Comment 15 ye.tian 2015-04-30 10:05:15 UTC
(In reply to Jani Nikula from comment #14)
> Fixed by
> 
> commit faa0cdbec1c258896bff8bb59051bbada4fd6f09
> Author: Imre Deak <imre.deak@intel.com>
> Date:   Fri Apr 17 19:31:22 2015 +0300
> 
>     drm/i915: fix intel_prepare_ddi
> 
> in drm-intel-next-queued, closing. Let's track the other issues mentioned in
> other bugs; please file new ones as necessary.

Test it on drm-intel-next-queued kernel, system can boot success.
So verified it.

Other issues is tracked in bug 90190.
Comment 16 Elizabeth 2017-10-06 14:30:27 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.