Bug 76182 - Platforms with i915 won't boot without modeset=0
Summary: Platforms with i915 won't boot without modeset=0
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Rodrigo Vivi
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-14 16:26 UTC by Tom Wylegala
Modified: 2017-07-24 22:55 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Tom Wylegala 2014-03-14 16:26:21 UTC
Description of problem:
server failed to boot

Version-Release number of selected component (if applicable):
Intel Haswell SoC (i7-4860EQ mobile processor) with Intel Iris Pro Graphics 5200 device (8086:0d26).

How reproducible:
regularly

Steps to Reproduce:
1.	Boot into PXE menu and select RHEL6.5 
2.	Install OS with text mode
3.	Reboot the machine after OS install
4.	Observed a kernel panic when tried to load i915 driver after loading drm.
5.	However, RHEL6.5 booted fine with “nomodeset” or “i915.modeset=0” (These boot parameters are used to turn-off KMS mode for i915 driver)


Additional info (log/error/screenshot): Please refer below dmesg output for more information (used drm.debug boot parameter to capture the log)
Workaround : RHEL6.5 is booted fine with “nomodeset” or “i915.modeset=0” boot parameters. These boot parameters are turning-off KMS mode

Log outputs:
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.24.6-ioctl (2013-01-15) initialised: dm-devel@redhat.com
udev: starting version 147
[drm] Initialized drm 1.1.0 20060810
i915 0000:00:02.0: can't derive routing for PCI INT A
i915 0000:00:02.0: PCI INT A: no GSI - using IRQ 5
[drm] Memory usable by graphics device = 2048M
[drm:i915_write32] *ERROR* Unknown unclaimed register before writing to c5100
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] Driver supports precise vblank timestamp query.
vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+me                                                                                                                                                  m
[drm] failed to retrieve link info, disabling eDP
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request.
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request.
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request.
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request.
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request.
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request.
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request.
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request.
[drm] GMBUS [i915 gmbus vga] timed out, falling back to bit banging on pin 2
i915 0000:00:02.0: No connectors reported connected with modes
[drm] Cannot find any crtc or sizes - going 1024x768
fbcon: inteldrmfb (fb0) is primary device
Console: switching to colour frame buffer device 128x48
i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
i915 0000:00:02.0: registered panic notifier
Slow work thread pool: Starting up
Slow work thread pool: Ready
[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
dracut: Starting plymouth daemon
dracut: rd_NO_DM: removing DM RAID activation
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
Pid: 35, comm: events/0 Not tainted 2.6.32-431.el6.x86_64 #1
Call Trace:
<NMI>  [<ffffffff815271fa>] ? panic+0xa7/0x16f
[<ffffffff810153a3>] ? native_sched_clock+0x13/0x80
[<ffffffff810e697d>] ? watchdog_overflow_callback+0xcd/0xd0
[<ffffffff8111c857>] ? __perf_event_overflow+0xa7/0x240
[<ffffffff8101d93d>] ? x86_perf_event_set_period+0xdd/0x170
[<ffffffff8111ce24>] ? perf_event_overflow+0x14/0x20
[<ffffffff81022d87>] ? intel_pmu_handle_irq+0x187/0x2f0
[<ffffffff8152cee6>] ? kprobe_exceptions_notify+0x16/0x430
[<ffffffff8152ba59>] ? perf_event_nmi_handler+0x39/0xb0
[<ffffffff8152d515>] ? notifier_call_chain+0x55/0x80
[<ffffffff8152d57a>] ? atomic_notifier_call_chain+0x1a/0x20
[<ffffffff810a154e>] ? notify_die+0x2e/0x30
[<ffffffff8152b1db>] ? do_nmi+0x1bb/0x340
[<ffffffff8152aaa0>] ? nmi+0x20/0x30
[<ffffffffa01157e7>] ? __gen6_gt_force_wake_mt_get+0xf7/0x160 [i915]
<<EOE>>  [<ffffffffa0115ad4>] ? gen6_gt_force_wake_get+0x44/0x60 [i915]
[<ffffffffa0115b5f>] ? intel_gen6_powersave_work+0x6f/0x660 [i915]
[<ffffffffa0115af0>] ? intel_gen6_powersave_work+0x0/0x660 [i915]
[<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
[<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
[<ffffffff8109aef6>] ? kthread+0x96/0xa0
[<ffffffff8100c20a>] ? child_rip+0xa/0x20
[<ffffffff8109ae60>] ? kthread+0x0/0xa0
[<ffffffff8100c200>] ? child_rip+0x0/0x20
drm_kms_helper: panic occurred, switching back to text console


====
similar problems occur on other Linux distributions
Comment 1 David Herrmann 2014-03-14 16:39:15 UTC
This is a driver bug, not a kmscon bug. Please report that to the i915 maintainers. RHEL6.5 doesn't even ship kmscon.

Note that kmscon is a user-space console based on KMS, it's not involved with the kernel-side KMS development, so it's unrelated to any boot problems.
Comment 2 Chris Wilson 2014-03-14 17:11:55 UTC
Or more likely, that is a BIOS bug.
Comment 3 Tom Wylegala 2014-03-14 17:18:12 UTC
Another example of this problem, this time on an Ubuntu kernel:

root@Andersc13:~# uname -a
 Linux Andersc13 3.14.0-0-generic #1~lp1284816v2 SMP Fri Mar 7 20:22:15 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

[ 364.342970] microcode: CPU0 sig=0x40661, pf=0x20, revision=0xf
 [ 364.346113] device-mapper: multipath: version 1.6.0 loaded
 [ 364.352572] microcode: CPU1 sig=0x40661, pf=0x20, revision=0xf
 [ 364.352589] microcode: CPU2 sig=0x40661, pf=0x20, revision=0xf
 [ 364.352602] microcode: CPU3 sig=0x40661, pf=0x20, revision=0xf
 [ 364.352615] microcode: CPU4 sig=0x40661, pf=0x20, revision=0xf
 [ 364.352631] microcode: CPU5 sig=0x40661, pf=0x20, revision=0xf
 [ 364.352644] microcode: CPU6 sig=0x40661, pf=0x20, revision=0xf
 [ 364.352656] microcode: CPU7 sig=0x40661, pf=0x20, revision=0xf
 [ 364.352723] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
 [ 364.359351] [drm] Initialized drm 1.1.0 20060810
 [ 364.394150] AVX2 version of gcm_enc/dec engaged.
 [ 364.401032] [drm:drm_pci_agp_init] *ERROR* Cannot initialize the agpgart module.
Comment 4 Rodrigo Vivi 2014-03-25 20:07:12 UTC
About this [drm:drm_pci_agp_init] *ERROR* Cannot initialize the agpgart 
on 3.14 I believe it is just a matter of removing "nomodeset" option that you might get i915 working on you Ubuntu.

Regarding the RHEL6.5 on HSW, looking to the logs I saw 2.6.32 kernel. Is that correct? HSW is only supported on Kernel > 3.8 or with proper backports from distros.

What is your scenario? What kernel are you using? Isn't possible to upgrade to a newest version?

Also, I recall one discussion with Adam that he was against backporting HSW to RHEL6.5 because at that time we had a critical eDP issue that was probably going to break all installation images. This is the reason that we introduced the preliminary_hw_support. And as far I can remember HSW support was out of RHEL6.5 installation image, not even under preliminary_hw_support. So it is so strange to see these log messages here. And more strange to see a so old kernel version on HSW with that messages.

But please, provide us more info about your system and distro versions. And please, try to upgrade it and test again.
Comment 5 Rodrigo Vivi 2014-03-25 20:55:20 UTC
Another idea here is to check the result booting our drm-intel/drm-intel-nightly development branch.
Comment 6 Tom Wylegala 2014-03-25 21:01:25 UTC
We would like to support our server on Red Hat Enterprise Linux, and therefore must use the 2.6.32 kernel.  For the most part, RHEL 6.5 has good support for Haswell.

Can you identify a set of patches that we could try to backport onto 6.5?  If that does in fact break all of the installation images, then we will have to find another plan.
Comment 7 Rodrigo Vivi 2014-03-25 21:13:07 UTC
Hi Tom, could you please test drm-intel-nightly branch? Than we can be sure this is not a BIOS bug and that might be a patch or a series to backport that would solve the issue.

Thanks
Comment 8 Tom Wylegala 2014-03-25 21:37:38 UTC
Thanks, Rodrigo.
We will try drm-intel/drm-intel-nightly branch and report the results.

Meanwhile, you can ignore my comments about Ubuntu.  Ubuntu 14.04, which is based on the 3.13 kernel, emits the drm initialization error.  This error is fixed in the 3.14 kernel.  Right now, the Ubuntu team is engaged in a bisection activity to determine exactly which patch fixes the initialization error.  There is no current problem with Ubuntu.
Comment 9 Daniel Vetter 2014-03-26 18:31:32 UTC
Quick maintainer note: This bugzilla here is for upstream bugs, i.e. if this works on -nightly and recently released kernels the backporting to rhel is out of scope and needs to be tracked somewhere else. Thanks.
Comment 10 Tom Wylegala 2014-03-31 23:06:40 UTC
We are trying to build  drm-intel/drm-intel-nightly branch to confirm that the fix there is totally satisfactory, but the first attempt ran into some difficulties due to missing libraries.  We will try to do the build again.

In the meantime, we would be very interested in knowing the identities of the upstream patches that repaired the driver, so we can attempt to apply them against RHEL 6.5.

We are also doing a similar exercise with Ubuntu.  We can get a kernel that works properly, but we are still in the midst of the bisection activity to identify exactly what patch has effected the driver repair.
Comment 11 Tom Wylegala 2014-04-03 17:17:41 UTC
We have successful execution on our platform based on the kernel built from drm-intel/drm-intel-nightly branch, so we are satisfied that there is proper upstream support for the i915.  In particular, it appears that the following patch is the one that remedies the problem we experienced:

commit b30324adaf8d2e5950a602bde63030d15a61826f
 Author: Daniel Vetter <daniel.vetter@ffwll.ch>
 Date: Wed Nov 13 22:11:25 2013 +0100

    drm/i915: Deprecated UMS support

We have commenced an activity to back-port that patch onto the older Red Hat distribution that we are trying to enable.  If there is someone with more intimate knowledge of the drm/i915 driver who can suggest other earlier patches that are needed in addition to the one cited, we would be happy to have that information.  Otherwise, we can consider the issue raised in this Bugzilla to be closed.
Comment 12 Daniel Vetter 2014-04-05 10:53:29 UTC
Ah, apparently the old UMS frankenstein driver redhat ships on some releases blows up (iirc 5.x series or so but not sure). Definitely not our issue, please file this issue against redhat support. Note that the patch itself simply forces modeset=0 through a build parameter.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.