Bug 54615 - [830M] i915 locks up my Thinkpad X30 (using a GMA 82830)
Summary: [830M] i915 locks up my Thinkpad X30 (using a GMA 82830)
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Chris Wilson
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-06 21:10 UTC by monnier
Modified: 2017-07-24 23:00 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
log of failed boot (76.18 KB, text/plain)
2012-09-06 21:10 UTC, monnier
no flags Details
failed boot with Debian's 3.2.30-1 (51.85 KB, text/plain)
2012-10-15 19:03 UTC, monnier
no flags Details

Description monnier 2012-09-06 21:10:17 UTC
Created attachment 66751 [details]
log of failed boot

A recent upgrade of my kernel bumped into a regression in the i915 kernel driver.
To give some context: there seems to be a bug in the KMS code for the 82830 chip which prevents me from using KMS on this machine (I can boot and use the accelerated graphics, but resume from suspend does not restore the GPU state properly), so I've been running with "options i915 modeset=0" and the fbdev Xorg driver, which is good enough for my use case.

Sometimes between Debian's 3.2.0-2 and 3.2.0-3 a new problem showed up: the machine locks up during the boot.  I can still reproduce the lock with Debian's 3.5-trunk which is much more vanilla and indicates that the problem hasn't yet been fixed upstream (aka here).  It seems like the lock up sometimes happens before and sometimes after dmesg get logged into /var/log/message, so I haven't been able to get a drm.debug=6 boot log yet but I do have another boot log.

The key element is that the lock up happens soon after seeing the message "fb: conflicting fb hw usage inteldrmfb vs VESA VGA - removing generic driver" on the console.

I can work around this boot-lockup by blacklisting the i915 kernel module, but that in turns brings new problem when resuming from suspend.
Comment 1 Daniel Vetter 2012-09-06 22:47:41 UTC
The inteldrmfb framebuffer driver is only registered when you're booting with modeseting enabled, so maybe double-check whether you're booting with the right options.

For the hang itself, this should be fixed in

commit d8636a2717bb3da2a7ce2154bf08de90bb8c87b0
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Aug 21 16:29:47 2012 +1000

    fbcon: fix race condition between console lock and cursor timer (v1.1)

The issue is very old, but can only be hit if the timing is just right.
Comment 2 monnier 2012-09-07 20:26:48 UTC
I have "options i915 modeset=0" in my /etc/modprobe.d/i915-kms.conf and it seems to be properly taken into account when booting into 3.0.2-2, but I have just tried booting 3.5-trunk again adding "nomodeset i915.modeset=0" to the kernel's command line and I get the same lock up with the same inteldrmfb message.

So either, the way to specify "no KMS, thank you" has changed, or the bug shows up also when KMS is disabled, or finally some other bug in i915 makes it ignore modeset=0.
Comment 3 Daniel Vetter 2012-09-07 20:54:30 UTC
I have no idea what's going on in your system but
- inteldrmfb is _only_ created when modeset is enabled
- i915.modeset=0 disables things.
For some odd reason your dmesg doesn't contain any "Command line:" output (usually on line 4 in dmesg), so I can't check.

In any case, please ensure you have the mentioned patch in your kernel, otherwise things can blow up when inteldrmfb kicks the firmware fb driver.
Comment 4 monnier 2012-10-15 19:03:38 UTC
Created attachment 68586 [details]
failed boot with Debian's 3.2.30-1

The kernel command line is on line 56
Comment 5 monnier 2012-10-15 19:17:08 UTC
Jonathan Nieder <jrnieder@gmail.com> tells me the patch was installed in the upcoming Debian 3.2 kernel, so I tried it, but it does not seem to help.

But I think I misrepresented the behavior, or maybe the patch partly changed it.
Not sure if it was different with the unpatched kernel, but at least the way it boots now, the boot actually goes through: it's only the display that's frozen.

I have attached a new log of failed boot.  The display freezes one line after displaying the "removing generic driver" message (this other line is not always the same).  IOW the display appears to freeze just before displaying "Console: switching to colour dummy device 80x25".  And indeed, this "Console: switching to colour dummy device 80x25" appears in the same cases as the "removing generic driver" message.

Also worth noting, that a successful boot with older code (with Debian's 3.2.12-1) shows the same 6 [drm] messages:

 [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
 [drm] No driver support for vblank timestamp query.
 [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
 [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
 [drm] No driver support for vblank timestamp query.
 [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.1 on minor 1
Comment 6 Daniel Vetter 2012-10-15 19:27:36 UTC
(In reply to comment #5)
>  [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
>  [drm] No driver support for vblank timestamp query.
>  [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
>  [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
>  [drm] No driver support for vblank timestamp query.
>  [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.1 on minor 1

Here we go, we load the driver twice for both the .0 and .1 subfunction. That can't end well at all. I have an X30 here, too, and this doesn't seem to happen here. Can you please attach lspci -nn for your machine?
Comment 7 Jonathan Nieder 2012-10-19 03:05:54 UTC
(In reply to comment #6)
> Can you please attach lspci -nn for your machine?

From <http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;bug=686284> (see that page for other devices):

00:02.0 VGA compatible controller [0300]: Intel Corporation 82830M/MG Integrated Graphics Controller [8086:3577] (rev 04) (prog-if 00 [VGA controller])
	Subsystem: IBM ThinkPad A/T/X Series [1014:0513]
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at e0000000 (32-bit, prefetchable) [size=128M]
	Region 1: Memory at d0000000 (32-bit, non-prefetchable) [size=512K]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: <access denied>

00:02.1 Display controller [0380]: Intel Corporation 82830M/MG Integrated Graphics Controller [8086:3577]
	Subsystem: IBM ThinkPad A/T/X Series [1014:0513]
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Region 0: Memory at e8000000 (32-bit, prefetchable) [size=128M]
	Region 1: Memory at d0080000 (32-bit, non-prefetchable) [size=512K]
	Capabilities: <access denied>
Comment 8 Daniel Vetter 2012-10-19 07:08:57 UTC
Ok, I think I understand what's going on: Since you boot with nomodeset we do shadow binding, which doesn't check for the 2nd pci function, hence why the driver binds twice. Dunno why that could cause a regression, but I guess you'd need to bisect to figure this out.

The kms resume issue is already tracked in bug #49838
Comment 9 monnier 2012-10-19 14:12:04 UTC
BTW, note that the "binding twice" behavior was already present in Debian's 3.2.0-2 but did not pose a problem there.
Also the problem with bisecting is that this machine is not super-fast (as you can guess) and moreover I do not often have the opportunity to reboot it.  Is there something else I can do to try and collect info that would help?
Comment 10 Chris Wilson 2012-10-19 14:24:49 UTC
As inteldrmfb is really a KMS feature, we shouldn't be throwing out the existing vesafb. So first shot in the dark would be:

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index f92c849..6858df5 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1508,7 +1508,8 @@ int i915_driver_load(struct drm_device *dev, unsigned long
                goto put_gmch;
        }
 
-       i915_kick_out_firmware_fb(dev_priv);
+       if (drm_core_check_feature(dev, DRIVER_MODESET))
+               i915_kick_out_firmware_fb(dev_priv);
 
        pci_set_master(dev->pdev);

But once we get this issue resolved, we really should make sure KMS also works for you. (And outside of a few DVO devices that have never had specs released, it should.)
Comment 11 monnier 2012-10-25 15:31:53 UTC
> if (drm_core_check_feature(dev, DRIVER_MODESET))

Yay!  We have a winner.  I built a 3.7.0-rc1 kernel from the linux-stable git, then applied the patch and build a second 3.7.0-rc1+ (using Debian's make-kpkg), and the boot fails in the same way for 3.7.0-rc1, whereas it succeeds for 3.7.0-rc1+.
This 3.7.0-rc1+ kernel doesn't want to wake up from s2ram (even with nomodeset), so I'm still stuck with Debian's 3.2.0-2 for now, but at least this issue is resolved.

> But once we get this issue resolved, we really should make sure KMS also works
> for you. (And outside of a few DVO devices that have never had specs released,
> it should.)

Being able to use the external 1600x1200 display (like I used to before KMS) would be great, yes.  I'm keeping an eye on bug#49838.
Comment 12 Daniel Vetter 2012-10-25 15:42:09 UTC
Thanks for the update of the bug.

On Thu, Oct 25, 2012 at 5:31 PM, <bugzilla-daemon@freedesktop.org> wrote:

> Being able to use the external 1600x1200 display (like I used to before KMS)
> would be great, yes.  I'm keeping an eye on bug#49838 <https://bugs.freedesktop.org/show_bug.cgi?id=49838>.
>
> Yeah, me too ;-) I have an i830M here which exhibits the problem, but so
far haven't found enough time to dig into the resume issue.
Comment 13 Jonathan Nieder 2012-10-25 18:59:31 UTC
(In reply to comment #11)
> Yay!  We have a winner.

Thanks!  I'm reopening since this doesn't seem to be fixed in drm-intel-fixes or drm-intel-next.
Comment 14 Daniel Vetter 2012-10-25 19:54:43 UTC
Chris, can you please submit the hunk from comment #10 as a patch? I've mixed things up among all the different i830M bugs ...
Comment 15 Florian Mickler 2012-11-05 23:10:28 UTC
A patch referencing this bug report has been merged in Linux v3.7-rc4:

commit 1623392af9da983f3ad088a75076c9da05e5600d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Oct 26 12:06:41 2012 +0100

    drm/i915: Only kick out vesafb if we takeover the fbcon with KMS
Comment 16 Daniel Vetter 2012-11-06 09:22:13 UTC
Should be fixed with

commit 1623392af9da983f3ad088a75076c9da05e5600d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Oct 26 12:06:41 2012 +0100

    drm/i915: Only kick out vesafb if we takeover the fbcon with KMS

which is included in 3.7-rc4.
Comment 17 Ben Hutchings 2012-11-09 05:37:27 UTC
(In reply to comment #16)
> Should be fixed with
> 
> commit 1623392af9da983f3ad088a75076c9da05e5600d
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Oct 26 12:06:41 2012 +0100
> 
>     drm/i915: Only kick out vesafb if we takeover the fbcon with KMS
> 
> which is included in 3.7-rc4.

Why is the fix marked as:

    Cc: stable@vger.kernel.org # v3.6

when this bug was found in 3.2?
Comment 18 Chris Wilson 2012-11-09 08:52:57 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > Should be fixed with
> > 
> > commit 1623392af9da983f3ad088a75076c9da05e5600d
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Fri Oct 26 12:06:41 2012 +0100
> > 
> >     drm/i915: Only kick out vesafb if we takeover the fbcon with KMS
> > 
> > which is included in 3.7-rc4.
> 
> Why is the fix marked as:
> 
>     Cc: stable@vger.kernel.org # v3.6
> 
> when this bug was found in 3.2?

At the time I wrote the comment, the bug had only trickled back as far as 3.6.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.