Bug 53222 - [IVB gt2 server] Total system crash on trying to start X w/ intel driver
Summary: [IVB gt2 server] Total system crash on trying to start X w/ intel driver
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Daniel Vetter
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-07 19:50 UTC by Maik Zumstrull
Modified: 2017-07-24 23:00 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel output, via netconsole, with drm.debug=0x0e (15.54 KB, application/x-bzip)
2012-08-08 18:39 UTC, Maik Zumstrull
no flags Details

Description Maik Zumstrull 2012-08-07 19:50:16 UTC
That's a C216 board. I'm running Debian, testing/unstable/experimental mix, x86_64, UEFI boot.

Debian bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=683167

xserver-xorg-core 2:1.12.3-1
libdrm2 2.4.33-3
xserver-xorg-video-intel 2:2.20.2-1

Reproducible: 100%

Just boot a 3.4 or 3.5 kernel and try to start X. Unless i915.ko and the intel X driver have been blacklisted, the system's gone.

I will get dmesg with drm.debug later, probably tomorrow.

My guess is the problem is in the kernel, but it might as well be the X driver. Feel free to reassign there as appropriate. The i915 module works fine for text console, only crashes when trying to go into X.
Comment 1 Daniel Vetter 2012-08-07 20:44:22 UTC
lspci -nn would be interesting. If you can, try to grab the dmesg over netconsole, so that we can see the last dying breadths of your systems. Also, does it die when i915.ko loads, or only when X is started?
Comment 2 Daniel Vetter 2012-08-07 20:45:12 UTC
And last but not least: Is this a regression?
Comment 3 Maik Zumstrull 2012-08-07 20:52:32 UTC
(In reply to comment #1 and #2)
> lspci -nn would be interesting.

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v2/Ivy Bridge DRAM Controller [8086:0158] (rev 09)
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port [8086:0151] (rev 09)
00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller [8086:016a] (rev 09)
00:06.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port [8086:015d] (rev 09)
00:14.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller [8086:1e31] (rev 04)
00:16.0 Communication controller [0780]: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 [8086:1e3a] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 [8086:1e2d] (rev 04)
00:1b.0 Audio device [0403]: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller [8086:1e20] (rev 04)
00:1c.0 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 [8086:1e10] (rev c4)
00:1c.5 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 6 [8086:1e1a] (rev c4)
00:1c.6 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 7 [8086:1e1c] (rev c4)
00:1d.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 [8086:1e26] (rev 04)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev a4)
00:1f.0 ISA bridge [0601]: Intel Corporation C216 Series Chipset LPC Controller [8086:1e53] (rev 04)
00:1f.2 RAID bus controller [0104]: Intel Corporation 82801 SATA Controller [RAID mode] [8086:2822] (rev 04)
00:1f.3 SMBus [0c05]: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller [8086:1e22] (rev 04)
04:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network Connection [8086:10d3]
05:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network Connection [8086:10d3]

That is one intely system.

> If you can, try to grab the dmesg over netconsole, so that we can see the last dying breadths of your systems.

Done (but without drm.debug so far) for the Debian bug. It dies so fast there's no output from the crash, though.

> Also, does it die when i915.ko loads, or only when X is started?

Only when X is started, KMS text console works.

> And last but not least: Is this a regression?

Nope, it has never worked for me. I've tried Kernels 3.4 and 3.5, X driver 2.19 and 2.20 (as available from Debian).

Older kernels don't crash, but only because they've never heard of the hardware and so don't try to use it, it's quite new.
Comment 4 Daniel Vetter 2012-08-08 07:50:55 UTC
Can you please retest with xorg-video-intel 2.20.3? That contains some ivb fixes ...
Comment 5 Chris Wilson 2012-08-08 07:55:13 UTC
Nope, already in his Debian package and only required for gt1; he has the Bromlow GT2 IVB server variant.

In the reported cases, it was only a GPU hang and not a total system hang.
Comment 6 Maik Zumstrull 2012-08-08 18:39:36 UTC
Created attachment 65307 [details]
Kernel output, via netconsole, with drm.debug=0x0e
Comment 7 Chris Wilson 2012-08-11 13:14:51 UTC
Let's start X by itself and see if we can pinpoint the crash.

So from a vt, can you execute 'X -ac' and see what happens?
Comment 8 Maik Zumstrull 2012-08-12 13:08:43 UTC
(In reply to comment #7)

> So from a vt, can you execute 'X -ac' and see what happens?

The screen goes black, except for a non-blinking cursor in the top left corner.

Doing the same via ssh, this is the ouput:


X.Org X Server 1.12.3
Release Date: 2012-07-09
X Protocol Version 11, Revision 0
Build Operating System: Linux 3.2.0-3-amd64 x86_64 Debian
Current Operating System: Linux antares 3.5-trunk-amd64 #1 SMP Thu Aug 2 17:16:27 UTC 2012 x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-3.5-trunk-amd64 root=/dev/mapper/luks-root ro single acpi_os=Linux
Build Date: 18 July 2012  08:00:38AM
xorg-server 2:1.12.3-1 (Julien Cristau <jcristau@debian.org>) 
Current version of pixman: 0.26.0
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sun Aug 12 15:05:36 2012
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
X: ../../intel/intel_bufmgr_gem.c:2783: drm_intel_bufmgr_gem_init: Assertion `0' failed.
Aborted


Surprisingly, I find the ssh connection still works after this. It seems the entire system doesn't die after all; just the display and the input devices are left in useless modes.
Comment 9 Maik Zumstrull 2012-08-12 13:11:11 UTC
Apparent duplicate w/ fix: https://bugzilla.redhat.com/show_bug.cgi?id=840180
Comment 10 Chris Wilson 2012-08-12 13:18:20 UTC
Aha! Indeed libdrm is out-of-date, you need libdrm 2.4.34 (or a backport of the PCI ID patch, or a backport of the assertion removal).

commit e057a56448e2e785f74bc13dbd6ead8572ebed91
Author: Eugeni Dodonov <eugeni@dodonov.net>
Date:   Thu Mar 29 21:03:29 2012 -0300

    intel: add Ivy Bridge GT2 server variant
    
    We were missing this one and it is being used by Bromolow.
    
    Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>

commit 9a2b57d229fe3e6a1c9799e8cd5397969202d223
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 25 16:28:59 2012 +0100

    intel: Bail gracefully if we encounter an unknown Intel device
    
    Otherwise we end up with X hitting a fail-loop as the embedded libGL
    stacks asserts whilst initialising.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.