Bug 34017

Summary: [i965gm] GPU lockup during login
Product: xorg Reporter: Bryce Harrington <bryce>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: high CC: mdz
Version: 7.6 (2010.12)   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
BootDmesg.txt
none
CurrentDmesg.txt
none
XorgLog.txt
none
i915_error_state.txt none

Description Bryce Harrington 2011-02-07 18:47:18 UTC
Forwarding this bug from Ubuntu reporter Iliyan:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/710321

[Problem]
[i965gm] GPU lockup during login, with "Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3]" error messages listed in the gpu dump text.

We've seen several other reports with the same error in their gpu dumps, which I'm assuming to all be dupes:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/710322
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/710325
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/711645
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/711691


[Original Description]
crashed during log-in

From the GPU dump:

0x00000a78:      0x0a000002: MI_DISPLAY_BUFFER_INFO
Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3]


---
[36928.228061] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[36928.229395] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 5477943 at 5477942, next 5477984)
[36928.230010] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[36928.484890] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[36928.504630] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[36928.524640] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[36928.544696] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[36928.564642] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[36928.584642] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[36928.604625] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[36928.624623] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[36928.644639] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[36928.664658] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
...
[36936.704058] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[36936.704077] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 5477993 at 5477942, next 5477994)
[36936.704502] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[36936.704505] [drm:i915_reset] *ERROR* Failed to reset chip.
---

ProblemType: Crash
DistroRelease: Ubuntu 11.04
Package: xserver-xorg-video-intel 2:2.14.0-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.38-1.28-generic 2.6.38-rc2
Uname: Linux 2.6.38-1-generic i686
Architecture: i386
Chipset: i965gm
CompizPlugins: No value set for `/apps/compiz-1/general/allscreens/options/active_plugins'
CompositorRunning: compiz
DRM.card0.DVI.D.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
DRM.card0.LVDS.1:
 status: connected
 enabled: enabled
 dpms: Off
 modes: 1400x1050
 edid-base64:
DRM.card0.VGA.1:
 status: connected
 enabled: enabled
 dpms: On
 modes: 1280x1024 1280x1024 1152x864 1024x768 1024x768 1024x768 832x624 800x600 800x600 800x600 800x600 640x480 640x480 640x480 640x480 720x400
 edid-base64: AP///////wBMLY8BNzFKTSEPAQNsIht4KqqlplRUmSYUUFS/74CBgHFPAQEBAQEBAQEBAQEBMCoAmFEAKkAwcBMAUg4RAAAeAAAA/QA4Sx5RDgAKICAgICAgAAAA/ABTeW5jTWFzdGVyCiAgAAAA/wBIRERZODEwMjYwCiAgAMg=
Date: Sun Jan 30 22:05:15 2011
DistUpgraded: Yes, recently upgraded Log time: 2010-12-04 00:59:39.625229
DistroCodename: natty
DistroVariant: ubuntu
DkmsStatus: vboxhost, 4.0.2: added
DumpSignature: e4a1dc33
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GraphicsCard:
 Subsystem: Fujitsu Limited. Device [10cf:13f5]
   Subsystem: Fujitsu Limited. Device [10cf:13f5]
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release i386 (20101007)
InterpreterPath: /usr/bin/python2.7
MachineType: FUJITSU SIEMENS LIFEBOOK E8310
PccardctlIdent:
 Socket 0:
   no product info available
 Socket 1:
   product info: "O2Micro", "SmartCardBus Reader", "V1.0", ""
   manfid: 0xffff, 0x0001
PccardctlStatus:
 Socket 0:
   no card
 Socket 1:
   5.0V 16-bit PC Card
   Subdevice 0 (function 0) [unbound]
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-1-generic root=UUID=12c58bc3-f876-43c1-b2a0-ec9eae5e0dae ro quiet splash vt.handoff=7
ProcKernelCmdLine_: BOOT_IMAGE=/boot/vmlinuz-2.6.38-1-generic root=UUID=12c58bc3-f876-43c1-b2a0-ec9eae5e0dae ro quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg 1:7.5+6ubuntu8
 libdrm2 2.4.23-1ubuntu3
 xserver-xorg-video-intel 2:2.14.0-1ubuntu2
Renderer: Hardware acceleration
SourcePackage: xserver-xorg-video-intel
Title: [i965gm] GPU lockup e4a1dc33
UnitySupportTest:

UserGroups:

dmi.bios.date: 07/18/2007
dmi.bios.vendor: FUJITSU // Phoenix Technologies Ltd.
dmi.bios.version: Version 1.08
dmi.board.name: FJNB1CE
dmi.board.vendor: FUJITSU
dmi.board.version: 1PCP331350-03
dmi.chassis.type: 10
dmi.chassis.vendor: FUJITSU SIEMENS
dmi.chassis.version: E8310
dmi.modalias: dmi:bvnFUJITSU//PhoenixTechnologiesLtd.:bvrVersion1.08:bd07/18/2007:svnFUJITSUSIEMENS:pnLIFEBOOKE8310:pvr:rvnFUJITSU:rnFJNB1CE:rvr1PCP331350-03:cvnFUJITSUSIEMENS:ct10:cvrE8310:
dmi.product.name: LIFEBOOK E8310
dmi.sys.vendor: FUJITSU SIEMENS
version.libdrm2: libdrm2 2.4.23-1ubuntu3
version.libgl1-mesa-glx: libgl1-mesa-glx 7.10-1ubuntu1
version.xserver-xorg: xserver-xorg 1:7.5+6ubuntu8
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.13.2+git20110124.fadee040-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.14.0-1ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau N/A
Comment 1 Bryce Harrington 2011-02-07 18:49:22 UTC
Created attachment 43074 [details]
BootDmesg.txt
Comment 2 Bryce Harrington 2011-02-07 18:49:58 UTC
Created attachment 43075 [details]
CurrentDmesg.txt
Comment 3 Bryce Harrington 2011-02-07 18:50:29 UTC
Created attachment 43076 [details]
XorgLog.txt
Comment 5 Chris Wilson 2011-02-08 01:44:28 UTC
commit 3c5b1399e29ef577b8b91655b5e1c215d1b6dfbb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 9 20:20:06 2010 +0000

    i915: Disable maximum state addresses
    
    As the kernel controls the relocation of state buffers, we should not
    hard code the maximum permissible value for them.
    
    Fixes an eventual hang with full-gtt.
    
    Reported-by: Peter Clifton <pcjc2@cam.ac.uk>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 6 Bryce Harrington 2011-02-08 13:28:46 UTC
The version of the driver the user tested was:

  version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.14.0-1ubuntu2

which already included commit 3c5b1399e29ef577b8b91655b5e1c215d1b6dfbb
Comment 7 Chris Wilson 2011-02-09 02:03:59 UTC
The error state says differently:

0x0d6a8014:      0x61010004: STATE_BASE_ADDRESS
0x0d6a8018:      0x00000001:    general state base address 0x00000000
0x0d6a801c:      0x0d8f2001:    surface state base address 0x0d8f2000
0x0d6a8020:      0x00000001:    indirect state base address 0x00000000
0x0d6a8024:      0x10000001:    general state upper bound 0x10000000
0x0d6a8028:      0x10000001:    indirect state upper bound 0x10000000
Comment 8 Chris Wilson 2011-02-09 02:05:04 UTC
Not only which, but we also have a typo in intel_decode.c
Comment 9 Christopher James Halse Rogers 2011-02-09 15:32:44 UTC
You appear to still be setting BASE_ADDRESS_MODIFY to 0x10000000 in i965_emit_video_setup for general_state, object_state, and instruction_state.  I'm not sure if that's relevant, but I suspect it is.
Comment 10 Christopher James Halse Rogers 2011-02-09 15:33:37 UTC
(In reply to comment #9)
> You appear to still be setting BASE_ADDRESS_MODIFY to 0x10000000 in
> i965_emit_video_setup for general_state, object_state, and instruction_state. 
  ^^^^^^^^^^^^^^^^^^ in src/i965_video.c, if that's not obvious.
Comment 11 Chris Wilson 2011-02-11 16:08:50 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > You appear to still be setting BASE_ADDRESS_MODIFY to 0x10000000 in
> > i965_emit_video_setup for general_state, object_state, and instruction_state. 
>   ^^^^^^^^^^^^^^^^^^ in src/i965_video.c, if that's not obvious.

Please. No. I don't want to be reminded of how much badness there exists in our code.
Comment 12 Chris Wilson 2011-02-12 02:46:35 UTC
Christopher, what's the likelihood of Xv being using during login? So I suspect this code was not the cause, but nevertheless it is still very wrong.

commit 23f9b14df7c102c1036134835dd5d1a508059858
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Feb 12 10:42:34 2011 +0000

    i965: Remove broken maximum base addresses from video
    
    WRONG.
    
    The hardware was never limited to 0x1000000 and the kernel can quite
    rightly place objects above that limit. Specifying such had no relation
    to reality, so why did we do it? TWICE!
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34017
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 13 Bryce Harrington 2011-02-14 16:01:04 UTC
Thanks for the fix Chris.  I would be a little surprised if we exercise Xv at all during boot, but stranger things have happened.  In any case we've been hearing of a few issues with video in general, perhaps this will solve those.
Comment 14 Chris Wilson 2011-02-19 03:07:29 UTC
*** Bug 34461 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.