Bug 35576

Summary: [arrandale] GPU lockup (IPEHR: 0x01820000)
Product: xorg Reporter: Bryce Harrington <bryce>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED DUPLICATE QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: high CC: mikael
Version: 7.6 (2010.12)   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
i915_error_state.txt
none
XorgLog.txt
none
CurrentDmesg.txt
none
BootDmesg.txt
none
Scanlines are inclusive... none

Description Bryce Harrington 2011-03-22 18:52:12 UTC
Forwarding this bug from Ubuntu reporter jradwans:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/737110

[Problem]
Another arrandale GPU lockup, but with IPEHR: 0x01820000.

LP bug #725206 is similar with an IPEHR of 0x01800002; I don't know if that is significant, but presume it may be a dupe.

ProblemType: Crash
DistroRelease: Ubuntu 11.04
Package: xserver-xorg-video-intel 2:2.14.0-4ubuntu2
ProcVersionSignature: Ubuntu 2.6.38-1.28-generic 2.6.38-rc2
Uname: Linux 2.6.38-1-generic x86_64
Architecture: amd64
Chipset: arrandale
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
DRM.card0.DP.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes: 
 edid-base64:
DRM.card0.DP.2:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes: 
 edid-base64:
DRM.card0.DP.3:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes: 
 edid-base64:
DRM.card0.HDMI.A.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes: 
 edid-base64:
DRM.card0.HDMI.A.2:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes: 
 edid-base64:
DRM.card0.HDMI.A.3:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes: 
 edid-base64:
DRM.card0.LVDS.1:
 status: connected
 enabled: enabled
 dpms: On
 modes: 1366x768 1366x768
 edid-base64: AP///////wAwrrBAAAAAADUTAQOAIxN46nuVnFdXlCkVUFQAAAABAQEBAQEBAQEBAQEBAQEBRR1Wu1AAJDA4JUYAWMEQAAAYnBdWqFAAFDAzIjUAWMEQAAAYAAAADwCMCTKMCSgWCQANr1AUAAAA/gBOMTU2QjYtTDBBCiAgAIM=
DRM.card0.VGA.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes: 
 edid-base64:
Date: Thu Mar 17 19:12:24 2011
DistUpgraded: Log time: 2010-12-25 11:21:08.520570
DistroCodename: natty
DistroVariant: ubuntu
DkmsStatus: virtualbox-ose, 4.0.4, 2.6.38-1-generic, x86_64: installed
DumpSignature: 7d85e61d
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GraphicsCard:
 Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:215a]
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Alpha amd64 (20101202)
InterpreterPath: /usr/bin/python2.7
MachineType: LENOVO 2598RM4
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:
 
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-2.6.38-1-generic root=/dev/mapper/ubuntu--root-ubuntu--root--ogic ro vga=773 quiet splash vt.handoff=7
ProcKernelCmdLine_: BOOT_IMAGE=/vmlinuz-2.6.38-1-generic root=/dev/mapper/ubuntu--root-ubuntu--root--ogic ro vga=773 quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg             1:7.6~3ubuntu11
 libdrm2                  2.4.23-1ubuntu3
 xserver-xorg-video-intel 2:2.14.0-4ubuntu2
Renderer: Unknown
SourcePackage: xserver-xorg-video-intel
Title: [arrandale] GPU lockup 7d85e61d
UpgradeStatus: Upgraded to natty on 2011-03-17 (0 days ago)
UserGroups:
 
dmi.bios.date: 01/11/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 81ET49WW (1.25 )
dmi.board.name: 2598RM4
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr81ET49WW(1.25):bd01/11/2011:svnLENOVO:pn2598RM4:pvrThinkPadL512:rvnLENOVO:rn2598RM4:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 2598RM4
dmi.product.version: ThinkPad L512
dmi.sys.vendor: LENOVO
version.compiz: compiz 1:0.9.4-0ubuntu5
version.libdrm2: libdrm2 2.4.23-1ubuntu3
version.libgl1-mesa-glx: libgl1-mesa-glx 7.10.1-0ubuntu3
version.xserver-xorg: xserver-xorg 1:7.6~3ubuntu11
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.0-0ubuntu2
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.14.0-4ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20110107+b795ca6e-0ubuntu5
Comment 1 Bryce Harrington 2011-03-22 18:52:41 UTC
Created attachment 44739 [details]
i915_error_state.txt
Comment 2 Bryce Harrington 2011-03-22 18:53:02 UTC
Created attachment 44740 [details]
XorgLog.txt
Comment 3 Bryce Harrington 2011-03-22 18:53:23 UTC
Created attachment 44741 [details]
CurrentDmesg.txt
Comment 4 Bryce Harrington 2011-03-22 18:53:41 UTC
Created attachment 44742 [details]
BootDmesg.txt
Comment 5 Chris Wilson 2011-03-25 03:49:38 UTC
Created attachment 44844 [details] [review]
Scanlines are inclusive...

If you can reproduce the bug, can you please try the attached patch.
Comment 6 Chris Wilson 2011-03-25 14:23:21 UTC
*** Bug 35575 has been marked as a duplicate of this bug. ***
Comment 7 Chris Wilson 2011-04-04 09:04:21 UTC
commit 972569f6fd1e14519f46e9f50d2509faf1d0aa55
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Mar 25 10:46:14 2011 +0000

    MI_LOAD_SCAN_LINES_INCL are inclusive and range [0, display height-1]
    
    We have seen GPU hangs with:
    
    batchbuffer at 0x0f9b4000:
    0x0f9b4000:      0x09000000: MI_LOAD_SCAN_LINES_INCL
    0x0f9b4004:      0x00000300:    dword 1
    0x0f9b4008:      0x09000000: MI_LOAD_SCAN_LINES_INCL
    0x0f9b400c:      0x00000300:    dword 1
    0x0f9b4010:      0x01820000: MI_WAIT_FOR_EVENT
    0x0f9b4014: HEAD 0x02000006: MI_FLUSH
    
    on a 1366x768 display. That according to the specs an invalid command
    for the pipe.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35576
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 8 Bryce Harrington 2011-04-04 16:14:13 UTC
Thanks.  I prepared a package for the user to test which includes this patch.  After installing and testing it, they received another crash with the same IPEHR value:

i915_error_state:
 Time: 1301810387 s 154509 us
 PCI ID: 0x0046
 EIR: 0x00000000
 PGTBL_ER: 0x00000000
 Render command stream:
   ACTHD: 0x006e1014
   IPEIR: 0x00000000
   IPEHR: 0x01820000
   INSTDONE: 0xffffffff
   INSTDONE1: 0xbfffffff
   INSTPS: 0x80000020
   INSTPM: 0x00000000

Full details for this new crash are here:
  https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/737110/+attachment/1971485/+files/xserver-xorg-video-intel.2011-04-03_07%3A59%3A49.020752.crash
Comment 9 Chris Wilson 2011-04-05 00:28:27 UTC
Hmm, might be worth testing with

  Option "DebugFlushCaches" "True"

and/or

  Option "DebugFlushBatches" "True"

to rule out one possibility of a bad render op that lies uncaught till the wait.
Comment 10 Bryce Harrington 2011-04-13 14:33:01 UTC
No response from the reporter for testing those two options.  But we've had three more bug reports come in with the same code, for various chips:

760068  [arrandale] GPU lockup (IPEHR: 0x01820000)
757399  [gm45] GPU lockup (IPEHR: 0x01820000)
752249  [q45] GPU lockup (0x01820000)
Comment 11 Eric Anholt 2011-06-08 15:41:40 UTC
Given how often we have problems with this, we should probably include the pipe/plane regs in the error_state dump so we can see whether we're waiting for scanlines that will ever change.
Comment 12 Chris Wilson 2011-06-26 02:20:34 UTC
commit c4a1d9e4dc5d5313cfec2cc0c9d630efe8a6f287
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Nov 21 13:12:35 2010 +0000

    drm/i915: Capture interesting display registers on error
    
    When trying to diagnose mysterious errors on resume, capture the
    display register contents as well.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 13 Chris Wilson 2011-07-08 03:17:30 UTC
Forward duping to the bug that has the patch...

*** This bug has been marked as a duplicate of bug 36515 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.