Bug 28964

Summary: [i965gm] GPU infinite MI_WAIT_FOR_EVENT while watching video in Totem
Product: xorg Reporter: Bryce Harrington <bryce>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: brian, gomyhr, mdz
Version: 7.5 (2009.10)   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
BootDmesg.txt
none
CurrentDmesg.txt
none
Dependencies.txt
none
GdmLog.txt
none
GdmLog1.txt
none
GdmLog2.txt
none
Lspci.txt
none
Lsusb.txt
none
PciDisplay.txt
none
ProcCpuinfo.txt
none
ProcInterrupts.txt
none
ProcMaps.txt
none
ProcModules.txt
none
ProcStatus.txt
none
RelatedPackageVersions.txt
none
UdevDb.txt
none
UdevLog.txt
none
XorgConf.txt
none
XorgLog.txt
none
XorgLogOld.txt
none
Xrandr.txt
none
glxinfo.txt
none
i915_error_state.txt
none
setxkbmap.txt
none
xdpyinfo.txt
none
xkbcomp.txt none

Description Bryce Harrington 2010-07-08 12:09:34 UTC
Forwarding this bug from Ubuntu reporter :
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/603064

[Problem]
GPU lockup while watching video in Totem.  GPU dump attached

[Original Description]
This happened while watching a video in Totem.

ProblemType: Crash
DistroRelease: Ubuntu 10.10
Package: xserver-xorg-video-intel 2:2.11.0-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.35-6.9-generic 2.6.35-rc3
Uname: Linux 2.6.35-6-generic x86_64
Architecture: amd64
Chipset: i965gm
DRM.card0.DVI.D.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes: 
 edid-base64:
DRM.card0.LVDS.1:
 status: connected
 enabled: enabled
 dpms: Off
 modes: 1680x1050
 edid-base64: AP///////wAkTYcoAAAAAAAPAQOAIRV4CrylmFhViygkUFQAAAABAQEBAQEBAQEBAQEBAQEBHC+Q0GAaD0AgMBMAS88QAAAZRSeQ0GAaD0AgMBMAS88QAAAZAAAADwCzCjKzCigUAQAyDAAAAAAA/gBMUDE1NFcwMi1UTDA2AL8=
DRM.card0.VGA.1:
 status: connected
 enabled: enabled
 dpms: On
 modes: 1920x1200 1600x1200 1680x1050 1280x1024 1440x900 1280x960 1280x800 1024x768 800x600 800x600 640x480
 edid-base64: AP///////wBMLeYDNjJXVCMSAQMONyJ4Kv4hqFM3riQRUFQjCACpQIGAgUCBAJUAswABAQEBKDyAoHCwI0AwIDYAJlQhAAAaAAAA/QA4PB5REQAKICAgICAgAAAA/ABTeW5jTWFzdGVyCiAgAAAA/wBIVkRRODAwNDkwCiAgANc=
Date: Tue Jul  6 23:41:42 2010
DkmsStatus: Error: [Errno 2] No such file or directory
DumpSignature: 77c6dfe5
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
InterpreterPath: /usr/bin/python2.6
MachineType: LENOVO 6465CTO
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.35-6-generic root=UUID=305dde78-d20a-4248-aaf4-09447b7c5791 ro quiet splash
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:
 
SourcePackage: xserver-xorg-video-intel
Title: [i965gm] GPU lockup 77c6dfe5
UserGroups:
 
dmi.bios.date: 01/21/2008
dmi.bios.vendor: LENOVO
dmi.bios.version: 7LETB0WW (2.10 )
dmi.board.name: 6465CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7LETB0WW(2.10):bd01/21/2008:svnLENOVO:pn6465CTO:pvrThinkPadT61:rvnLENOVO:rn6465CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 6465CTO
dmi.product.version: ThinkPad T61
dmi.sys.vendor: LENOVO
system: codename:           maverick
 architecture:       x86_64
 kernel:             2.6.35-6-generic
Comment 1 Bryce Harrington 2010-07-08 13:21:02 UTC
Created attachment 36858 [details]
BootDmesg.txt
Comment 2 Bryce Harrington 2010-07-08 13:21:07 UTC
Created attachment 36859 [details]
CurrentDmesg.txt
Comment 3 Bryce Harrington 2010-07-08 13:21:13 UTC
Created attachment 36860 [details]
Dependencies.txt
Comment 4 Bryce Harrington 2010-07-08 13:21:19 UTC
Created attachment 36861 [details]
GdmLog.txt
Comment 5 Bryce Harrington 2010-07-08 13:21:24 UTC
Created attachment 36862 [details]
GdmLog1.txt
Comment 6 Bryce Harrington 2010-07-08 13:21:30 UTC
Created attachment 36863 [details]
GdmLog2.txt
Comment 7 Bryce Harrington 2010-07-08 13:21:43 UTC
Created attachment 36864 [details]
Lspci.txt
Comment 8 Bryce Harrington 2010-07-08 13:21:49 UTC
Created attachment 36865 [details]
Lsusb.txt
Comment 9 Bryce Harrington 2010-07-08 13:21:55 UTC
Created attachment 36866 [details]
PciDisplay.txt
Comment 10 Bryce Harrington 2010-07-08 13:22:01 UTC
Created attachment 36867 [details]
ProcCpuinfo.txt
Comment 11 Bryce Harrington 2010-07-08 13:22:12 UTC
Created attachment 36868 [details]
ProcInterrupts.txt
Comment 12 Bryce Harrington 2010-07-08 13:22:21 UTC
Created attachment 36869 [details]
ProcMaps.txt
Comment 13 Bryce Harrington 2010-07-08 13:22:27 UTC
Created attachment 36870 [details]
ProcModules.txt
Comment 14 Bryce Harrington 2010-07-08 13:22:33 UTC
Created attachment 36871 [details]
ProcStatus.txt
Comment 15 Bryce Harrington 2010-07-08 13:22:40 UTC
Created attachment 36872 [details]
RelatedPackageVersions.txt
Comment 16 Bryce Harrington 2010-07-08 13:22:46 UTC
Created attachment 36873 [details]
UdevDb.txt
Comment 17 Bryce Harrington 2010-07-08 13:22:53 UTC
Created attachment 36874 [details]
UdevLog.txt
Comment 18 Bryce Harrington 2010-07-08 13:22:59 UTC
Created attachment 36875 [details]
XorgConf.txt
Comment 19 Bryce Harrington 2010-07-08 13:23:05 UTC
Created attachment 36876 [details]
XorgLog.txt
Comment 20 Bryce Harrington 2010-07-08 13:23:11 UTC
Created attachment 36877 [details]
XorgLogOld.txt
Comment 21 Bryce Harrington 2010-07-08 13:23:18 UTC
Created attachment 36878 [details]
Xrandr.txt
Comment 22 Bryce Harrington 2010-07-08 13:23:24 UTC
Created attachment 36879 [details]
glxinfo.txt
Comment 23 Bryce Harrington 2010-07-08 13:23:31 UTC
Created attachment 36880 [details]
i915_error_state.txt
Comment 24 Bryce Harrington 2010-07-08 13:23:39 UTC
Created attachment 36881 [details]
setxkbmap.txt
Comment 25 Bryce Harrington 2010-07-08 13:23:46 UTC
Created attachment 36882 [details]
xdpyinfo.txt
Comment 26 Bryce Harrington 2010-07-08 13:23:53 UTC
Created attachment 36883 [details]
xkbcomp.txt
Comment 27 Chris Wilson 2010-07-09 02:28:45 UTC
batchbuffer at 0x0edac000:
0x0edac000:      0x09000000: MI_LOAD_SCAN_LINES_INCL
0x0edac004:      0x000004b0:    dword 1
0x0edac008:      0x09000000: MI_LOAD_SCAN_LINES_INCL
0x0edac00c:      0x000004b0:    dword 1
0x0edac010:      0x01800002: MI_WAIT_FOR_EVENT
0x0edac014: HEAD 0x54f08806: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src tile 1, dst tile 1)
0x0edac018:      0x03cc0780:    format 8888, dst pitch 1920, clipping disabled
0x0edac01c:      0x00000000:    dst (0,0)
0x0edac020:      0x04b00780:    dst (1920,1200)
0x0edac024:      0x08f93000:    dst offset 0x08f93000
0x0edac028:      0x00000000:    src (0,0)
0x0edac02c:      0x00000780:    src pitch 1920
0x0edac030:      0x07546000:    src offset 0x07546000
0x0edac034:      0x02000000: MI_FLUSH
0x0edac038:      0x00000000: MI_NOOP
0x0edac03c:      0x05000000: MI_BATCH_BUFFER_END
Comment 28 Chris Wilson 2010-07-09 02:40:57 UTC
Hmm, it is interesting how more careful the dri code is in handling the MI_WAIT_FOR_EVENT.
Comment 29 Chris Wilson 2010-07-09 02:54:48 UTC
I've pushed addition checks from the dri WAIT_FOR_EVENT handling as they didn't appear to negatively impact my machine:

commit 272d1c14a39c32ade39b5a8b080a891f2b3d6e8e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jul 9 10:41:19 2010 +0100

    video: apply the crtc box checks from dri.
    
    The dri code is much more careful in ensuring that the scan lines that
    is waits for are valid. Copy this code to video, with a bit of work this
    can be refactored, and perhaps even teach dri how to handle rotated
    front buffers.
    
    References:
    
      Bug 28964 - [i965gm] GPU infinite MI_WAIT_FOR_EVENT while watching video
                  in Totem
      https://bugs.freedesktop.org/show_bug.cgi?id=28964

However, these are just a set of extra sanity checks. It is not clear under what circumstances the machine froze so I cannot say whether this is the fix for the bug.
Comment 30 Chris Wilson 2010-07-09 03:07:18 UTC
Clearing regression keyword, nothing in the report suggests that this bug has been recently introduced. If it can be narrowed down to particular commit (or range thereof) that would be most useful.
Comment 31 Chris Wilson 2010-09-06 10:08:57 UTC
http://cgit.freedesktop.org/~ickle/drm-intel/log/?h=drm-intel-next contains a new check in hangcheck that should fix these as a last resort.
Comment 32 Chris Wilson 2010-09-11 01:18:10 UTC
Repository moved:

git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel.git drm-intel-next
Comment 33 Chris Wilson 2010-11-13 01:56:30 UTC
commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Nov 13 09:49:11 2010 +0000

    drm/i915: Retire any pending operations on the old scanout when switching
    
    An old and oft reported bug, is that of the GPU hanging on a
    MI_WAIT_FOR_EVENT following a mode switch. The cause is that the GPU is
    waiting on a scanline counter on an inactive pipe, and so waits for a
    very long time until eventually the user reboots his machine.
    
    We can prevent this either by moving the WAIT into the kernel and
    thereby incurring considerable cost on every swapbuffers, or by waiting
    for the GPU to retire the last batch that accesses the framebuffer
    before installing a new one. As mode switches are much rarer than swap
    buffers, this looks like an easy choice.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28964
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29252
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: stable@kernel.org

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.