Bug 48829 - [i965gm] GPU hang, stray GL_DEPTH_BUFFER clear
Summary: [i965gm] GPU hang, stray GL_DEPTH_BUFFER clear
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 8.0
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Ian Romanick
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-17 10:20 UTC by Bryce Harrington
Modified: 2012-05-04 14:59 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
BootDmesg.txt (64.77 KB, text/plain)
2012-04-17 10:21 UTC, Bryce Harrington
Details
CurrentDmesg.txt (86.83 KB, text/plain)
2012-04-17 10:21 UTC, Bryce Harrington
Details
i915_error_state.txt (766.76 KB, text/plain)
2012-04-17 10:21 UTC, Bryce Harrington
Details
XorgLog.txt (38.76 KB, text/plain)
2012-04-17 10:22 UTC, Bryce Harrington
Details
Screenshot from 2012-04-12 12:05:52.png (162.14 KB, image/png)
2012-04-17 10:24 UTC, Bryce Harrington
Details
Another crash dump (290.80 KB, application/x-gzip)
2012-04-19 09:11 UTC, Ben Gamari
Details
Yet another dump (147.62 KB, application/x-gzip)
2012-04-19 09:21 UTC, Ben Gamari
Details
i915_error_state from odd hang (894.48 KB, application/octet-stream)
2012-04-19 09:28 UTC, Ben Gamari
Details
crash-20120419-1230 (303.52 KB, application/x-gzip)
2012-04-19 09:31 UTC, Ben Gamari
Details
crash-20120419-1233 (169.39 KB, application/x-gzip)
2012-04-19 09:33 UTC, Ben Gamari
Details
crash-20120419-1242 (288.09 KB, application/x-gzip)
2012-04-19 09:42 UTC, Ben Gamari
Details

Description Bryce Harrington 2012-04-17 10:20:49 UTC
Forwarding this bug from Ubuntu reporter Ben Gamari:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/980017

[Problem]
Lockup several times a day preceded by corruption, started immediately after upgrading to precise on 1st April.  Does not occur with Unity 2D.  Seems to be more frequently triggered when using the Unity 3D application switcher.

We saw quite a few 0x02000004 bugs last cycle in Ubuntu Oneiric, but this cycle just a couple.  (LP #978836 being the other report).

[Original Description]
Machine sporadically locks up while running compiz.

Lockup is generally preceded by obvious display corruption. Eventually GPU locks up, resulting in a blank screen and unresponsive machine, even to sysrq.

Attached is an example of the sort of corruption exhibited. Note the color and grey boxes of the menu bar at the top of the screen.

It actually seems that the system will respond to sysrq if issued not too long after the screen turns blank. After a few seconds however, it will not respond.

The application switcher seems to be very good at reproducing the crash, which occurs quite often, usually within five minutes of logging in. The machine is completely stable under Unity 2D.


ProblemType: Crash
DistroRelease: Ubuntu 12.04
Package: xserver-xorg-video-intel 2:2.17.0-1ubuntu4
ProcVersionSignature: Ubuntu 3.2.0-23.36-generic 3.2.14
Uname: Linux 3.2.0-23-generic x86_64
.tmp.unity.support.test.0:
 
ApportVersion: 2.0.1-0ubuntu2
Architecture: amd64
Chipset: i965gm
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
Date: Thu Apr 12 11:54:19 2012
DistUpgraded: 2012-04-01 17:31:17,679 DEBUG enabling apt cron job
DistroCodename: precise
DistroVariant: ubuntu
DuplicateSignature: [i965gm] GPU lockup  render.IPEHR: 0x02000004 Ubuntu 12.04
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
ExtraDebuggingInterest: Yes, whatever it takes to get this fixed in Ubuntu
GpuHangFrequency: Several times a day
GpuHangReproducibility: Seems to happen randomly
GpuHangStarted: Immediately after installing this version of Ubuntu
GraphicsCard:
 Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) [8086:2a02] (rev 0c) (prog-if 00 [VGA controller])
   Subsystem: Dell Device [1028:01fe]
   Subsystem: Dell Device [1028:01fe]
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release amd64 (20111012)
InterpreterPath: /usr/bin/python2.7
MachineType: Dell Inc. Latitude D830
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:
 
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-23-generic root=UUID=27880cc8-df42-4098-8e07-3c4fb9dba0a5 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg             1:7.6+12ubuntu1
 libdrm2                  2.4.32-1ubuntu1
 xserver-xorg-video-intel 2:2.17.0-1ubuntu4
SourcePackage: xserver-xorg-video-intel
Title: [i965gm] GPU lockup  render.IPEHR: 0x02000004
UpgradeStatus: Upgraded to precise on 2012-04-01 (10 days ago)
UserGroups:
 
dmi.bios.date: 01/04/2010
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A15
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA15:bd01/04/2010:svnDellInc.:pnLatitudeD830:pvr:rvnDellInc.:rn:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Latitude D830
dmi.sys.vendor: Dell Inc.
version.compiz: compiz 1:0.9.7.6-0ubuntu1~ppa3
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.32-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 8.0.2-0ubuntu3
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 8.0.2-0ubuntu3
version.xserver-xorg-core: xserver-xorg-core 2:1.11.4-0ubuntu10
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.0-0ubuntu1
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20111219.aacbd629-0ubuntu2
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.17.0-1ubuntu4
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20111201+b5534a1-1build2
Comment 1 Bryce Harrington 2012-04-17 10:21:20 UTC
Created attachment 60189 [details]
BootDmesg.txt
Comment 2 Bryce Harrington 2012-04-17 10:21:32 UTC
Created attachment 60190 [details]
CurrentDmesg.txt
Comment 3 Bryce Harrington 2012-04-17 10:21:49 UTC
Created attachment 60191 [details]
i915_error_state.txt
Comment 4 Bryce Harrington 2012-04-17 10:22:15 UTC
Created attachment 60192 [details]
XorgLog.txt
Comment 5 Bryce Harrington 2012-04-17 10:24:25 UTC
Created attachment 60193 [details]
Screenshot from 2012-04-12 12:05:52.png
Comment 6 Chris Wilson 2012-04-18 06:49:49 UTC
The error-state looks ordinary and more importantly self-consistent. There are not the tell-tales of recent bugs, so I currently have no explanation for the hang. Can you please attach a few more error-states to see if a pattern forms?
Comment 7 Ben Gamari 2012-04-19 09:11:41 UTC
Created attachment 60321 [details]
Another crash dump

The panel was inactive although the machine was responsive over SSH.
Comment 8 Chris Wilson 2012-04-19 09:20:51 UTC
In the second crash dump, mesa overwrote our batch performing a depth-clear.

One more...
Comment 9 Ben Gamari 2012-04-19 09:21:15 UTC
Created attachment 60322 [details]
Yet another dump
Comment 10 Chris Wilson 2012-04-19 09:26:22 UTC
(In reply to comment #9)
> Created attachment 60322 [details]
> Yet another dump

This wasn't a hang, so I'm not going to use its vote as to whether there is an underlying UXA bug here...
Comment 11 Ben Gamari 2012-04-19 09:28:32 UTC
Created attachment 60323 [details]
i915_error_state from odd hang

This time I couldn't get the rest of the dump as cat BUGs in i915_batchbuffer_info. Nevertheless, here is i915_error_state.
Comment 12 Ben Gamari 2012-04-19 09:31:36 UTC
Created attachment 60324 [details]
crash-20120419-1230
Comment 13 Chris Wilson 2012-04-19 09:33:30 UTC
That time, the stray depth clear hit a Mesa batch buffer. With 3 clear errors, let's presume this is the first and foremost the stray clear that's causing the hangs.
Comment 14 Ben Gamari 2012-04-19 09:33:47 UTC
Created attachment 60325 [details]
crash-20120419-1233
Comment 15 Ben Gamari 2012-04-19 09:42:28 UTC
Created attachment 60327 [details]
crash-20120419-1242
Comment 16 Ben Gamari 2012-04-19 12:07:25 UTC
I can confirm that the problem appears to be gone with mesa master (dbf48e88)
Comment 17 Ben Gamari 2012-04-19 13:02:15 UTC
The 8.0 branch (6fe42b6) exhibits the problem.
Comment 18 Ben Gamari 2012-04-19 13:03:51 UTC
(In reply to comment #17)
> The 8.0 branch (6fe42b6) exhibits the problem.

To clarify the 8.0 branch (currently 49ed43b6) exhibits the issue as does 6fe42b6, the point where master diverged from 8.0.
Comment 19 Ben Gamari 2012-04-19 13:09:21 UTC
8f5c172c does not exhibit the problem
Comment 20 Ben Gamari 2012-04-19 13:16:10 UTC
952ca07 exhibits the problem.
Comment 21 Ben Gamari 2012-04-19 13:30:33 UTC
7335cf1c exhibits the problem.
Comment 22 Ben Gamari 2012-04-19 13:40:46 UTC
9be0f9 exhibits the issue.
Comment 23 Ben Gamari 2012-04-19 13:42:46 UTC
dbadd39 does not exhibit the problem.
Comment 24 Ben Gamari 2012-04-19 13:52:56 UTC
f00c97b does not exhibit the problem.
117a0e9 exhibits the problem.
308c6be exhibits the problem.
fbe8543 does not exhibit the problem.
e2dce7f does not exhibit the problem.
Comment 25 Ben Gamari 2012-04-19 13:53:23 UTC
Here is the first working commit:

commit e2dce7f7ee3e7da9cbb0bb33307ecd79e824426d
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Feb 10 12:54:25 2012 -0800

    intel: Fix rendering from textures after RenderTexture().
    
    There's a serious trap for drivers: RenderTexture() does not indicate
    that the texture is currently bound to the draw buffer, despite
    FinishRenderTexture() signaling that the texture is just now being
    unbound from the draw buffer.
    
    We were acting as if RenderTexture() *was* the start of rendering and
    that we could make texturing incoherent with the current contents of
    the renderbuffer.  This caused intel oglconform sRGB
    Mipmap.1D_textures to fail, because we got a call to TexImage() and
    thus RenderTexture() on a texture bound to a framebuffer that wasn't
    the draw buffer, so we skipped validating the new image into the
    texture object used for rendering.
    
    We can't (easily) make RenderTexture() indicate the start of drawing,
    because both our driver and gallium are using it as the moment to set
    up the renderbuffer wrapper used for things like MapRenderbuffer().
    Instead, postpone the setup of the workaround render target miptree
    until update_renderbuffer time, so that we no longer need to skip
    validation of miptrees used as render targets.  As a bonus, this
    should make GL_NV_texture_barrier possible.
    
    (This also fixes a regression in the gen4 small-mipmap rendering since
    3b38b33c1648b07e75dc4d8340758171e109c598, which switched
    set_draw_offset from image->mt to irb->mt but didn't move the irb->mt
    replacement up before set_draw_offset).
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44961
    NOTE: This is a candidate for the 8.0 branch.
Comment 26 Eric Anholt 2012-05-04 14:59:17 UTC
Pushed the cherry pick.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.