Bug 41099 - [gm45] GPU lockup render.IPEHR: 0x78080003 playing Minecraft (mesa?)
Summary: [gm45] GPU lockup render.IPEHR: 0x78080003 playing Minecraft (mesa?)
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 7.11
Hardware: x86 (IA32) Linux (All)
: high major
Assignee: Ian Romanick
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 42991 44622
  Show dependency treegraph
 
Reported: 2011-09-21 18:04 UTC by Bryce Harrington
Modified: 2012-08-10 10:04 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
XorgLog.txt (31.49 KB, text/plain)
2011-09-21 18:04 UTC, Bryce Harrington
Details
i915_error_state.txt (1.53 MB, text/plain)
2011-09-21 18:05 UTC, Bryce Harrington
Details
hs_err_pid5105.log (64.00 KB, text/x-log)
2011-09-21 18:07 UTC, Bryce Harrington
Details
CurrentDmesg.txt (1.96 KB, text/plain)
2011-09-21 18:07 UTC, Bryce Harrington
Details
BootDmesg.txt (60.98 KB, text/plain)
2011-09-21 18:07 UTC, Bryce Harrington
Details
i915_error_state after a gpu hang on SNB (2.17 MB, application/octet-stream)
2011-10-17 07:07 UTC, Michael Groh
Details
Dmesg output after most recent GUP hang (603 bytes, text/plain)
2011-11-09 09:25 UTC, Corbin
Details

Description Bryce Harrington 2011-09-21 18:04:08 UTC
Forwarding this bug from Ubuntu reporter Corbin:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/852812

[Problem]
Reproducable GPU lockup when using Minecraft.
(Only this one person reporting this problem, so far.)

[   35.612020] hci_cmd_timer: hci0 command tx timeout
[   36.612020] hci_cmd_timer: hci0 command tx timeout
[   37.612020] hci_cmd_timer: hci0 command tx timeout
[ 1337.456032] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1337.456038] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 1337.458392] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 520757 at 520755, next 520807)

[Original Description]
This happens when I play the game Minecraft in 11.10 Oneiric. To be honest, I don't know if it is my video card, Java, or my processor that is the problem. When I run Minecraft, Java usually crashes after a few minutes. And not just Java, it usually beings down a few other processes with it. I could run this fine in 11.04. After Java crashed this time, Apport had me report this bug. Also, after Java crashed this time, Unity also crashed, so I currently  have no panels or window decorations. I'll reboot and try to crash it all again, and see if I can get some more useful information. All I know is that Java consistently crashes in Oneiric while running Minecraft on my system, and almost always causes more problems when it crashes.

ProblemType: Crash
DistroRelease: Ubuntu 11.10
Package: xserver-xorg-video-intel 2:2.15.901-1ubuntu2
ProcVersionSignature: Ubuntu 3.0.0-11.18-generic 3.0.4
Uname: Linux 3.0.0-11-generic i686
.tmp.unity.support.test.0:
 
ApportVersion: 1.23-0ubuntu1
Architecture: i386
Chipset: gm45
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
Date: Sat Sep 17 12:54:52 2011
DistUpgraded: Log time: 2011-09-09 10:05:06.097430
DistroCodename: oneiric
DistroVariant: ubuntu
DuplicateSignature: [gm45] GPU lockup  render.IPEHR: 0x78080003 Ubuntu 11.10
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
ExtraDebuggingInterest: Yes, whatever it takes to get this fixed in Ubuntu
GpuHangFrequency: Several times a week
GpuHangReproducibility: Yes, I can easily reproduce it
GpuHangStarted: I don't know
GraphicsCard:
 Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07) (prog-if 00 [VGA controller])
   Subsystem: Dell Device [1028:0401]
   Subsystem: Dell Device [1028:0401]
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Release i386 (20110427.1)
InterpreterPath: /usr/bin/python2.7
MachineType: Dell Inc. Vostro 1014
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:
 
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-11-generic root=UUID=436f32d2-4869-4b5a-94f3-ffdf2fede543 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg             1:7.6+7ubuntu7
 libdrm2                  2.4.26-1ubuntu1
 xserver-xorg-video-intel 2:2.15.901-1ubuntu2
SourcePackage: xserver-xorg-video-intel
Title: [gm45] GPU lockup  render.IPEHR: 0x78080003
UpgradeStatus: Upgraded to oneiric on 2011-09-09 (8 days ago)
UserGroups:
 
dmi.bios.date: 11/26/2010
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A07
dmi.board.name: 0XHND9
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA07:bd11/26/2010:svnDellInc.:pnVostro1014:pvr:rvnDellInc.:rn0XHND9:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Vostro 1014
dmi.sys.vendor: Dell Inc.
version.compiz: compiz 1:0.9.5.94+bzr2803-0ubuntu5
version.libdrm2: libdrm2 2.4.26-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 7.11-0ubuntu3
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 7.11-0ubuntu3
version.xserver-xorg: xserver-xorg 1:7.6+7ubuntu7
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.6.0-1ubuntu13
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20110811.g93fc084-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.15.901-1ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20110411+8378443-1
Comment 1 Bryce Harrington 2011-09-21 18:04:55 UTC
Created attachment 51487 [details]
XorgLog.txt
Comment 2 Bryce Harrington 2011-09-21 18:05:38 UTC
Created attachment 51488 [details]
i915_error_state.txt
Comment 3 Bryce Harrington 2011-09-21 18:07:28 UTC
Created attachment 51489 [details]
hs_err_pid5105.log
Comment 4 Bryce Harrington 2011-09-21 18:07:38 UTC
Created attachment 51490 [details]
CurrentDmesg.txt
Comment 5 Bryce Harrington 2011-09-21 18:07:55 UTC
Created attachment 51491 [details]
BootDmesg.txt
Comment 6 Corbin 2011-09-21 21:51:56 UTC
Hello. I am the original reporter. I am here if you need any more logs, reproductions, patch testing, etc. I'm not sure how to subscribe to this, so if I forget about it, you can email me directly at jdgregson@gmail.com.
Comment 7 Chris Wilson 2011-09-22 01:51:06 UTC
Dies near the start of a batch. How early into the start of minecraft does it crash?

This should be amenable to bisection, if you fancy the challenge.
Comment 8 Corbin 2011-09-22 11:33:51 UTC
It is kind of difficult to get it to happen at the same time. For example, last night it crashed twice, within two minutes of starting the game. Today, it crashed twice only after the game had been running for ten or fifteen minutes. And by 'crash twice' I mean, two separate incidents. The time and cause of the crash seems somewhat random. The only thing that is not random is the fact that it always crashes. Also, I'm having difficulty getting the logs, because after it crashes, Unity also goes down most of the time, so I have to reboot, which seems to clear the Xorg logs. Perhaps I could try running it in a Desktop Environment other than Unity, which is already pretty unstable at this point.

What do you mean by "This should be amenable to bisection, if you fancy the challenge."?
Comment 9 Bryce Harrington 2011-09-22 19:53:46 UTC
> What do you mean by "This should be amenable to bisection, if you fancy the
> challenge."?

It means, you could probably narrow the bug down by following the steps documented here:

  https://wiki.ubuntu.com/Kernel/KernelBisection

You know that it worked in natty, so take as your starting point "2.6.38", and end point "3.0.0".

As a shortcut, or if  building things from git looks like *too* much of a challenge, there are some pre-built kernel .debs at http://kernel.ubuntu.com/~kernel-ppa/mainline/, though this won't get you down to the exact commit.
Comment 10 Corbin 2011-09-22 20:26:31 UTC
Alright, I am going to try to bisect this. I was wanting to work on kernels anyway. But I'm not confident that I will get any useful results. Is it possible that one of you could bisect and build them, and than have me test them?
Comment 11 Bryce Harrington 2011-09-22 23:55:38 UTC
Hi Corbin,

I wish we could pre-build all the kernels to make this easier, and maybe one day we will but unfortunately we're not really set up for that currently.  So, sorry...  we'll try to be available for questions though (but be aware the next couple weeks are crunch time for getting oneiric finalized.)
Comment 12 Corbin 2011-09-23 09:44:52 UTC
I understand. I will seek closer support elsewhere, and post back if we find anything. Thank you for your help.
Comment 13 Michael Groh 2011-10-17 07:07:44 UTC
Created attachment 52422 [details]
i915_error_state after a gpu hang on SNB

Hello everyone,

i do also get hangs with minecraft - however i have a Intel(R) Sandybridge Mobile (GT2+) running  3.1-rc6+ (drm-intel-next git 64a742fac3a22f57303d8f1b7e347350a1c48254)

I can run minecraft, but after some time i get a gpu hang. dmesg then says:


[18291.377254] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[18291.377262] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[18291.380448] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 3243224 at 3243214, next 3243229)

Said /debug/dri/0/i915_error_state is attached.

If i need to provide more informations, feel free to ask.

Thanks for your help,
Michael
Comment 14 Kenneth Graunke 2011-10-24 20:07:47 UTC
Michael,

Though in the same game, your hang on Sandybridge turned out to be unrelated.  I fixed it today in Mesa master:

commit 3cc0a7be23ab603ed40d602595f673a44e079885
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Fri Oct 21 01:03:37 2011 -0700

    i965: Apply post-sync non-zero workaround to homebrew workaround.

The fix should appear in Mesa 7.11.1.

However, that doesn't resolve Corbin's original issue on GM45.  This bug report continues to track that.
Comment 15 Michael Groh 2011-10-25 00:56:55 UTC
Kenneth,

i can confirm that this fixes my problem. I just updated mesa from git, and played for some time without crashes.

Too bad that this doesnt fix things for Bryce - good luck that the problem will be fixed soon :)

Thanks for your work Kenneth!

Michael
Comment 16 Eugeni Dodonov 2011-11-03 13:59:03 UTC
The patch was cherry-picked to 7.11, so closing this.
Comment 17 Corbin 2011-11-03 14:03:58 UTC
Wait, did the patch resolve Michael's issue, as stated above, or is there another patch that also resolves my issue?
Comment 18 Kenneth Graunke 2011-11-03 17:44:47 UTC
As far as I know, it's still broken on GM45.  Not sure why, unfortunately.  Reopening.
Comment 19 Corbin 2011-11-03 17:50:12 UTC
Okay, thank you. I can provide any testing needed still, but I discovered that bisecting the kernel myself is a quite a bit outside of my skill-set.
Comment 20 Corbin 2011-11-09 09:24:29 UTC
Good morning, 
This bug does still exist. After about an hour of testing, my tester's game-play was ended by a GUP hang (see attached terminal output).  

Before this was reproduced, there was a similar problem, which may or may not be related. Because I have no idea what happened, I'll talk about it here. While the tester was playing, the X server crashed, and the screen switched to a TTY-like session with kernel messages about the error, and the kernel seems to have completely crashed. I believe that the kernel crashed because the computer was taking no input. I couldn't switch TTY Sessions or anything, I just had to do a hard reboot. Because I had no X server,  the only valuable output I could get was by taking pictures of the screen with my cell phone. I uploaded the images to my site here: http://jdgregson.com/temp/ubuntubug/ It may be important to note that this error occurred while the tester was using the guest session included in lightdm.  

Hope this helps.
Comment 21 Corbin 2011-11-09 09:25:29 UTC
Created attachment 53334 [details]
Dmesg output after most recent GUP hang
Comment 22 Kenneth Graunke 2011-11-09 13:17:50 UTC
Nasty...looks like something deeper than Mesa.  CC'ing Chris and Daniel, maybe one of them will have some insight.
Comment 23 Daniel Vetter 2011-11-10 04:20:27 UTC
Comment on attachment 52422 [details]
i915_error_state after a gpu hang on SNB

This is an error_state for another bug!!
Comment 24 Daniel Vetter 2011-11-10 04:28:32 UTC
Looked a bit at it and not noticed much. We have a few fancy reports of strange crashes, so I think another error_state would be interesting to gauge the randomness involved in this issue.
Comment 25 Eric Anholt 2012-08-08 19:39:14 UTC
Running minecraft on my g45, it's been up and rendering for about 10 minutes and I've wandered around for a bit, no GPU hang.

Can you figure out some way to reliably produce the hang in just a few minutes, so that we can actually debug?  Even if you can't come up with a way for us to reproduce the problem, if you can manage to do it in a short time period once with INTEL_DEBUG=aub set in the environment, the resulting intel.aub file may be useful.  It requires upstream Mesa, though -- 8.0.x doesn't have that logging feature.

(note, for those with theories of possible GPU hangs, minecraft isn't using FBOs)
Comment 26 Corbin 2012-08-08 20:03:52 UTC
I haven't been playing this game very much anymore, but the GPU hang hasn't happened for a while. It never was predictable, but it would usually happen within an hour or two. About a month ago my friend and I played for about eight hours straight and it didn't happen.

It's possible that this was fixed somehow. I definitely remember some i915 updates since this GPU hang happened last.
Comment 27 Daniel Vetter 2012-08-10 10:04:46 UTC
I guess then that we can tentatively close this issue. Thanks a lot for reporting it, and please reopen if it blows up again.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.