Bug 26333 - [GM45] system locks up at random intervals
Summary: [GM45] system locks up at random intervals
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact:
URL: https://bugs.launchpad.net/bugs/469820
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2010-01-30 03:34 UTC by Geir Ove Myhr
Modified: 2017-07-24 23:08 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
lspci -vvnn (12.21 KB, text/plain)
2010-01-30 03:37 UTC, Geir Ove Myhr
no flags Details
Batchbuffer dump from 2010-01-26 (390.12 KB, application/x-compressed-tar)
2010-01-30 03:42 UTC, Geir Ove Myhr
no flags Details
Batchbuffer dump from 2010-01-29 (389.64 KB, application/x-compressed-tar)
2010-01-30 03:43 UTC, Geir Ove Myhr
no flags Details
Record batch buffer at time of error (15.37 KB, patch)
2010-02-19 08:45 UTC, Chris Wilson
no flags Details | Splinter Review

Description Geir Ove Myhr 2010-01-30 03:34:22 UTC
Originally reported by xXPenGuiNXx at:
  https://bugs.launchpad.net/bugs/469820

[Problem]

GPU hangs at random intervals on HP G71 notebook with GM45. The hangs have been around since at least Ubuntu 9.04 (with kernel 2.6.28, -intel 2.6.3) and are still present with Ubuntu 10.04 with xorg-edgers (xorg and drivers from git) and kernel 2.6.31-rc5. Sometimes the mouse locks up, sometimes it doesn't.

[Original bug report from Ubuntu 9.10]

Binary package hint: xorg

system just freezes, but the mouse is moveable. clicking things do nothing, and the keyboard is non-responsive, so no virtual consoles can be reached. syslog and dmesg has i2c adapter error - unable to read EDID block - and i915 error -no EDID data.
doesn't seem to be a trigger, the lock up has happened during multiple tasks, including screensaver and hibernation.

i can ssh into the locked up system, and the sub systems seem fine. i can run commands etc, but gdm restart, start and stop commands are of no help. at times when it did work, the system locked up soon after. 

i had the same lock up problem in 9.04 with this system, as i installed it a week before karmic release. i thought it to be a kernel problem that would be fixed in 9.10, but to no avail. research led me to graphic server problems that were similar.

ProblemType: Bug
Architecture: amd64
Date: Sun Nov  1 16:38:14 2009
DistroRelease: Ubuntu 9.10
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
MachineType: Hewlett-Packard HP G71 Notebook PC
Package: xorg 1:7.4+3ubuntu7
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-14-generic root=UUID=019e32e1-6813-4e60-8ecc-66a711cf49c0 ro quiet splash
ProcEnviron:
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
RelatedPackageVersions:
 xserver-xorg 1:7.4+3ubuntu7
 libgl1-mesa-glx 7.6.0-1ubuntu4
 libdrm2 2.4.14-1ubuntu1
 xserver-xorg-video-intel 2:2.9.0-1ubuntu2
 xserver-xorg-video-ati 1:6.12.99+git20090929.7968e1fb-0ubuntu1
SourcePackage: xorg
Uname: Linux 2.6.31-14-generic x86_64
XorgConf: Error: [Errno 2] No such file or directory: '/etc/X11/xorg.conf'
XsessionErrors:
 (gnome-settings-daemon:1770): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (polkit-gnome-authentication-agent-1:1814): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed
 (nautilus:1802): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed
dmi.bios.date: 08/21/2009
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: F.11
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: 306B
dmi.board.vendor: Quanta
dmi.board.version: 21.12
dmi.chassis.type: 10
dmi.chassis.vendor: Quanta
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnHewlett-Packard:bvrF.11:bd08/21/2009:svnHewlett-Packard:pnHPG71NotebookPC:pvrRev1:rvnQuanta:rn306B:rvr21.12:cvnQuanta:ct10:cvrN/A:
dmi.product.name: HP G71 Notebook PC
dmi.product.version: Rev 1
dmi.sys.vendor: Hewlett-Packard
fglrx: Not loaded
system:
 distro:             Ubuntu
 architecture:       x86_64kernel:             2.6.31-14-generic
Comment 1 Geir Ove Myhr 2010-01-30 03:37:14 UTC
Created attachment 32925 [details]
lspci -vvnn
Comment 2 Geir Ove Myhr 2010-01-30 03:42:58 UTC
Created attachment 32926 [details]
Batchbuffer dump from 2010-01-26

this freeze the mouse locked up too

The tarball contains Xorg.0.log, intel_gpu_dump.txt and dmesg.txt in addition to i915_* (with multiple freezes, isn't it best to keep the logs in each tarball?)
Comment 3 Geir Ove Myhr 2010-01-30 03:43:58 UTC
Created attachment 32927 [details]
Batchbuffer dump from 2010-01-29

interesting. the freeze looked like the ones i have been having except when testing keys (i check the numlock or caps to check for keyboard activity) the screen went black and the mouse turned to a wheel for a second. i then checked for virtual terminals in which i could access all of them. there was a i915 error waiting for me telling me there was a GPU hang. i made a buffer dump.
Comment 4 Jesse Barnes 2010-02-05 15:17:12 UTC
Chris has been looking at some of these; here's hoping this is a dupe of an existing bug.
Comment 5 Chris Wilson 2010-02-08 02:57:48 UTC
Sorry Geir, ever since hang check started resetting the gpu on hangs we've lost the utility of dumping, the dumps you have captured contain no information. :(

I've a patch that I've been meaning to encourage Eric to apply, time to poke him again.
Comment 6 Chris Wilson 2010-02-10 06:19:40 UTC
Geir, this commit [in libdrm] should fix many of the completely mysterious hangs, can you check whether it has a positive effect for you as well:

commit 4f0f871730b76730ca58209181d16725b0c40184
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Feb 10 09:45:13 2010 +0000

    intel: Handle resetting of input params after EINTR during SET_TILING
    
    The SET_TILING is pernicious in that it overwrites the input arguments
    following an error in order to report the current tiling state of the
    buffer. This caught us by surprise as we then fed those arguments back
    into to the ioctl unmodified following an EINTR and so the kernel then
    reported success for the no-op. We interpreted this success as meaning
    that the tiling on the buffer had changed so updated our state and
    started using the buffer incorrectly in the new tiled/untiled manner.
    This lead to all sorts of random corruption and GPU hangs, even though
    the batch buffers would look sane (when the GPU had not wandered off
    into forbidden territory).
    
    References:
    
      Bug 25475 - [i915] Xorg crash / Execbuf while wedged
      http://bugs.freedesktop.org/show_bug.cgi?id=25475
    
      Bug 25554 - i830_uxa_prepare_access: gtt bo map failed: Input/output error
      http://bugs.freedesktop.org/show_bug.cgi?id=25554
    
    (And probably every other weird bug in the last few months.)
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Comment 7 Geir Ove Myhr 2010-02-15 00:17:42 UTC
(In reply to comment #6)
> Geir, this commit [in libdrm] should fix many of the completely mysterious
> hangs, can you check whether it has a positive effect for you as well:

I patched libdrm 2.17 with this patch and had the original reporter test it. Unfortunately, it didn't have a positive effect. 
Comment 8 Chris Wilson 2010-02-19 08:45:55 UTC
Created attachment 33423 [details] [review]
Record batch buffer at time of error

Can you please apply this error capture patch and upload the resulting /debug/dri/.../i915_error_state?
Comment 9 Geir Ove Myhr 2010-02-19 08:58:03 UTC
(In reply to comment #8)
> Can you please apply this error capture patch and upload the resulting
> /debug/dri/.../i915_error_state?

I'm having problems booting self-compiled kernels in Ubuntu at the moment, but once I figure out what the problem is, I will build and ask xxpenguinxx to test and clear the NEEDINFO flag once it's done.

xxpenguinxx, there is a kernel with v7 of the patch at http://www.kvante.info/recordbatchbuffer/ that you may try if you feel lucky, but as I said, it doesn't boot on my system. It is possible that uninstalling Plymouth will make it boot.
Comment 10 Geir Ove Myhr 2010-02-22 03:19:59 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Can you please apply this error capture patch and upload the resulting
> > /debug/dri/.../i915_error_state?
> xxpenguinxx, there is a kernel with v7 of the patch at
> http://www.kvante.info/recordbatchbuffer/ that you may try if you feel lucky,
> but as I said, it doesn't boot on my system. It is possible that uninstalling
> Plymouth will make it boot.

xxpenguinxx, after an apt-get dist-upgrade of my Lucid today, the kernels are now booting (there are some mount- and plymouth-related error messages at boot, though). So can you try the kernel from http://www.kvante.info/recordbatchbuffer/ and capture the file /sys/kernel/debug/dri/0/i915_error_state when it freezes? This is version 7 of the patch. If this is a problem (Chris?) I may build one with version 8 later.
Comment 11 xxpenguinxx 2010-02-22 13:50:29 UTC
absolutely. i haven't had time as of late to get the kernel in, but i
haven't had a lock up in quite a few days either so it hasn't been at the
top of lists. i will try to carve out an hour or so in the next few days and
run a test drive with the patched kernel.

On Mon, Feb 22, 2010 at 3:19 AM, <bugzilla-daemon@freedesktop.org> wrote:

> http://bugs.freedesktop.org/show_bug.cgi?id=26333
>
>
>
>
>
> --- Comment #10 from Geir Ove Myhr <gomyhr@gmail.com>  2010-02-22 03:19:59
> PST ---
> (In reply to comment #9)
> > (In reply to comment #8)
> > > Can you please apply this error capture patch and upload the resulting
> > > /debug/dri/.../i915_error_state?
> > xxpenguinxx, there is a kernel with v7 of the patch at
> > http://www.kvante.info/recordbatchbuffer/ that you may try if you feel
> lucky,
> > but as I said, it doesn't boot on my system. It is possible that
> uninstalling
> > Plymouth will make it boot.
>
> xxpenguinxx, after an apt-get dist-upgrade of my Lucid today, the kernels
> are
> now booting (there are some mount- and plymouth-related error messages at
> boot,
> though). So can you try the kernel from
> http://www.kvante.info/recordbatchbuffer/ and capture the file
> /sys/kernel/debug/dri/0/i915_error_state when it freezes? This is version 7
> of
> the patch. If this is a problem (Chris?) I may build one with version 8
> later.
>
>
> --
> Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
>



Comment 12 Chris Wilson 2010-07-24 04:19:55 UTC
Closing, as the last comment indicated no sightings of the bug and there have been no further reports...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.