Bug 16091 - once a week I find my computer with screensaver hung
Summary: once a week I find my computer with screensaver hung
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: haihao
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-25 06:11 UTC by martin
Modified: 2009-01-13 18:41 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description martin 2008-05-25 06:11:03 UTC
Full report here:
https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/234768

I'm using the "2:2.2.1-1ubuntu12" version of the xserver-xorg-video-intel package. If you give step by step instructions I can try a later version of the driver.

Also, do you have any ideas on how to debug this further? Since I can't use any "sudo" commands I currently don't know how to extract more information from this bug?
Comment 1 Gordon Jin 2008-05-25 20:26:44 UTC
Is "sudo not working" a side effect of the weekly hang, or always an issue on your system?

It's really hard to debug such a hard-to-reproduce issue. Anyway you can try the latest upstream driver with the guide at http://www.intellinuxgraphics.org/install.html.
Comment 2 martin 2008-05-25 23:41:54 UTC
"sudo not working" is a side effect of this bug. I usually don't have any problems running sudo and I can normally also attach to X.org with symbols etc no problems.

I know this is a hard bug to fix, anyway to make a long story short; yesterday I came up with two ideas that when combined allowed me to make progress on this bug:

A) attach "sudo gdb" before the crash happens and have it attached all along
B) turn off the "blank screen in 20 minutes" which makes the screensaver run forever.

The first trick made it possible for me to get a backstrace, the second trick made this bug A LOT easier to repro (when I woke up this morning it was hung again).

I'm going to do some additional testing during this week. I really recommend you to activate the busy spheres screensaver and turn of "blank screen" in power options. This is a great way to stress the driver.
Comment 3 martin 2008-05-25 23:43:04 UTC
(read the ubuntu bug report for details on the backtrace etc)
Comment 4 martin 2008-05-26 11:25:44 UTC
I just repro'ed this bug using the xserver-xorg-video-intel_2.3.1-1_i386.deb driver as well.
Comment 5 unggnu 2008-05-27 02:57:28 UTC
Just for the records:

Uploaded files:
http://launchpadlibrarian.net/14699416/DSC_0053.jpg (Screenshot of the frozen screen)
http://launchpadlibrarian.net/14699465/Xorg.0.log
http://launchpadlibrarian.net/14701101/dmesg.log
http://launchpadlibrarian.net/14701110/lspci.log


Backtrace against the Ubuntu 8.04 Intel Driver 2:2.2.1-1ubuntu13

(gdb) bt full
#0 0xb7ef5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7c98085 in raise () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7c99a01 in abort () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7c9110e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#4 0xa75f0d68 in bm_fake_NotifyContendedLockTake () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#5 0xa75f6871 in LOCK_HARDWARE () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#6 0xa760fff5 in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#7 0x0840e880 in ?? ()
No symbol table info available.
#8 0xb7bd7a0c in __pthread_mutex_unlock_usercnt () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#9 0xa7610545 in brw_draw_prims () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#10 0xa76ac96c in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#11 0x0840e880 in ?? ()
No symbol table info available.
#12 0x08862030 in ?? ()
No symbol table info available.
#13 0xbfe30d30 in ?? ()
No symbol table info available.
#14 0x00000001 in ?? ()
No symbol table info available.
#15 0x00000000 in ?? ()
No symbol table info available.
(gdb)
Comment 6 martin 2008-05-27 12:38:56 UTC
When I boot up the first time after this crash I see the following in the end of /var/log/Xorg.0.log.old:

(WW) intel(0): ESR is 0x00000001
(WW) intel(0): PRB0_CTL (0x0001f001) indicates ring buffer enabled
(WW) intel(0): PRB0_HEAD (0x95e1c1a0) and PRB0_TAIL (0x0001fd30) indicate ring buffer not flushed
(WW) intel(0): Existing errors found in hardware state.


A complete copy of the Xorg.0.log.old file taking on first boot after crash is available here:

http://launchpadlibrarian.net/14752765/Xorg.0.log.old
Comment 7 martin 2008-05-27 23:48:43 UTC
I downloaded and burned Ubuntu Gutsy Gibbon yesterday and I ran an overnight stress test during the busy spheres screensaver. Often the hardy intel driver won't even last 60 minutes using busy spheres and the gutsy driver ran for the entire night and it's still running this morning with no sign of the bug.

Therefore I believe that, with a pretty high probability, this bug is a regression. The regression range for it, although very wide, is:

It worked fine with the "2:2.1.1-0ubuntu9" package and the bug had already been introduced in ""2:2.2.1-1ubuntu12".

However, there is still a chance that the bug was still present in both drivers, but that some other non-driver change in ubuntu between gutsy and hardy made the driver run through a new code path etc. One such change that comes to mind was the fact that compiz was enabled for my graphics card between gutsy and hardy. It was previously blacklisted because video didn't work well when compiz was active.

Further, the fact that it runs on gutsy also makes it pretty unlikely that the graphics card in this machine is defect somehow.
Comment 8 martin 2008-06-04 13:08:20 UTC
When I turn off compiz I can run the same screensavers without problems while using the 2:2.2.1-1ubuntu12 driver.

When I tried 2:2.1.1-0ubuntu9 before, I did it with the gutsy gibbon live CD and that has compiz turned off by default (so it was kind of a bad test, it's entirely possible that the 2:2.1.1-0ubuntu9 version is also affected).
Comment 9 martin 2008-06-07 12:57:04 UTC
FWIW, the game gunroar also seems to trigger this bug, I can rarely play more than say 15 minutes before my entire X freezes with the exact same graphics defects etc.

Use "sudo apt-get install gunroar" if you want to try it.
Comment 10 martin 2008-08-05 13:29:45 UTC
Today I got a new version (2:2.2.1-1ubuntu13.6) of the intel driver and also the intel-dbg package and so I decided to see if this bug still exists. The bug is still easily reproducible unfortunately. However, the new dbg package seems to provide a much better stacktrace (now I also get some, but not all, DRI function names which I think could be useful):

Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb7c2ea30 (LWP 6312)]
0xb7f56410 in __kernel_vsyscall ()
(gdb) bt full
#0  0xb7f56410 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb7cf8085 in raise () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2  0xb7cf9a01 in abort () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3  0xb7cf110e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#4  0xa7650d68 in bm_fake_NotifyContendedLockTake ()
   from /usr/lib/dri/i965_dri.so
No symbol table info available.
#5  0xa7656871 in LOCK_HARDWARE () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#6  0xa7656391 in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#7  0xa77ae76f in _mesa_resizebuffers () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#8  0xa7693e3c in _mesa_make_current () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#9  0xa7656add in intelMakeCurrent () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#10 0xa764cd3a in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#11 0xb7b94d2a in __glXDRIcontextForceCurrent ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#12 0xb7b5f506 in __glXForceCurrent ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#13 0xb7b5b2b7 in DoRender () from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#14 0xb7b5b44c in __glXDisp_Render ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#15 0xb7b5f996 in __glXDispatch ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#16 0x081506ee in ?? ()
No symbol table info available.
#17 0x0808d8df in Dispatch ()
No symbol table info available.
#18 0x0807471b in main ()
No symbol table info available.
Comment 11 Michael Fu 2008-12-28 22:11:41 UTC
haihao, would you please try to see if you can still reproduce this on your 965GM machine? otherwise, we may close this bug... thanks.
Comment 12 haihao 2009-01-13 18:41:06 UTC
I can't reproduce it with the latest drivers, so  I mark this bug as fixed. If you still experience this issue with the latest drivers, feel free to reopen it.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.