Bug 7962 - [i965] SPECViewperf9.0.3 crashed X server
[i965] SPECViewperf9.0.3 crashed X server
Status: VERIFIED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965
git
x86-64 (AMD64) Linux (All)
: medium major
Assigned To: Zou Nan hai
: NEEDINFO
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-23 01:22 UTC by veelion (inactive account)
Modified: 2009-08-24 12:24 UTC (History)
5 users (show)

See Also:


Attachments
Xorg.log (64.55 KB, text/plain)
2006-08-23 01:28 UTC, veelion (inactive account)
Details
xorg.con (2.74 KB, text/plain)
2006-08-23 01:29 UTC, veelion (inactive account)
Details
our test result, every time the test always crashed when 3dsmax is running (577.94 KB, application/zip)
2006-09-06 20:00 UTC, veelion (inactive account)
Details
output of viewperf, we are using C1 (3.30 KB, text/plain)
2006-09-06 20:02 UTC, veelion (inactive account)
Details
peter's kernel log (11.08 KB, application/x-gzip)
2007-03-01 22:47 UTC, Peter Cordes
Details
peter's X log (9.75 KB, application/x-gzip)
2007-03-01 22:48 UTC, Peter Cordes
Details
peter's x config (5.63 KB, text/plain)
2007-03-01 22:50 UTC, Peter Cordes
Details
Xorg log error with SPECViewperf test (45.26 KB, text/plain)
2007-09-01 17:23 UTC, Wagner Macedo
Details
peter's Xorg.log, Nov 2k7 (43.39 KB, text/plain)
2007-11-12 20:57 UTC, Peter Cordes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description veelion (inactive account) 2006-08-23 01:22:59 UTC
Using latest Intel driver (with i965 support) with Mesa CVS and X7.1, 
SPECViewperf81 crashed X server after running several minutes.
kernel: 2.6.17.7
OS: FC5/ia32e
platform: G965 C0

how to reproduce:
1. ./src/Configure
2. ./Run_All.csh

after running a little time, X server aborted with the error,
Error in I830WaitLpRing(), now is 972543175, start is 972541174
pgetbl_ctl: 0x3ff80001 pgetbl_err: 0x0
ipeir: 0 iphdr: 0
LP ring tail: 20a8 head: 1a0c8 len: 1f001 start 0
eir: 0 esr: 1 emr: ffdf
instdone: 0 instpm: 0
memmode: 0 instps: 0
hwstam: dfff ier: 0 imr: dfff iir: a0
space: 98328 wanted 131064

Fatal server error:
lockup

Error in I830WaitLpRing(), now is 972545195, start is 972543194
pgetbl_ctl: 0x3ff80001 pgetbl_err: 0x0
ipeir: 0 iphdr: 0
LP ring tail: 20b0 head: 1a0c8 len: 1f001 start 0
eir: 0 esr: 1 emr: ffdf
instdone: 0 instpm: 0
memmode: 0 instps: 0
hwstam: dfff ier: 0 imr: dfff iir: a0
space: 98320 wanted 131064

FatalError re-entered, aborting
lockup
Comment 1 veelion (inactive account) 2006-08-23 01:24:09 UTC
this crashment appears both on 32-bit and 64-bit machine
Comment 2 veelion (inactive account) 2006-08-23 01:28:37 UTC
Created attachment 6654 [details]
Xorg.log
Comment 3 veelion (inactive account) 2006-08-23 01:29:06 UTC
Created attachment 6655 [details]
xorg.con
Comment 4 Alan Hourihane 2006-09-05 07:42:02 UTC
Fixed in latest Mesa CVS.
Comment 5 Gordon Jin 2006-09-06 01:15:55 UTC
We're still seeing the same error.
Please let me know what info you want.
Comment 6 Keith Whitwell 2006-09-06 03:48:42 UTC
(In reply to comment #5)
> We're still seeing the same error.
> Please let me know what info you want.

What output do you get from viewperf?  Specifically, what test is running during
the crash?  Is it always the same?

Have you got more recent hardware than C0?  I only have a C1 here and there were
definitely some differences - can you see whether the crash happens for you on C1?
Comment 7 veelion (inactive account) 2006-09-06 20:00:13 UTC
Created attachment 6854 [details]
our test result, every time the test always crashed when 3dsmax is running
Comment 8 veelion (inactive account) 2006-09-06 20:02:53 UTC
Created attachment 6855 [details]
output of viewperf, we are using C1
Comment 9 veelion (inactive account) 2006-09-06 20:12:10 UTC
BTW, when we compile the source code, we got an error,
clock.c: In function ‘stopclock’:
clock.c:85: error: “CLK_TCK” undeclared (first use in this function)
clock.c:85: error: (Each undeclared identifier is reported only once
clock.c:85: error: for each function it appears in.)
make: *** [Release/Linux32/clock.o] Error 1
our solution: add definition in clock.c
		#define CLK_TCK ((__clock_t) __sysconf (2)) /* 2 is _SC_CLK_TCK*/
in clock.c, at the end of the file, there some lines like this,
#ifdef WIN32
        period = (float) (GetTickCount() - gtime) / 1000.0F;
#else
        period = (float) (times(&tbuf) - gtime) / (float) CLK_TCK;
#endif
        return (period);

I try to change (float) CLK_TCK to 1000.0F as ifdef WIN32, but the some 
crashment appeared.
Comment 10 Gordon Jin 2006-10-26 00:06:27 UTC
This issue goes away when testing on production G965 system.
Comment 11 Peter Cordes 2007-03-01 22:46:37 UTC
I see a very similar bug on my Intel DG965WH motherboard, with SPECviewperf 9.0.3.  I bought this board retail from tigerdirect.ca, so it better be production hardware. :)  Tell me how to find out the hardware version and I'll post it.  I'm not keen on taking the heatsink off the northbridge to read the stamp, but other than that...

 I'm using AMD64 Ubuntu Edgy (xorg 7.1).  (I changed the "Hardware" field for this bug, since it was closed for ia32 hardware.  I hope that's ok.)  I have drm (including kernel-side), mesa, and xf86-video-intel compiled from git sources.  (updated yesterday, march 1).  X is generally working quite stably for playing games, e.g. vegastrike.  2GB of dual channel DDR2-6400 and a core2duo 2.4GHz kick butt. :)  BTW, I bought Intel graphics hardware specifically because the drivers were Free and well supported.  I want to be able to run Xen, and I just plain like Free software.

 Err, back to the bug.  In my quest to find new and exciting ways to crash X in a way that would force me to reboot (before I start using this machine as my server for everything at home as well as a desktop) I tried SPECviewperf.  The 3dsmax test doesn't crash the server, but it does segfault.
Run All Summary
3dsmax-04 Weighted Geometric Mean = 0.00000
catia-02 Weighted Geometric Mean = 0.00000
ensight-03 Weighted Geometric Mean =   1.335
light-08 Weighted Geometric Mean =   2.672
maya-02 Weighted Geometric Mean =   7.840
proe-04 Weighted Geometric Mean =   1.219
sw-01 Weighted Geometric Mean =   1.818
ugnx-01 Weighted Geometric Mean = 0.00000
tcvis-01 Weighted Geometric Mean = 0.00000

 I guess the server crashed during the the last test, ugs, since sum_results/ugs doesn't have a summary.txt.
The relevant messages in my kernel log are:

Mar  1 20:56:51 tesla kernel: [95331.474024] viewperf[21069]: segfault at 00002afda95bfe68 rip 00002afca7bbc105 rsp 00007fff04532a38 error 4
Mar  1 20:57:08 tesla kernel: [95348.330566] viewperf[21090]: segfault at 00002ba004981e78 rip 00002b9f0377e82c rsp 00007fffa8974f88 error 4
Mar  1 20:57:47 tesla kernel: [95387.069937] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3012069 emitted: 3013566
Mar  1 20:57:52 tesla kernel: [95392.140823] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3015814 emitted: 3017333
Mar  1 20:57:57 tesla kernel: [95397.259699] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3019581 emitted: 3021098
Mar  1 20:58:02 tesla kernel: [95402.382584] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3023326 emitted: 3024865
Mar  1 20:58:08 tesla kernel: [95407.585200] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3027057 emitted: 3028632
Mar  1 20:58:13 tesla kernel: [95412.920028] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3030859 emitted: 3032395
Mar  1 20:58:18 tesla kernel: [95418.134891] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3034042 emitted: 3036163
Mar  1 20:58:21 tesla kernel: [95421.134232] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3035667 emitted: 3036163
Mar  1 20:58:28 tesla kernel: [95427.832940] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3038184 emitted: 3038455
Mar  1 20:58:33 tesla kernel: [95433.499696] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3040150 emitted: 3040738
Mar  1 21:09:34 tesla kernel: [96093.646095] viewperf[21217]: segfault at fffffffff1218980 rip fffffffff1218980 rsp 00007fffb9e6e0c8 error 14
Mar  1 21:09:39 tesla kernel: [96099.111426] viewperf[21218]: segfault at ffffffffa9b4a980 rip ffffffffa9b4a980 rsp 00007fff0153b7a8 error 14
Mar  1 21:15:26 tesla kernel: [96446.134405] viewperf[21230]: segfault at 00000000787dc980 rip 00000000787dc980 rsp 00007fff328a7ae8 error 14
Mar  1 21:19:45 tesla kernel: [96705.174709] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 63645579 emitted: 63865738


My Xorg.0.log ends with:
Error in I830WaitLpRing(), now is 272316156, start is 272314155
pgetbl_ctl: 0x7ff80001 pgetbl_err: 0x0
ipeir: 0 iphdr: 0
LP ring tail: 1cb00 head: 12b18 len: 1f001 start 0
eir: 0 esr: 1 emr: ffdf
instdone: 0 instpm: 0
memmode: 0 instps: 0
hwstam: cffe ier: 82 imr: 0 iir: 70
space: 90128 wanted 131064
(II) I810(0): [drm] removed 1 reserved context for kernel
(II) I810(0): [drm] unmapping 8192 bytes of SAREA 0x2efff000 at 0x2b12a3439000

Fatal server error:
lockup

Error in I830WaitLpRing(), now is 272318181, start is 272316180
pgetbl_ctl: 0x7ff80001 pgetbl_err: 0x0
ipeir: 0 iphdr: 0
LP ring tail: 1cb08 head: 12b18 len: 1f001 start 0
eir: 0 esr: 1 emr: ffdf
instdone: 0 instpm: 0
memmode: 0 instps: 0
hwstam: dfff ier: 0 imr: dfff iir: 70
space: 90120 wanted 131064

FatalError re-entered, aborting
lockup

 I can gather more info if you can't reproduce this.  I'll post my kernel log and xorg log.

BTW, clock.c compiles ok if you use -D_XOPEN_SOURCE, or some other feature-test macro that results in time.h defining the CLK_TCK (which isn't in C99, or something, so glibc doesn't define it normally).  viewperf 9.0.3 compiles fine for me out of the box, even though it still uses CLK_TCK.
Comment 12 Peter Cordes 2007-03-01 22:47:44 UTC
Created attachment 8933 [details]
peter's kernel log
Comment 13 Peter Cordes 2007-03-01 22:48:10 UTC
Created attachment 8934 [details]
peter's X log
Comment 14 Peter Cordes 2007-03-01 22:50:16 UTC
Created attachment 8935 [details]
peter's x config

I ran X with -layout simple, which just uses the i965 head.  I wasn't doing any -sharevts multiseat stuff either; I have a separate xorg.conf for that. :)
Comment 15 Peter Cordes 2007-03-01 22:54:54 UTC
I think I just spammed you guys with a bunch of emails while I tweaked the mime type on an attachment.  sorry, I didn't realize emails were getting sent for every time I did something minor, or I would have left it alone :(
Comment 16 Peter Cordes 2007-03-02 17:22:51 UTC
I reproduced this again.  This time, without being preceded by other segfaulting graphics programs that might have caused problems.  I did run 32bit glxgears and googleearth to test the 32bit dri libs I compiled, but they worked fine.

 This time I noticed that viewperf always does
viewperf: main/framebuffer.c:219: _mesa_free_framebuffer_data: Assertion `fb->RefCount == 0' failed.
Aborted
instead of exiting.  I think this is at the end of a test, when it would have exited anyway.

 The kernel log messages:
Mar  2 20:17:26 tesla kernel: [61215.978734] [drm:i915_wait_irq] *ERROR* i915_wa
it_irq: EBUSY -- rec: 164764684 emitted: 164766183
 all come from the ensight test.

 The three viewperf segfaults are all from the ugnx test.

The test that crashes X is:
$ ./Run_Viewset.csh tcvis-01 tcvis results
Running: tcvis-01.csh
Writing PNG file '../results/tcvis/tcvis01.png'...done.
Writing PNG file '../results/tcvis/tcvis01-depth.png'...done.
Writing PNG file '../results/tcvis/tcvis01Full.png'...done.
Writing PNG file '../results/tcvis/tcvis01Full-depth.png'...done.
DRM_I830_BATCHBUFFER: -13
viewperf: intel_context.c:694: UNLOCK_HARDWARE: Assertion `intel->batch->ptr == intel->batch->map + intel->batch->offset' failed.
Aborted

 The errors at the end of Xorg.0.log are the same as before, just with a few different numbers.  Also, it only takes < 15 seconds for tcvis to crash X.  I'm running viewperf at 1024x768, in case that matters.  My X resolution is was 1280x1024 this time, but I think last time I was at 1024x768.  (my CRT only does 60Hz at 1280x1024, so I usually use xrandr to bring it down.)

 Trying to start X again fails, with essentially the same error.  And suspend/resume doesn't seem to work anymore (it used to re-post the video, not it oopses the kernel with the video still off).  It would be really nice if crashing X didn't mean I had to reboot before I could use X again on that head.  I'm typing now on X running on the PCI r128 in this machine.  I think I forgot to mention that this didn't lock the machine at all.  Only the video hardware is out to lunch.
Comment 17 Peter Cordes 2007-03-02 18:38:42 UTC
tested again, this time fresh from a reboot.
booted up (to a text console)
sudo X -config ... -layout ...
DISPLAY=:0 fluxbox

in X
open a terminal
crash X within 10 seconds of starting tcvis. (configured to run at 640x480)
peter@tesla:/usr/local/src/opengl/SPEC/SPECViewperf9.0$ ./Run_Viewset.csh tcvis-01 tcvis results
Running: tcvis-01.csh
Writing PNG file '../results/tcvis/tcvis01.png'...done.
Writing PNG file '../results/tcvis/tcvis01-depth.png'...done.
Writing PNG file '../results/tcvis/tcvis01Full.png'...done.
Writing PNG file '../results/tcvis/tcvis01Full-depth.png'...done.
DRM_I830_BATCHBUFFER: -13
viewperf: intel_context.c:694: UNLOCK_HARDWARE: Assertion `intel->batch->ptr == intel->batch->map + intel->batch->offset' failed.
Aborted

 So this is a very reproducible bug on my system, and now I'm sure that nothing I did before running viewperf could have confused the driver.
Comment 18 Wagner Macedo 2007-09-01 17:23:27 UTC
Created attachment 11381 [details]
Xorg log error with SPECViewperf test

I'm having errors with SPECViewperf (9.0.3) benchmark too. I'm using Ubuntu 7.10, that uses Mesa 7.0.x, when I run the test with window manager (Gnome), the test fails on light-08 test. If I run a clean X, without GDM and Gnome, command: 'xinit -e xterm', the light-08 test pass, but fails on tcvis-01.
Comment 19 Wagner Macedo 2007-09-01 17:46:00 UTC
I forget to say in my former comment: I open a bug (12235) with errors in some GL applications (e.g. Torcs game). The error is very similar to the error on this bug. I think that can have some relationship.
Comment 20 Gordon Jin 2007-10-09 06:01:44 UTC
Peter, are you still seeing the xserver crash with the latest git driver?
SpecViewPerf 9.0.3 works fine on my i965, on both 32-bit and 64-bit system.
Comment 21 Gordon Jin 2007-10-20 05:24:33 UTC
Peter, 
I forgot to mention we added "#define GLX_GLXEXT_PROTOTYPES" at the head of the file viewperf.c to avoid segfault error in ugnx-01 on x86-64.
Please try if that makes difference.
Comment 22 Peter Cordes 2007-11-12 20:55:36 UTC
I finally got around to trying this again.  I can still reproduce this, now on AMD64 Ubuntu Gutsy. :(

> I forgot to mention we added "#define GLX_GLXEXT_PROTOTYPES" at the head of the
> file viewperf.c to avoid segfault error in ugnx-01 on x86-64.

 Thanks, that does make viewperf run cleanly except when it locks up X.

 I used ./Configure to compile a 64bit viewperf.  (after wasting several hours without realizing I was running 32bit viewperf with non-updated 32bit mesa.)

 my libdrm, kernel-side drm, and mesa are from git as of Nov 11th.  My X server, kernel, and xorg intel driver are from Ubuntu Gutsy.  I'm not that familiar with git, so I did a fresh checkout of the mesa git tree.  (diff showed it was in fact identical to my git tree, except for configs/linux-dri-x86_64, which I'd changed).  That also made sure I was compiling with the standard gcc flags, instead of my usual -Os -march=nocona -mtune=generic (for core2duo).  Anyway, that's what the LD_LIBRARY_PATH= and LIBGL_DRIVERS_PATH= is about.

peter@tesla:/usr/local/src/opengl/SPEC/SPECViewperf9.0$ libgl=/usr/local/src/g965/mesa.fresh/lib                                    peter@tesla:/usr/local/src/opengl/SPEC/SPECViewperf9.0$ LD_LIBRARY_PATH="$libgl" LIBGL_DRIVERS_PATH="$libgl" LIBGL_DEBUG=verbose MESA_DEBUG=1 /usr/bin/time ./Run_Viewset.csh tcvis-01 tcvis results
Running: tcvis-01.csh
libGL: XF86DRIGetClientDriverName: 1.8.0 i965 (screen 0)
libGL: OpenDriver: trying /usr/local/src/g965/mesa.fresh/lib/i965_dri.so
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 6, (OK)
drmOpenByBusid: Searching for BusID pci:0000:00:02.0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 6, (OK)
drmOpenByBusid: drmOpenMinor returns 6
drmOpenByBusid: drmGetBusid reports pci:0000:00:02.0
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable
libGL error:
Can't open configuration file /etc/drirc: No such file or directory.
Writing PNG file '../results/tcvis/tcvis01.png'...done.
Writing PNG file '../results/tcvis/tcvis01-depth.png'...done.
Writing PNG file '../results/tcvis/tcvis01Full.png'...done.
Writing PNG file '../results/tcvis/tcvis01Full-depth.png'...done.
intelWaitIrq: drmI830IrqWait: -16
19.91user 9.75system 0:34.27elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (43major+73954minor)pagefaults 0swaps

kernel:
[   49.112607] [drm] Initialized i915 1.11.0 20070209 on minor 0                                                                    
...                                                                                                                                 
[  772.653501] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 5505330 emitted: 5545013   

Xorg.0.log: excerpt
Error in I830WaitLpRing(), timeout for 2 seconds
pgetbl_ctl: 0xcff80001 pgetbl_err: 0x0
ipeir: 0 iphdr: 60020100
LP ring tail: fae0 head: 7150 len: 1f001 start 0
Err ID (eir): 0 Err Status (esr): 1 Err Mask (emr): ffffffdf
instdone: 6fe5fafd instdone_1: ffff0
instpm: 0
memmode: 0 instps: 409f02e
HW Status mask (hwstam): fffecffe
IRQ enable (ier): 2 imr: fffe0000 iir: 10c0
acthd: 5ff1430 dma_fadd_p: 5ff1430
ecoskpd: 307 excc: 0
cache_mode: 6800/180
mi_arb_state: 44
IA_VERTICES_COUNT_QW 0/0
IA_PRIMITIVES_COUNT_QW 0/0
VS_INVOCATION_COUNT_QW 0/0
GS_INVOCATION_COUNT_QW 0/0
GS_PRIMITIVES_COUNT_QW 0/0
CL_INVOCATION_COUNT_QW 0/0
CL_PRIMITIVES_COUNT_QW 0/0
PS_INVOCATION_COUNT_QW 0/0
PS_DEPTH_COUNT_QW 0/0
WIZ_CTL 0
TS_CTL 0  TS_DEBUG_DATA b1618b5b
TD_CTL 0 / 0
space: 95848 wanted 131064
(II) intel(0): [drm] removed 1 reserved context for kernel
(II) intel(0): [drm] unmapping 8192 bytes of SAREA 0x2efff000 at 0x2b9056a43000

Fatal server error:
lockup
(then the above repeats)


I don't use a multiseat setup any more, so my xorg.conf looks like:

Section "ServerFlags"
        Option "DefaultServerLayout" "simple"
        Option "AllowMouseOpenFail"  "true"
        Option "AIGLX" "false"
EndSection

Section "Module"
        Load    "i2c"
        Load    "bitmap"
        Load    "ddc"
        Load    "dri"
        Load    "extmod"
        Load    "freetype"
        Load    "glx"
        Load    "int10"
        Load    "type1"
        Load    "vbe"
EndSection

Section "ServerLayout"
        Identifier      "simple"
        Screen          "intel Screen"
        InputDevice     "Generic Keyboard"
        InputDevice     "Configured Mouse"
EndSection

Section "Device"
        Identifier      "intel"
        Driver          "intel"
        BusID           "PCI:0:2:0"
EndSection

Section "Screen"
        Identifier      "intel Screen"
        Device          "intel"
        Monitor         "auto" # Monitor section with no options set.
        DefaultDepth    24
        SubSection "Display"
                Depth           24
                Modes           "1680x1050" ...
        EndSubSection
EndSection



(normally I use this Modules section, because if I want to use a multiseat serverlayout, I need e.g. int10 and vbe commented out.  and dga crashes X.)
Section "Module"
#       Load    "i2c"
        Load    "bitmap"
#       Load    "ddc"
        Load    "dri"
        SubSection "extmod"
                Option "omit xfree86-dga"
        EndSubSection
#       Load    "extmod" # subsection does this
        Load    "freetype"
        Load    "glx"
#       Load    "int10"
        Load    "type1"
#       Load    "vbe"
EndSection
Comment 23 Peter Cordes 2007-11-12 20:57:35 UTC
Created attachment 12489 [details]
peter's Xorg.log, Nov 2k7
Comment 24 Peter Cordes 2007-11-12 22:21:15 UTC
BTW, this is 100% reproducible for me, at about 20 seconds in.

Other things I forgot to mention:
Another reason I did a fresh git checkout was that this symlink looked messed up:
peter@tesla:/usr/local/src/g965/mesa$ ll src/mesa/drivers/dri/i965/server/
total 0
lrwxrwxrwx 1 peter src 27 2007-11-11 20:24 intel_dri.c -> ../intel/server/intel_dri.c

 It's a broken symlink, which I'm guessing should be a symlink to ../../intel/...
(This is obviously a separate bug, but easy to fix so I just meant to mention it here.)



 Also, Unreal Tournament 2004 causes lockups that look the same as the viewperf tcvis ones.  The only difference is that the X server doesn't log anything or exit until you do a killall ut2004-bin.  This was reported on an Ubuntu bug report that was originally about Ubuntu's mesa being very prone to lockups on g965...
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/104673

 A few other users found that bug report and reported their problems on it.  I was able to reproduce the lockups with the ut2004 demo.  The demo is a freely available, with Linux binaries for AMD64 and x86.  Try http://treefort.icculus.org/ut2004/
bf9f483902c6006b94c327fb7b585086  UT2004-LNX-Demo3334.run

 As I said on the Ubuntu bug report, the lockups are reproduced most quickly on the "bombing run" game type.  The error messages (including the Xorg.log) look quite similar to the viewperf lockups.

peter@tesla:~$ libgl=/usr/local/src/g965/mesa.fresh/lib; LD_LIBRARY_PATH="$libgl" LIBGL_DRIVERS_PATH="$libgl" LIBGL_DEBUG=verbose MESA_DEBUG=1 /usr/bin/time ut2004
WARNING: ALC_EXT_capture is subject to change!
libGL: XF86DRIGetClientDriverName: 1.8.0 i965 (screen 0)
libGL: OpenDriver: trying /usr/local/src/g965/mesa.fresh/lib/i965_dri.so
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 153, (OK)
drmOpenByBusid: Searching for BusID pci:0000:00:02.0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 153, (OK)
drmOpenByBusid: drmOpenMinor returns 153
drmOpenByBusid: drmGetBusid reports pci:0000:00:02.0
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable
libGL error:
Can't open configuration file /etc/drirc: No such file or directory.
intelWaitIrq: drmI830IrqWait: -16
Signal: SIGTERM [terminate]
Requesting Exit.
Signal: SIGQUIT [quit]
Aborting.

Crash information will be saved to your logfile.
Command exited with non-zero status 1
174.79user 4.44system 6:34.00elapsed 45%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (68major+117331minor)pagefaults 0swaps

(lockups on bombing run usually happen much faster than the "assault" game I played.)
Comment 25 Zou Nan hai 2008-03-02 22:51:46 UTC
Peter,
  We have fixed some critical crash bugs in our 3D driver.
Could you check if the issue is still valid with the latest mesa and drm driver?
Thanks
Comment 26 Peter Cordes 2008-03-10 01:32:30 UTC
(In reply to comment #25)
> Peter,
>   We have fixed some critical crash bugs in our 3D driver.
> Could you check if the issue is still valid with the latest mesa and drm
> driver?

 SPECviewperf 9.0.3 runs fine now. :)  Nice work, guys.  mesa updated 2k8/3/9, drm kernel and user updated 2k8/2/29.  (didn't want to reboot again to update drm.)

 I tried once on a fairly fresh X server and left it alone.  I ran it again and put the window behind other stuff I was doing.  (firefox, vnc, mplayer).  Nothing I did caused a lockup that made the X server exit.

 There's a bad interaction between mplayer -vo gl:yuv=2:swapinterval=1:lscale=1:cscale=0  and SPECviewperf.  While viewperf was running, I tried to run mplayer, but the graphics froze after the mplayer window opened.  I thought I'd managed to crash X like usual, but sshing in and killing mplayer unfroze the desktop with no ill effects.  (I think -QUIT worked sometimes, but other times -KILL was needed.)  mplayer -vo xv didn't cause any problems.

 Unreal Tournament doesn't crash X anymore either, but it doesn't work.  It segfaults while trying to load a game.  (after all the menus, while the level is loading).

 These last two are obviously separate bugs from the dri lockups this bug report was about, so it can finally get closed.  And they're _much_ less serious, since they don't need a reset of the computer to get the video hardware back to a useable state.  Needing to reboot after an X lockup was always the most annoying thing.

 I guess I should file bug reports for those two things I mentioned above.  I'll probably do that tomorrow.
Comment 27 Zou Nan hai 2008-03-10 01:47:07 UTC
Thanks Peter, 
 I am cloing the bug. Please open seperate bug reports for the issues you mentioned.

BTW:
 ut2004 runs well here on i965 machines. My favoritemethod to debug UT is to modify the ut2004 script, 
replace exec "./ut2004-bin" to exec gdb "./ut2004-bin"
so you can see backtrace when ut crashes.
Comment 28 Peter Cordes 2008-10-13 15:00:22 UTC
> Please open seperate bug reports for the issues you mentioned.

 I was going, but they went away when I upgraded to Ubuntu Hardy, with its newer X server and intel driver.  I'm only bothering to say this because I got an email about a tag being added to this bug, so I thought other people might be looking at it and wondering what happened.

 happy hacking.
Comment 29 ajax at nwnk dot net 2009-08-24 12:24:09 UTC
Mass version move, cvs -> git