Bug 3493 - r128: Entire X-windowing system hangs during normal 3D gaming
r128: Entire X-windowing system hangs during normal 3D gaming
Status: REOPENED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/r128
5.0.2
x86 (IA32) Linux (All)
: high normal
Assigned To: Default DRI bug account
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-08 08:24 UTC by Alan Grimes
Modified: 2013-03-15 08:02 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg config (2.88 KB, text/plain)
2006-04-24 02:26 UTC, Andrea Vettorello
Details
Xorg log (44.71 KB, text/plain)
2006-04-24 02:27 UTC, Andrea Vettorello
Details
Xorg.0.log.old with some "tossed events which came in late" (74.84 KB, application/x-trash)
2008-02-08 10:57 UTC, Jérôme Carretero
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Grimes 2005-06-08 08:24:57 UTC
This bug is reproducable with all OpenGL games I've tried. 

The server will hang for exactly ten seconds (or some multiple thereof) during
3D gaming. It is reproducable after a few minutes with ppracer...

Here's some log output: ( Noteworthy lines highlighted..) 


X Window System Version 6.8.2
Release Date: 9 February 2005
X Protocol Version 11, Revision 0, Release 6.8.2
Build Operating System: Linux 2.6.11.7 i686 [ELF] 
Current Operating System: Linux leenooks 2.6.12-rc5 #2 SMP Sun Jun 5 19:03:50
EST 2005 i686
Build Date: 05 June 2005
##################

(II) Screen 0 shares mem & io resources
(II) Screen 1 shares mem & io resources
(II) Entity 0 shares no resources
(II) Entity 1 shares no resources
##################################
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 1025
(EE) R128(0): Idle timed out, resetting engine...
##################################
(II) Screen 0 shares mem & io resources
(II) Screen 1 shares mem & io resources
(II) Entity 0 shares no resources
(II) Entity 1 shares no resources
(II) Screen 0 shares mem & io resources
(II) Screen 1 shares mem & io resources
(WW) Open APM failed (/dev/apm_bios) (No such file or directory)
(II) Screen 0 shares mem & io resources
(II) Screen 1 shares mem & io resources
(II) Screen 0 shares mem & io resources
(II) Screen 1 shares mem & io resources
(II) Entity 0 shares no resources
(II) Entity 1 shares no resources
(II) Mouse0: ps2EnableDataReporting: succeeded
(II) 3rd Button detected: disabling emulate3Button
Comment 1 Andrea Vettorello 2006-04-24 02:26:21 UTC
Created attachment 5431 [details]
Xorg config
Comment 2 Andrea Vettorello 2006-04-24 02:27:05 UTC
Created attachment 5432 [details]
Xorg log
Comment 3 Andrea Vettorello 2006-04-24 02:32:58 UTC
Sorry for the spam, i've filed the attachments to the wrong bug.
Comment 4 Benjamin Herrenschmidt 2006-04-24 08:24:08 UTC
There can be various causes from such hangs, though the most common are:

 - Mesa segfaulted in a SW fallback (it happens quite regulary with the r300
driver at least, with foobillard and glob2 it's 100% repro here) while the HW
lock was held by the DRI driver. Similar seems to happen if something writes to
stderr while holding the HW lock.

 - The card disliked something in the command stream and locked up.

The former can usually be diagnosed by launching the app from an ssh session or
redirecting stdout/stderr output to a file provided you manage to sync your
filesystem after the hang.

Comment 5 Peter Cordes 2007-02-28 11:06:28 UTC
I see a similar thing with the  engine  Mesa demo.  ./engine on my r128 locks out the keyboard for the server.  I have AMD64 Ubuntu Edgy, but drm (including the kernel component) and mesa compiled from git (updated today, feb 28 2007).  My kernel is Ubuntu's 2.6.20-9-generic.

 I don't think X unfreezes for me.  Maybe I can try killing engine and then maybe X will recover...

 My Xorg log file ends with:
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 49
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 49
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 51
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 41
...
(about 25 lines)

 My kernel log has:
[ 9506.298142] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock


 Other mesa demos are fine.  I don't do much 3D stuff on this vid card, since it's a PCI r128 AIW w/16MB, and the motherboard has g965 graphics (hence the dev versions of mesa, drm, and xf86-video-intel).  I run it dual-seat, with -sharevts and evdev input.  This is reproducible when no X server is running on the g965 hardware, though.
Comment 6 Slava Gorbunov 2007-03-01 08:33:21 UTC
I observe analogous behavior. The X server hangs during active 3d rendering with following message in syslog:

Mar  1 18:52:29 fpfe [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock

It appears that only X server is affected (alt-sysrq key combination work).

I use kernel 2.6.19.2, mesa-6.5.2, xf86-video-ati-6.6.3
Comment 7 Peter Cordes 2007-03-02 18:22:20 UTC
This is extremely easy to reproduce with SPECviewperf 9.0.3.  (I have it installed to test my i965 hardware in the same machine.)  Although unlike last time, I didn't get any 
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 49
messages in my Xorg.0.log.  X unfroze when I killed the viewperf process that caused the
[drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
message.

 Terminal output from a full SPECviewperf run:
peter@tesla:/usr/local/src/opengl/SPEC/SPECViewperf9.0$ ./Run_All.csh
Running: 3dsmax-04.csh
Terminated
Running: catia.csh
Writing PNG file '../results/catia/01.png'...done.
Terminated
Running: Ensight-03.sh
Terminated
Running: light-08.csh
Terminated
Running: maya-02.csh
Writing PNG file '../results/maya/01.png'...done.
Terminated
Running: proe-04.csh
Writing PNG file '../results/proe/proe01.png'...done.
Terminated
Running: suzuki-engine
Writing PNG file '../results/sw/sw01.png'...done.
Writing PNG file '../results/sw/sw01Full.png'...done.
Writing PNG file '../results/sw/sw02.png'...done.
Writing PNG file '../results/sw/sw02Full.png'...done.
Illegal instruction
Running: ugnx-01.csh
Segmentation fault
Segmentation fault
Terminated
Terminated
Terminated
Terminated
Terminated
Segmentation fault
Terminated
Terminated
Terminated
Running: tcvis-01.csh
Terminated
Run All Summary
3dsmax-04 Weighted Geometric Mean = 0.00000
catia-02 Weighted Geometric Mean = 0.00000
ensight-03 Weighted Geometric Mean = 0.00000
light-08 Weighted Geometric Mean = 0.00000
maya-02 Weighted Geometric Mean = 0.00000
proe-04 Weighted Geometric Mean = 0.00000
sw-01 Weighted Geometric Mean = 0.00000
ugnx-01 Weighted Geometric Mean = 0.00000
tcvis-01 Weighted Geometric Mean = 0.00000


kernel log:
Mar  2 21:25:46 tesla kernel: [65315.110796] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:28:21 tesla kernel: [65470.003625] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:28:32 tesla kernel: [65481.086515] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:28:42 tesla kernel: [65491.348416] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:29:07 tesla kernel: [65515.740211] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:29:48 tesla kernel: [65557.313944] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:31:07 tesla kernel: [65636.105453] viewperf[16959] trap invalid opcode rip:2b2deb5fea1b rsp:7fffc0bc8ed8 error:0
Mar  2 21:31:40 tesla kernel: [65669.276782] viewperf[16970]: segfault at 0000000007048f70 rip 0000000007048f70 rsp 00007fffa4036dc8 error 15
Mar  2 21:32:42 tesla kernel: [65731.491660] viewperf[16972]: segfault at ffffffffcd242f70 rip ffffffffcd242f70 rsp 00007fffdde3fbe8 error 14
Mar  2 21:35:38 tesla kernel: [65907.385109] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:44:39 tesla kernel: [66448.322471] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:49:10 tesla kernel: [66719.224818] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:51:42 tesla kernel: [66870.685200] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:56:50 tesla kernel: [67178.608402] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 22:01:33 tesla kernel: [67461.398885] viewperf[17042]: segfault at ffffffffe9c1df70 rip ffffffffe9c1df70 rsp 00007fffc14641e8 error 14
Mar  2 22:01:42 tesla kernel: [67470.173738] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 22:07:58 tesla kernel: [67846.066669] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 22:09:18 tesla kernel: [67926.606019] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 22:09:40 tesla kernel: [67948.741423] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock


 ugnx locked up the most.  Maybe it just has a test script that invokes viewperf many times, instead of the batch mode the others use.  Every "Terminated" line in that output is from me doing killall viewperf after a new kernel log message, and lack of CPU load, indicates that it got stuck again.  I have no idea why it would run an illegal instruction, or segfault.  Maybe the SIGILL was just from a jump to an address in an executable memory region that wasn't meant to be amd64 instructions.

 BTW, I compiled SPECviewperf 9.0.3 from the source in the src directory.  ./Configure, and choose Linux64, IIRC.  Mesa, r128_dri, and libdrm are all built from git sources.
Comment 8 Yatheendra 2007-10-27 04:05:58 UTC
I hit this bug *always* on OpenGL apps (atleast glxgears and gl-117).
I have DRI enabled and used (see glxinfo later below)

H/W:
    ATI Rage 128 Pro GL PF (AGP) (ChipID = 0x5046)
        VideoRAM: 32768 kByte (64-bit SDR SGRAM 1:1)
    AMD Sempron(tm) Processor 2600+ stepping 00
    MSI-K8N VIAK8M motherboard
    512 MB RAM
S/W:
    Stock Debian 4.0 Etch
    (kernel 2.6.18-5-k7 #1 SMP, X.org X Window System Version 7.1.1)
    (my xorg.conf has NO GLcore, has "dri" and "glx", color depth 16,
     r128 driver, force AGP to 2x although 1x gives the same result)

bash $ LIBGL_DEBUG=verbose MESA_DEBUG=1 glxinfo 
name of display: :0.0
libGL: XF86DRIGetClientDriverName: 4.1.0 r128 (screen 0)
libGL: OpenDriver: trying /usr/lib/dri/r128_dri.so
drmOpenByBusid: Searching for BusID pci:0000:01:00.0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 4, (OK)
drmOpenByBusid: drmOpenMinor returns 4
drmOpenByBusid: drmGetBusid reports pci:0000:01:00.0
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.2
server glx extensions:
    ....
client glx vendor string: SGI
client glx version string: 1.4
client glx extensions:
    ....
....
OpenGL vendor string: VA Linux Systems, Inc.
OpenGL renderer string: Mesa DRI Rage 128 Pro 20051027 AGP 2x
OpenGL version string: 1.2 Mesa 6.5.1
....

On running OpenGL apps via 'xinit' from console, the kernel log contains:
Oct 27 15:48:17 <NAME> kernel: agpgart: Found an AGP 3.0 compliant device at 0000:00:00.0.
Oct 27 15:48:17 <NAME> kernel: agpgart: Device is in legacy mode, falling back to 2.x
Oct 27 15:48:17 <NAME> kernel: agpgart: Putting AGP V2 device at 0000:00:00.0 into 2x mode
Oct 27 15:48:17 <NAME> kernel: agpgart: Putting AGP V2 device at 0000:01:00.0 into 2x mode
Oct 27 15:48:45 <NAME> kernel: [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock

As an aside, my Xorg.log has this 1 (non-fatal) error:
    (EE) R128(0): No DFP detected

I will see if I can reproduce this issue on Mesa 7.0.1.
Comment 9 Alex Deucher 2007-10-27 07:21:33 UTC
this may be related to bug 9379
Comment 10 Slava Gorbunov 2007-11-06 10:39:55 UTC
I have found that this bug can be triggered by adding circle to the current scene in blender-2.45. X hangs, but mouse cursor moves, and system could be accessed through the network. I have noticed that there are following messages at the end of Xorg.log:

tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.

They are repeated many times.

In the syslog there is already mentioned here message:

Nov  6 21:12:05 fpfe [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock

Software: xorg-server-1.4 xf86-video-ati-6.6.3, mesa-7.0.1, kernel 2.6.22
Hardware: ATI Technologies Inc Rage 128 Pro Ultra TF
Comment 11 Daniel Stone 2007-11-06 16:46:48 UTC
(In reply to comment #10)
> I have found that this bug can be triggered by adding circle to the current
> scene in blender-2.45. X hangs, but mouse cursor moves, and system could be
> accessed through the network. I have noticed that there are following messages
> at the end of Xorg.log:
> 
> tossed event which came in late
> mieqEnequeue: out-of-order valuator event; dropping.
> 
> They are repeated many times.

This is just indicative of the main thread either in an infinite loop, or sleeping and never waking up.
Comment 12 Daniel Stone 2007-11-06 16:59:30 UTC
Sorry, didn't mean to resolve this.
Comment 13 Alex Deucher 2007-11-07 07:41:16 UTC
this may be related to bug 9379
Comment 14 Jérôme Carretero 2008-02-08 10:57:52 UTC
Created attachment 14219 [details]
Xorg.0.log.old with some "tossed events which came in late"

I've been having this behavior since some weeks.
First appeared with Xorg-server 1.3, continues with 1.4.0.90.

You'll find relevant information in my attached Xorg logfile.
System is Linux MacGyver 2.6.24-MacGyver #4 PREEMPT Tue Jan 29 16:59:41 EST 2008 i686 Intel(R) Pentium(R) M processor 1.70GHz GenuineIntel GNU/Linux

Gentoo, gcc 4.2.2.

I don't use 3D.

STFW for "tossed event which came in late" and you'll get a lot of recent bugs.

I don't think my GPU is dying yet ;)

Regards,
Comment 15 Jérôme Carretero 2008-02-08 11:10:36 UTC
as I'm not doing 3D I'll file a new bug.
Comment 16 Nico R. 2009-04-04 03:57:23 UTC
Could this be related to bug 5562 (even if this bug is specifically about 3D)?
Comment 17 chemtech 2013-03-15 08:02:31 UTC
Alan Grimes,
Do you still experience this issue with newer drivers ?
Please check the status of your issue.