This bug is reproducable with all OpenGL games I've tried. The server will hang for exactly ten seconds (or some multiple thereof) during 3D gaming. It is reproducable after a few minutes with ppracer... Here's some log output: ( Noteworthy lines highlighted..) X Window System Version 6.8.2 Release Date: 9 February 2005 X Protocol Version 11, Revision 0, Release 6.8.2 Build Operating System: Linux 2.6.11.7 i686 [ELF] Current Operating System: Linux leenooks 2.6.12-rc5 #2 SMP Sun Jun 5 19:03:50 EST 2005 i686 Build Date: 05 June 2005 ################## (II) Screen 0 shares mem & io resources (II) Screen 1 shares mem & io resources (II) Entity 0 shares no resources (II) Entity 1 shares no resources ################################## (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 1025 (EE) R128(0): Idle timed out, resetting engine... ################################## (II) Screen 0 shares mem & io resources (II) Screen 1 shares mem & io resources (II) Entity 0 shares no resources (II) Entity 1 shares no resources (II) Screen 0 shares mem & io resources (II) Screen 1 shares mem & io resources (WW) Open APM failed (/dev/apm_bios) (No such file or directory) (II) Screen 0 shares mem & io resources (II) Screen 1 shares mem & io resources (II) Screen 0 shares mem & io resources (II) Screen 1 shares mem & io resources (II) Entity 0 shares no resources (II) Entity 1 shares no resources (II) Mouse0: ps2EnableDataReporting: succeeded (II) 3rd Button detected: disabling emulate3Button
Created attachment 5431 [details] Xorg config
Created attachment 5432 [details] Xorg log
Sorry for the spam, i've filed the attachments to the wrong bug.
There can be various causes from such hangs, though the most common are: - Mesa segfaulted in a SW fallback (it happens quite regulary with the r300 driver at least, with foobillard and glob2 it's 100% repro here) while the HW lock was held by the DRI driver. Similar seems to happen if something writes to stderr while holding the HW lock. - The card disliked something in the command stream and locked up. The former can usually be diagnosed by launching the app from an ssh session or redirecting stdout/stderr output to a file provided you manage to sync your filesystem after the hang.
I see a similar thing with the engine Mesa demo. ./engine on my r128 locks out the keyboard for the server. I have AMD64 Ubuntu Edgy, but drm (including the kernel component) and mesa compiled from git (updated today, feb 28 2007). My kernel is Ubuntu's 2.6.20-9-generic. I don't think X unfreezes for me. Maybe I can try killing engine and then maybe X will recover... My Xorg log file ends with: (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 49 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 49 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 51 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50 (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 41 ... (about 25 lines) My kernel log has: [ 9506.298142] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Other mesa demos are fine. I don't do much 3D stuff on this vid card, since it's a PCI r128 AIW w/16MB, and the motherboard has g965 graphics (hence the dev versions of mesa, drm, and xf86-video-intel). I run it dual-seat, with -sharevts and evdev input. This is reproducible when no X server is running on the g965 hardware, though.
I observe analogous behavior. The X server hangs during active 3d rendering with following message in syslog: Mar 1 18:52:29 fpfe [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock It appears that only X server is affected (alt-sysrq key combination work). I use kernel 2.6.19.2, mesa-6.5.2, xf86-video-ati-6.6.3
This is extremely easy to reproduce with SPECviewperf 9.0.3. (I have it installed to test my i965 hardware in the same machine.) Although unlike last time, I didn't get any (EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 49 messages in my Xorg.0.log. X unfroze when I killed the viewperf process that caused the [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock message. Terminal output from a full SPECviewperf run: peter@tesla:/usr/local/src/opengl/SPEC/SPECViewperf9.0$ ./Run_All.csh Running: 3dsmax-04.csh Terminated Running: catia.csh Writing PNG file '../results/catia/01.png'...done. Terminated Running: Ensight-03.sh Terminated Running: light-08.csh Terminated Running: maya-02.csh Writing PNG file '../results/maya/01.png'...done. Terminated Running: proe-04.csh Writing PNG file '../results/proe/proe01.png'...done. Terminated Running: suzuki-engine Writing PNG file '../results/sw/sw01.png'...done. Writing PNG file '../results/sw/sw01Full.png'...done. Writing PNG file '../results/sw/sw02.png'...done. Writing PNG file '../results/sw/sw02Full.png'...done. Illegal instruction Running: ugnx-01.csh Segmentation fault Segmentation fault Terminated Terminated Terminated Terminated Terminated Segmentation fault Terminated Terminated Terminated Running: tcvis-01.csh Terminated Run All Summary 3dsmax-04 Weighted Geometric Mean = 0.00000 catia-02 Weighted Geometric Mean = 0.00000 ensight-03 Weighted Geometric Mean = 0.00000 light-08 Weighted Geometric Mean = 0.00000 maya-02 Weighted Geometric Mean = 0.00000 proe-04 Weighted Geometric Mean = 0.00000 sw-01 Weighted Geometric Mean = 0.00000 ugnx-01 Weighted Geometric Mean = 0.00000 tcvis-01 Weighted Geometric Mean = 0.00000 kernel log: Mar 2 21:25:46 tesla kernel: [65315.110796] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 21:28:21 tesla kernel: [65470.003625] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 21:28:32 tesla kernel: [65481.086515] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 21:28:42 tesla kernel: [65491.348416] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 21:29:07 tesla kernel: [65515.740211] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 21:29:48 tesla kernel: [65557.313944] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 21:31:07 tesla kernel: [65636.105453] viewperf[16959] trap invalid opcode rip:2b2deb5fea1b rsp:7fffc0bc8ed8 error:0 Mar 2 21:31:40 tesla kernel: [65669.276782] viewperf[16970]: segfault at 0000000007048f70 rip 0000000007048f70 rsp 00007fffa4036dc8 error 15 Mar 2 21:32:42 tesla kernel: [65731.491660] viewperf[16972]: segfault at ffffffffcd242f70 rip ffffffffcd242f70 rsp 00007fffdde3fbe8 error 14 Mar 2 21:35:38 tesla kernel: [65907.385109] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 21:44:39 tesla kernel: [66448.322471] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 21:49:10 tesla kernel: [66719.224818] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 21:51:42 tesla kernel: [66870.685200] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 21:56:50 tesla kernel: [67178.608402] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 22:01:33 tesla kernel: [67461.398885] viewperf[17042]: segfault at ffffffffe9c1df70 rip ffffffffe9c1df70 rsp 00007fffc14641e8 error 14 Mar 2 22:01:42 tesla kernel: [67470.173738] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 22:07:58 tesla kernel: [67846.066669] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 22:09:18 tesla kernel: [67926.606019] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Mar 2 22:09:40 tesla kernel: [67948.741423] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock ugnx locked up the most. Maybe it just has a test script that invokes viewperf many times, instead of the batch mode the others use. Every "Terminated" line in that output is from me doing killall viewperf after a new kernel log message, and lack of CPU load, indicates that it got stuck again. I have no idea why it would run an illegal instruction, or segfault. Maybe the SIGILL was just from a jump to an address in an executable memory region that wasn't meant to be amd64 instructions. BTW, I compiled SPECviewperf 9.0.3 from the source in the src directory. ./Configure, and choose Linux64, IIRC. Mesa, r128_dri, and libdrm are all built from git sources.
I hit this bug *always* on OpenGL apps (atleast glxgears and gl-117). I have DRI enabled and used (see glxinfo later below) H/W: ATI Rage 128 Pro GL PF (AGP) (ChipID = 0x5046) VideoRAM: 32768 kByte (64-bit SDR SGRAM 1:1) AMD Sempron(tm) Processor 2600+ stepping 00 MSI-K8N VIAK8M motherboard 512 MB RAM S/W: Stock Debian 4.0 Etch (kernel 2.6.18-5-k7 #1 SMP, X.org X Window System Version 7.1.1) (my xorg.conf has NO GLcore, has "dri" and "glx", color depth 16, r128 driver, force AGP to 2x although 1x gives the same result) bash $ LIBGL_DEBUG=verbose MESA_DEBUG=1 glxinfo name of display: :0.0 libGL: XF86DRIGetClientDriverName: 4.1.0 r128 (screen 0) libGL: OpenDriver: trying /usr/lib/dri/r128_dri.so drmOpenByBusid: Searching for BusID pci:0000:01:00.0 drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 4, (OK) drmOpenByBusid: drmOpenMinor returns 4 drmOpenByBusid: drmGetBusid reports pci:0000:01:00.0 Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable display: :0 screen: 0 direct rendering: Yes server glx vendor string: SGI server glx version string: 1.2 server glx extensions: .... client glx vendor string: SGI client glx version string: 1.4 client glx extensions: .... .... OpenGL vendor string: VA Linux Systems, Inc. OpenGL renderer string: Mesa DRI Rage 128 Pro 20051027 AGP 2x OpenGL version string: 1.2 Mesa 6.5.1 .... On running OpenGL apps via 'xinit' from console, the kernel log contains: Oct 27 15:48:17 <NAME> kernel: agpgart: Found an AGP 3.0 compliant device at 0000:00:00.0. Oct 27 15:48:17 <NAME> kernel: agpgart: Device is in legacy mode, falling back to 2.x Oct 27 15:48:17 <NAME> kernel: agpgart: Putting AGP V2 device at 0000:00:00.0 into 2x mode Oct 27 15:48:17 <NAME> kernel: agpgart: Putting AGP V2 device at 0000:01:00.0 into 2x mode Oct 27 15:48:45 <NAME> kernel: [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock As an aside, my Xorg.log has this 1 (non-fatal) error: (EE) R128(0): No DFP detected I will see if I can reproduce this issue on Mesa 7.0.1.
this may be related to bug 9379
I have found that this bug can be triggered by adding circle to the current scene in blender-2.45. X hangs, but mouse cursor moves, and system could be accessed through the network. I have noticed that there are following messages at the end of Xorg.log: tossed event which came in late mieqEnequeue: out-of-order valuator event; dropping. They are repeated many times. In the syslog there is already mentioned here message: Nov 6 21:12:05 fpfe [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock Software: xorg-server-1.4 xf86-video-ati-6.6.3, mesa-7.0.1, kernel 2.6.22 Hardware: ATI Technologies Inc Rage 128 Pro Ultra TF
(In reply to comment #10) > I have found that this bug can be triggered by adding circle to the current > scene in blender-2.45. X hangs, but mouse cursor moves, and system could be > accessed through the network. I have noticed that there are following messages > at the end of Xorg.log: > > tossed event which came in late > mieqEnequeue: out-of-order valuator event; dropping. > > They are repeated many times. This is just indicative of the main thread either in an infinite loop, or sleeping and never waking up.
Sorry, didn't mean to resolve this.
Created attachment 14219 [details] Xorg.0.log.old with some "tossed events which came in late" I've been having this behavior since some weeks. First appeared with Xorg-server 1.3, continues with 1.4.0.90. You'll find relevant information in my attached Xorg logfile. System is Linux MacGyver 2.6.24-MacGyver #4 PREEMPT Tue Jan 29 16:59:41 EST 2008 i686 Intel(R) Pentium(R) M processor 1.70GHz GenuineIntel GNU/Linux Gentoo, gcc 4.2.2. I don't use 3D. STFW for "tossed event which came in late" and you'll get a lot of recent bugs. I don't think my GPU is dying yet ;) Regards,
as I'm not doing 3D I'll file a new bug.
Could this be related to bug 5562 (even if this bug is specifically about 3D)?
Alan Grimes, Do you still experience this issue with newer drivers ? Please check the status of your issue.
The r128 driver longer exists in Mesa.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.