3493 – r128: Entire X-windowing system hangs during normal 3D gaming

Bug 3493 - r128: Entire X-windowing system hangs during normal 3D gaming

Summary: r128: Entire X-windowing system hangs during normal 3D gaming

Status:	RESOLVED WONTFIX

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/r128 (show other bugs)
Version:	5.0.2
Hardware:	x86 (IA32) Linux (All)

Importance:	high normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-06-08 08:24 UTC by Alan Grimes
Modified:	2015-11-30 23:43 UTC (History)
CC List:	4 users (show)

See Also:
i915 platform:
i915 features:

Attachments
Xorg config (2.88 KB, text/plain) 2006-04-24 02:26 UTC, Andrea Vettorello	Details
Xorg log (44.71 KB, text/plain) 2006-04-24 02:27 UTC, Andrea Vettorello	Details
Xorg.0.log.old with some "tossed events which came in late" (74.84 KB, application/x-trash) 2008-02-08 10:57 UTC, Jérôme Carretero	Details
Show Obsolete (2) View All

Description Alan Grimes 2005-06-08 08:24:57 UTC

This bug is reproducable with all OpenGL games I've tried. 

The server will hang for exactly ten seconds (or some multiple thereof) during
3D gaming. It is reproducable after a few minutes with ppracer...

Here's some log output: ( Noteworthy lines highlighted..) 


X Window System Version 6.8.2
Release Date: 9 February 2005
X Protocol Version 11, Revision 0, Release 6.8.2
Build Operating System: Linux 2.6.11.7 i686 [ELF] 
Current Operating System: Linux leenooks 2.6.12-rc5 #2 SMP Sun Jun 5 19:03:50
EST 2005 i686
Build Date: 05 June 2005
##################

(II) Screen 0 shares mem & io resources
(II) Screen 1 shares mem & io resources
(II) Entity 0 shares no resources
(II) Entity 1 shares no resources
##################################
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 1025
(EE) R128(0): Idle timed out, resetting engine...
##################################
(II) Screen 0 shares mem & io resources
(II) Screen 1 shares mem & io resources
(II) Entity 0 shares no resources
(II) Entity 1 shares no resources
(II) Screen 0 shares mem & io resources
(II) Screen 1 shares mem & io resources
(WW) Open APM failed (/dev/apm_bios) (No such file or directory)
(II) Screen 0 shares mem & io resources
(II) Screen 1 shares mem & io resources
(II) Screen 0 shares mem & io resources
(II) Screen 1 shares mem & io resources
(II) Entity 0 shares no resources
(II) Entity 1 shares no resources
(II) Mouse0: ps2EnableDataReporting: succeeded
(II) 3rd Button detected: disabling emulate3Button

Comment 1 Andrea Vettorello 2006-04-24 02:26:21 UTC

Created attachment 5431 [details]
Xorg config

Comment 2 Andrea Vettorello 2006-04-24 02:27:05 UTC

Created attachment 5432 [details]
Xorg log

Comment 3 Andrea Vettorello 2006-04-24 02:32:58 UTC

Sorry for the spam, i've filed the attachments to the wrong bug.

Comment 4 Benjamin Herrenschmidt 2006-04-24 08:24:08 UTC

There can be various causes from such hangs, though the most common are:

 - Mesa segfaulted in a SW fallback (it happens quite regulary with the r300
driver at least, with foobillard and glob2 it's 100% repro here) while the HW
lock was held by the DRI driver. Similar seems to happen if something writes to
stderr while holding the HW lock.

 - The card disliked something in the command stream and locked up.

The former can usually be diagnosed by launching the app from an ssh session or
redirecting stdout/stderr output to a file provided you manage to sync your
filesystem after the hang.

Comment 5 Peter Cordes 2007-02-28 11:06:28 UTC

I see a similar thing with the  engine  Mesa demo.  ./engine on my r128 locks out the keyboard for the server.  I have AMD64 Ubuntu Edgy, but drm (including the kernel component) and mesa compiled from git (updated today, feb 28 2007).  My kernel is Ubuntu's 2.6.20-9-generic.

 I don't think X unfreezes for me.  Maybe I can try killing engine and then maybe X will recover...

 My Xorg log file ends with:
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 49
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 49
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 51
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 50
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 41
...
(about 25 lines)

 My kernel log has:
[ 9506.298142] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock


 Other mesa demos are fine.  I don't do much 3D stuff on this vid card, since it's a PCI r128 AIW w/16MB, and the motherboard has g965 graphics (hence the dev versions of mesa, drm, and xf86-video-intel).  I run it dual-seat, with -sharevts and evdev input.  This is reproducible when no X server is running on the g965 hardware, though.

Comment 6 Slava Gorbunov 2007-03-01 08:33:21 UTC

I observe analogous behavior. The X server hangs during active 3d rendering with following message in syslog:

Mar  1 18:52:29 fpfe [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock

It appears that only X server is affected (alt-sysrq key combination work).

I use kernel 2.6.19.2, mesa-6.5.2, xf86-video-ati-6.6.3

Comment 7 Peter Cordes 2007-03-02 18:22:20 UTC

This is extremely easy to reproduce with SPECviewperf 9.0.3.  (I have it installed to test my i965 hardware in the same machine.)  Although unlike last time, I didn't get any 
(EE) R128(0): R128CCEWaitForIdle: (DEBUG) CCE idle took i = 49
messages in my Xorg.0.log.  X unfroze when I killed the viewperf process that caused the
[drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
message.

 Terminal output from a full SPECviewperf run:
peter@tesla:/usr/local/src/opengl/SPEC/SPECViewperf9.0$ ./Run_All.csh
Running: 3dsmax-04.csh
Terminated
Running: catia.csh
Writing PNG file '../results/catia/01.png'...done.
Terminated
Running: Ensight-03.sh
Terminated
Running: light-08.csh
Terminated
Running: maya-02.csh
Writing PNG file '../results/maya/01.png'...done.
Terminated
Running: proe-04.csh
Writing PNG file '../results/proe/proe01.png'...done.
Terminated
Running: suzuki-engine
Writing PNG file '../results/sw/sw01.png'...done.
Writing PNG file '../results/sw/sw01Full.png'...done.
Writing PNG file '../results/sw/sw02.png'...done.
Writing PNG file '../results/sw/sw02Full.png'...done.
Illegal instruction
Running: ugnx-01.csh
Segmentation fault
Segmentation fault
Terminated
Terminated
Terminated
Terminated
Terminated
Segmentation fault
Terminated
Terminated
Terminated
Running: tcvis-01.csh
Terminated
Run All Summary
3dsmax-04 Weighted Geometric Mean = 0.00000
catia-02 Weighted Geometric Mean = 0.00000
ensight-03 Weighted Geometric Mean = 0.00000
light-08 Weighted Geometric Mean = 0.00000
maya-02 Weighted Geometric Mean = 0.00000
proe-04 Weighted Geometric Mean = 0.00000
sw-01 Weighted Geometric Mean = 0.00000
ugnx-01 Weighted Geometric Mean = 0.00000
tcvis-01 Weighted Geometric Mean = 0.00000


kernel log:
Mar  2 21:25:46 tesla kernel: [65315.110796] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:28:21 tesla kernel: [65470.003625] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:28:32 tesla kernel: [65481.086515] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:28:42 tesla kernel: [65491.348416] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:29:07 tesla kernel: [65515.740211] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:29:48 tesla kernel: [65557.313944] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:31:07 tesla kernel: [65636.105453] viewperf[16959] trap invalid opcode rip:2b2deb5fea1b rsp:7fffc0bc8ed8 error:0
Mar  2 21:31:40 tesla kernel: [65669.276782] viewperf[16970]: segfault at 0000000007048f70 rip 0000000007048f70 rsp 00007fffa4036dc8 error 15
Mar  2 21:32:42 tesla kernel: [65731.491660] viewperf[16972]: segfault at ffffffffcd242f70 rip ffffffffcd242f70 rsp 00007fffdde3fbe8 error 14
Mar  2 21:35:38 tesla kernel: [65907.385109] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:44:39 tesla kernel: [66448.322471] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:49:10 tesla kernel: [66719.224818] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:51:42 tesla kernel: [66870.685200] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 21:56:50 tesla kernel: [67178.608402] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 22:01:33 tesla kernel: [67461.398885] viewperf[17042]: segfault at ffffffffe9c1df70 rip ffffffffe9c1df70 rsp 00007fffc14641e8 error 14
Mar  2 22:01:42 tesla kernel: [67470.173738] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 22:07:58 tesla kernel: [67846.066669] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 22:09:18 tesla kernel: [67926.606019] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
Mar  2 22:09:40 tesla kernel: [67948.741423] [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock


 ugnx locked up the most.  Maybe it just has a test script that invokes viewperf many times, instead of the batch mode the others use.  Every "Terminated" line in that output is from me doing killall viewperf after a new kernel log message, and lack of CPU load, indicates that it got stuck again.  I have no idea why it would run an illegal instruction, or segfault.  Maybe the SIGILL was just from a jump to an address in an executable memory region that wasn't meant to be amd64 instructions.

 BTW, I compiled SPECviewperf 9.0.3 from the source in the src directory.  ./Configure, and choose Linux64, IIRC.  Mesa, r128_dri, and libdrm are all built from git sources.

Comment 8 Yatheendra 2007-10-27 04:05:58 UTC

I hit this bug *always* on OpenGL apps (atleast glxgears and gl-117).
I have DRI enabled and used (see glxinfo later below)

H/W:
    ATI Rage 128 Pro GL PF (AGP) (ChipID = 0x5046)
        VideoRAM: 32768 kByte (64-bit SDR SGRAM 1:1)
    AMD Sempron(tm) Processor 2600+ stepping 00
    MSI-K8N VIAK8M motherboard
    512 MB RAM
S/W:
    Stock Debian 4.0 Etch
    (kernel 2.6.18-5-k7 #1 SMP, X.org X Window System Version 7.1.1)
    (my xorg.conf has NO GLcore, has "dri" and "glx", color depth 16,
     r128 driver, force AGP to 2x although 1x gives the same result)

bash $ LIBGL_DEBUG=verbose MESA_DEBUG=1 glxinfo 
name of display: :0.0
libGL: XF86DRIGetClientDriverName: 4.1.0 r128 (screen 0)
libGL: OpenDriver: trying /usr/lib/dri/r128_dri.so
drmOpenByBusid: Searching for BusID pci:0000:01:00.0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 4, (OK)
drmOpenByBusid: drmOpenMinor returns 4
drmOpenByBusid: drmGetBusid reports pci:0000:01:00.0
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.2
server glx extensions:
    ....
client glx vendor string: SGI
client glx version string: 1.4
client glx extensions:
    ....
....
OpenGL vendor string: VA Linux Systems, Inc.
OpenGL renderer string: Mesa DRI Rage 128 Pro 20051027 AGP 2x
OpenGL version string: 1.2 Mesa 6.5.1
....

On running OpenGL apps via 'xinit' from console, the kernel log contains:
Oct 27 15:48:17 <NAME> kernel: agpgart: Found an AGP 3.0 compliant device at 0000:00:00.0.
Oct 27 15:48:17 <NAME> kernel: agpgart: Device is in legacy mode, falling back to 2.x
Oct 27 15:48:17 <NAME> kernel: agpgart: Putting AGP V2 device at 0000:00:00.0 into 2x mode
Oct 27 15:48:17 <NAME> kernel: agpgart: Putting AGP V2 device at 0000:01:00.0 into 2x mode
Oct 27 15:48:45 <NAME> kernel: [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock

As an aside, my Xorg.log has this 1 (non-fatal) error:
    (EE) R128(0): No DFP detected

I will see if I can reproduce this issue on Mesa 7.0.1.

Comment 9 Alex Deucher 2007-10-27 07:21:33 UTC

this may be related to bug 9379

Comment 10 Slava Gorbunov 2007-11-06 10:39:55 UTC

I have found that this bug can be triggered by adding circle to the current scene in blender-2.45. X hangs, but mouse cursor moves, and system could be accessed through the network. I have noticed that there are following messages at the end of Xorg.log:

tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.

They are repeated many times.

In the syslog there is already mentioned here message:

Nov  6 21:12:05 fpfe [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock

Software: xorg-server-1.4 xf86-video-ati-6.6.3, mesa-7.0.1, kernel 2.6.22
Hardware: ATI Technologies Inc Rage 128 Pro Ultra TF

Comment 11 Daniel Stone 2007-11-06 16:46:48 UTC

(In reply to comment #10)
> I have found that this bug can be triggered by adding circle to the current
> scene in blender-2.45. X hangs, but mouse cursor moves, and system could be
> accessed through the network. I have noticed that there are following messages
> at the end of Xorg.log:
> 
> tossed event which came in late
> mieqEnequeue: out-of-order valuator event; dropping.
> 
> They are repeated many times.

This is just indicative of the main thread either in an infinite loop, or sleeping and never waking up.

Comment 12 Daniel Stone 2007-11-06 16:59:30 UTC

Sorry, didn't mean to resolve this.

Comment 13 Alex Deucher 2007-11-07 07:41:16 UTC

this may be related to bug 9379

Comment 14 Jérôme Carretero 2008-02-08 10:57:52 UTC

Created attachment 14219 [details]
Xorg.0.log.old with some "tossed events which came in late"

I've been having this behavior since some weeks.
First appeared with Xorg-server 1.3, continues with 1.4.0.90.

You'll find relevant information in my attached Xorg logfile.
System is Linux MacGyver 2.6.24-MacGyver #4 PREEMPT Tue Jan 29 16:59:41 EST 2008 i686 Intel(R) Pentium(R) M processor 1.70GHz GenuineIntel GNU/Linux

Gentoo, gcc 4.2.2.

I don't use 3D.

STFW for "tossed event which came in late" and you'll get a lot of recent bugs.

I don't think my GPU is dying yet ;)

Regards,

Comment 15 Jérôme Carretero 2008-02-08 11:10:36 UTC

as I'm not doing 3D I'll file a new bug.

Comment 16 Nico R. 2009-04-04 03:57:23 UTC

Could this be related to bug 5562 (even if this bug is specifically about 3D)?

Comment 17 chemtech 2013-03-15 08:02:31 UTC

Alan Grimes,
Do you still experience this issue with newer drivers ?
Please check the status of your issue.

Comment 18 Ian Romanick 2015-11-30 23:43:25 UTC

The r128 driver longer exists in Mesa.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.