Bug 1280

Summary:	Lock up w/ r200, TCL, GL apps
Product:	Mesa	Reporter:	jor <j2o3r>
Component:	Drivers/DRI/r200	Assignee:	Default DRI bug account <dri-devel>
Status:	RESOLVED FIXED	QA Contact:
Severity:	major
Priority:	high	CC:	jaak, michel, n0nb, npeninguy, prestonbridge, sami.nieminen
Version:	unspecified
Hardware:	x86 (IA32)
OS:	Linux (All)
URL:	http://localhost
Whiteboard:
i915 platform:		i915 features:
Attachments:	A trace of the X server when it is locked up

Description jor 2004-09-02 13:59:57 UTC

I get a lockup running glxgears (or any other GL app). The screen freezes almost
inmidiatly, but the mouse cursor still works and can also still login from
another box. The freeze takes a bit longer when I enabled lot's of debug output
(can even see the wheels spinnning a couple degrees before it locks up). When I
then try to kill the X server, the whole box locks up hard (no mouse, no
network, nothing). When I disable TCL (through the driconf stuff) it all works OK.

Usually get this error message from running glxgears when it freezes:
R200WaitIrq: drmRadeonIrqWait: -16

System info:
Radeon 8500 128Mb
Xorg CVS checkout from 2 sept 2004 (note, have this problem for a long time,
also older dri cvs tree gave me this problem. Just now had some free time too
look into it some more)
Debian unstable
Kernel 2.6.8 (with preempt)

The result of enabling debug output from the drm radeon kernel module is that
when the X screen is frozen I get the following lines in an endless loop:
Sep  2 21:51:23 jor kernel: [drm:radeon_cp_getparam] pid=2909
Sep  2 21:51:23 jor kernel: [drm:radeon_ioctl] pid=2909, cmd=0xc0086451,
nr=0x51, dev 0xe200, auth=1
Sep  2 21:51:23 jor kernel: [drm:radeon_cp_getparam] pid=2909
Sep  2 21:51:23 jor kernel: [drm:radeon_ioctl] pid=2909, cmd=0xc0086451,
nr=0x51, dev 0xe200, auth=1
etc.

When I then try to kill the Xserver, the sytem locks ups and the last messages
it gives are (written down, could not log these, so ignore spelling errors ;):
[drm:radeon_cp_stop]
[drm:radeon_do_cp_idle]
[drm:radeon_ioctl] pid=2858, cmd=0x040086422, nr=0x42, dev 0xe200, auth=1
the lines repeated a lot of times, then finally:
[drm:radeon_do_cp_idle]
and the box is locked up hard.

Comment 1 Sami Nieminen 2004-09-29 10:21:39 UTC

I am experiencing the same. X hangs when running glxgears or enemy territory. 
I am also able to login using ssh from another computer, but it's not possible 
to kill X, have to reboot. 
 
This happens for me only with 6.8.0 (also happened with the 6.7.99 snapshots), 
but 6.7.0 works great. 
 
I haven't tried disabling TCL (don't know how to do that).

Comment 2 jor 2004-10-02 06:00:53 UTC

You can disable TCL by exporting R200_NO_TCL, e.g. in a Bash shell you could do:
export R200_NO_TCL=1
glxinfo
It should show "NO-TCL" in the "OpenGL renderer string". You can then try and
run glxgears in that shell/terminal and see if it locks up with it disabled.

For a more permanent solution, you can also use the driconf method to disabled
TCL for all openGL applications, see
http://dri.sourceforge.net/cgi-bin/moin.cgi/ConfigurationInfrastructure for more
info about that.

Comment 3 Sami Nieminen 2004-10-04 07:24:47 UTC

Thanks, I tried disabling TCL, unfortunately that did not help. Enemy 
Territory still hangs my machine after few minutes.

Comment 4 Valerie R. Coffman 2004-10-24 13:02:29 UTC

I also have this problem.  It would freeze regularly when running certain
screensavers.

Comment 5 Søren Sandmann Pedersen 2005-01-27 12:58:20 UTC

Created attachment 1761 [details]
A trace of the X server when it is locked up

I took this trace of the X server from another machine after the server locked
up. When I tried killing the server, the whole system froze.

Comment 6 Michel Dänzer 2005-03-28 21:46:01 UTC

Does disabling Render acceleration work around the problem?

Comment 7 Steven Wilson 2005-04-03 21:46:12 UTC

I get what seems to be the same bug (at least, an strace shows it hammering
getparam, and I can't reproduce it with NO-TCL), but it usually doesn't happen
immediately, and I haven't yet reproduced it with glxgears (StepMania and
icculus-quake2 seem to do a fine job if left running for a bit, though those are
far from ideal testcases). I can regain limited control over the system with
magic SysRQ (unraw + kill), but only enough to cleanly reboot.

Søren, how did you get that stack trace? If I try to attach gdb to my X server
when this happens, gdb just hangs.

Also, in case this is a hw-dependent bug, lspci has this to say about the card
(it may be worth noting that it identifies as a Radeon LE / R200QL despite the
fact that the card itself is labeled as a regular 8500...):

0000:01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R200 QL
[Radeon 8500 LE] (prog-if 00 [VGA])
        Subsystem: ATI Technologies Inc Radeon 8500
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping+ SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (2000ns min), Cache Line Size: 0x08 (32 bytes)
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at e0000000 (32-bit, prefetchable) [size=128M]
        Region 1: I/O ports at c000 [size=256]
        Region 2: Memory at ec020000 (32-bit, non-prefetchable) [size=64K]
        Capabilities: [58] AGP version 2.0
                Status: RQ=48 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans-
64bit- FW+ AGP3- Rate=x1,x2,x4
                Command: RQ=32 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

and the AGP bridge (VIA KT400, any way to get a revision / stepping or similar?):

0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host
Bridge
        Subsystem: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
        Latency: 8
        Region 0: Memory at e8000000 (32-bit, prefetchable) [size=64M]
        Capabilities: [a0] AGP version 2.0
                Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans-
64bit- FW- AGP3- Rate=x1,x2,x4
                Command: RQ=32 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

Comment 8 Ian Romanick 2005-11-10 08:30:23 UTC

A lot has happened in the R200 driver since May 2005.  Are people still able to
reproduce this problem with recent drivers?

Comment 9 Nicolas Peninguy 2005-11-10 16:32:29 UTC

Almost instant lockup here when running Neverwinter Nights, using 6.9.0 RC 2, on
Ubuntu 5.10.

(II) Primary Device is: PCI 01:00:0
(II) ATI:  Candidate "Device" section "ATI Technologies, Inc. Radeon 9200 (RV280)".
(WW) RADEON: No matching Device section for instance (BusID PCI:1:0:1) found
(--) Chipset ATI Radeon 9200 5961 (AGP) found

With X.org 6.8.2, lockups happen after about 30 minutes.


Will try with TCL disabled later...

Comment 10 Nicolas Peninguy 2005-11-11 12:36:48 UTC

Seems ok with TCL disabled, while quite slower (but I suppose it's normal...)

Comment 11 Steven Wilson 2005-12-14 18:10:24 UTC

I can still reproduce this as of 7.0rc3 and kernel 2.6.14.

Comment 12 Laurentiu Pancescu 2006-05-08 06:34:18 UTC

I can confirm that disabling TCL through driconf makes the problem go away
completely.

I also tried with Option "CCEusecTimeout" "20000", while having tcl_mode set to
3, and it seems to help quite a lot: I do not get hangs with gl-117 (tested for
2 hours, before I got crashes after times starting with a few seconds, until cca
20-30 minutes in some rarer occasions) and with the GL screensavers (which
always hanged X).  However, vegastrike makes X hang as soon as I start spinning
around, in a few seconds; this works fine without TCL (played for at least 5 hours).

Comment 13 Roland Scheidegger 2006-05-08 10:34:12 UTC

(In reply to comment #12)
> I can confirm that disabling TCL through driconf makes the problem go away
> completely.
> 
> I also tried with Option "CCEusecTimeout" "20000", while having tcl_mode set to
> 3, and it seems to help quite a lot:
Do you mean CPusecTimeout? CCEusecTimeout only seeems to be recognized by the
r128, but not the radeon driver.

Comment 14 Laurentiu Pancescu 2006-05-09 04:35:02 UTC

(In reply to comment #13)
> Do you mean CPusecTimeout? CCEusecTimeout only seeems to be recognized by the
> r128, but not the radeon driver.

Unfortunately not - I had a Rage 128 Pro before buying the Radeon 9250 (I was
hoping to get rid of similar hangs with r128, and reading that 9250 is the best
supported open-source driver... :).  I just inserted the option for r128,
thinking that at least part of the code is similar.  None of the CCE options are
documented in the Ubuntu man pages for X.org, and the X.org site seems to have
the same (old?) documentation.  Sorry about this, I should have grepped for WW.

I tried CPusecTimeout, but it still hangs in vegastrike, within a couple of
minutes.  The bad news is that I just got an X.org hang with tcl_mode 0 (no
CPusecTimeout, same config as over the weekend), in gl-117.  Maybe it happens
more seldom with TCL disabled, or maybe I was just lucky over the weekend, as
the CCE placebo seems to indicate. :)  gl-117 was running in a window, not full
screen, and X froze as soon as I got another window on top of the gl-117 one
(mail notification).

Is there anything else I could try?

Comment 15 Nate Bargmann 2006-05-16 07:26:38 UTC

I have posted a couple of comments in bug 2999 regarding lockups I have received
whenever trying to run OpenGL xscreensavers on my RADEON 9200.  I was advised to
check this bug and after disabling TCL, it seems the immediate lockups are gone.
 I've changed no other settings.

lspci reports:

0000:01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200
PRO] (rev 01) (prog-if 00 [VGA])
        Subsystem: ATI Technologies Inc RV280 [Radeon 9200 PRO]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (2000ns min), Cache Line Size: 0x08 (32 bytes)
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at d0000000 (32-bit, prefetchable) [size=256M]
        Region 1: I/O ports at c000 [size=256]
        Region 2: Memory at e3000000 (32-bit, non-prefetchable) [size=64K]
        Expansion ROM at e2000000 [disabled] [size=128K]
        Capabilities: [58] AGP version 2.0
                Status: RQ=80 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans-
64bit- FW+ AGP3- Rate=x1,x2,x4
                Command: RQ=32 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

Comment 16 Steven Wilson 2006-07-07 13:34:36 UTC

I've reproduced this with another card (Radeon 9000) on the same mainboard (Soyo
Dragon KT400 / VIA KT400 chipset). Not much to report other than that it's not
limited to the particular card/chip.

Comment 17 Nicolas Peninguy 2006-08-04 14:17:05 UTC

I've tried the latest (cvs/git) Mesa + XServer + drm + ati-driver and things
seems to have improved on the hardware TCL side : with or without R200_NO_TCL=1
it now takes some minutes to get the lockup.

Anyway, that still make OpenGL unusable :-(

Is there any active developer able to reproduce it ?

Could export R200_DEBUG=XXX help to debug this ? With which value ?

Comment 18 Nicolas Peninguy 2006-08-22 03:46:01 UTC

No more lockups with the latest (cvs/git) Mesa + XServer + drm + ati-driver.

Note I also updated to the latest BIOS revision of my mother board (a via kt333
based card), and the old bios backup failed so I cannot go back to try again...

Comment 19 Steven Wilson 2006-11-24 19:34:39 UTC

I'm now running kernel 2.6.18 (also tested with 2.6.17) Xorg 7.1.1 and v6.6.3 of
the Xorg ATI driver. The 8500LE still locks up regularly, but I can't get the
9000 to lock up anymore. The 9000 lockup I reported previously may have been an
unrelated problem...

Comment 20 Chris Sharman 2007-06-25 08:28:18 UTC

I've got this problem running neverwinter nights on ubuntu 6.10
What's driconf? Can't locate it on my system.
Any suggestions for how to workaround or further debug please??
I can recover by logging in on another console and using kill -9 on the nwmain process. lspci output below.
Using a 19" 1280*1024 screen, newer than the rest of the system - quite a lot of unsupported multiverse games don't work.

Thanks
Chris

$ lspci
00:00.0 Host bridge: Intel Corporation 82845 845 (Brookdale) Chipset Host Bridge (rev 11)
00:01.0 PCI bridge: Intel Corporation 82845 845 (Brookdale) Chipset AGP Bridge (rev 11)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. G400/G450 (rev 05)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
02:01.0 Communication controller: Agere Systems LT WinModem (rev 02)

Comment 21 Roland Scheidegger 2007-06-25 08:46:43 UTC

(In reply to comment #20)
> I've got this problem running neverwinter nights on ubuntu 6.10
Please be more specific. Not every lockup is the same!

> What's driconf? Can't locate it on my system.
If your distribution doesn't offer it, install it manually.

> Any suggestions for how to workaround or further debug please??
Drivers in ubuntu 6.10 are probably quite old, and update may help.

> I can recover by logging in on another console and using kill -9 on the nwmain
> process.
So it's not really a lockup then.

> lspci output below.
> 01:00.0 VGA compatible controller: Matrox Graphics, Inc. G400/G450 (rev 05)
And apparently it's not even a remotely similar graphic chip. Don't mix this in here.

Comment 22 Chris Sharman 2007-06-25 10:16:10 UTC

(In reply to comment #21)
> (In reply to comment #20)
> > I've got this problem running neverwinter nights on ubuntu 6.10
> Please be more specific. Not every lockup is the same!

Screen goes blank, X hangs. ctrl/alt/F1 gets me a console login which I can use to recover.

> > What's driconf? Can't locate it on my system.
> If your distribution doesn't offer it, install it manually.

Never thought - I expected it to be part of an existing package. Now installed, thanks.
It fails to open any windows - but ctrl/C works on this one. Traceback:
$ driconf
libGL warning: 3D driver claims to not support visual 0x4b
Traceback (most recent call last):
  File "/usr/bin/driconf", line 28, in ?
    driconf.main()
  File "/usr/lib/python2.4/site-packages/driconf.py", line 52, in main
    commonui.dpy = dri.DisplayInfo ()
  File "/usr/lib/python2.4/site-packages/dri.py", line 396, in __init__
    self.getScreen (i)
  File "/usr/lib/python2.4/site-packages/dri.py", line 411, in getScreen
    screen = ScreenInfo (i, self.dpy)
  File "/usr/lib/python2.4/site-packages/dri.py", line 380, in __init__
    self.glxInfo = GLXInfo (screen, dpy)
  File "/usr/lib/python2.4/site-packages/dri.py", line 343, in __init__
    glxInfo = infopipe.read()
KeyboardInterrupt


> > Any suggestions for how to workaround or further debug please??
> Drivers in ubuntu 6.10 are probably quite old, and update may help.

ubuntu update manager shows no upgrades available.
various lib*mesa* packages are at 6.5.1~20060817-0ubuntu3

> > I can recover by logging in on another console and using kill -9 on the nwmain
> > process.
> So it's not really a lockup then.

X hangs, and only switching out of X (ctrl/alt/f1) allows me to do anything.

> > lspci output below.
> > 01:00.0 VGA compatible controller: Matrox Graphics, Inc. G400/G450 (rev 05)
> And apparently it's not even a remotely similar graphic chip. Don't mix this in
> here.

OK - it seemed very similar to #9 to me - I'll open another bug if you're sure it's not a duplicate.
Thanks

Comment 23 Tormod Volden 2007-06-25 13:55:45 UTC

> ubuntu update manager shows no upgrades available.
> various lib*mesa* packages are at 6.5.1~20060817-0ubuntu3

As an old, stable release, Ubuntu 6.10 won't see much updates. Upgrading to 7.04 is recommended, and if you really want to help out, please try 7.10 betas (Gutsy) in parallel where you more easily can get up-to-date X components.

Comment 24 Jerome Glisse 2009-06-30 07:11:43 UTC

Do you still have this issue with recent mesa & Xorg ?

Comment 25 Nate Bargmann 2009-06-30 20:20:15 UTC

(In reply to comment #24)
> Do you still have this issue with recent mesa & Xorg ?
> 

Although I still have the same graphics card, I don't recall any GL problems since my last post in bug 2999.

Comment 26 Jerome Glisse 2009-07-07 06:37:39 UTC

Ok, so i mark it as fixed please reopen if you experience similar issue with recent mesa,kernel,ddx

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.