Bugzilla – Bug 9252
complete lockups with radeon 9600 XT
Last modified: 2014-02-08 15:41:14 UTC
I'm using debian/sid, currently with Xorg 7.1, ati 6.6.3, kernel 2.6.18, libdrm
2.0.2, mesa 6.5.1. DRI is enabled, I'm using compiz on a radon 9600 XT. The big
problem is that there are random lockups - complete lockups, like not even ping
is working, nothing in the logs after reboot. Looks like the PCI bus is hosed,
but I'm not an expert.
Here is an excerpt of my config file:
Identifier "ATI Radeon 9600XT"
Option "AccelMethod" "xaa"
Option "RenderAccel" "true"
Option "EnablePageFlip" "true"
Option "ColorTiling" "true"
Option "AccelDFS" "true"
Option "GARTSize" "64"
Option "DDCMode" "true"
Could you disable page flip, render accel, attach your xorg log,
and try enabling debug output of radeon & drm module (you should
mount your log partition with sync option to have somethings
How do I enable debug output ? Soemthing like "drm debug=1" in /etc/modules ?
radeon debug=1 would be more helpfull (my fault). no need to restart
you could stop xserver, rmmod radeon and modprobe radeon debug=1
don't forget to mount the partition where you log file are with sync
Did you try with disabling pageflip, &/or colortilling ?
Thanks, I'll do that.
I think I already tried various combinations of various configs (BTW, AGP x8
with writecache lockups right at start), but I'll try that specifically when I
get back home.
Guys, no need to even reload the kernel module to enable debugging output...
That's what /sys/module/drm/parameters/debug is for.
Here's a lockup log (syslog + X log). I couldn't find a debug parameter for
radeon, I used the drm debug param.
Created attachment 7967 [details]
syslog, 71M unzipped, with drm debug=1
Created attachment 7968 [details]
Just FYI, the lockups are still present with debian experimental's xorg 1.2
*** Bug 8833 has been marked as a duplicate of this bug. ***
Did you remount your /var partition with sync ? I don't see
anylockup situation in the log (i might be wrong but i think
once lockup happen we shouldn't see any more cmd buffer submit).
IIRC the command to remount with sync should look like this:
mount -o remount,sync <varpartitionmountpoint>
Yes, I did. But when my machine freezes, it looks like a PCI bus freeze, so everything is stuck, and I don't think it's still able to write something to the logfiles (even if mounted -o sync).
I think that I have a similar problem; actually, I get a couple of different
lock up types, but this is one of them.
I also haven't been able to get anything to disk yet, but I'm going try with a
kernel serial console that should (hopefully) get all the DRM messages.
I'm also going to try with the R300 ring buffer debug patch, originally posted
I modified it slightly so it applies cleanly to Git master.
Just some random ideas that may or may not help you.
Created attachment 8875 [details] [review]
Patch, probably not correct!
I don't think it is correct, but it helps me here with my X700 to avoid some lockups with beryl.
Could you explain some of your reasoning for that patch? I'm just wondering if I should test this, but I'm not sure why you would disable that lock?
(In reply to comment #15)
> Could you explain some of your reasoning for that patch? I'm just wondering if
> I should test this, but I'm not sure why you would disable that lock?
Ignore the patch. With this the lockups in my desktop with beryl where a bit late, so that's why I posted it just to see if others problems where affected also.
By the way I think that there is something wrong with r300_scratch in r300_cmdbuf.c in the drm module. I created a patch, for which I am not sure if it is correct, although I think it should be. I think r300_scratch was totally broken.
Created attachment 8889 [details] [review]
r300_scratch is broken.
Created attachment 8893 [details] [review]
r300_scratch is broken.
So the R300 scratch patch is related to the lockups, or another problem?
(In reply to comment #19)
> So the R300 scratch patch is related to the lockups, or another problem?
I think it fixes my lockups with beryl, (up to now of course).
Why don't you try it and post your comments?
I still get lockups with that patch, but if it solves your problem then commit it of course.
Created attachment 8909 [details] [review]
If we can't idle unlock. Lock when we should...
Well I think that this patch is correct and should probably help.
Please try it.
A comment on the patch:
You only need to retake the lock if you previously released it.
I would personally prefer if you moved LOCK_HARDWARE() in after UNLOCK_HARDWARE() and DO_SLEEP().
Though on a functional level i shouldn't make a difference.
Created attachment 8910 [details] [review]
Updated patch to Rune's comment.
Thanks Rune. You are correct! Updated patch.
(In reply to comment #24)
> Created an attachment (id=8910) [details]
> Updated patch to Rune's comment.
> Thanks Rune. You are correct! Updated patch.
Unfortunately again there seems to be a problem. I get the same
lockups (trying to rotate the cube, it lockups when I stop rotating) if I restart beryl! What could be happening?
Created attachment 8913 [details] [review]
DO_USLEEP if it is supported
Still lockups the second time I run beryl.
(In reply to comment #26)
> Created an attachment (id=8913) [details]
> DO_USLEEP if it is supported
What exactly is the problem addressed by this patch? While it may make sense to drop the lock while sleeping (but it's not obvious why not doing so could cause lockups, except by changing the timing maybe), I don't think making the sleep conditional on the do_usleeps configuration makes sense because it's not directly related to this situation. Its purpose is to allow sleeps when the CPU gets too far ahead of the GPU, but here it would instead greatly change the time out waiting for the GPU to go idle.
> Still lockups the second time I run beryl.
So apparently, that's not related to your patches at all? What are the symptoms? E.g., is the X server still running? If not, any hints in its log file or stderr output? ...
Could you explain the r300_scratch patch? How was it broken?
(In reply to comment #28)
> Could you explain the r300_scratch patch? How was it broken?
According to Aapo Tahkola, it is not broken. See http://archive.netbsd.se/?ml=dri-devel&a=2007-02&t=3223874.
Although I think this code could be cleaner.
What is the u in the drm_r300_cmd_header_t union?
(In reply to comment #27)
> So apparently, that's not related to your patches at all? What are the
> symptoms? E.g., is the X server still running? If not, any hints in its log
> file or stderr output? ...
I think not. This patch just changed the behaviour of the lockup. The lockup
can be reproduced by running beryl (I use the latest SVN version) with AIGLX and by rotating the cube. When you let the cube the system lockups. No hints in the logs. With the patch I sent, what was happening is that the first time I was running beryl, when I would rotate the cube for the first time, and let it ,then while the cube was getting its initial position I could see something like the cube vanishing and reappearing again, the time that it would lock up without the patch. Then if I rotated the cube again, everything was working just fine. No vanishing and reappearing. If I restarted beryl and tried to do
the same things, then my system would lock up.
I hope you understand what I wrote. Sorry for my English.
I seems to experience the same bug here;
arch linux, kernel 2.6.21, xorg 1.3.0, mesa 6.5.3, xf86-video-ati-6.6.192
As I start compiz, the system freeze irrimediably, not even ping nor acpi shutdown works. Nothing is stated in the logs after reboot.
This is systematic, and it's been like that since a few months.
Before that time (whith older version of kernel, xorg, ati driver, everything) I was able to use compiz at decent speed.
Other opengl applications doesn't (apparently) hit the bug. Specifically, I use google earth without problems.
Trying to recover from the attempted hijacks and get back to the original problem... Xavier, if you can still reproduce this, does Option "BusType" "PCI" help?
closing due to lack of feedback
Sorry, I continued discussion on debian #515326, at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=515326 ...
In a nutshell:
- "BusType" "PCI" gives me a "white screen of death" and that log:
(**) RADEON(0): Forced into PCI mode
(EE) RADEON(0): [pci] Out of memory (-12)
(EE) RADEON(0): [pci] PCI failed to initialize. Disabling the DRI.
(II) RADEON(0): [drm] removed 1 reserved context for kernel
(II) RADEON(0): [drm] unmapping 8192 bytes of SAREA 0xf82c6000 at 0xb79fc000
(II) RADEON(0): [drm] Closed DRM master.
(WW) RADEON(0): Direct rendering disabled
- I can ssh my machine sometimes (with 6.6 I couldn't), and an attach+backtrace on Xorg gives that:
#0 0xb7f22424 in __kernel_vsyscall ()
#1 0xb7bb0b29 in ioctl () from /lib/i686/cmov/libc.so.6
#2 0xb79c0bed in drmDMA (fd=10, request=0xbff3dcfc) at ../../libdrm/xf86drm.c:1266
#3 0xb79474c7 in RADEONCPGetBuffer (pScrn=0x9a83f88) at ../../src/radeon_accel.c:594
#4 0xb7999823 in RADEONPrepareSolidCP (pPix=0x9db03d0, alu=3, pm=4294967295, fg=0) at ../../src/radeon_exa_funcs.c:92
#5 0xb777d44a in exaFillRegionSolid (pDrawable=0x9db03d0, pRegion=0x9db2448, pixel=0, planemask=4294967295, alu=<value optimized out>) at ../../exa/exa_accel.c:1072
#6 0xb777edf2 in exaPolyFillRect (pDrawable=0x9db03d0, pGC=0x9d377d0, nrect=1, prect=0x9d4b51c) at ../../exa/exa_accel.c:751
#7 0x0817aad4 in damagePolyFillRect (pDrawable=0x9db03d0, pGC=0x9d377d0, nRects=1, pRects=0x9d4b51c) at ../../../miext/damage/damage.c:1404
#8 0x08089490 in ProcPolyFillRectangle (client=0x9d4b328) at ../../dix/dispatch.c:1769
#9 0x0808c51f in Dispatch () at ../../dix/dispatch.c:437
#10 0x080716f5 in main (argc=9, argv=0xbff3e064, envp=Cannot access memory at address 0xc0286431) at ../../dix/main.c:397
Ah, and FWIW, the deadlocks disappeared with 6.10.0 and reappeared around 6.10.99 IIRC.
(In reply to comment #34)
> Ah, and FWIW, the deadlocks disappeared with 6.10.0 and reappeared around
> 6.10.99 IIRC.
Any chance you could bisect between those releases and find the bad commit?
I'd very much like to do that (and more), but I really don't have time for this right now. Sorry for this.
Nowadays the behavior is different: I have a segfault during startup (I think during compiz startup). I tried removing xorg.conf, same result.
Created attachment 25640 [details]
Xorg.log (without xorg.conf)
(In reply to comment #37)
> Nowadays the behavior is different: I have a segfault during startup (I think
> during compiz startup). I tried removing xorg.conf, same result.
Yes, now it is 3d driver crashing. Could you disable compiz for now, and try reproducing it with some simpler 3d app?
It would be great if you could provide us with backtrace with mesa debugging symbols.
This may be fixed in mesa Git master and mesa_7__branch already.
Anyway, it's a separate issue so it should have been a new report and this one only reopened if the original problem reported here still happens.
Closing due to inactivity and that the reported issue seems to have changed.