8397 – r430 infinite loop under radeon_do_cp_idle

Bug 8397 - r430 infinite loop under radeon_do_cp_idle

Summary: r430 infinite loop under radeon_do_cp_idle

Status:	RESOLVED NOTABUG

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/other (show other bugs)
Version:	XOrg git
Hardware:	x86 (IA32) Linux (All)

Importance:	high normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2006-09-22 09:38 UTC by Ari Rahikkala
Modified:	2006-10-03 09:36 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Ari Rahikkala 2006-09-22 09:38:28 UTC

The box contains a Radeon X800 XL (PCIE), an Athlon64, and a VIA
VT8251/K8M800/whatchamacallit motherboard. It's running a recent Gentoo Linux, a
vanilla 2.6.18 kernel (aside drm), X.org 7.1.1 with the corresponding DRI, plus
libdrm and drm modules pulled from git today.

Starting Xorg (just the server, no clients) causes CPU use to go to 100% in
system according to top. The system can otherwise still be used with ssh, but
the X server can't be killed even with kill -9. Here's some <a
href="http://www.student.oulu.fi/~arirahik/xorg_problem/strace.txt.gz">strace</a>
(65k gzipped) and  <a
href="http://www.student.oulu.fi/~arirahik/xorg_problem/gdb.txt">gdb</a>(10k)
output about the process. strace indicates that we're stuck trying to use ioctl
0x6444 on /dev/dri/card0, getting -EBUSY every time, and gdb indicates that
we're under some XAA code that's helping with cursor handling.

I defined RADEON_FIFO_DEBUG 1 in radeon_cp.c, modprobed drm.ko with debug=1, and
<a href="http://www.student.oulu.fi/~arirahik/xorg_problem/kmsg.txt">captured
some /proc/kmsg output</a>(35k) from starting Xorg. It's pretty opaque to me,
though, so I don't know how to go on from here.

I've tried various settings in xorg.conf: SWCursor True, BusType PCI, BusType
PCIE, and AccelMethod EXA. Apparently the same bug (I only checked that top
showed 100% CPU usage in system) manifested in every case, though with EXA it
took until I actually spawned a window, while with XAA it happens soon after the
X server has started up.

Comment 1 Ari Rahikkala 2006-10-03 02:33:37 UTC

I asked around about this on IRC, and as far as I can tell the problem is that
the command processor gets locked up, but nobody could actually tell *why* that
would happen from this information. I updated to current DRM and xf86-video-ati
from git yesterday and am still seeing the same bug. Nobody on #freedesktop knew
how to investigate why the CP would get locked up... is there any chance that
anyone *would* know that?

Comment 2 Jerome Glisse 2006-10-03 07:42:12 UTC

It seems you are using 16bits depth could you try with 24bits depth. In fact try
removing all option in your conf about radeon and report if this doesn't help.
Btw tracking such lockup isn't easy, thus you won't likely find somebody telling
you why you got this. Try also to disable write-combining.

Comment 3 Ari Rahikkala 2006-10-03 09:11:42 UTC

Well, seems that the thing that fixed *this* bug was setting AGPMode 4X instead
of leaving it at the default which I think was 8X. I did that because I couldn't
figure out where to disable write-combining, and that seemed like the next
"gentle-to-the-GPU" thing to do...

So, yes, at least it starts now, so I guess this bug is invalid. Now it does
seem that rendering anything through DRI freezes the server, and exiting a
DRI-using program immediately crashes the server, but I'll be investigating that
and post a new bug shortly. :P

Comment 4 Ari Rahikkala 2006-10-03 09:14:39 UTC

(sorry for the bugspam, but I forgot to mention the explanation for this little
discrepancy: This isn't my system, and I only learned today that this graphics
card is AGP, not PCIE, at least in the sense that it goes in an AGP slot. lspci
identifies it as PCIe, though...)

Comment 5 Michel Dänzer 2006-10-03 09:36:17 UTC

(In reply to comment #3)
> Well, seems that the thing that fixed *this* bug was setting AGPMode 4X instead
> of leaving it at the default which I think was 8X. 

The default isn't 8x, see the radeon manpage. Check the log files for what it
was using.

> So, yes, at least it starts now, so I guess this bug is invalid. 

If you say so. :)

> Now it does seem that rendering anything through DRI freezes the server, and
> exiting a DRI-using program immediately crashes the server, but I'll be
> investigating that and post a new bug shortly. :P

If you're using the DRM from git, make sure it's up-to-date.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.