Bug 18097

Summary: EXA + Composite causes GLX apps to hang
Product: xorg Reporter: Yang Zhao <yang>
Component: Driver/radeonhdAssignee: Luc Verhaegen <lverhaegen>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: ahabig, karel.podvolecky, sparkybluefang
Version: 7.4 (2008.09)   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
X logfile, XAA crashing in a very similar way to this EXA bug
none
Log with one stack trace related to this bug
none
screenshot of freeze error
none
backtrace of softlock
none
first attempt to fix
none
Partial EXA changes port from radeon none

Description Yang Zhao 2008-10-16 20:17:49 UTC
Running glxgears under a composite-enabled DE (in my case, Gnome) causes X to hang after a second or two. Screen no longer redraws, load is increased significantly (7~8 on an otherwise idle system) but HW cursor still functions. Not reproducible with just a plain X server.

radeonhd from git master: efaebb70294055f371cd328124b23a343cea6a68

Alex Deucher and I narrowed it down to bad EXA/Composite interaction on IRC. Turning on EXANoComposite seems to work around it.

Neither dmesg nor Xorg.0.log seem to show anything.


I've observed this behaviour before CS branch merge, but only after the GLX app has run for significantly longer, or if I move around the window the app is in. Did not report earlier as I was unsure if it was due to my power management code.

The new CS code seems to expose the bug more consistently and quickly.
Comment 1 Yang Zhao 2008-10-17 07:00:00 UTC
Forgot to mention what the affected hardware is:

ATI Technologies Inc M52 [Mobility Radeon X1300] rev 0, Mem @ 0xd8000000/0, 0xee100000/0, I/O @ 0x00002000/0, BIOS @ 0x????????/131072

xorg-server-1.5.2


rmh3093 on IRC seems to be seeing the same issue on a M56, but unconfirmed.
Comment 2 Steven She 2008-10-17 08:57:33 UTC
I just wanted to confirm that this happens to me too on a M56 with RadeonHD 1.2.2 and 1.2.3. 

(--) PCI:*(0@1:0:0) ATI Technologies Inc M56GL [Mobility FireGL V5200] rev 0, Mem @ 0xd0000000/0, 0xee000000/0, I/O @ 0x00002000/0, BIOS @ 0x????????/131072

When I execute glxgears with compositing enabled, X becomes unresponsive after a second or two.
Comment 3 Marc Dietrich 2008-11-10 02:23:10 UTC
any news here?
My RS690 also hangs, though it seems, that this has nothing to with composting:
(II) RADEONHD(0): Attaching EXA Composite hooks for R5xx.
(**) RADEONHD(0): Option "EXANoComposite" "TRUE"
(**) RADEONHD(0): Option "EXANoUploadToScreen" "TRUE"
(**) RADEONHD(0): Option "EXANoDownloadFromScreen" "TRUE"
(**) RADEONHD(0): EXA: Disabling Composite operation (RENDER acceleration)
(**) RADEONHD(0): EXA: Disabling UploadToScreen
(**) RADEONHD(0): EXA: Disabling DownloadFromScreen

Xorg still locks (some infinite loop), but I can login via network. Xorg.0.log says:
(WW) RADEONHD(0): DRMCPIdle: DRM CP IDLE returned BUSY!

I'm using drm/radeonhd from git. 

Maybe this is a different bug, but it fits the subject ;-)

What can I do to debug it further?
Comment 4 Egbert Eich 2008-12-13 01:32:36 UTC
*** Bug 19008 has been marked as a duplicate of this bug. ***
Comment 5 Marc Dietrich 2008-12-15 03:38:55 UTC
does not happen anymore with v1.2.4+ here. Will try to crash it by stess testing later ;-)
Comment 6 Alec Habig 2008-12-15 14:43:25 UTC
Still happens to me (w/ 1.2.4), with the

(WW) RADEONHD(0): DRMCPIdle: DRM CP IDLE returned BUSY!

Xorg log messages.  Doesn't happen right away though, took some cube spinning and window wobbling.  Didn't have glxgears running at the time though.
Comment 7 Alec Habig 2008-12-30 09:17:14 UTC
Just wanted to update this bug.  As of commit 003325a56684649171b2c1af50aa490b1461ee16 (ie, just before the R6xx stuff hit) and running on F10's kernel 2.6.27.9-159.fc10 (which includes a bunch of commits from Dave Airlie in early December fixing drm and radeon kernel module stuff), this bug is still present.  Although I didn't get the DRMCPIdle log message this time (likely because I didn't start the Xserver logverbose 7).

The possibly related Bug 19008 is still there too, no log messages in that one of course since the machine locks up hard after a few tens of seconds of glxgears under Windowmaker instead of gnome.
Comment 8 Alec Habig 2009-02-25 14:47:26 UTC
Created attachment 23296 [details]
X logfile, XAA crashing in a very similar way to this EXA bug

Two more data points here.

1) It happens with XAA as well as EXA.  glxgears can run, till you start dragging it around on a desktop with compositing enabled.  
2) kde 4.2 with EXA behaves like windowmaker did.  Simply running glxgears for 5 seconds makes a hard lockup that sometimes sticks the case speaker on, completely bricking the computer.

The attachment is from case 1, in case it sheds light on the EXA bug this bug is about, even though it's XAA.  There's a possibly interesting backtrace there, followed by:

[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
Comment 9 Alec Habig 2009-03-10 09:23:07 UTC
For what it's worth, I tried the plain old radeon (not radeonhd) as well, which has experimental r5xx 3D support.

Same bug with that driver.  Which might help isolate the bug, as it's in code common to the two drivers.
Comment 10 Alex Deucher 2009-03-10 09:31:59 UTC
Would it be possible to try the drm-next branch of Dave's drm-2.6 kernel tree?

http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=summary

I've also pulled some of the fixes into the r6xx-r7xx-support branch in the fdo drm tree.  So you could try that if it's easier.
http://cgit.freedesktop.org/mesa/drm/?h=r6xx-r7xx-support

this may be a dup of bugs 20348, 16198.
Comment 11 Yang Zhao 2009-03-10 17:53:02 UTC
I've just finished trying the latest r6xx-r7xx-support of drm and drm-next of drm-2.6. No luck with either.
Comment 12 Marc Dietrich 2009-03-12 10:26:53 UTC
The promised stress test in comment #5 (playing chromium) crashed as before. Somehow glxgears wasn't enough. Today, I checkout drm-master (and radeonhd-master) and tried again. No crashes since now. 
Yang: Can you please also check with drm-master? I think the r6xx-7xx branch may not have all updates. 
Comment 13 Yang Zhao 2009-03-12 10:34:18 UTC
(In reply to comment #12)
> The promised stress test in comment #5 (playing chromium) crashed as before.
> Somehow glxgears wasn't enough. Today, I checkout drm-master (and
> radeonhd-master) and tried again. No crashes since now. 
> Yang: Can you please also check with drm-master? I think the r6xx-7xx branch
> may not have all updates. 

drm master has no r6xx-r7xx support AFAICS. The log doesn't show any merging that's occurred.
Comment 14 Marc Dietrich 2009-03-12 10:39:28 UTC
The r6xx branch shouldn't be required as the X1300 you mentioned in comment 1 is a r5xx chip. Are you still using it?
Comment 15 Yang Zhao 2009-03-12 10:44:32 UTC
Oh, oops. Neglected to context switch my brain to r5xx when I ran those tests.

I'll run the tests again later today, with the proper branch.
Comment 16 Yang Zhao 2009-03-12 16:57:51 UTC
Persists with drm-master and latest radeonhd.
Comment 17 Alec Habig 2009-05-14 10:01:08 UTC
Current git dd287015 (running on an F10 system, so perhaps not the bleeding edge drm) gives me a different error message when it locks up:

[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
Comment 18 Karel Podvolecky 2009-05-18 00:56:59 UTC
Created attachment 25954 [details]
Log with one stack trace related to this bug

Attached log file with one stack trace.
I started GDM, logged in. Then I run glxgears. One frame has been drawn. Then everything graphical except mouse cursor freezed. Linux was working (hdd activity, ping responses, mouse moving at ~1 fps). This log is after 2 or 4 minutes of this freeze.
Comment 19 Karel Podvolecky 2009-05-18 01:03:38 UTC
(In reply to comment #18)
> Created an attachment (id=25954) [details]
> Log with one stack trace related to this bug
> 
> Attached log file with one stack trace.
> I started GDM, logged in. Then I run glxgears. One frame has been drawn. Then
> everything graphical except mouse cursor freezed. Linux was working (hdd
> activity (updatedb), ping responses, mouse moving at ~1 fps). This log is after 2 or 4
> minutes of this freeze.
> 

I forgot to mention my system:
Gentoo up-to-date, 32bit, mobility X1300 64MB, actual git master, version of other sw see log file.
Comment 20 Karel Podvolecky 2009-05-26 10:31:50 UTC
Created attachment 26224 [details]
screenshot of freeze error

Hello there,
  I was trying last git commit 08acc05a4c252ff5c55ed1c8f55f106aa6c68546 with EXA accel. 
  First run of googlearth: 
  - mouse was only moving (same cursor, not responging to buttons)
  - nothing else responging
  - I was able to ssh to machine. Googlearth had 100% CPU and system was responsible.
  - I killed googlearth and everything was fine (even GUI was working like charm).

  Then I run Googlearth second time. Almost same scenario:
  - mouse moves
  - ssh worked. But googlearth had only few percent CPU
  - system was barely responsive
  - and in few seconds hard freeze, only power off worked (hard off)

Attached picture is message from second run. I hope it can point you to better place where to look.

Cheers
kapo
Comment 21 Alec Habig 2009-06-11 17:17:13 UTC
Still here after upgrading to Fedora 11, with it's updated X and kernel, with the current radeonhd git.
Comment 22 Pierpaolo Follia 2009-07-06 07:58:29 UTC
I'm using the latest mesa 7.6 (radeon rewrite) with a 2.6.30 kernel: from a couple of week the problem seems to be totally gone.
Comment 23 Matthias Hopf 2009-07-06 09:08:23 UTC
(In reply to comment #22)
> I'm using the latest mesa 7.6 (radeon rewrite) with a 2.6.30 kernel: from a
> couple of week the problem seems to be totally gone.

That's really good news.
You are running radeonhd (not radeon)? Have you tried both XAA and EXA?

Any ideas whether this is related to 2.6.30 DRM update, or to the radeon rewrite branch in mesa?

Thanks

Comment 24 Pierpaolo Follia 2009-07-07 00:51:27 UTC
(In reply to comment #23)
> That's really good news.
> You are running radeonhd (not radeon)? Have you tried both XAA and EXA?

Running radeon driver (not hd) with EXA mode active. Before my upgrade, there was no way to run a gl screensaver or glxgears.

> 
> Any ideas whether this is related to 2.6.30 DRM update, or to the radeon
> rewrite branch in mesa?

Well, I upgraded all this stuff all together: I wrote in list, and after the first upgrade I still had some hard locks (though not so frequently as before). Now, as I wrote, from a couple of weeks the system is perfect (now running compiz, using gl screen savers and every kind of accelerated apps). Sorry for not be so careful (I can't say you exactly when this bug is gone for me), but I'm upgrading my system (ubuntu) using tormold's PPA and sometimes I used to try GL apps to see if something was changed.
To be sure about the DRM or Mesa, I can downgrade the kernel and tell you the result. 


Comment 25 Jerome Glisse 2009-07-07 06:46:54 UTC
Yang can you confirm that lastest kernel and mesa fix the issue for you too ?
Comment 26 Matthias Hopf 2009-07-07 07:13:23 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > That's really good news.
> > You are running radeonhd (not radeon)? Have you tried both XAA and EXA?
> 
> Running radeon driver (not hd) with EXA mode active. Before my upgrade, there
> was no way to run a gl screensaver or glxgears.

Ok. Just one question: could you try with latest radeonhd as well (no change otherwise)? I have no idea whether radeonhd currently works with the rewrite branch, even it should theoretically be compatible.

> To be sure about the DRM or Mesa, I can downgrade the kernel and tell you the
> result. 

IMHO that's not worth the pain.
Comment 27 Yang Zhao 2009-07-07 14:25:26 UTC
Still freezes with latest libdrm, mesa, and radeonhd from git, plus 2.6.31-rc2 kernel, but different symptom.

No longer freezes after a certain time, but only when window with glxgears is moved around. Cursor position input remains responsive for sometime and even reacts with changing image as it's moved around window decorations. No other inputs are responsive.

EXANoComposite still functions as workaround. No messages in Xorg.0.log nor dmesg.
Comment 28 Pierpaolo Follia 2009-07-08 01:00:49 UTC
(In reply to comment #26)
> Ok. Just one question: could you try with latest radeonhd as well (no change
> otherwise)? I have no idea whether radeonhd currently works with the rewrite
> branch, even it should theoretically be compatible.

Radeonhd driver seems better than the radeon one: with radeonhd bug 20470 is gone for me too :-)
Anyway, I'm trying it now and seems to work well. But I can't be sure...I have to use it for a while. I'll let you know.
Comment 29 Matthew Turnbull 2009-07-08 08:06:45 UTC
Using kernel 2.6.30, libdrm 2.4.11 (and git head), mesa git head, xorg-server 1.6.1.902, and radeonhd git head on a Mobility x1400

Running glxgears causes my screen to go completely black except for the cursor. If I'm also using Composite then the keyboard/mouse also becomes unresponsive. Though now I'm not seeing any of the DRM CP or IRQ errors in the log.
Comment 30 Pierpaolo Follia 2009-07-08 23:39:40 UTC
(In reply to comment #28)
> Radeonhd driver seems better than the radeon one: with radeonhd bug 20470 is
> gone for me too :-)
> Anyway, I'm trying it now and seems to work well. But I can't be sure...I have
> to use it for a while. I'll let you know.

Update: radeonhd driver still causes X to hang after a second or two when running gl apps (like glxgears). This happens only using EXA for me. Using XAA the system seems stable.
With radeon driver, both XAA and EXA work well and I have no hangs.
I'm using an ATI Radeon x1400 and the only difference between my system configuration and the one on comment #29 from Matthew Turnbull is that my xorg-server is version 1.6.2+git20090707 (server-1.6-branch)

Comment 31 Matthias Hopf 2009-07-09 08:40:54 UTC
(In reply to comment #27)
> No longer freezes after a certain time, but only when window with glxgears is
> moved around. Cursor position input remains responsive for sometime and even
> reacts with changing image as it's moved around window decorations. No other
> inputs are responsive.

Sounds like there is still a 2d/3d locking issue left in the EXA code - I think there are other bug reports indicating the same. Thanks for testing!

(In reply to comment #30)
> Update: radeonhd driver still causes X to hang after a second or two when
> running gl apps (like glxgears). This happens only using EXA for me. Using XAA
> the system seems stable.
> With radeon driver, both XAA and EXA work well and I have no hangs.

Also thanks for testing!
Comment 32 Marc Dietrich 2009-08-28 06:25:23 UTC
ok - retestet with latest drm/mesa/radeonhd/kernel/xorg1.6.3. glxgears still softlocks on rs690. I was able to get a backtrace from a remote machine. Will attach it later. It showed that it had something to do with ExaDownloadFromScreen, so I disabled it. Now glxgears & chromium run fine.
Comment 33 Marc Dietrich 2009-08-28 06:54:03 UTC
Created attachment 28983 [details]
backtrace of softlock

seem the softlock is not very reproducible. as reported before, glxgears not always triggers the bug. really strange.
Comment 34 Marc Dietrich 2009-08-28 12:15:15 UTC
Created attachment 28985 [details] [review]
first attempt to fix

this patch seems to fix it here (at least I can play chomium) and glxgears does not crash. please try out!
Comment 35 Matthias Hopf 2009-09-08 06:33:57 UTC
Good catch!

R5xx2DIdle(pScrn) emits commands that have to be flushed() to be actually processed...

Submitted.
Comment 36 Matthias Hopf 2009-09-08 08:07:37 UTC
According to Luc this is actually wrong - but as it fixes your symptoms, the code stays this way until somebody comes up with a real solution.
Comment 37 Alec Habig 2009-09-08 08:22:44 UTC
This lets glxgears run for about twice as long as it used to (say, 10 seconds instead of 5), and eliminates the class of soft lockups which actually spit something to Xorg.0.log.  So, progress!

But whichever problem I started with in bug 19008 is still there.  The laptop (Thinkpad T60p with FireGL 5200) locks up so hard that nothing is logged at all and I can't even ssh back in.  The r5xx is supposed to have worked for a long time, but not this particular one (although it is a rather common setup) :(

Plus, picked up a new regression where my PANEL display gets blacked out in duaolhead mode, possibly the recent power management features?  It does say

  (II) RADEONHD(0): LVDSSetBacklight: trying to set BL_MOD_LEVEL to: 0

although xrandr claims no displays have backlight settings.  But I guess that's a separate bug.

Anyway - could the hard lock be the consequences of the "not quite right" handling of the flush which Luc described?
Comment 38 Yang Zhao 2009-09-27 19:13:06 UTC
Created attachment 29913 [details] [review]
Partial EXA changes port from radeon

Give this patch a try.

It doesn't prevent freezes completely, but seems to be a step in the right direction.
Comment 39 Yang Zhao 2009-09-28 20:30:21 UTC
This seems to definitely be fixed in drm, and even old versions of radeon work fine.

Seems to me there's something fundamentally wrong in the way r5xx DRM is used in radeonhd.  Since the codes have diverged so much, it may be easier to do a re-port of the r5xx accel code instead of trying to find the bug.
Comment 40 Yang Zhao 2009-11-06 13:02:17 UTC
Fixed as of 40fa33f05df863ed98cb43539f237cb90a3a5e38

Unless someone is still experiencing the failure, I'll close this in a couple days.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.