Bug 24587

Summary: HD3470 mobility hangs (more or less randomly) with KMS
Product: xorg Reporter: Stefano Carignano <scary.moo>
Component: Driver/RadeonAssignee: xf86-video-ati maintainers <xorg-driver-ati>
Status: RESOLVED DUPLICATE QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: joakim, reinouts
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
Xorg.0.log none

Description Stefano Carignano 2009-10-17 03:24:49 UTC
Created attachment 30498 [details] [review]
dmesg

Latest (2009-10-16) drm-next over 2.6.31.y , latest xf86-video-ati, mesa and libdrm, all from git. 

When KMS is enabled, sometimes X (with the whole laptop) would hang. Sometimes the screen suddenly becomes black, sometimes the image on screen remains, I can still move the mouse, but keyboard and clicking/scrolling/everything is unresponsive (and X takes 100% cpu). Sometimes I can ssh/sysrq, sometimes It just doesn't work. Usually when it hangs and I can ssh to the laptop, the logs show no particular error.

The thing usually happens few (10-15) seconds after I've started X, and it usually follows the startup of a GTK app (I'm using xfce, no desktop compositing/compiz stuff whatsoever).

The hanging happens 100% of the time if I'm using a preemptive kernel. If I remove the preemptive option, sometimes it... survives :) As far as I can see it either happens immediately or doesn't happen at all. 
Building drm as a module /seems/ to improve things, but I cannot say for certain, it's just an impression.
modprobing drm with debug flag /seems/ to improve things even more. Not sure of that either.
Comment 1 Stefano Carignano 2009-10-17 03:25:28 UTC
Created attachment 30499 [details] [review]
Xorg.0.log
Comment 2 Alex Deucher 2009-10-17 17:13:21 UTC
Does your kernel have aspm (active state power management) enabled?  If so try booting with pcie_aspm=off on the kernel command line.
Comment 3 Stefano Carignano 2009-10-19 00:10:01 UTC
nope, always been disabled.

Comment 4 Stefano Carignano 2009-10-19 07:21:47 UTC
I might add this: another 100% certain way of making it hang, if it survives the first few seconds, is suspending and then resuming. Suspend works fine, resume works fine, 20-30 seconds after resume it hangs the same way as before. I have tried it only two-three times, and it has done it every time. 
Of course I am willing to post whatever other info you need. Just ask



Comment 5 Reinout van Schouwen 2009-10-19 16:01:19 UTC
I'm having a similar problem and as requested on the mailing list I attached my Xorg log and dmesg to bug 21643. But this bug is probably closer to my experience.
Comment 6 Rafał Miłecki 2009-10-19 16:08:04 UTC
Do you use Gnome? Is this possible it's window manager tries to use OpenGL? What about removing r600 for a moment?
mv /usr/lib64/dri/r600_dri.so /usr/lib64/dri/r600_dri.so.backup

If this does not help, try to disable Composite, DFS and UTS with following options in xorg.conf:
Option "EXANoComposite" "true"
Option "EXANoUploadToScreen" "true"
Option "EXANoDownloadFromScreen" "true"

If this still does not help, can you apply following patch to xf86-video-ati?
diff --git a/src/r600_exa.c b/src/r600_exa.c
index a2ce5c9..cf381c8 100644
--- a/src/r600_exa.c
+++ b/src/r600_exa.c
@@ -173,6 +173,7 @@ R600PrepareSolid(PixmapPtr pPix, int alu, Pixel pm, Pixel fg)
     uint32_t a, r, g, b;
     float ps_alu_consts[4];

+    return FALSE;
     if (!R600CheckBPP(pPix->drawable.bitsPerPixel))
        RADEON_FALLBACK(("R600CheckDatatype failed\n"));
     if (!R600ValidPM(pm, pPix->drawable.bitsPerPixel))
Just put "return FALSE;" after variables definitions.
Comment 7 Stefano Carignano 2009-10-20 03:18:05 UTC
@Rafał: was your comment directed to Reinout or to me ? I'm currently using Xfce, with compositing disabled in the wm options. I tried removing the r600.so and also adding the EXANo... to xorg.conf, tried suspending, then resuming and I got the same identical behavior as before, ie. I can resume, play around for ~ 30-60 seconds, and then it hangs. For the record, when adding the ExaNo... options, the screen hung but I still could move the mouse (but couldn't click/write/...), while when removing r600_dri.so I got a total hang and the screen turned completely black. Shall I go on with recompiling the driver ? Or would that be useless for me?

@Reinout: I speak without knowing things, but I'm not sure if our bugs are completely related.. from what I've understood from your report, you cannot get  a usable screen at all, am I right ? While in my case I /always/ get a useable screen, with no corruption whatsoever, it's just that after a few seconds sometimes X (and the whole system) becomes totally unresponsive. I never got anything like stripes or such... well, up to the devs to decide :)
Comment 8 Rafał Miłecki 2009-10-20 09:54:27 UTC
(In reply to comment #7)
> @Rafał: was your comment directed to Reinout or to me ?
> (...)
> Shall I go on with recompiling the driver ? Or would that be useless for me?

Yes, this was directed to you, don't really know what is Reinout's issue.

Stefano: I own very similar GPU.

My:
(--) PCI:*(0:1:0:0) 1002:95c4:104d:9035 rev 0

Yours:
(--) PCI:*(0:1:0:0) 1002:95c4:1179:ff50 ATI Technologies Inc Mobility Radeon HD 3400 Series rev 0

And I have very similar issue described in bug #24535. As you can see in my bug report disabling Solid with xf86-video-ati hack removed locking for me. So I would really like to hear if this works for you as well.
Comment 9 Stefano Carignano 2009-10-20 10:43:29 UTC
Oh! A fellow soul lost in the 3470 troubles too :) 
Jokes aside, I did a /very/ quick testing with your patch (and erm, to be honest I also applied Alex's proposed patch in your #24535 :D)
And...... it doesn't change a thing :( Well, to be honest I managed to boot and have the system survive /once/ with a preemptive kernel, but it could have been pure luck... tried two other times (w/ preempt) and got the usual hang. 
Also tried suspend/resume without preempt.. no luck. same as before, resume just fine, 30-60 seconds of gtk apps, no 3d etc, hang. 



Comment 10 Joakim Gebart 2009-10-22 06:33:04 UTC
I have the same trouble on an HD3470 mobility (RV620 or whatever it's called in the firmware)
If I enable preempt I usually get the same lockup as described in the OP at most a few minutes after login (Gnome 2.26), mouse works but everything else seems to have stopped. I can still SSH to the machine. SysRq works. Caps lock led doesn't respond. This is with a 2.6.32-rc kernel, I haven't tried .31.
When not running with preempt it seems that the system is stable until I suspend to ram and resume (just like #4). Glxgears also seems to cause it to hang after a few seconds.
Dist: Gentoo
Some package versions that might be relevant:
Gnome 2.26 (stable)
Mesa svn from 2009-10-21
xf86-video-ati svn from 2009-10-21
libdrm svn from 2009-10-21
kernel 2.6.32-rc5 with drm-next patches, git pull from 2009-10-21
Comment 11 Stefano Carignano 2009-10-23 04:08:55 UTC
> Glxgears also seems to cause it to
> hang after a few seconds.

Hm! You sure about that ? I have tried running glxgears for a couple minutes and have gotten no hang so far.. although when I launch it I get a warning about IRQs not working correctly and I can see that both X and glxgears itself start sucking all of the cpu (~50% glxgears, ~50% X). Is that to be expected? BTW, I'm getting ~1500-2000 fps.
If I recall correctly, I had a problem with opengl apps (say, mplayer -vo gl) hanging up the system some time ago, but only when KMS was /not/ enabled.
Comment 12 Rafał Miłecki 2009-11-02 23:02:01 UTC
Stefano can you test patch from bug #24535 comment #25 ?
Comment 13 Stefano Carignano 2009-11-03 06:17:09 UTC
I'm currently testing the patch.. so far I've managed to boot into an usable system twice with preemptive enabled and also to suspend and survive from the resume. Seems to be working !  Thanks Rafał, and thanks to the devs ! Of course I will keep on testing and see if something wrong happens...

Comment 14 Rafał Miłecki 2009-11-04 15:23:18 UTC
Stefano: thanks for testing, hopefully we will still get that into .32.

Joakim: sounds very much like our issue, so should be fixed now in drm-next and hopefully in next .32 RC.

*** This bug has been marked as a duplicate of bug 24535 ***
Comment 15 Joakim Gebart 2009-11-05 01:43:26 UTC
I've used the patch in Bug #24535 Comment #25 for two days now and it seems to be working without lockups, kernel is compiled with CONFIG_PREEMPT and suspend/resume is working too. My glxgears-problem went away as well.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.