Bug 1912 - Locks up w/ DynamicClocks
Locks up w/ DynamicClocks
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/Radeon
7.0.0
All All
: high normal
Assigned To: Xorg Project Team
:
Depends on:
Blocks: 2790
  Show dependency treegraph
 
Reported: 2004-11-23 11:13 UTC by Kristian Høgsberg
Modified: 2006-02-15 15:18 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Patch to special case RV100 in the dynamic clock setup (1.45 KB, patch)
2004-11-23 11:16 UTC, Kristian Høgsberg
no flags Details | Splinter Review
better patch (2.68 KB, patch)
2004-11-24 17:36 UTC, Alex Deucher
no flags Details | Splinter Review
updated patch (2.70 KB, patch)
2004-11-28 10:01 UTC, Alex Deucher
no flags Details | Splinter Review
new patch (3.21 KB, patch)
2004-11-30 16:45 UTC, Alex Deucher
no flags Details | Splinter Review
one more time... (1.62 KB, patch)
2004-11-30 21:43 UTC, Alex Deucher
roland.mainz: 6.8‑branch+
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Kristian Høgsberg 2004-11-23 11:13:47 UTC
I've been chasing a hard lockup that occurs on Radeon 7000 cards on SMP boxes. 
The lock up is typically triggered by high levels of drawing activity, the gnome
login process typically triggers it within 10 seconds, another way to reproduce
is the pixbuf part of gtk-demo.  The box locks up hard, it's not pingable and
there is no response on a serial console.  I was able to trace it down to the
dynamic clock feature, and if I remove the calls to RADEONSetDynamicClock in
lines 4511-4515 I can no longer reproduce the lockup.  I'm not sure what the
exact problem is, but I suspect some of the bits set in RADEONSetDynamicClock
could be undefined for RV100.  I don't have the specs, so I'm only guessing.

I'm using the attached patch in our RPMs to special case RV100s, but I'd like to
hear if others have better suggestions, and I would like to get this into head
and 6.8.2.
Comment 1 Kristian Høgsberg 2004-11-23 11:16:31 UTC
Created attachment 1361 [details] [review]
Patch to special case RV100 in the dynamic clock setup
Comment 2 Kristian Høgsberg 2004-11-23 11:18:33 UTC
Comment on attachment 1361 [details] [review]
Patch to special case RV100 in the dynamic clock setup

The patch basically reverts to 6.7.0 behaviour for RV100s.
Comment 3 Alex Deucher 2004-11-23 12:03:48 UTC
This patch will disable dynamicclocks on all RV100 hardware including
m6 which is what one of the chips on which is is most useful and
doesn't seem to cause a problem.  I've done extensive testing on my m6
and all is well there.  Unless we come up with a better fix, we should
probably change the patch to:

-            if ( !info->HasCRTC2 ) {
+           if ((info->ChipFamily == CHIP_FAMILY_RV100) || !(info->IsMobility)) {
+               break;
+           } else if ( !info->HasCRTC2 ) {
Comment 4 Alex Deucher 2004-11-23 12:06:45 UTC
ARGH!  sorry , I meant:

-            if ( !info->HasCRTC2 ) {
+           if ((info->ChipFamily == CHIP_FAMILY_RV100) && !(info->IsMobility)) {
+               break;
+           } else if ( !info->HasCRTC2 ) {
Comment 5 Alex Deucher 2004-11-24 17:36:53 UTC
Created attachment 1369 [details] [review]
better patch

This should take care of the ForceON case for desktop VE cards and also not
affect m6 users.  This also fixes a typo in in radeon_reg.h related to
dynclocks noticed by benh.
Comment 6 Alex Deucher 2004-11-28 10:01:37 UTC
Created attachment 1412 [details] [review]
updated patch

comment out rv200 section of last patch.  code only applies to rv100 in this
case, so no need for rv200 code in it.
Comment 7 Kristian Høgsberg 2004-11-29 18:51:02 UTC
I tested the patch in attachment 1412 [details] [review] on the SMP box in question.  I wasn't able
to reproduce the crash with this patch either.  I've updated our Fedora RPM to
use this patch, and I would say that it's material for HEAD and 6.8.2.
Comment 8 Alex Deucher 2004-11-30 07:48:31 UTC
looks like this needs to be applied to rv200 as well; see bug 1773.  I'll work
up a new patch to include rv200 and apply to HEAD.
Comment 9 Kristian Høgsberg 2004-11-30 08:11:02 UTC
What about just playing it safe and go with the 6.7.0 clock setting for all
chipsets when DynamicClocks are disabled?
Comment 10 Alex Deucher 2004-11-30 08:54:37 UTC
(In reply to comment #9)
> What about just playing it safe and go with the 6.7.0 clock setting for all
> chipsets when DynamicClocks are disabled?

I was considering the same thing.  The only potential problem I could see would
be where a user used dynamicclocks on one server startup and then turned it off
the next round without a reboot.  would the 6.7 behavior be enough?  I suppose
if it caused any problems a reboot would solve them.
Comment 11 Kristian Høgsberg 2004-11-30 11:30:40 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > What about just playing it safe and go with the 6.7.0 clock setting for all
> > chipsets when DynamicClocks are disabled?
> 
> I was considering the same thing.  The only potential problem I could see would
> be where a user used dynamicclocks on one server startup and then turned it off
> the next round without a reboot.  would the 6.7 behavior be enough?  I suppose
> if it caused any problems a reboot would solve them.

Ah, I see what you're saying, those dynamic clock bits wouldn't be set back to
static clock in that case, which would interfer with the 6.7 clock setup code
when starting the X server without dynamic clocks.

I think this is a reasonable tradeoff though, it's better to have a stable
server that could act a bit strange when changing the config, than a server
locks up hard on a number of cards.
Comment 12 Alex Deucher 2004-11-30 12:03:10 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > What about just playing it safe and go with the 6.7.0 clock setting for all
> > > chipsets when DynamicClocks are disabled?
> > 
> > I was considering the same thing.  The only potential problem I could see would
> > be where a user used dynamicclocks on one server startup and then turned it off
> > the next round without a reboot.  would the 6.7 behavior be enough?  I suppose
> > if it caused any problems a reboot would solve them.
> 
> Ah, I see what you're saying, those dynamic clock bits wouldn't be set back to
> static clock in that case, which would interfer with the 6.7 clock setup code
> when starting the X server without dynamic clocks.
> 
> I think this is a reasonable tradeoff though, it's better to have a stable
> server that could act a bit strange when changing the config, than a server
> locks up hard on a number of cards.

Agreed.  I'll post a new patch tonight.
Comment 13 Alex Deucher 2004-11-30 16:45:05 UTC
Created attachment 1432 [details] [review]
new patch

this patch should take care of all cases.  It reverts the forceON code back to
the 6.7.0 version, which seems to work in all cases.  It also adds a check, as
per benh's last patch, to only enable/disable dynamicclocks on the first
instance of the Xserver for the card.  I've tested on my radeons and I've had
no problems.  If this patch is acceptable, I'll commit it to HEAD.
Comment 14 Kristian Høgsberg 2004-11-30 17:50:56 UTC
(In reply to comment #13)
> Created an attachment (id=1432) [edit]
> new patch
> 
> this patch should take care of all cases.  It reverts the forceON code back to
> the 6.7.0 version, which seems to work in all cases.  It also adds a check, as
> per benh's last patch, to only enable/disable dynamicclocks on the first
> instance of the Xserver for the card.  I've tested on my radeons and I've had
> no problems.  If this patch is acceptable, I'll commit it to HEAD.

Looks good, I'll give this a spin tomorrow, thanks.
Comment 15 Alex Deucher 2004-11-30 21:03:51 UTC
going forward if you could try and narrow down which part is causing the
problem, it'd be much appreciated.  one thing you may want to try is increasing
the sleeps while waiting for the clocks to lock.  perhaps one of the clocks
isn't quite locked.  It'd be nice to get this code fully working rather than
just worked around.
Comment 16 Alex Deucher 2004-11-30 21:26:26 UTC
as a reference bug 1773 is related (similar issues on rv200)
Comment 17 Alex Deucher 2004-11-30 21:43:25 UTC
Created attachment 1433 [details] [review]
one more time...

this should limit the clock stuff to just mobility chips.
Comment 18 Kristian Høgsberg 2004-12-01 12:03:09 UTC
(In reply to comment #17)
> Created an attachment (id=1433) [edit]
> one more time...
> 
> this should limit the clock stuff to just mobility chips.

Don't regular rv100 chips need the initialization in the DynamicClocks=0 case?
Comment 19 Alex Deucher 2004-12-01 12:22:16 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > Created an attachment (id=1433) [edit] [edit]
> > one more time...
> > 
> > this should limit the clock stuff to just mobility chips.
> 
> Don't regular rv100 chips need the initialization in the DynamicClocks=0 case?

I'm not 100% sure, but I don't think they do. After some discussion, I suspect
that code was mostly around for mobility chips.  I suspect it's safer not to
mess with those things on non-mobility chips.  If you have problems without it
let me know, otherwise we should be ok.  
Comment 20 Kristian Høgsberg 2004-12-03 13:52:26 UTC
Comment on attachment 1361 [details] [review]
Patch to special case RV100 in the dynamic clock setup

Removing request for the first iteration of the patch.
Comment 21 Kristian Høgsberg 2004-12-03 13:54:19 UTC
Comment on attachment 1433 [details] [review]
one more time...

Requesting approval for the latest spin of the patch, which I have confirmed to
fix an solid lock up on a SMP box with a Radeon 7000 card.
Comment 22 Roland Mainz 2004-12-08 05:59:42 UTC
Comment on attachment 1433 [details] [review]
one more time...

Approved in the 2004-12-06 release-wranglers conference call.
Please do not commit it yourself, I'll do that later today...
Comment 23 Alex Deucher 2004-12-08 06:01:41 UTC
(In reply to comment #22)
> (From update of attachment 1433 [details] [review] [edit])
> Approved in the 2004-12-06 release-wranglers conference call.
> Please do not commit it yourself, I'll do that later today...
> 

I can still commit it to the trunk though or are you going to take care of that too?
Comment 24 Kristian Høgsberg 2004-12-08 06:04:16 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > (From update of attachment 1433 [details] [review] [edit] [edit])
> > Approved in the 2004-12-06 release-wranglers conference call.
> > Please do not commit it yourself, I'll do that later today...
> > 
> 
> I can still commit it to the trunk though or are you going to take care of
that too?

Please go ahead, thanks.
Comment 25 Alex Deucher 2004-12-11 16:15:33 UTC
committed to HEAD
Comment 26 ajax at nwnk dot net 2005-04-03 14:35:48 UTC
mass update: the fix for this bug was applied to both HEAD and the stable 6.8 branch, but the bug was 
never closed.  closing now.
Comment 27 Kristian Høgsberg 2005-04-14 13:41:31 UTC
I'm reopening this bug as we're seeing regressions as we moved FC-3 from 6.8.1
with the patch in comment #13 applied in our RPM to 6.8.2 to which the patch in
comment #17 was applied.

Regression bug:

  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=15264
Comment 28 Alex Deucher 2005-04-14 14:01:21 UTC
(In reply to comment #27)
> I'm reopening this bug as we're seeing regressions as we moved FC-3 from 6.8.1
> with the patch in comment #13 applied in our RPM to 6.8.2 to which the patch in
> comment #17 was applied.
> 
> Regression bug:
> 
>   https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=15264

Kristian,

   That redhat bug comes back as invalid.  There are a few mistakes in the
dynamicclocks code that benh found (outregs that should be outpll, etc.); we
should probably fix those up.  additionally, there are some hw bugs in the PLL
code that benh recently fixed that might be causing problems with the
dynamicclocks code.   you may want to pull in the PLL access errata patch from
cvs HEAD.
Comment 29 Kristian Høgsberg 2005-04-14 15:29:02 UTC
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=152648(In reply to comment #28)
> > Regression bug:
> > 
> >   https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=15264
> 
> Kristian,
> 
>    That redhat bug comes back as invalid.  

Oops, I cut off the last digit from the URL:

   https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=152648

> There are a few mistakes in the
> dynamicclocks code that benh found (outregs that should be outpll, etc.); we
> should probably fix those up.  additionally, there are some hw bugs in the PLL
> code that benh recently fixed that might be causing problems with the
> dynamicclocks code.   you may want to pull in the PLL access errata patch from
> cvs HEAD.

That sounds good.  For FC-3 I think I'll just revert to the patch we used before, but it would make sense 
to get these patches nominated for 6.8.3 and try them out in rawhide.
Comment 30 Benjamin Herrenschmidt 2005-04-14 16:40:16 UTC
I have a set of patches fixing issues in STABLE at

http://gate.crashing.org/~benh/xorg/

I'm not sure the PLL/errata fixes will apply 'alone' as all patches touch the
same file, they pretty much have to be applied in order. They are all candidate
for 6.8.3 as far as I'm concerned as they are all bug fixes.
Comment 31 Frank Arnold 2005-06-28 13:07:10 UTC
(In reply to comment #29)
> That sounds good.  For FC-3 I think I'll just revert to the patch we used
before, but it would make sense 
> to get these patches nominated for 6.8.3 and try them out in rawhide.

Patch from comment #13 brakes something. See bug 3318 and related redhat bugs.
Comment 32 T. Hood 2005-09-26 03:22:45 UTC
*** Bug 2187 has been marked as a duplicate of this bug. ***
Comment 33 Alex Deucher 2005-10-04 09:04:57 UTC
This bug should be fixed in cvs.
Comment 34 ajax at nwnk dot net 2005-10-21 11:58:05 UTC
(In reply to comment #33)
> This bug should be fixed in cvs.

marking as fixed then.  reopen if it needs reopening.
Comment 35 roby 2005-10-22 08:39:23 UTC
Hi! Sorry, I am a newbie.. I think I unfortunately meet this bug in ubuntu
5.10.. Could you explain me how to patch it? Thank you in advance!
Comment 36 Alex Deucher 2005-10-22 09:45:16 UTC
(In reply to comment #35)
> Hi! Sorry, I am a newbie.. I think I unfortunately meet this bug in ubuntu
> 5.10.. Could you explain me how to patch it? Thank you in advance!

You need to build and install xorg from cvs HEAD:
http://dri.freedesktop.org/wiki/Building
or grab one of the xorg 6.9/7.0 snapshots
Comment 37 Martin Juergens 2006-02-16 07:31:59 UTC
Ubuntu Dapper Drake does have xorg 7 but the system freezes there, too. I have a
Radeon 7000, too and after the freeze the system is not pingable, either.

So I will reopen this bug. Please tell me which information you need to fix
this, I will then try my best to provide you with the required information.

You may also want to have a look at the "downstream" Ubuntu bugreport:
https://launchpad.net/distros/ubuntu/+source/xorg/+bug/16873
Comment 38 Alex Deucher 2006-02-16 08:09:40 UTC
(In reply to comment #37)
> Ubuntu Dapper Drake does have xorg 7 but the system freezes there, too. I have a
> Radeon 7000, too and after the freeze the system is not pingable, either.
> 
> So I will reopen this bug. Please tell me which information you need to fix
> this, I will then try my best to provide you with the required information.
> 
> You may also want to have a look at the "downstream" Ubuntu bugreport:
> https://launchpad.net/distros/ubuntu/+source/xorg/+bug/16873

This isn't the cause of your lockup as the DynamicClocks code is not called at
all on desktop boards.  Try disabling the DRI.

Comment 39 Martin Juergens 2006-02-16 08:50:37 UTC
ok, this works. Upstream or downstream issue?
Comment 40 Alex Deucher 2006-02-16 10:18:03 UTC
(In reply to comment #39)
> ok, this works. Upstream or downstream issue?

hard to say.