Bug 13853 - radeonhd: occasional blackouts
Summary: radeonhd: occasional blackouts
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/radeonhd (show other bugs)
Version: 7.2 (2007.02)
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Egbert Eich
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-29 06:08 UTC by Aljaž Prusnik
Modified: 2008-02-04 23:52 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg log (147.89 KB, text/x-log)
2007-12-29 06:10 UTC, Aljaž Prusnik
no flags Details
xorg configuration file (2.65 KB, application/octet-stream)
2007-12-29 06:11 UTC, Aljaž Prusnik
no flags Details
Fix. (625 bytes, patch)
2008-02-04 13:35 UTC, Egbert Eich
no flags Details | Splinter Review

Description Aljaž Prusnik 2007-12-29 06:08:57 UTC
Since the 1.1.x series (or maybe even with the 1.0.x), I have a strange
behaviour: the screen would occasionally go blank all of the sudden
without any trace in the available logs. What helps is a switch to
terminal (CTRL-ALT-F1 or any other) and then back (CTRL-ALT-F7). This
just returns the picture and I can continue where I left off (no restarting of Xorg there or logging out and in again).

It has no pattern so I cannot reproduce it at will. It just happens. :) 
If nothing will be found, I'll switch this bug to Solved/Invalid once it stops occuring.

Configuration:
OS: Debian testing, up-to-date
HW: Asus M2N-E, AMD X2 4200, Gigabyte X1300PRO (RV515)
Comment 1 Aljaž Prusnik 2007-12-29 06:10:21 UTC
Created attachment 13411 [details]
xorg log
Comment 2 Aljaž Prusnik 2007-12-29 06:11:04 UTC
Created attachment 13412 [details]
xorg configuration file
Comment 3 Egbert Eich 2007-12-29 07:11:31 UTC
Hardware involved: 0x7142:0x1458:0x2148 -  RV515 
Mode:
"1600x1200"x60.0  162.00  1600 1664 1856 2160  1200 1201 1204 1250 +hsync +vsync (75.0 kHz)  (??)
I'm seeing a similar problem on the RV515 I have here. However the dropouts are intermittent.
We assume there is a problem with the electrical values for TMDS. RV515 seems to be notorious in this respect as I'm also seeing problems in some BIOS adjusted VESA modes.
I will look into this.
Comment 4 Egbert Eich 2007-12-29 07:17:13 UTC
You said that the this occurred since 1.1.0 (possibly since 1.0.0). Was there a previous version where this didn't occur?  If so - could you identify which one this was?
Comment 5 Aljaž Prusnik 2007-12-29 07:42:55 UTC
Yes, there was but, I wouldn't know which one. I make a build every couple of days so to track this down would be a long-running task. But if it helps, I'll do it but it will take some time, since I don't know, when this blackouts will occur. 
Comment 6 Aljaž Prusnik 2007-12-29 08:44:41 UTC
I quick question - how do I return to a certain revision of the driver tree? Let's say 13 days ago or any other tree commit?
Comment 7 Egbert Eich 2007-12-29 09:45:24 UTC
You should be able to do something like git-checkout master@{"13 days ago"}.
This depends slightly on the version of your git. On older versions you needed to create a branch (git-checkout -b testbranch master@{"13 days ago").
Comment 8 Aljaž Prusnik 2007-12-31 05:36:35 UTC
The first commit after 1.1.0 bump causes problems (RV670 0x9501: TMDSA/B electrical values.). I fail to understand why, since  the changes kind of apply to the rv670.
I ran the 1.1.0  version for 2 days without a problem, then I moved on to one commit further and voila, blackouts.
Comment 9 Aljaž Prusnik 2007-12-31 10:37:40 UTC
I also refreshed the driver to the latest commit (3293c60... on 31.12., "
Fix header inclusion order.") but left out the problematic commit (10c551027... on 22.12., "
RV670 0x9501: TMDSA/B electrical values.") and again it works without blackouts.
Comment 10 Egbert Eich 2008-01-01 11:58:16 UTC
Could it be that you changed something else or that the ways you started the server with and without this said patch differently (ie. started the one without the patch after a fresh boot or was a suspend/resume cycle involved at some point)?
This patch is entirely unrelated to your hardware.
Comment 11 Aljaž Prusnik 2008-01-02 01:42:03 UTC
No, it's a desktop machine, so I don't use suspend/resume, and when I change driver I just restart gdm.

I have to say, however, that even without this patch there occurred a blackout but only 1 in 2 days and it happened while no user was active. I noticed it when the other user wanted to resume and was greeted with a black screen. Changing terminals would do.

With the patch, however, the blackouts were more frequent (at least one in an hour) and were occurring while user was active.
Comment 12 Aljaž Prusnik 2008-01-02 05:48:39 UTC
Apparently blackouts still happen. Not as often but they do. So now I plan to do the path from 1.0.0 to 1.1.0 and see when it begins to happen. A long path, but I really don't have a clue what set it off...
Comment 13 Egbert Eich 2008-01-02 06:15:27 UTC
git-bisect can help you here. But are you sure blackouts didn't happen with 1.0.0?
Maybe you should check this first, then I can point you to potentially troublesome commits.
Comment 14 Aljaž Prusnik 2008-01-02 06:20:45 UTC
I'm sure that blackouts are a recent thing, just don't know, how recent. But I think that more than a month back would be to far in the past, that's why I decided to go from 1.0.0. I compiled the 1.0.0. plus following two commits which I found harmless (hopefully). So I'm going to try this one for a couple of days to see if there are any blackouts. 
Comment 15 Egbert Eich 2008-01-02 07:28:01 UTC
The LVTMA handling has changed in ed9065a4288b92d4e3c286071b1a452bb1756a88.
Backing out this single commit won't be possible as other commits depend on it. However you can comment out the line for the electrical value for your chipset:
{ 0x7142, 0x00F2061C }, /* RV515 */
and see if this makes a difference. In this case the value in the register is unaltered.
Comment 16 Aljaž Prusnik 2008-01-07 10:24:27 UTC
Just an update: the 1.0.0. also resulted in blackouts so I reverted back to 0.0.4. From there on no blackouts happened and yesterday I've updated to commit 	122f8f8ec7019267dd79ab612c96b2c5a5bcac92. So far so good (i.e. no blackouts).
Today I've moved one more ahead to 78b763f97ab62cf8d504cc238bb4d59c5b9bf6fa.

Any hints towards the 1.0.0.? ;)
Comment 17 Aljaž Prusnik 2008-01-07 10:26:47 UTC
And an answer to Comment #15 - there were blackouts with that line commented as well in 1.1.0.
Comment 18 Aljaž Prusnik 2008-01-15 14:25:04 UTC
I think I may have found the problematic one. I'm currently using commit 	1c07fbe6284f90c9b8967d680a2d558e6785d428 and it went without blackouts now for more than two days.
The next commit (90247017...) already produced blackouts. I still do not know what triggers them so I'm not sure what to do to reproduce. But maybe if it helps, I can describe once again what happened and what I did when a blackout occured.
The screen goes black on a random occasion and in none of the occasions was I using one and the same application where it happened. After the blackout I tried switching the terminal to vt1 with CTRL-ALT-1 and then back to vt7 which resulted in keyboard producing all this weird characters so I could not issue any command and neither switch back to vt1. I could do connect remotely and restart gdm. 
Another time I decided to log off users remotely before restarting gdm (remotely). And when I pkilled a user, the vt7 was back in action (i presume it restarts during log-offs).
I could also do both things locally if I stayed on vt1 and not switch back right away to vt7.
Comment 19 Aljaž Prusnik 2008-01-15 14:29:01 UTC
Some more additional info:
I have not restarted the system during all these series of probing. The most of the commit searching was done during the weekend, when I was "lucky" enough to quickly get one after a driver change so I could dismiss commit after commit.
All I do is: reset the tree to the next (previous) commit and restart gdm and the see when and if it occurs.
Comment 20 Matthias Hopf 2008-01-16 03:19:18 UTC
Luc, this is due to the Blanking hooks.
Apparently, there are some cases where the blanking bits aren't restored.

I assume you are using RandR modesetting, so there's probably still a bug lurking somewhere.
Comment 21 Luc Verhaegen 2008-01-16 03:59:20 UTC
Hrm, this is rather awkward. Why would the blanking bits influence this behaviour? It is pretty much on/off, and shouldn't influence other functionality of the CRTCs, as all it does is disable reading from memory and sending out black instead.

Aljaz, are you absolutely 100% certain that it is this commit that is causing the problems, because you have been claiming certainty about other commits before. We do not want to go and bark up the wrong tree.
Comment 22 Aljaž Prusnik 2008-01-16 04:32:32 UTC
Yes, I'm aware of the level of certainty, that's why I used the wording (I think...) ;)

Anyway, I'm not 100% sure, but I only assumed this after watching the behaviour of the computer. During the weekend I was more time on it than usual, hence the blackouts occured more often and I used every such event to move to one commit back. During the last 2 days I tried doing everything I did during the weekends where I would get the black screen but did not succeed (yet).

Of course, to be certain, I would wait another week to confirm or overrule. 
Are there any more ideas what commit would potentially trigger this if it happens again (how far back should I return?).

Could this be also hardware related as no one is calling in for the same problem?


About the mode setting: I hope I understand what you ask - I don't have any custom mode setting anywhere. I use the same xorg.conf all the time and don't use xrandr commands at all.

Comment 23 TommyDrum 2008-01-16 06:13:13 UTC
I'd like to confirm I have the exact same problem with an RV670; the problem occurs randomly to a second user after logging in (on vt9 on kubuntu). The first user doesn't have any problem whatsoever.
I've got Kubuntu 7.10 with radeonhd git source fetched almost 4 days ago.
Comment 24 Aljaž Prusnik 2008-01-16 06:38:57 UTC
Well, in my case, I have two concurrent users and it happens to both of them.
Comment 25 Aljaž Prusnik 2008-01-16 06:40:24 UTC
Tommy, can you try going back to the commit I now have and try again (look at the comment #18)?
Comment 26 TommyDrum 2008-01-16 08:24:49 UTC
First, a little change in behaviour (always with the git source revision I've got, not the one mentioned by Aljaž): I noticed that this black out just occured to me once more with the first user, so it's not a matter of using the second vt; second, it seems that this occurs to me only when I've got the second user logged in (operative or not, since the other vt session is locked at this time).

Aljaž could you please tell me the exact command you used to get the revision you mentioned? (I'm not familiar with git).

Comment 27 Aljaž Prusnik 2008-01-16 09:10:51 UTC
Well, it's not really a command line, because I went for the GUI solution.
Get yourself "git-gui" and gitk packages (should be in repository), then do the following:
- log out the second user
- go to your local driver git folder
- say "git-gui" in the command-line
- In the menu, choose "Repository" > "Visualize All Branch History" (another windows pops out)
- in the left corner windows you will find the commits. Walk yourself to this one:    "Handle non-branch case of git workdir". This one preceeds the commit that the last one caused to me the blackouts "CRTC: Implement a Blank hook." You will find those at the date of 26.11.2007.
- right click on the commit and choose "Reset master branch to here"
- exit both windows and check if source file changed (I check the dates, sometime the code) but it should be working,
- do the complete compile/install procedure
- check the drivers are changed (debian has them in /usr/lib/xorg/modules/drivers/). List them with "ls -l |grep radeonhd" and check the dates and times to be sure you have the newer files.

Log out the first user, go to vt1 and do "/etc/init.d/gdm restart". Then try again and update this bug if you find anything new. If it happens, move one commit back until you find the one, that does not cause blackouts. This is what I do now...
Comment 28 TommyDrum 2008-01-16 10:16:59 UTC
Did all of it, but since I didn't have the option to "Reset master branch to here" I managed to use git-reset --hard 1c07fbe6284f90c9b8967d680a2d558e6785d428 to obtain the same -and then had to repatch the git source to add RV670 support (it wasn't present at the time)...

Logged in with both users, and now all I have to do is... wait, I guess?!
Comment 29 TommyDrum 2008-01-16 10:20:46 UTC
Removed my CC from the list, I'm already receiving comments through radeonhd mailing list.
Comment 30 Aljaž Prusnik 2008-01-16 11:50:34 UTC
Yes. :) The waiting game. I suggest you wait a couple of days or a couple of intervals that you are used to between blackouts. Then if nothing happens, move one ahead. Let's see if we end up by the same one. :)
Comment 31 Aljaž Prusnik 2008-01-20 16:15:42 UTC
well, it's been almost a week and still no blackouts. I have however during this trial (thursday) switched once again to the next commit (crtc blanking hooks) and within 20 minutes there was a black screen. This time I was not doing a thing. Just left the computer to glow and start the screensaver and somewhere there the big black appeared.
Returned again to this current commit (one that preceeds this one) and no blackouts appeared since.
Comment 32 Martin Seifert 2008-01-23 11:59:38 UTC
I'd just like to confirm I had said blackouts with my recently sold X1900XT and RadeonHD 1.0.0. After having a second user logged in (on vt9 in Ubuntu Gutsy via "User Swither") and switching back to the first user the problems occured randomly. Most times CTRL-ALT-F1 brought the display back, otherwise a hard reset was unavoidable.

Comment 33 Aljaž Prusnik 2008-01-28 05:51:15 UTC
Well, it's been another week without blackouts, so I can fairly surely say that the commit I currently have is non-problematic. I hope others will also test it this way and either confirm or deny my finding, since there has been no additional info from them. 

I'm using this opportunity to ask Tommy and Martin about their findings with my candidate for the still unproblematic commit.

So, what now. Should I try anything else?

Comment 34 TommyDrum 2008-01-31 09:03:39 UTC
Sorry for the long delay in responding, I had a cute little root partition corruption and had to reinstall everything; I conferm the findings from Aljaž about the older commit which stayed stable (no blackouts for about 4 days of continuous uptime), and latest git checkout always presents the problem with two or more users logged in (random blackouts). No problem is found when one user is logged in in my case.

I was running kubuntu 7.10, now on 8.04; I will start trying again with the latest commit and post accordingly, since 8.04 has xorg 7.3.
Comment 35 TommyDrum 2008-02-01 13:19:18 UTC
Tried again on kubuntu 8.04 with xorg 7.3 and recent git (30 mins ago) to no avail; screen turns off at random intervals, and I have to switch sessions in order to regain functionality (Ctrl+Alt+F8 then Ctrl+Alt+F7 in my case)

However there are four lines added to the xorg log when this occurs: 

(II) Open ACPI successful (/var/run/acpid.socket)
(WW) RADEONHD(0): RandR: While switching off TV_7PIN_DIN: output DAC B is also used by DVI-I_1/analog - ignoring
(WW) RADEONHD(0): RandR: While switching off TV_7PIN_DIN: output DAC B is also used by DVI-I_1/analog - ignoring
(II) Configured Mouse: ps2EnableDataReporting: succeeded

I was wondering: Does anyone know how to turn on a more verbose xorg logging in ubuntu using default kdm (not startx), in order to dig a little more in this?
Comment 36 Aljaž Prusnik 2008-02-01 15:25:26 UTC
Tommy, just to be certain here: if you go one commit forward (not the latest git but the one with the blanking hooks), do you as well get blackouts? I'm asking this, because I'm stuck on this one for now and because the next one gave me those blackouts. 
Comment 37 Aljaž Prusnik 2008-02-01 15:26:51 UTC
ah, the proper wording... I'm stuck on this one, means I'm stuck on the commit prior to the blanking hooks commit...
Comment 38 TommyDrum 2008-02-01 18:43:42 UTC
Let me resume everything that has been side on my side:

- Latest git (rev. caa10014d115a49a59b4a2aef6ce36a4e615556a) has been fetched and built from scratch and presents random blackouts.

- Revision 1c07fbe6284f90c9b8967d680a2d558e6785d428 (Handle non-branch case of git workdir) -the one working for Aljaž, I think- has been fetched after having removed the *entire* xf86-video-radeonhd directory and does *not* present blackouts after various days being used 24hours (so I confirm what was said by Aljaž).

- The culprit revision, as I found out about an hour ago (9024701762ea282e5f861e8399b194b224bf5d2b - CRTCs: Implement a Blank hook) has also been fetched after removing the entire xf86-video-radeonhd directory and presents the aforementioned blackout problem.

I think this also answers Pierre Pronchery on the mailing list who had pointed out that it could have been a dependency issue and not a bug.

BTW, the warnings spat out in xorg log (the ones on my last post) seem to be only change-vt related, but anyway, this is what revision 9024701762ea282e5f861e8399b194b224bf5d2b spits when I have to switch vt's in order to get the screen back:

(II) Open ACPI successful (/var/run/acpid.socket)
(WW) RADEONHD(0): RandR: While switching off DVI-I_1/DAC_B: output DAC B is also used by DVI-I_1/DAC_B - ignoring
(WW) RADEONHD(0): RandR: While switching off TV_7PIN_DIN/DAC_B: output DAC B is also used by DVI-I_1/DAC_B - ignoring
(WW) RADEONHD(0): RandR: While switching off DVI-I_2/TMDS_B: output TMDS B is also used by DVI-I_2/TMDS_B - ignoring
(EE) RADEONHD(0): TMDSBVoltageControl: unhandled chipset: 0x9505.
(II) RADEONHD(0): LVTMA_MACRO_CONTROL: 0x00330414
(II) RADEONHD(0): LVTMA_TRANSMITTER_ADJUST: 0x00000000
(II) RADEONHD(0): LVTMA_PREEMPHASIS_CONTROL: 0x00000000
(WW) RADEONHD(0): RandR: While switching off TV_7PIN_DIN/DAC_B: output DAC B is also used by DVI-I_1/DAC_B - ignoring
(II) Configured Mouse: ps2EnableDataReporting: succeeded

Comment 39 Egbert Eich 2008-02-03 11:52:29 UTC
If this commit is causing the problem you should see a message "D1Blank"
(or "D2Blank") in the log file whenever the screen blanks when starting X with -verbose 7. Of course only with this commit applied. 
Could someone monitor and try to verify this please (best from a second computer)? 
If this is the case the most likey culprit is rhdSaveScreen() which gets called by the X screen saver, as this function has been implemented with this patch.

Please try if 'xset s off' helps to make the problem go away.
Comment 40 TommyDrum 2008-02-03 12:15:57 UTC
Egbert (or someone else), how could one change xorg verbosity on debian based distros (ubuntu in this case), in order to continue using regular desktop (KDE), since the bug appears randomly and I cannot refrain from working on this machine?
Comment 41 Linus Walleij 2008-02-03 14:40:22 UTC
I've been having this problem since a while now,  you're not alone.
However I think I'm on to it now, because now I get blackouts all the
time and I know what I changed :-) I am getting blackouts every
minute when writing this, so compare your setup to mine:

1 I only have this problem when two users run sessions on the machine.
  If I run only one user, no problem at all. Each user runs his own
  X server, this is under Fedora 8 (x86_64).

2 It seems only the second user gets the blackouts after logging in and
  starting his/her secondary X session.

3 To replicate something really nasty here, switch to the first logged-in 
  users session, go into your screen save settings and set it to consider
  computer inactive after 1 minute. Then go into gnome power manager (or 
  whatever you're using for this, perhaps only g-p-m has this problem) 
  and set down time before the screen sleeps to lowest possible value,
  2 minutes it should be now. Switch back to secondary user. If everything
  is as on my machines, you will start getting massive blackouts after
  2 minutes.

So it is gnome-power-manager running as the first user logged in that is
causing this. It uses DPMS to shut down the screen through the X server.

Now, I don't know what tool is making the mistake here. I believe that
even if g-p-m tells X to blank the screen, only the X instance that is 
currently holding the display/monitor should actually be allowed to turn
off the screen with DPMS.

Can the other reporters confirm this root cause?
Comment 42 Egbert Eich 2008-02-04 00:14:39 UTC
It's possible that some client doing DMPS is causing this. In the latter case a client connecting to the wrong X server.
However then the problem should appear independently of commit (9024701762ea282e5f861e8399b194b224bf5d2b - CRTCs: Implement a Blank hook) as DPMS is used also otherwise.
It would be easy to distinguish: does the screen only go blank or does it also go into power savings?
About the verbosity level: 
this needs to be done on the command line, so if you are using x/k/gdm you will have to edit /etc/X11/xdm/Xservers (here on my non-debian system). 
This file contains the command line to run to start the Xserver.
add '-logverbose 7'. (Make sure to remove it after this test).
You can also try to disable dpms completely here:
add '-dpms' to the Xserver command line.
Comment 43 Aljaž Prusnik 2008-02-04 12:38:12 UTC
Yup Linus, great find! I can confirm this behaviour (tried twice with the same result). 

I have tried since to find out a way to debug xorg in debian but didn't yet. But since this is now reproducible, the developers (Egbert?) can reproduce under the verbose mode.
Comment 44 Egbert Eich 2008-02-04 13:35:35 UTC
Created attachment 14141 [details] [review]
Fix.

RHDSaveScreen() is a DIX function. It gets called right from the DPMS extension. Non of the DIX function know anything about screen switching. Thus the protection of the hardware against accidental switches needs to come from the DDX (which includes the driver - if it provies functions which may be called from DIX directly).
Adding a check for vtSema seems to fix the problem.
Comment 45 Egbert Eich 2008-02-04 13:41:14 UTC
Pushed: cbbd54a2fa934ecbe8c6b93d5b063407c06d955f.
Please give it a try.
Comment 46 Egbert Eich 2008-02-04 13:50:24 UTC
Looks like i've seen confirmation on IRC that this did indeed fix the problem.
Guys, thanks for helping me find this. The hints in #18 and #41 pointed me into the right direction.
Developers can never be better than their testers :)
Comment 47 Aljaž Prusnik 2008-02-04 14:07:15 UTC
Yeeey, thank you! Well, it's been a struggle but I am very happy we nailed it. Now I can again catch up with the latest git.
Comment 48 Linus Walleij 2008-02-04 15:31:04 UTC
As a side note, the insight of this root cause came to me
in a dream, fallen asleep myself while putting my 4yr old son
to sleep. It is really true... :-)
Comment 49 TommyDrum 2008-02-04 23:52:58 UTC
Gosh, someone stops watching his email for 2 days and the bug gets solved
Tsktsktsk...

:)

Great work everyone!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.