Summary: | Using MergedFB freezes system reproducable | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Michael <auslands-kv> | ||||||
Component: | Driver/Radeon | Assignee: | Xorg Project Team <xorg-team> | ||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||
Severity: | normal | ||||||||
Priority: | high | CC: | alexdeucher, benh, bo2hansen, jonas, marius | ||||||
Version: | 7.0.0 | Keywords: | regression | ||||||
Hardware: | x86 (IA32) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
Michael
2006-06-08 07:35:23 UTC
Does removing the dynamicclocks option help (make sure you power off the computer after removing the option to make sure the clocks come up as the bios programs them)? Hi Alex, I have tried it without the dynamicclocks option and also without the dri modules, but I'm not sure that I restarted the computer. Are you certain, that a restart is needed? I did a test a few days ago. I started an X server with DynamicClocks enabled and measured battery consumption. Then I exited the server and restarted it with DynamicClocks disabled (The Xorg.0.log file stated, "DynamicClocks disabled".). I again measured battery consumption and it seemed to be higher again, as should be when dynamicclocks is off... But I will try tomorrow with the posted xorg.conf file. Cheers, Michael Confirming. X.org 7.0.0 often freezes my laptop (T42, Radeon Mobility 7500) if I use MergedFB and switch from dual-head mode to clone mode. I use xrandr for the switch rather than Ctrl+Alt+[+/-]. The freeze is solid -- even Alt+SysRq doesn't work. When the machine is frozen, I see a strange picture on the external monitor: http://mg.pov.lt/P6010010.JPG. Not every xrandr call causes a lockup, but three or four tries are enough to reproduce the problem. Sometimes xrandr succeeds, but the laptop lock up suddenly after 10-15 minutes when I'm not doing anything in particular. My xorg.conf is already linked from this bug report ;) I've used it in the past with X.org 6.9 for quite a long time with no problems (well, with only minor problems, no lockups). I've also filed this bug in Ubuntu as https://launchpad.net/bugs/47775. I'll try to disable DynamicClocks and see if it helps. (In reply to comment #2) > Are you certain, that a restart is needed? > > A full shutdown and boot up would be preferred. Enabling and disabling dynamicclocks just enables and disables the bits in the approriate registers. It doesn't save and restore those regs to how the bios set them up. if it is causing problems you'll want to start up with a freshly initialized (by the bios) chip. I disabled DynamicClocks, turned the laptop off and back on, and started clicking on my panel launchers that run xrandr. The system froze after a few clicks. (In reply to comment #5) > I disabled DynamicClocks, turned the laptop off and back on, and started > clicking on my panel launchers that run xrandr. The system froze after a few > clicks. when you say disable, did you set the option to false or did you just remove the line? if you set the option to false, try again, but just remove the dynamicclocks line altogether. O.K. so I have disabled DynamicClocks (by commenting out the line) and I have experienced four freezes since then. I had the feeling that it took a bit more switching between the modes until a freeze occured (appr. 5 to 10 switches while with DynamicClocks it often happened on the second or third switch), but this is probably not significant. One time I saw a very very nice color pattern on the external monitor such as mentioned in comment 3. The other times the external monitor just showed that the video frequency has changed and is "out of range" now. So no picture. I used "xrandr -s X" to switch between the modes as well as CTRL-ALT-Keypad+/- . Both methods lead to freezes. Hope this helps. Anything else I can test? Michael ---------------------------------------- Two (more or less) relevant additions: 1.) I simplified the MetaModes to Option "MetaModes" "1024x768 1280x1024 1024x768+1280x1024" as these are the options that I really need: 1024x768 clone mode, when working on the laptop and maybe a beamer attached. 1280x1024 clone mode or 1024x768-1280x1024 when working on the docking station. Strange I found, that a.) X always started in 1280x1024 clone mode, however only showing 1024x768 on both monitors (with a virtual screen of 1280x1024) b.) xrand only gave me the possibility to switch between 1024x768 and 1280x1024 clone mode (which at least gave me the expected results), not the xinerama like setup. All Windows were repositioned correctly. c.) switching with CTRL-ALT-Keypad+/- switched through all three modes but in the clone modes there was always a 1280x1024 virtual screen area. No windows were repositioned (and the the KDE kicker did not adapt either) 2.) What I always hate with such freezes is that data is lost on the harddisk. I'm using ext3 and in the boot process you first see the journal recovering with lots of inodes deleted, then the fsck yielding lots of errors, trying to correct them and then finally a reboot. Awful :( I already tried remounting the disk read-only via ALT-SYSRQ-u before switching resolutions (and freezing the system), but I got the same disk errors on the next boot. I guess I should find a ram disk based test environment. If anybody has an idea? One more test: I disabled the dri module (as this leads to some system freezes in other cases), but no change. (In reply to comment #6) > when you say disable, did you set the option to false or did you just remove > the line? I set the option to "off" > if you set the option to false, try again, but just remove the > dynamicclocks line altogether. Michael tried that already (comment #7). I can also try if you wish. (In reply to comment #7) > > > ---------------------------------------- > Two (more or less) relevant additions: > > 1.) I simplified the MetaModes to > Option "MetaModes" "1024x768 1280x1024 1024x768+1280x1024" > as these are the options that I really need: 1024x768 clone mode, when working > on the laptop and maybe a beamer attached. 1280x1024 clone mode or > 1024x768-1280x1024 when working on the docking station. > > Strange I found, that > a.) X always started in 1280x1024 clone mode, however only showing 1024x768 on > both monitors (with a virtual screen of 1280x1024) This is because your first metamode is 1024x768. and the largest metamode is 1280x1024. > b.) xrand only gave me the possibility to switch between 1024x768 and 1280x1024 > clone mode (which at least gave me the expected results), not the xinerama like > setup. All Windows were repositioned correctly. You didn't specify a dualhead metamode. 1024x768+1280x1024 is a clone mode. 1024x768-1280x1024 is a dualhead mode. > c.) switching with CTRL-ALT-Keypad+/- switched through all three modes but in > the clone modes there was always a 1280x1024 virtual screen area. No windows > were repositioned (and the the KDE kicker did not adapt either) only xrandr resizes the desktop. CTRL-ALT-Keypad+/- just changes the mode; the desktop remains the same size. > > 2.) What I always hate with such freezes is that data is lost on the harddisk. > I'm using ext3 and in the boot process you first see the journal recovering with > lots of inodes deleted, then the fsck yielding lots of errors, trying to correct > them and then finally a reboot. Awful :( save your data and type 'sync' to flush to the HD before you try test something likely to crash. What version of the radeon driver are you using? (In reply to comment #10) > > Strange I found, that > > a.) X always started in 1280x1024 clone mode, however only showing 1024x768 on > > both monitors (with a virtual screen of 1280x1024) > > This is because your first metamode is 1024x768. and the largest metamode is > 1280x1024. > Ähh, o.k. How can I start X then in 1024x768 clone mode without any virtual screen. This would be my default setup. Only when I connect to the docking station with the external monitor, I would like to switch to 1280x1024 clone mode (with the virtual screen on the laptop LCD) or the dual-head-setup. > > b.) xrand only gave me the possibility to switch between 1024x768 and 1280x1024 > > clone mode (which at least gave me the expected results), not the xinerama like > > setup. All Windows were repositioned correctly. > > You didn't specify a dualhead metamode. 1024x768+1280x1024 is a clone mode. > 1024x768-1280x1024 is a dualhead mode. > Oh, sh... Typo. I will correct that. > > c.) switching with CTRL-ALT-Keypad+/- switched through all three modes but in > > the clone modes there was always a 1280x1024 virtual screen area. No windows > > were repositioned (and the the KDE kicker did not adapt either) > > only xrandr resizes the desktop. CTRL-ALT-Keypad+/- just changes the mode; the > desktop remains the same size. > O.K. so one needs a combination of xrandr and the CTRL-ALT-Keypad in order to switch between these three modes? > > > > 2.) What I always hate with such freezes is that data is lost on the harddisk. > > I'm using ext3 and in the boot process you first see the journal recovering with > > lots of inodes deleted, then the fsck yielding lots of errors, trying to correct > > them and then finally a reboot. Awful :( > > save your data and type 'sync' to flush to the HD before you try test something likely to crash. > I thought ALT-SYSRQ-S + ALT-SYSRQ-U should do (sync and remount ro). But that did not work. Next time I try to issue "sync" and see what happens... > What version of the radeon driver are you using? > 6.5.8.0-1 from Debian unstable. Xorg.0.log says: (II) LoadModule: "radeon" (II) Loading /usr/lib/xorg/modules/drivers/radeon_drv.so (II) Module radeon: vendor="X.Org Foundation" compiled for 7.0.0, module version = 4.0.3 Module class: X.Org Video Driver ABI class: X.Org Video Driver, version 0.8 (II) LoadModule: "ati" (II) Loading /usr/lib/xorg/modules/drivers/ati_drv.so (II) Module ati: vendor="X.Org Foundation" compiled for 7.0.0, module version = 6.5.8 Module class: X.Org Video Driver ABI class: X.Org Video Driver, version 0.8 (II) ATI: ATI driver (version 6.5.8) for chipsets: ati, ativga Does that help or do you need other info? Michael (In reply to comment #11) > (In reply to comment #10) > > > Strange I found, that > > > a.) X always started in 1280x1024 clone mode, however only showing 1024x768 on > > > both monitors (with a virtual screen of 1280x1024) > > > > This is because your first metamode is 1024x768. and the largest metamode is > > 1280x1024. > > > Ähh, o.k. How can I start X then in 1024x768 clone mode without any virtual > screen. This would be my default setup. Only when I connect to the docking > station with the external monitor, I would like to switch to 1280x1024 clone > mode (with the virtual screen on the laptop LCD) or the dual-head-setup. Unforunately that is a limitation of mergedfb, which is a big hack to begin with. you'd either need to hack the driver to pre-reserve the additional desktop space at the beginning or and xrandr -s 0 to your .xinitrc to resize the desktop when you login. I think keithp is working on an improved version of mergedfb that may address this. > > > > c.) switching with CTRL-ALT-Keypad+/- switched through all three modes but in > > > the clone modes there was always a 1280x1024 virtual screen area. No windows > > > were repositioned (and the the KDE kicker did not adapt either) > > > > only xrandr resizes the desktop. CTRL-ALT-Keypad+/- just changes the mode; the > > desktop remains the same size. > > > > O.K. so one needs a combination of xrandr and the CTRL-ALT-Keypad in order to > switch between these three modes? > depending on what you want to do xrandr should be fine. CTRL-ALT-Keypad uses the vidmode extension to change the mode. It came along before xrandr could resize the desktop to match the mode. > > > > > > 2.) What I always hate with such freezes is that data is lost on the harddisk. > > > I'm using ext3 and in the boot process you first see the journal recovering with > > > lots of inodes deleted, then the fsck yielding lots of errors, trying to correct > > > them and then finally a reboot. Awful :( > > > > save your data and type 'sync' to flush to the HD before you try test > something likely to crash. > > > > I thought ALT-SYSRQ-S + ALT-SYSRQ-U should do (sync and remount ro). But that > did not work. Next time I try to issue "sync" and see what happens... that won't work if your system is already locked solid ;) sync is a preemptive method. > > > What version of the radeon driver are you using? > > > > 6.5.8.0-1 from Debian unstable. > > Xorg.0.log says: > > (II) LoadModule: "radeon" > (II) Loading /usr/lib/xorg/modules/drivers/radeon_drv.so > (II) Module radeon: vendor="X.Org Foundation" > compiled for 7.0.0, module version = 4.0.3 > Module class: X.Org Video Driver > ABI class: X.Org Video Driver, version 0.8 > (II) LoadModule: "ati" > (II) Loading /usr/lib/xorg/modules/drivers/ati_drv.so > (II) Module ati: vendor="X.Org Foundation" > compiled for 7.0.0, module version = 6.5.8 > Module class: X.Org Video Driver > ABI class: X.Org Video Driver, version 0.8 > > (II) ATI: ATI driver (version 6.5.8) for chipsets: ati, ativga > That should be the most recent stable 7.0 release, IIRC. (In reply to comment #12) > > Unforunately that is a limitation of mergedfb, which is a big hack to begin > with. you'd either need to hack the driver to pre-reserve the additional > desktop space at the beginning or and xrandr -s 0 to your .xinitrc to resize the > desktop when you login. I think keithp is working on an improved version of > mergedfb that may address this. > Well, o.k., this will certainly do. I'd be happy to use MergedFb at all. :) > depending on what you want to do xrandr should be fine. CTRL-ALT-Keypad uses > the vidmode extension to change the mode. It came along before xrandr could > resize the desktop to match the mode. I'll try again with my typo corrected. That is if we can get rid of the freezing bug... > > That should be the most recent stable 7.0 release, IIRC. > O.K. What can I do to find the bug? Cheers, Michael (In reply to comment #13) > > O.K. What can I do to find the bug? > If possible, using ErrorF() or GDB, try and find out what function in the radeon driver the crash is happening in. O.K. I will do this. Will, however, take some time, as I have no idea how to do this. :-/ Is there somewhere a tutorial about using ErrorF() or GDB which could function as a start for me? Thanks, Michael (In reply to comment #15) > O.K. I will do this. Will, however, take some time, as I have no idea how to do > this. :-/ > ErrorF() is easy. just add them to the source code when you want to print something. so you might do something like: in RADEONRestoreCrtc2() ErrorF("RestoreCrtc2 called"); ... ErrorF("about to write crtc2 regs"); ... ErrorF("RestoreCrtc2 finished"); etc. the messages will show up in your log. then you can look at your log and get an idea of where the problem is. > Is there somewhere a tutorial about using ErrorF() or GDB which could function > as a start for me? Using GDB with the X server: http://wiki.x.org/wiki/DebuggingTheXserver O.K. Thanks for the info. I had a short look over it. I will try whatever I can, but I guess it will take some time... I'm not absolutely sure that I can come up with some results. Both methods seem to have some hurdles that I might not overcome: 1.) errorF(): Necessary to understand the source code. I guess that's over my capabilities. :( But I will have a look at it (not even sure that I can sucessfully compile a complete xserver. Might work with apt-src install easier though...). I guess I need to find the code that gets executed when the modes are switched. uh, uh... 2.) GDB: A typical debugger it seems. However, if I am not mistaken, this approach only works with crashed xservers, but not with a completely frozen system. The relevant data is taken after the server gets a signal e.g. SIGSEGV. But when the whole system is frozen, even a remote gdb session via ssh won't work, will it? So, I had a longer look at GDB. To my mind, this won't work, if I'm not overlooking something. What really might be a good idea, is to insert the errorF() calls in some prominent funtions, just to identify, in what function the freeze happens. With a reconfigured syslogd, one could broadcast the messages and log them on another computer on the net, so that all messages are logged for sure and won't get lost if there is no time left for a sync. What would be necessary, would be some educated guess from the developers what functions are good candidates, in which such errorF() calls should be inserted. I have no chance to understand the code and to identify places where such calls would make sense for this debug approach. Would anyone who understands the code be interested in helping in this approach? Cheers, Michael (In reply to comment #18) > > With a reconfigured syslogd, one could broadcast the messages and log them on > another computer on the net, so that all messages are logged for sure and won't > get lost if there is no time left for a sync. As Roland pointed out in another bug, re-mounting the filesystem containing the log file with -o sync should be easier and do the trick. > What would be necessary, would be some educated guess from the developers what > functions are good candidates, in which such errorF() calls should be inserted. I'd start in RADEONSwitchMode(). (In reply to comment #19) > > I'd start in RADEONSwitchMode(). O.K. I'll see if I can find this, insert some meaningful errorF() messages and get hopefully some sensible information back :-) (However, not this week, I'm on business travel. Next week... ) Cheers, Michael Created attachment 5994 [details]
Xorg.log file of freeze
So, here are the first results. (I'm abroad at the moment, so I have limited
possibilities for testing here.)
Setup: The Radeon driver was modified with ErrorF() function calls as shown
here:
_X_EXPORT Bool RADEONSwitchMode(int scrnIndex, DisplayModePtr mode, int flags)
{
ScrnInfoPtr pScrn = xf86Screens[scrnIndex];
RADEONInfoPtr info = RADEONPTR(pScrn);
Bool tilingOld = info->tilingEnabled;
Bool ret;
#ifdef XF86DRI
Bool CPStarted = info->CPStarted;
if (CPStarted) {
DRILock(pScrn->pScreen, 0);
RADEONCP_STOP(pScrn, info);
}
#endif
RADEONTRACE(("RADEONSwitchMode() !n"));
ErrorF ("RADEONSwitchMode entered\n");
if (info->allowColorTiling) {
if (info->MergedFB) {
if ((((RADEONMergedDisplayModePtr)mode->Private)->CRT1->Flags &
(V_DBLSCAN | V_INTERLACE)) ||
(((RADEONMergedDisplayModePtr)mode->Private)->CRT2->Flags &
(V_DBLSCAN | V_INTERLACE)))
info->tilingEnabled = FALSE;
else info->tilingEnabled = TRUE;
}
else {
info->tilingEnabled = (mode->Flags & (V_DBLSCAN | V_INTERLACE)) ?
FALSE : TRUE;
}
#ifdef XF86DRI
if (info->directRenderingEnabled && (info->tilingEnabled != tilingOld))
{
RADEONSAREAPrivPtr pSAREAPriv;
drmRadeonSetParam radeonsetparam;
memset(&radeonsetparam, 0, sizeof(drmRadeonSetParam));
radeonsetparam.param = RADEON_SETPARAM_SWITCH_TILING;
radeonsetparam.value = info->tilingEnabled ? 1 : 0;
if (drmCommandWrite(info->drmFD, DRM_RADEON_SETPARAM,
&radeonsetparam, sizeof(drmRadeonSetParam)) < 0)
xf86DrvMsg(pScrn->scrnIndex, X_ERROR,
"[drm] failed changing tiling status\n");
pSAREAPriv = DRIGetSAREAPrivate(pScrn->pScreen);
info->tilingEnabled = pSAREAPriv->tiling_enabled ? TRUE : FALSE;
}
#endif
}
ErrorF ("RADEON tiling finished\n");
if (info->accelOn)
RADEON_SYNC(info, pScrn);
ErrorF ("RADEON_SYNC finished\n");
if (info->FBDev) {
RADEONSaveFBDevRegisters(pScrn, &info->ModeReg);
ret = fbdevHWSwitchMode(scrnIndex, mode, flags);
RADEONRestoreFBDevRegisters(pScrn, &info->ModeReg);
} else {
info->IsSwitching = TRUE;
ret = RADEONModeInit(xf86Screens[scrnIndex], mode);
info->IsSwitching = FALSE;
}
ErrorF ("RADEON info->FBdev finished\n");
if (info->tilingEnabled != tilingOld) {
/* need to redraw front buffer, I guess this can be considered a hack ?
*/
xf86EnableDisableFBAccess(scrnIndex, FALSE);
RADEONChangeSurfaces(pScrn);
xf86EnableDisableFBAccess(scrnIndex, TRUE);
/* xf86SetRootClip would do, but can't access that here */
}
ErrorF ("RADEON ENABLEDISABLE FB Access finished\n");
if (info->accelOn) {
RADEON_SYNC(info, pScrn);
RADEONEngineRestore(pScrn);
}
ErrorF ("RADEON Engine Restore finished\n");
#ifdef XF86DRI
if (CPStarted) {
RADEONCP_START(pScrn, info);
DRIUnlock(pScrn->pScreen);
}
ErrorF ("RADEON DRIUnlock finished\n");
#endif
/* Since RandR (indirectly) uses SwitchMode(), we need to
* update our Xinerama info here, too, in case of resizing
*/
if(info->MergedFB) {
RADEONUpdateXineramaScreenInfo(pScrn);
}
ErrorF ("RADEON UpdateXinerama finished\n");
return ret;
}
The disk was remounted with the sync option (and verfied via /proc/mounts).
Screen Depth was changed to 24 bits and X restarted.
Lbreakout2 was started and - using the "f" key - resolution was switched appr.
5-6 times between fullscreen (640x480) and window mode (1024x768). Switching
was stopped when the system froze. The Xorg.log file is attached. (Sorry,
output is not nicely ordered, as I forgot the carriage returns first...)
Interesting is: The xorg.log file does not show any indications that the freeze
happens between any function calls in RADEONSwitchMode (). To my mind, there
are three possible explanations:
1.) Even with the "sync" option, ErrorF messages are written to disk with a
short delay, resulting in lost messages when the freeze occurs.
or
2.) The freeze "takes some time", e.g. one millisecond, so that, although
initiated in RADEONSwitchMode, the function may finish before the machine
actually locks-up.
or
3.) RADEONSwitchMode does something wrong, so that registers or some settings
are screwed up, but the freeze only happens when one other function somewhere
else tries to use these screwed values.
or
3.) The freeze has nothing to do with RADEONSwitchMode ()
Any ideas? How should I continue best?
I have done the lbreakout test as Michel suggested that it's probably due to
the same bug and it's easier to test. If it makes a big difference I can repeat
the test but use the MergedFB setup and try switching resolutions with xrandr.
Cheers,
Michael
I suspect the log writes may get buffered somewhere, so we're missing the last ones. As for where to go from here, the best suggestion I can make is for someone to do a git bisect, as someone on https://launchpad.net/distros/ubuntu/+source/xserver-xorg-driver-ati/+bug/47775 claims this only started happening recently. Michael, you could try an older version of xserver-xorg-video-ati first, e.g. from http://snapshot.debian.net/archive/2006/05/03/debian/pool/main/x/xserver-xorg-video-ati/ PS: Please try not to clutter up your comments. Attaching a diff would be more useful than pasting code, e.g. Also, it's not clear for someone who just reads this bug how the 'lbreakout test' relates to this bug, and until we've confirmed that other bug and this one are indeed one and the same, it's better not to mix up comments between them. Well, having read Marius' bug entry I actually thought that he meant with xorg 6.8.2 there were no problems, but now, after I have tried 6.5.7.3-3, I am happy to say: Yes, it is a regression! Works flawlessly with MergedFB -> In an extensive test last night I did not manage to freeze the system at all. And yes, bug #7251 seems to have the same cause. *** Bug 7251 has been marked as a duplicate of this bug. *** Would be really nice if someone could do a git bisect. It's most likely related to Ben's memmap changes though - Ben, any other ideas for tracking this down? Same here with Thinkpad X31 (Radeon M6) and MergedFB (clone mode), 1280x1024 virtual desktop size, 1280x1024 resolution on CRT and 1024x768 res on LCD. With xserver-xorg-video-ati 6.5.8.0-1 as of Debian xorg 7.0.22 I consistently get random system freezes (even no SysRq anymore) and I even don't have to switch modes to trigger it. Just use the system and after random time it freezes. I already disabled DRI to sort that out. With xserver-xorg-video-ati 6.5.7.3-3 as suggested above, it runs stable again (thanks for that link!). (In reply to comment #26) > With xserver-xorg-video-ati 6.5.8.0-1 as of Debian xorg 7.0.22 I consistently > get random system freezes (even no SysRq anymore) and I even don't have to > switch modes to trigger it. Just use the system and after random time it freezes. Please try current xf86-video-ati git. Both the master and ati-1-0-branch branches have a fix that might help with this. Other than that, a git bisect might still be useful. (In reply to comment #27) > > Please try current xf86-video-ati git. Both the master and ati-1-0-branch > branches have a fix that might help with this. > Hmm, how to use git? What address to use? I guess, it is something like git clone XXX ?? I like to try it. I'm using xorg 7.0, though, as this is the latest in Debian. > Other than that, a git bisect might still be useful. How can that be done? Thanks, Michael (In reply to comment #28) > Hmm, how to use git? What address to use? I guess, it is something like > git clone XXX ?? Yes, git clone git://git.freedesktop.org/git/xorg/driver/xf86-video-ati > I like to try it. I'm using xorg 7.0, though, as this is the latest in Debian. Then you may have to do git checkout ati-1-0-branch in the xf86-video-ati directory before building. > > Other than that, a git bisect might still be useful. > > How can that be done? man git-bisect, or google for a HOWTO. O.K. done that. The git drivers, however, still freeze the system when switching resolution via xrandr. Now I am going to have a look at git-bisect and see if I understand enough to make any use of it. Cheers, Michael O.K. did the git-bisect, although I now ask myself if this was really necessary at all, as 1.) the culprit is really quite obvious, if one knows the code and the changes that have happened between 6.5.7.3 and 6.5.8.0 and 2.) it is such a big patch that we're more or less at the beginning again. The patch is: http://gitweb.freedesktop.org/?p=xorg-driver-xf86-video-ati;a=commit;h=5c141bb15d1163e04c012a0cdf0699d534f0be37 So, how do we continue? Ben, any ideas why the memmap changes could cause freezes on setting a mode? AFAICT the only major difference is that RADEONRestoreMemMapRegisters() gets called in the process, maybe the unconditional writes to DISPLAY_BASE_ADDR are problematic? I meant to say 'DISPLAY_BASE_ADDR and friends', of course. (In reply to comment #32) > Ben, any ideas why the memmap changes could cause freezes on setting a mode? > AFAICT the only major difference is that RADEONRestoreMemMapRegisters() gets > called in the process, maybe the unconditional writes to DISPLAY_BASE_ADDR are > problematic? The patch also explicitly disables crtc memory access when updating those registers which one would assume would be the right thing to do. However, IIRC, benh has some hangs with regard to this which was worked around at the time via some usleeps (which we later worked around some other way). Perhaps it needs to be revisited. (In reply to comment #34) > The patch also explicitly disables crtc memory access when updating those > registers [...] Only when updating MC_FB_LOCATION and friends, which should only happen on server startup, not when switching modes during a generation. So, is there nothing that can be done? Is there a chance that this megapatch can be devided into two halfs that can be both checked to identify the code responsible for the freeze? Or anything else? (Seems not to many people use mergedfb on an ATI setup anyway considering that not really a lot of people are complaining about these crashes ... ) Well, it looks like a lot of people using Ubuntu Dapper are having this problem (Seems to be especially bad using IBM Thinkpads with ATI cards). https://launchpad.net/distros/ubuntu/+source/xserver-xorg-driver-ati/+bug/47775 Being one of the sufferers, I find the bug very critical as it causes total freeze and dataloss. Please don't give up guys! If there is something I can do to help, please let me know We shouldn't hit any of that mem map change code in that case since FB_LOCATION and AGP_LOCATION aren't changed... Not sure what's being hit. I suspect it may be some "safety" bits I added that trigger one of the 18902734897324 bugs in the radeon chips .... Try commenting out that bit in RestoreCrtc2Registers and let me know: /* We prevent the CRTC from hitting the memory controller until * fully programmed */ OUTREG(RADEON_CRTC2_GEN_CNTL, crtc2_gen_cntl | RADEON_CRTC2_DISP_REQ_EN_B); No, pity, that did not help. Still freezes. Any other idea? What's for sure is that the error was newly introduced in this one megapatch. So can we devide that somehow into two or more parts for testing? Cheers, Michael Btw. I've upgraded to xserver-xorg 1.1.0 (as it is now in Debian experimental). (In reply to comment #38) > We shouldn't hit any of that mem map change code in that case since FB_LOCATION > and AGP_LOCATION aren't changed... As I pointed out in comment #34, it always writes to *some* registers. I wouldn't expect those to cause problems, but... > I suspect it may be some "safety" bits I added Were those already in your very first memmap commit? No. THe whole thing to properly stop CRTCs (and wait for them to be stopped) etc... is more recent and did actually fix lockups on some machines here. I'm not sure what's going at this point. There is something specific to 7000's it seems (and possibly M6 and M7). I'm getting loads of reports from ubuntu that they lockup randomly and I have a co-worker here where it will lockup right away at startup if trying to enable mergedfb (before anything gets displayed, the whole machine is down, hard locked). (In reply to comment #41) So Ben, any idea what in your very first memmap commit could cause this, or how to track it down? Well, there were issues with the first mmap commit, typically related to the chip stil fetching from the a mixed bag of the old and new locations, thus causing crazy PCI accesses etc... that's why my subsequent commits have been attempting to fix by shutting down as much as possible of things that hit the MC (like CRTCs etc...). At this point, the best would be to take a snapshot of all the relevant values (MC*, *OFFSET, *BASE_ADDR...) around the mode settings and that might light a bulb... though the problem seems to be quite specific to M7s and earlier, so I wonder if we might be hitting some other (but related) issue (like some bit of the chip caching the old address or whatever) At this point, I'm desperate for some help from ATI and/or somebody who has a reproduceable lockup and a PCI analyzer between the card and the host. (In reply to comment #43) I suspect you're still looking at this too broadly, and not taking into account all the information we have. E.g., we're talking about runtime mode changes, so the values the memmap changes were about shouldn't change. So, keeping this in mind and looking only at the very first memmap commit, which part(s) do you think could lead to significantly different behaviour (different order of register access, ...) during a runtime modeswitch? Created attachment 6862 [details] [review] Restore some checks before calling RADEONRestoreCommonRegisters() Michael, can you try this patch? It looks like we lost some checks before calling RADEONRestoreCommonRegisters(). Marking as valid bug, per http://wiki.x.org/wiki/XorgTriage . (In reply to comment #45) > Created an attachment (id=6862) [edit] > Restore some checks before calling RADEONRestoreCommonRegisters() > > Michael, can you try this patch? It looks like we lost some checks before > calling RADEONRestoreCommonRegisters(). Sorry, took some time. I first tried with git, but somehow the driver did not work. I then used the 6.6.2 driver code from debian experimental. However, I'm really sorry. System still freezes when changing resolution in MergedFB mode :( :( can you try xf86-video-ati git head for this? Dave, this looks VERY promising! I don't know what you have done (there are quite a lot of changes in git after 6.6.2), but whatever it was, it seems to be exactly what was needed!! So far no freezes using MergedFB on a Radeon M6 (IBM Thinkpad X31), and I have tried quite a lot of resolution changes! Now,..., next step will be to get the KDE guys to eliminate some annoying bugs on the desktop when MergedFB is enabled :-D ;-) Greeting from Switzerland Michael I'm not sure if I can/should change the bug status, but I try to put it to "Resolved" and "Fixed" Cheers, Michael I'd like to test xf86-video-ati git head too. Could someone gently point me to the relevant documentation for building it, as I can't seem to get past autogen.sh? Also, can I build just the driver, or do I need to build the whole server? I currently use X.org 7.0.0 from Ubuntu Dapper. Hi Marius, I haven't found much documentation. Everything I know is basically described by Michel in comment 29. This only builds the ati driver. After the build you find the driver files somewhere in src/.libs/ . So, it's a hidden directory. The files are all named *.so like ati_drv.so . I searched for them in my root dir, found them in /lib/xorg/drivers/modules and simply exchange them with the newly build ones. If you are having problems with autogen, look if you have all the corresponding packages (I think they are called autoconf and automake (automake1.7 if I remember correctly). And you need all the relevant xorg dev packages. It takes some time to set this up. If you don't like to install all these packages on your system (e.g. because it's a production system like mine), then you can also set up a virtual machine with vmware and use this as a build system (this is what I do). Hope this helps a bit. Cheers, Michael |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.