Summary: | Intermittent fail due to improper memory access, SERR generated when starting XWindow in Linux RH4 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | jon chaplick <chaplick> | ||||||||||||
Component: | Driver/Radeon | Assignee: | Benjamin Herrenschmidt <benh> | ||||||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||||||
Severity: | normal | ||||||||||||||
Priority: | high | CC: | alexdeucher, bd, benh, diegocg, hyu, michel | ||||||||||||
Version: | git | ||||||||||||||
Hardware: | x86 (IA32) | ||||||||||||||
OS: | Linux (All) | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Attachments: |
|
Description
jon chaplick
2005-07-30 03:53:25 UTC
Please attach the patch to this entry Jon, I'm going to commit it to HEAD shortly. Created attachment 3191 [details]
SSH key for Andrew Cowie
CVSROOT: /cvs/xorg Module name: xc Changes by: daenzer@gabe.freedesktop.org 05/07/29 12:45:14 Log message: * programs/Xserver/hw/xfree86/drivers/ati/radeon_driver.c: (RADEONSetFBLocation): bugzilla #3911 (https://bugs.freedesktop.org/show_bug.cgi?id=3911) attachment #3191 [details] (http://bugs.freedesktop.org/attachment.cgi?id=3191) Disable bus mastering while updating MC_FB_LOCATION and friends to prevent the X server from hanging on startup every now and then under some circumstances. (ATI Technologies Inc.) Modified files: ./: ChangeLog xc/programs/Xserver/hw/xfree86/drivers/ati/: radeon_driver.c Revision Changes Path 1.1161 +10 -0 xc/ChangeLog 1.64 +9 -1 xc/programs/Xserver/hw/xfree86/drivers/ati/radeon_driver.c Comment on attachment 3191 [details]
SSH key for Andrew Cowie
Note that the patch was made against and tested on 6.8 in the first place.
Gentlemen, please have a look at Bug #4324. It seems that the above fix may in fact trigger a similar hang-on-startup problem on other systems. Ok, I'm not sure this fix is correct. I have a slightly different analysis of the problem. I think there is no such things as a memory request beeing "blocked waiting for a valid MC_FB_LOCATION", the value of that register is always "valid" as far as the chip is concerned (unless your top is below your bottom but that should never happen, unless you chip was really in a bad shape in the first place). I think what happens is that you have a scanout in progress via a CRTC at the point where you change MC_FB_LOCATION. At this point, the scanout continues from the old address and thus generates bus master reads, until you change DISPLAY_BASE_ADDRESS (and all other registers tha may be relevant, like DISP2_BASE_* etc...) Imho, the proper fix is to disable CRTCs, not disable bus mastering. In fact, there is even a bit in CRTC registers (and in one of the LVDS one as well iirc) to prevent them from doing any memory access. Created attachment 3620 [details] [review] Rework setup of the memory map This patch reworks how the memory map is initialized to do it as part of the mode setting and to properly disable CRTCs before moving things around. We don't touch BUS_CNTL anymore as this caused lockups with PCI GART. We might want to add a bit more safety there in the future like disabling the capture engines too. On the UseFBDev case, it changes the behaviour as we no longer try to move thing around (we use whatever the fbdev driver had setup). This appear to work on the few machines I tested so far. Please test for regression as I intend to commit before 7.0 is final (In reply to comment #7) > Created an attachment (id=3620) [edit] > Rework setup of the memory map I like the approach of this patch. The only cosmetic comment I have is that + /* Default to existing values */ + save->mc_fb_location = INREG(RADEON_MC_FB_LOCATION); + save->mc_agp_location = INREG(RADEON_MC_AGP_LOCATION); can be removed in RADEONInitMemMapRegisters() because those fields are always initialized later on in that function. Adding Hui to the CC list, maybe he has other comments. This seems to fix a hang I had been having when loading the serverworks AGP kernel module (not loading it made the system work fine). With this patch, I can run with the serverworks agp module loaded (Radeon 9200 SE graphics card) and no hangs (they were easily reproducible when enabling kompmgr and I can't reproduce it anymore) This may be unrelated, but I see this after setting Option "AGPMode" "2" in my xorg.conf file: agpgart: X tried to set rate=x12. Setting to AGP3 x8 mode. agpgart: X requested AGPx8 but bridge not capable. agpgart: Putting AGP V2 device at 0000:00:00.1 into 1x mode agpgart: Putting AGP V2 device at 0000:01:00.0 into 1x mode The Xorg.0.log file confirms I'm not configuring it wrong: (**) RADEON(0): Option "AGPMode" "2" (**) RADEON(0): Option "AGPFastWrite" "True" (**) RADEON(0): Option "EnablePageFlip" "True" (**) RADEON(0): Option "MonitorLayout" "CRT, CRT" (**) RADEON(0): Option "RenderAccel" "True" (**) RADEON(0): Option "AccelMethod" "EXA" [...] (**) RADEON(0): AGP 2x mode is configured (**) RADEON(0): Enabling AGP Fast Write Ok, forget what I said about this patch killing the hangs I were having - they still happen, they're just more difficult to reproduce :/ Hrm.. I'm not sure those hangs are related to the bug I'm trying to fix then... I have a new patch, but I'm still facing a few regressions. I'll attach it to this bug once I've figured those out New patch is in upstream, please test This seems to work fine for me (Debian's 6.9 ati driver hangs my box after using it for some minutes), except that apparently I can't enable EXA - if I do, the X.org will not say nothing about exa (grep -i exa returns nothing), it uses XAA and everything feels much slower than when I comment out the EXA line in x.org.conf (I can't say if I did something wrong when I compiled it though :P) Can you attach your log and config files ? Created attachment 4697 [details]
X.org log
Of course not! How you dare.... :P
Created attachment 4698 [details]
xorg.conf when Accelmethod = EXA
The startup log when I comment out accelmethod = EXA is *exactly* the same (no diff) than when it's enabled, except that everything redraws much slower (ie: like if I were using the vesa driver or something) Diff between the accelmethod=exa and noaccelmethod specified: --- /var/log/Xorg.0.log.old 2006-02-21 01:29:15.000000000 +0100 +++ /var/log/Xorg.0.log 2006-02-21 01:29:26.000000000 +0100 @@ -12,7 +12,7 @@ Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. -(==) Log file: "/var/log/Xorg.0.log", Time: Tue Feb 21 01:29:10 2006 +(==) Log file: "/var/log/Xorg.0.log", Time: Tue Feb 21 01:29:23 2006 (==) Using config file: "/root/xorg.conf" (==) ServerLayout "Default Layout" (**) |-->Screen "Default Screen" (0) @@ -735,35 +735,30 @@ (==) RADEON(0): Write-combining range (0xf0000000,0x8000000) (II) RADEON(0): BIOS HotKeys Disabled drmOpenDevice: node name is /dev/dri/card0 -drmOpenDevice: open result is -1, (No such device or address) -drmOpenDevice: open result is -1, (No such device or address) -drmOpenDevice: Open failed +drmOpenDevice: open result is 8, (OK) drmOpenDevice: node name is /dev/dri/card0 -drmOpenDevice: open result is -1, (No such device or address) -drmOpenDevice: open result is -1, (No such device or address) -drmOpenDevice: Open failed +drmOpenDevice: open result is 8, (OK) drmOpenByBusid: Searching for BusID pci:0000:01:00.0 drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 8, (OK) drmOpenByBusid: drmOpenMinor returns 8 drmOpenByBusid: drmGetBusid reports pci:0000:01:00.0 -(II) RADEON(0): [drm] loaded kernel module for "radeon" driver (II) RADEON(0): [drm] DRM interface version 1.2 (II) RADEON(0): [drm] created "radeon" driver at busid "pci:0000:01:00.0" (II) RADEON(0): [drm] added 8192 byte SAREA at 0xf091e000 -(II) RADEON(0): [drm] mapped SAREA 0xf091e000 to 0xa7f2e000 +(II) RADEON(0): [drm] mapped SAREA 0xf091e000 to 0xa7f53000 (II) RADEON(0): [drm] framebuffer handle = 0xf0000000 (II) RADEON(0): [drm] added 1 reserved context for kernel (II) RADEON(0): [agp] Mode 0x1f00021b [AGP 0x1166/0x0007; Card 0x1002/0x5960] (II) RADEON(0): [agp] 8192 kB allocated with handle 0x00000001 (II) RADEON(0): [agp] ring handle = 0xd8000000 -(II) RADEON(0): [agp] Ring mapped at 0x9f872000 +(II) RADEON(0): [agp] Ring mapped at 0x9f897000 (II) RADEON(0): [agp] ring read ptr handle = 0xd8101000 -(II) RADEON(0): [agp] Ring read ptr mapped at 0x9f871000 +(II) RADEON(0): [agp] Ring read ptr mapped at 0x9f896000 (II) RADEON(0): [agp] vertex/indirect buffers handle = 0xd8102000 -(II) RADEON(0): [agp] Vertex/indirect buffers mapped at 0x9f671000 +(II) RADEON(0): [agp] Vertex/indirect buffers mapped at 0x9f696000 (II) RADEON(0): [agp] GART texture map handle = 0xd8302000 -(II) RADEON(0): [agp] GART Texture map mapped at 0x9f191000 +(II) RADEON(0): [agp] GART Texture map mapped at 0x9f1b6000 (II) RADEON(0): [drm] register handle = 0xfe6f0000 (II) RADEON(0): [dri] Visual configs initialized (II) RADEON(0): Depth moves disabled by default @@ -848,5 +843,5 @@ Warning: font renderer for ".pmf" already registered at priority 0 Could not init font path element /usr/lib/X11/fonts/Speedo, removing from list! (II) RADEON(0): [drm] removed 1 reserved context for kernel -(II) RADEON(0): [drm] unmapping 8192 bytes of SAREA 0xf091e000 at 0xa7f2e000 +(II) RADEON(0): [drm] unmapping 8192 bytes of SAREA 0xf091e000 at 0xa7f53000 FreeFontPath: FPE "/usr/lib/X11/fonts/misc" refcount is 2, should be 1; fixing. forget this last commentary, this was using the wrong config file (/root/xorg.conf) because it was being run as root. Created attachment 4699 [details]
X.org log when accelmethod = EXA
My setup writes the log in /opt/var/log not in /var/log. Sorry for all the
noise and for my stupidity
Created attachment 4700 [details]
log diff between accelmethod=exa (-) and without it (+)
This log really shows that exa is being used (but the slow-redraw problem
persist of course)
I've been running this for days with cero problems except for performance - it's solid as a rock, i hope 6.9/7.0 is updated with this when the patch is ready, it's definitively an improvement! It looks like current CVS makes my system hang again - it started working well on 21-2 and I've using CVS for a while, but the latest changse make it behave in the same way than previously. Sadly I don't know what change started to cause this again, I'll try to go back in the time a do a manual bisection search - I wish X.org used git :/ By current CVS, do you mean actually current as of today or maybe a couple of days ago ? I found another cause for these hangs and commited a fix yesterday... Just make sure you are really testing the latest CVS. If it still hangs, then yes, it would be useful to know what specific change commited over the past few days is causing the hang. Hm, right now I'm using current CVS...but the problem doesn't seems to be the CVS. Apparently, My box only hangs when I've the agp and radeon kernel modules loaded (i've tried 2.6.16-git right now). When I remove them, everything seems to work fine. I'm not doing anything which uses 3D, just a regular kde 3.5 session using firefox 1.5 The version of the ddx driver in CVS does matter a lot. There are some very subtle and incestuous interactions going on between the X side driver and the kernel DRM around the way the memory map is setup. I would expect the DDX that is currently in CVS HEAD or ati-1-0-branch to not cause bogus bus master accesses any more, I took all sort of precautions against it. If it still happens, then I suppose there is some more weird voodoo going on with the card and we'll need ATI to shed some light on the matter. In the meantime, please test what happens when running current top of tree X ati driver and the current DRM CVS kernel module and tell me. Then test downgrading the kernel module to what is in 2.6.15 or 2.6.16 (doesn't matter). (In reply to comment #26) > Hm, right now I'm using current CVS...but the problem doesn't seems to be the > CVS. Apparently, My box only hangs when I've the agp and radeon kernel modules > loaded (i've tried 2.6.16-git right now). Not that there are many possible causes for hangs; in this entry, we're only interested in SERR conditions, which can probably only be diagnosed on server type machines. (In reply to comment #28) > Not that there are many possible causes for hangs; [...] Whoops, that was supposed to say 'Note that...'. (In reply to comment #27) > I would expect the DDX that is currently in CVS HEAD or ati-1-0-branch to not I'm using kernel 2.6.17 git and CVS HEAD from the ati driver, and i still get hangs (which go away without loading the radeon kernel module). (I know nothing about "SERR" conditions I just get "hangs" sorry, for some reason I reported my at bug to this bug. is there other bug where I should take this?) (In reply to comment #30) > I'm using kernel 2.6.17 git and CVS HEAD from the ati driver, and i still get > hangs (which go away without loading the radeon kernel module). I'm not 100% sure, but IIRC the problem described in this bug happens regardless of whether the DRM is loaded. > is there other bug where I should take this?) Yes, e.g. bug 6271. The original problem was fixed, all the patches have been committed and people having other problems redirected to other bugs. Resolving as fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.