Summary: | Radeon 9250 (R280) 128 bit 256 Mbyte system lockup during DRI init with recent DRM | ||
---|---|---|---|
Product: | xorg | Reporter: | Brian Beardall <brian> |
Component: | Driver/Radeon | Assignee: | Benjamin Herrenschmidt <benh> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | high | CC: | alexdeucher, benh, michel |
Version: | git | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Brian Beardall
2006-02-23 14:18:15 UTC
I have tested the card in three different computer with the exact same results: 1x: VIA 686A northbridge (HP computer) 2x: AMD 751 Northbridge (MSI 6195, and MSI 6167) Are you using the latest X.org ati driver from CVS ? If not, can you try it and send the server log on startup ? I need to check the memory map. Ben. I have checked out the latest CVS snapshot for the xf86-video-ati driver with the new memory management patches. I knew that I would need both a new DRM snapshot, and Xorg driver snapshot to make it work. This bug was occuring before the patches where submitted. I first experienced the radeon lockup with 256 meg of ram on the December 25, 2005 snapshot. I will be attaching an Xorg.0.log file soon. Created attachment 4729 [details]
Radeon 9250 128 Meg of Ram 64 bit ram interface (no lockup)
This is the card that didn't lockup the computer.
Created attachment 4730 [details]
Radeon 9250 256 Meg of Ram 128 bit ram interface (lockup)
This log isn't very long nor very helpful because the computer froze during
initization of the video card. :(
(In reply to comment #5) > > This log isn't very long nor very helpful because the computer froze during > initization of the video card. :( Indeed, please try mounting the filesystem containing the log file with -o remount,sync. Created attachment 4732 [details]
Radeon 9250 256 Meg of Ram 128 bit ram interface (lockup) again
I added the option sync to the mount, and I got a good long log file.
Looks like it might be locking up when applying the memory map... I need you to test a few things: - First, does it lockup without the DRM & AGP ? - In RADEONInitMemoryMap(), can you try hacking after those lines: mem_size = INREG(RADEON_CONFIG_MEMSIZE); if (mem_size == 0) mem_size = 0x800000; Add that line: mem_size /= 2; - Separately (that is without the above change), in RADEONRestoreMemMapRegisters(), can you add various message output with ErrorF("message"); to try to verify that the crash happens in that function and where precisely in that function. - Also try in that same function to change the order of those two lines: OUTREG(RADEON_MC_FB_LOCATION, restore->mc_fb_location); OUTREG(RADEON_MC_AGP_LOCATION, restore->mc_agp_location); - Also in that same function again, try commenting out the call to RADEONEngineReset(pScrn). Thanks ! Ben. Hrm... in addition to the mem_size /= 2 change you should also do pScrn->videoRam /= 2 or the server will be out of sync with the memory map setting... Ok I did the debugging in the function RADEONRestoreMemMapRegisters(). I added the ErrorF(""); on every line for the last 8 lines of the function, and the function completed according to the log. I tried all of the suggestions for RADEONRestoreMemMapRegisters(), and none of them worked. The hack mem_size /=2; worked. Xorg loaded with the hack and no system lockup. The computer locks up even if I use the option Option NoAccel "1". OK, you didn't reply about the DRM question, that is, does latest X driver works when not loading the DRM ? Can you try adding ErrorF's to RADEONScreenInit() after the call to RADEONModeInit() see if we reach that ? Also after the call to RADEONRestoreMemoryMap from RADEONModeInit() and inside RADEONRestoreMode() in case it crashes in there ? (Basically trying to locate the crash more precisely). I would expect the DRM to make no difference at this point though, but according to your log, it does seem to make one (unless the problem is just that the log didn't get the last bits before the crash). Next would be to try hacking in the kernel DRM to find out what's going on. Ben. The r128 doesn't build if I opt out of the dri. I removed the kernel modules for the drm, and the radeon. The computer still locks up. I'll have to do more debugging tomorrow. The server loads the modules itself. To test without DRM, you can afaik either comment out the Load "dri" line in the Modules section or move the kernel modules out of the way (out of /lib/modules/*) so that X can't find them. I already tested the DRM. Without the drm the computer still freezes. So right now the bug is in the X11 driver itself. RADEONRestoreMode() { } else { RADEONRestoreMemMapRegisters(pScrn, restore); ErrorF("under the else\n"); if (info->MergedFB) { ErrorF("MergedFB RADEONRestoreCrtc2Registers()\n"); RADEONRestoreCrtc2Registers(pScrn, restore); ErrorF("MergedFB RADEONRestorePLL2Registers()\n"); RADEONRestorePLL2Registers(pScrn, restore); } if (!pRADEONEnt->HasSecondary || pRADEONEnt->IsSecondaryRestored || info->IsSwitching) { pRADEONEnt->IsSecondaryRestored = FALSE; ErrorF("HasSecondary RADEONRestoreCommonRegisters()\n"); RADEONRestoreCommonRegisters(pScrn, restore); ErrorF("HasSecondary RADEONRestoreCrtcRegisters()\n"); Died here -> RADEONRestoreCrtcRegisters(pScrn, restore); ErrorF("HasSecondary RADEONRestoreFPRegisters()\n"); RADEONRestoreFPRegisters(pScrn, restore); ErrorF("HasSecondary RADEONRestorePLL2Registers()\n"); RADEONRestorePLLRegisters(pScrn, restore); static void RADEONRestoreCrtcRegisters(ScrnInfoPtr pScrn, RADEONSavePtr restore) { RADEONInfoPtr info = RADEONPTR(pScrn); unsigned char *RADEONMMIO = info->MMIO; ErrorF("Die location 1\n"); OUTREG(RADEON_CRTC_GEN_CNTL, restore->crtc_gen_cntl); ErrorF("Die location 2\n"); Died at this register setting -> OUTREGP(RADEON_CRTC_EXT_CNTL, restore->crtc_ext_cntl, RADEON_CRTC_VSYNC_DIS | RADEON_CRTC_HSYNC_DIS | RADEON_CRTC_DISPLAY_DIS); This is what I found. I learned that by disabling this line in the function RADEONRestoreMemMapRegisters() if (INREG(RADEON_MC_FB_LOCATION) != restore->mc_fb_location || INREG(RADEON_MC_AGP_LOCATION) != restore->mc_agp_location) { with if (0) { Then the computer didn't lockup until much later in the card initialization. This is the additional log with the if(0) { hack: (**) RADEON(0): GRPH_BUFFER_CNTL from 200d7c7c to 20135c5c (II) RADEON(0): Direct rendering enabled (**) RADEON(0): Setting up final surfaces (**) RADEON(0): Initializing Acceleration (II) RADEON(0): Render acceleration enabled (**) RADEON(0): EngineInit (32/32) from the original lockup log file. I hope this helps. In RADEONRestoreMemMapRegisters(), can you add that bit of code at the beginning: ErrorF("Dump for ben:\n"); ErrorF("%x %x %x %x %x %x %x %x %x\n", INREG(RADEON_MC_FB_LOCATION), INREG(RADEON_MC_AGP_LOCATION), INREG(RADEON_DISPLAY_BASE_ADDR), INREG(RADEON_DISPLAY2_BASE_ADDR), INREG(RADEON_OV0_BASE_ADDR), INREG(RADEON_AIC_CNTL), INREG(0x1d8), INREG(0x1dc), INREG(0x1e0)); And tell me what it says before the hang. Also, if you have a bit of time to investigate, it would be useful to know which bit in CRTC_EXT_CNTL is causing the hang... you can print the previous value of the register and the value about to be written. OUTREGP as used in that code will basically write the value in crtc_ext_cntl except those 3 other bits given as a mask which will be kept to whatever value was there, you can try blasting values manually yourself with OUTREG to check which actual bit set or removed by that line is causing the lockup. If you don't know how to proceed, just add an ErrorF("CRTC_EXT_CNTL is: %x want: %x\n", INREG(RADEON_CRTC_EXT_CNTL), restore->crtc_ext_cntl); and tell me what it says and I'll come up with more stuffs to test. While we are at it ... does it work with fglrx and in this case, do you have access to all 256Mb ? (Can you send me a log with fglrx ?). It would be interesting to compare how fglrx sets some register. I'll upload an updated radeontool later today or tomorrow that dumps the stuff I'm interested in Re-assign to myself and mark as xorg bug as it's really not a DRI issue The ATI fglrx driver only detects 128 Meg of ram on the video card. The computer locks up when I enable drm support in the fglrx driver. Xorg will load if I disable the drm for the fglrx. I haven't had very good luck with the fglrx and AGP. I have had a lot better luck with the xorg driver. Can you post the complete fglrx log ? If it detects only 128Mb of RAM, that may mean that either the card really only has that, or those cards has some weird feature/bug that makes 256Mb configs not useable without major trickery. I've asked ATI, we may or may not get an answer. I'll wait a couple of days (and check what the fglrx log says) but without a good answer from them, I think I'll limit the internal ram mapping to 128Mb, at least on pre-r300 generation chips. Sad, but at this point there is nothing else I can do. Created attachment 4750 [details]
fglrx log
These are the options that I used when starting the X server.
Option "no_dri" "yes"
Option "no_accel" "no"
The computer locked up
Created attachment 4751 [details]
fglrx log no accel
I started the X server with these options:
Option "no_dri" "yes"
Option "no_accel" "yes"
X completely loaded, and the computer was stable. It is like ATI doesn't even
have the driver initializing correctly.
This is the output of rovclock, and unless the video card has been programmed incorrectly it has 256 meg of ram. Radeon overclock 0.6b by Hasw (hasw@hasw.net) Found ATI card on 01:05, device id: 0x5960 I/O base address: 0xd000 Video BIOS shadow found @ 0xc0000 Reference clock from BIOS: 27.0 MHz Memory size: 262144 kB Memory channels: 1, CD,CH only: 0 tRcdRD: 8 tRcdWR: 4 tRP: 8 tRAS: 16 tRRD: 3 tR2W-CL: 3 tWR: 4 tW2R: 2 tW2Rsb: 1 tR2R: 2 tRFC: 18 tWL(0.5): 2 tCAS: 3 tCMD: 0 tSTR: 1 XTAL: 27.0 MHz, RefDiv: 2 Core: 240.75 MHz, Mem: 200.25 MHz I am going to plug it into a computer that has Microsoft Windows to see what it does. The Windows driver gets a lot more support than does fglrx. Please do so... I wonder if the card might have busted memory or something like that... Tested in Windows. Result: PASS RAM 256 Meg GPU RV280 AGP 4x The computer ran just fine with the video card. It is my opinion that ATI is using some sort of trickery. Doesn't make sense when the 128 meg version inits perfectly, and runs perfectly, and the 256 meg version freezes the computer. I will however mod the code to read the register values so that we can get more information on what is happening. There may have been a couple of new registers added to the rv280 to properly init the upper 128 meg of ram. Interesting. I don't know what's up at this point. We'll see if ATI can provide a clue, if not, I'll limit the vram on those BTW. Do you have some way of reading the card registers from windows ? I don't have a utility to read the register values of the Radeon video card in Windows. What email address did you use to email ATI? I contacted some folks I know directly fglrx will always restrict the amount of CPU accessible video RAM to 128 MB for all the cards it currently supports. The amount if 'invisible' video RAM it's using can be seen in the kernel output after starting the X server with DRI enabled. This is the output of fglrx after it is loaded. fglrx: module license 'Proprietary. (C) 2002 - ATI Technologies, Starnberg, GERMANY' taints kernel. [fglrx] Maximum main memory to use for locked dma buffers: 372 MBytes. ACPI: PCI Interrupt 0000:01:05.0[A] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10 [fglrx] module loaded - fglrx 8.22.5 [Feb 7 2006] on minor 0 The problem is that the machine locks up when fglrx enables its DRI.... The "invisible" is what I'm looking for since my lockup seems to be due to setting MC_FB_LOCATION, not the accessible vram (though I can try to change that too and see if it makes a difference but it doesn't appear so) fglrx locking up is most likely AGP related. It should use the second 128 MB of video RAM as invisible without problems. Heh, well, ok, but do you have any clue why the open source driver is locking up when setting CRTC_EXT_CNTL after setting MC_FB_LOCATION to cover the whole 128M while it work if it coverts only 128M ? :) Is there some black magic I've been missing here ? No idea, unfortunately. FWIW, does restricting the amount of video RAM to 128 MB via the VideoRam directive make a difference with fglrx? Ok, got a reply, it's a chip errata, i'll commit a workaround later today or tomorrow. On those chips, the aperture must be aligned to the aperture size (that is FB_START in MC_FB_LOCATION must be aligned to the aperture size). Can you also send me an lspci -vv output (as root) ? Thanks ! Created attachment 4773 [details]
The output of lspci -vv
(In reply to comment #39) > Created an attachment (id=4773) [edit] > The output of dmesg -vv > lspci -vv Created attachment 4774 [details] [review] Add rv280 workaround and limit vram to 128M again Plase test this and tell me if it helps. I have to limit the CPU accessible VRAM to 128Mb again for now (though the full 256Mb should be accessible by the engine at least). Created attachment 4775 [details]
Updated Xorg log with patch
The patch did keep the computer from freezing, but DRI doesn't work with the
patch. DRI is reported as initialized, but there is no acceleration.
Hrm... DRI looks properly enabled in the server. What do you mean by no acceleration ? 3d is sluggish ? I think that's a totally different problem, possibly because you don't have the DRI user driver in the right place or something like that... What happens if you do LD_DEBUG=files glxinfo I just forgot to move back from fglrx. I forgot to run eselect opengl set xorg-x11. DRI works. Fixed in CVS I synced with CVS, but the info->PciInfo->size[0]; doesn't seem to be reading the correct amount of ram for the PCI at Region 0. lspci -vv shoes 128M of ram for Region 0, and info->PciInfo->size[0]; reads 27k. Is the because I need to update my Xorg server to cvs, or is it because it really isn't working. Damn, I should have tested it ... I have to investigate, not sure what's up Ok, I just commited a fix to CVS (it may take a little while to reach anoncvs though). Let me know if it works This bug is fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.