Bug 6011

Summary: Radeon 9250 (R280) 128 bit 256 Mbyte system lockup during DRI init with recent DRM
Product: xorg Reporter: Brian Beardall <brian>
Component: Driver/RadeonAssignee: Benjamin Herrenschmidt <benh>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: high CC: alexdeucher, benh, michel
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Radeon 9250 128 Meg of Ram 64 bit ram interface (no lockup)
none
Radeon 9250 256 Meg of Ram 128 bit ram interface (lockup)
none
Radeon 9250 256 Meg of Ram 128 bit ram interface (lockup) again
none
fglrx log
none
fglrx log no accel
none
The output of lspci -vv
none
Add rv280 workaround and limit vram to 128M again
none
Updated Xorg log with patch none

Description Brian Beardall 2006-02-23 14:18:15 UTC
With newer snapshots of DRM I have been having system lockups with the Radeon
9250 with 256 Mbyte of Ram.  The card has the 128bit ddr ram.  In the Xorg.0.log
the DRI is reported to be initialized, but that is where it locks up.  I know
that only 128 Mbyte of RAM is used, but with the DRM in the kernel this lockup
does not occur.  I have seen this bug in the December 25, 2005 snapshot, and in
the current February 21, 2006 snapshot.  When I used the February 21, 2006
snapshot I also grabed the snapshot for the xf86-video-ati driver to get the
newest memory management patches for the radeon.  I have tested these same
snapshots on three different ATI Radeon cards with no system lockup.  These
other cards include:
ATI Radeon 7000VE 64 Meg
ATI Radeon 9000 128 Meg 128 bit ddr ram
ATI Radeon 9250 128 Meg 64 bit ddr ram

No system stability was experienced with those three cards.  During the
initializing of the ATI Radeon 9250 256 Meg 128 bit ddr ram is when the system
lockup is occuring.  Xorg.0.log to follow shortly.  I can't get the dmesg output
after loading X because the system is frozen with a blank screen.  No keyboard,
no network, no mouse, no interrupts. :(
Comment 1 Brian Beardall 2006-02-23 14:21:48 UTC
I have tested the card in three different computer with the exact same results:
1x: VIA 686A northbridge (HP computer)
2x: AMD 751 Northbridge (MSI 6195, and MSI 6167)
Comment 2 Benjamin Herrenschmidt 2006-02-23 20:53:36 UTC
Are you using the latest X.org ati driver from CVS ? If not, can you try it and
send the server log on startup ? I need to check the memory map.

Ben.
 
Comment 3 Brian Beardall 2006-02-24 02:28:40 UTC
I have checked out the latest CVS snapshot for the xf86-video-ati driver with
the new memory management patches. I knew that I would need both a new DRM
snapshot, and Xorg driver snapshot to make it work.  This bug was occuring
before the patches where submitted.  I first experienced the radeon lockup with
256 meg of ram on the December 25, 2005 snapshot.  I will be attaching an
Xorg.0.log file soon.
Comment 4 Brian Beardall 2006-02-24 03:02:56 UTC
Created attachment 4729 [details]
Radeon 9250 128 Meg of Ram 64 bit ram interface (no lockup)

This is the card that didn't lockup the computer.
Comment 5 Brian Beardall 2006-02-24 03:04:59 UTC
Created attachment 4730 [details]
Radeon 9250 256 Meg of Ram 128 bit ram interface (lockup)

This log isn't very long nor very helpful because the computer froze during
initization of the video card. :(
Comment 6 Michel Dänzer 2006-02-24 03:27:19 UTC
(In reply to comment #5)
> 
> This log isn't very long nor very helpful because the computer froze during
> initization of the video card. :(

Indeed, please try mounting the filesystem containing the log file with -o
remount,sync.
Comment 7 Brian Beardall 2006-02-24 03:58:59 UTC
Created attachment 4732 [details]
Radeon 9250 256 Meg of Ram 128 bit ram interface (lockup) again

I added the option sync to the mount, and I got a good long log file.
Comment 8 Benjamin Herrenschmidt 2006-02-24 10:04:04 UTC
Looks like it might be locking up when applying the memory map... I need you to
test a few things:

 - First, does it lockup without the DRM & AGP ?
 
 - In RADEONInitMemoryMap(), can you try hacking after those lines:

 
    mem_size = INREG(RADEON_CONFIG_MEMSIZE);
    if (mem_size == 0)
	    mem_size = 0x800000;

   Add that line:

    mem_size /= 2;

 - Separately (that is without the above change), in
RADEONRestoreMemMapRegisters(), can you add various message output with
ErrorF("message"); to try to verify that the crash happens in that function and
where precisely in that function.

 - Also try in that same function to change the order of those two lines:

	OUTREG(RADEON_MC_FB_LOCATION, restore->mc_fb_location);
 	OUTREG(RADEON_MC_AGP_LOCATION, restore->mc_agp_location);

 - Also in that same function again, try commenting out the call to
RADEONEngineReset(pScrn).


Thanks !

Ben.


Comment 9 Benjamin Herrenschmidt 2006-02-24 13:19:03 UTC
Hrm... in addition to the mem_size /= 2 change you should also do
pScrn->videoRam /= 2 or the server will be out of sync with the memory map
setting...
Comment 10 Brian Beardall 2006-02-24 13:23:29 UTC
Ok I did the debugging in the function RADEONRestoreMemMapRegisters().  I added
the ErrorF(""); on every line for the last 8 lines of the function, and the
function completed according to the log.  I tried all of the suggestions for
RADEONRestoreMemMapRegisters(), and none of them worked.

The hack mem_size /=2; worked.  Xorg loaded with the hack and no system lockup.

The computer locks up even if I use the option Option NoAccel "1".
Comment 11 Benjamin Herrenschmidt 2006-02-24 13:41:21 UTC
OK, you didn't reply about the DRM question, that is, does latest X driver works
when not loading the DRM ? Can you try adding ErrorF's to RADEONScreenInit()
after the call to RADEONModeInit() see if we reach that ? Also after the call to
 RADEONRestoreMemoryMap from RADEONModeInit() and inside RADEONRestoreMode() in
case it crashes in there ? (Basically trying to locate the crash more precisely).

I would expect the DRM to make no difference at this point though, but according
to your log, it does seem to make one (unless the problem is just that the log
didn't get the last bits before the crash).

Next would be to try hacking in the kernel DRM to find out what's going on.

Ben.
Comment 12 Brian Beardall 2006-02-24 14:09:55 UTC
The r128 doesn't build if I opt out of the dri.  I removed the kernel modules
for the drm, and the radeon.  The computer still locks up.  I'll have to do more
debugging tomorrow.
Comment 13 Benjamin Herrenschmidt 2006-02-24 14:19:53 UTC
The server loads the modules itself. To test without DRM, you can afaik either
comment out the Load "dri" line in the Modules section or move the kernel
modules out of the way (out of /lib/modules/*) so that X can't find them.
Comment 14 Brian Beardall 2006-02-24 15:55:33 UTC
I already tested the DRM.  Without the drm the computer still freezes.  So right
now the bug is in the X11 driver itself.
Comment 15 Brian Beardall 2006-02-25 13:22:03 UTC
RADEONRestoreMode() {

    } else {
        RADEONRestoreMemMapRegisters(pScrn, restore);
ErrorF("under the else\n");
        if (info->MergedFB) {
ErrorF("MergedFB RADEONRestoreCrtc2Registers()\n");
            RADEONRestoreCrtc2Registers(pScrn, restore);
ErrorF("MergedFB RADEONRestorePLL2Registers()\n");
            RADEONRestorePLL2Registers(pScrn, restore);
        }

        if (!pRADEONEnt->HasSecondary || pRADEONEnt->IsSecondaryRestored ||
            info->IsSwitching) {
            pRADEONEnt->IsSecondaryRestored = FALSE;

ErrorF("HasSecondary RADEONRestoreCommonRegisters()\n");
            RADEONRestoreCommonRegisters(pScrn, restore);
ErrorF("HasSecondary RADEONRestoreCrtcRegisters()\n");
Died here ->            RADEONRestoreCrtcRegisters(pScrn, restore);
ErrorF("HasSecondary RADEONRestoreFPRegisters()\n");
            RADEONRestoreFPRegisters(pScrn, restore);
ErrorF("HasSecondary RADEONRestorePLL2Registers()\n");
            RADEONRestorePLLRegisters(pScrn, restore);




static void RADEONRestoreCrtcRegisters(ScrnInfoPtr pScrn,
                                       RADEONSavePtr restore)
{
    RADEONInfoPtr  info       = RADEONPTR(pScrn);
    unsigned char *RADEONMMIO = info->MMIO;
ErrorF("Die location 1\n");
    OUTREG(RADEON_CRTC_GEN_CNTL, restore->crtc_gen_cntl);

ErrorF("Die location 2\n");
Died at this register setting ->    OUTREGP(RADEON_CRTC_EXT_CNTL,
            restore->crtc_ext_cntl,
            RADEON_CRTC_VSYNC_DIS |
            RADEON_CRTC_HSYNC_DIS |
            RADEON_CRTC_DISPLAY_DIS);

This is what I found. I learned that by disabling this line in the function
RADEONRestoreMemMapRegisters()

    if (INREG(RADEON_MC_FB_LOCATION) != restore->mc_fb_location ||
        INREG(RADEON_MC_AGP_LOCATION) != restore->mc_agp_location) {

with 
   if (0) {
Then the computer didn't lockup until much later in the card initialization.

This is the additional log with the if(0) { hack:
(**) RADEON(0): GRPH_BUFFER_CNTL from 200d7c7c to 20135c5c
(II) RADEON(0): Direct rendering enabled
(**) RADEON(0): Setting up final surfaces
(**) RADEON(0): Initializing Acceleration
(II) RADEON(0): Render acceleration enabled
(**) RADEON(0): EngineInit (32/32)

from the original lockup log file. I hope this helps.
Comment 16 Benjamin Herrenschmidt 2006-02-25 15:35:16 UTC
In RADEONRestoreMemMapRegisters(), can you add that bit of code at the beginning:

ErrorF("Dump for ben:\n");
ErrorF("%x %x %x %x %x %x %x %x %x\n",
  INREG(RADEON_MC_FB_LOCATION),
  INREG(RADEON_MC_AGP_LOCATION),
  INREG(RADEON_DISPLAY_BASE_ADDR),
  INREG(RADEON_DISPLAY2_BASE_ADDR),
  INREG(RADEON_OV0_BASE_ADDR),
  INREG(RADEON_AIC_CNTL),
  INREG(0x1d8),
  INREG(0x1dc),
  INREG(0x1e0));

And tell me what it says before the hang.

Comment 17 Benjamin Herrenschmidt 2006-02-25 15:39:19 UTC
Also, if you have a bit of time to investigate, it would be useful to know which
bit in CRTC_EXT_CNTL is causing the hang... you can print the previous value of
the register and the value about to be written. OUTREGP as used in that code
will basically write the value in crtc_ext_cntl except those 3 other bits given
as a mask which will be kept to whatever value was there, you can try blasting
values manually yourself with OUTREG to check which actual bit set or removed by
that line is causing the lockup. If you don't know how to proceed, just add an

ErrorF("CRTC_EXT_CNTL is: %x want: %x\n", INREG(RADEON_CRTC_EXT_CNTL),
restore->crtc_ext_cntl); and tell me what it says and I'll come up with more
stuffs to test.
Comment 18 Benjamin Herrenschmidt 2006-02-25 15:43:22 UTC
While we are at it ... does it work with fglrx and in this case, do you have
access to all 256Mb ? (Can you send me a log with fglrx ?). It would be
interesting to compare how fglrx sets some register. I'll upload an updated
radeontool later today or tomorrow that dumps the stuff I'm interested in
Comment 19 Benjamin Herrenschmidt 2006-02-25 15:58:45 UTC
Re-assign to myself and mark as xorg bug as it's really not a DRI issue
Comment 20 Brian Beardall 2006-02-25 17:58:01 UTC
The ATI fglrx driver only detects 128 Meg of ram on the video card. The computer
locks up when I enable drm support in the fglrx driver.  Xorg will load if I
disable the drm for the fglrx. I haven't had very good luck with the fglrx and
AGP. I have had a lot better luck with the xorg driver.
Comment 21 Benjamin Herrenschmidt 2006-02-26 07:48:41 UTC
Can you post the complete fglrx log ?

If it detects only 128Mb of RAM, that may mean that either the card really only
has that, or those cards has some weird feature/bug that makes 256Mb configs
not useable without major trickery.

I've asked ATI, we may or may not get an answer. I'll wait a couple of days (and
check what the fglrx log says) but without a good answer from them, I think I'll
limit the internal ram mapping to 128Mb, at least on pre-r300 generation chips.
Sad, but at this point there is nothing else I can do.
Comment 22 Brian Beardall 2006-02-26 08:16:17 UTC
Created attachment 4750 [details]
fglrx log 

These are the options that I used when starting the X server.
	Option "no_dri" "yes"
	Option "no_accel" "no"

The computer locked up
Comment 23 Brian Beardall 2006-02-26 08:18:29 UTC
Created attachment 4751 [details]
fglrx log no accel

I started the X server with these options:
	Option "no_dri" "yes"
	Option "no_accel" "yes"

X completely loaded, and the computer was stable. It is like ATI doesn't even
have the driver initializing correctly.
Comment 24 Brian Beardall 2006-02-26 08:21:55 UTC
This is the output of rovclock, and unless the video card has been programmed
incorrectly it has 256 meg of ram.

Radeon overclock 0.6b by Hasw (hasw@hasw.net)

Found ATI card on 01:05, device id: 0x5960
I/O base address: 0xd000
Video BIOS shadow found @ 0xc0000
Reference clock from BIOS: 27.0 MHz
Memory size: 262144 kB
Memory channels: 1, CD,CH only: 0
tRcdRD:   8
tRcdWR:   4
tRP:      8
tRAS:     16
tRRD:     3
tR2W-CL:  3
tWR:      4
tW2R:     2
tW2Rsb:   1
tR2R:     2
tRFC:     18
tWL(0.5): 2
tCAS:     3
tCMD:     0
tSTR:     1
XTAL: 27.0 MHz, RefDiv: 2

Core: 240.75 MHz, Mem: 200.25 MHz

I am going to plug it into a computer that has Microsoft Windows to see what it
does. The Windows driver gets a lot more support than does fglrx.
Comment 25 Benjamin Herrenschmidt 2006-02-26 08:22:38 UTC
Please do so... I wonder if the card might have busted memory or something like
that...
Comment 26 Brian Beardall 2006-02-26 09:51:22 UTC
Tested in Windows.  Result: PASS
RAM 256 Meg
GPU RV280
AGP 4x

The computer ran just fine with the video card. It is my opinion that ATI is
using some sort of trickery. Doesn't make sense when the 128 meg version inits
perfectly, and runs perfectly, and the 256 meg version freezes the computer. I
will however mod the code to read the register values so that we can get more
information on what is happening. There may have been a couple of new registers
added to the rv280 to properly init the upper 128 meg of ram.
Comment 27 Benjamin Herrenschmidt 2006-02-26 10:08:42 UTC
Interesting. I don't know what's up at this point. We'll see if ATI can provide
a clue, if not, I'll limit the vram on those
Comment 28 Benjamin Herrenschmidt 2006-02-26 10:25:08 UTC
BTW. Do you have some way of reading the card registers from windows ?
Comment 29 Brian Beardall 2006-02-26 10:39:03 UTC
I don't have a utility to read the register values of the Radeon video card in
Windows. What email address did you use to email ATI?
Comment 30 Benjamin Herrenschmidt 2006-02-26 10:41:06 UTC
I contacted some folks I know directly
Comment 31 Michel Dänzer 2006-02-26 23:51:49 UTC
fglrx will always restrict the amount of CPU accessible video RAM to 128 MB for
all the cards it currently supports. The amount if 'invisible' video RAM it's
using can be seen in the kernel output after starting the X server with DRI enabled.
Comment 32 Brian Beardall 2006-02-27 04:05:46 UTC
This is the output of fglrx after it is loaded.

fglrx: module license 'Proprietary. (C) 2002 - ATI Technologies, Starnberg,
GERMANY' taints kernel.
[fglrx] Maximum main memory to use for locked dma buffers: 372 MBytes.
ACPI: PCI Interrupt 0000:01:05.0[A] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10
[fglrx] module loaded - fglrx 8.22.5 [Feb  7 2006] on minor 0
Comment 33 Benjamin Herrenschmidt 2006-02-27 07:16:29 UTC
The problem is that the machine locks up when fglrx enables its DRI.... The
"invisible" is what I'm looking for since my lockup seems to be due to setting
MC_FB_LOCATION, not the accessible vram (though I can try to change that too and
see if it makes a difference but it doesn't appear so)
Comment 34 Michel Dänzer 2006-02-27 21:18:22 UTC
fglrx locking up is most likely AGP related. It should use the second 128 MB of
video RAM as invisible without problems.
Comment 35 Benjamin Herrenschmidt 2006-02-27 23:43:03 UTC
Heh, well, ok, but do you have any clue why the open source driver is locking up
when setting CRTC_EXT_CNTL after setting MC_FB_LOCATION to cover the whole 128M
while it work if it coverts only 128M ? :) Is there some black magic I've been
missing here ?
Comment 36 Michel Dänzer 2006-02-28 02:52:02 UTC
No idea, unfortunately. FWIW, does restricting the amount of video RAM to 128 MB
via the VideoRam directive make a difference with fglrx?
Comment 37 Benjamin Herrenschmidt 2006-02-28 08:24:13 UTC
Ok, got a reply, it's a chip errata, i'll commit a workaround later today or
tomorrow. On those chips, the aperture must be aligned to the aperture size
(that is FB_START in MC_FB_LOCATION must be aligned to the aperture size).
Comment 38 Benjamin Herrenschmidt 2006-02-28 14:59:23 UTC
Can you also send me an lspci -vv output (as root) ? Thanks !
Comment 39 Brian Beardall 2006-02-28 15:14:29 UTC
Created attachment 4773 [details]
The output of lspci -vv
Comment 40 Brian Beardall 2006-02-28 15:19:09 UTC
(In reply to comment #39)
> Created an attachment (id=4773) [edit]
> The output of dmesg -vv
> 

lspci -vv
Comment 41 Benjamin Herrenschmidt 2006-02-28 16:45:53 UTC
Created attachment 4774 [details] [review]
Add rv280 workaround and limit vram to 128M again

Plase test this and tell me if it helps. I have to limit the CPU accessible
VRAM to 128Mb again for now (though the full 256Mb should be accessible by the
engine at least).
Comment 42 Brian Beardall 2006-02-28 17:34:47 UTC
Created attachment 4775 [details]
Updated Xorg log with patch

The patch did keep the computer from freezing, but DRI doesn't work with the
patch. DRI is reported as initialized, but there is no acceleration.
Comment 43 Benjamin Herrenschmidt 2006-02-28 17:42:10 UTC
Hrm... DRI looks properly enabled in the server. What do you mean by no
acceleration ? 3d is sluggish ? I think that's a totally different problem,
possibly because you don't have the DRI user driver in the right place or
something like that... 

What happens if you do

LD_DEBUG=files glxinfo

Comment 44 Brian Beardall 2006-03-01 01:30:07 UTC
I just forgot to move back from fglrx. I forgot to run eselect opengl set
xorg-x11. DRI works.
Comment 45 Benjamin Herrenschmidt 2006-03-01 10:38:33 UTC
Fixed in CVS
Comment 46 Brian Beardall 2006-03-02 12:24:18 UTC
I synced with CVS, but the info->PciInfo->size[0]; doesn't seem to be reading
the correct amount of ram for the PCI at Region 0. lspci -vv shoes 128M of ram
for Region 0, and info->PciInfo->size[0]; reads 27k. Is the because I need to
update my Xorg server to cvs, or is it because it really isn't working.
Comment 47 Benjamin Herrenschmidt 2006-03-02 12:35:51 UTC
Damn, I should have tested it ... I have to investigate, not sure what's up 
Comment 48 Benjamin Herrenschmidt 2006-03-02 13:05:45 UTC
Ok, I just commited a fix to CVS (it may take a little while to reach anoncvs
though). Let me know if it works
Comment 49 Brian Beardall 2006-03-02 13:55:23 UTC
This bug is fixed. 

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.