Summary: | [r600] SUMO2 GPU lockup CP stall (kernel 3.2.47,3.4,3.8, 3.9, 3.10) | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | wojtek <wojtask9> | ||||||||||||||||||||||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||||||||||||||||||||||
Severity: | normal | ||||||||||||||||||||||||||||||
Priority: | medium | CC: | alpha_one_x86, manowar | ||||||||||||||||||||||||||||
Version: | XOrg git | ||||||||||||||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||||||||
Attachments: |
|
Created attachment 78070 [details]
dmesg
Created attachment 78071 [details]
Xorg.0.log without crash (but GPU lockup CP stall appears)
Created attachment 78072 [details]
Xorg.0.log with crash log
I've just installed Gentoo. Issue still exists. radeon-ucode-20130513 mesa-9.2-20130515 (without llvm) kernel-3.9.2 (UVD disabled because kms with UVD don't work) libdrm-2.4.44 xf86-video-ati-7.1.0 (without glamour) any ideas or patches to test? Is this a regression with kernels 3.8, 3.9? I.e., did 3.7 work ok? This is the first linux installation on that machine. I cannot confirm if this is regression. First time when I installed Arch linux kernel version was 3.7 and GPU lockup occurred short summary (Alex suggestions on IRC): remove r600g_dri.so -> NOT OK "ColorTiling2D" "false" -> NOT OK kernel-3.10-rc1 -> NOT OK "NoAccel" "true" -> OK Created attachment 80492 [details]
reg_dump_radeon_kernel39
Created attachment 80493 [details]
reg_dump_fglrx_kernel39
What is the motherboard and cpu reference ? AMD A4-3400 ? Motherboard: Gigabyte Technology Co., Ltd. GA-A75M-UD2H/GA-A75M-UD2H, BIOS F5 11/03/2011 CPU: AMD A4-3400 APU with Radeon(tm) HD Graphics (fam: 12, model: 01, stepping: 00) probably duplicate https://bugs.freedesktop.org/show_bug.cgi?id=56081 and almost the same issue http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg40024.html On my system with tree from http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.11-wip-4 GPU lockup still present (tested on X11 and Wayland) Try comparing the following registers between fglrx and radeon: 0x98FC 0x98F8 0x8950 0x98F4 0x3F88 0x9B7C 0x3F90 0x9148 0x3F94 0x914C 0x8954 0x2004 0x2008 0x2768 0x8B24 0xA008 0xA020 0xA02C 0x9100 0x913C 0x960c 0x9610 0x88C4 You can use radeonreg (as root): ./radeonreg regmatch 0x98FC See if changing any of them to what fglrx programs before starting X with tiling disabled. It doesn't help :/ xorg.conf ColorTilling "false" ColorTilling2D "false" Maybe I'm doing something wrong? modprobe fglrx radeonreg reg radeon > fglrx_reg_dump_console.log start x radeonreg reg radeon > fglrx_reg_dump_x11.log restart computer modprobe radeon radeonreg reg radeon > radeon_reg_dump_console.log ./diff_and_select select_registers.txt > selected_registers.txt #script that generate diff and select only registers from comment#13 ./set_registers selected_registers.txt #script read selected_registers.txt and execute radeonreg regset "register" "register_value" diff between registers from comment#13 before startx diff fglrx_registers_c.log radeon_registers_c.log 2c2 < 0x98F8 0x00000000 (0) --- > 0x98F8 0x02010002 (33619970) 7,8c7,8 < 0x3F90 0xffff0000 (-65536) < 0x9148 0xffff0000 (-65536) --- > 0x3F90 0x00000000 (0) > 0x9148 0x00000000 (0) 15,17c15,17 < 0x8B24 0x00000000 (0) < 0xA008 0x00030000 (196608) < 0xA020 0x00158011 (1409041) --- > 0x8B24 0x00ff0fff (16715775) > 0xA008 0x00010000 (65536) > 0xA020 0x00020009 (131081) 20,21c20,21 < 0x913C 0x01000000 (16777216) < 0x960c 0x76543210 (1985229328) --- > 0x913C 0x00000004 (4) > 0x960c 0x54763210 (1417032208) 23c23 < 0x88C4 0x00000000 (0) --- > 0x88C4 0x000000c1 (193) Hi, there. I don't absolutely sure I've got the same issue, but at least, my problem is very close to one reported here. System: 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Sumo [Radeon HD 6480G] xorg-server-1.14.2-alt1 xorg-drv-radeon-7.1.0-alt2 Linux 3.9.10-std-def-alt1 I'm asking you to test the following thing, please: try to modprobe radeon vramlimit=32 On my system it helps, but only a little: the X screen doesn't freeze, just highly corrupted. And VTs could be switched. The vramlimit=16 option works the same way, but any other value I've tried -- 64, 128 --- doesn't! Does it mean something we can use to trace down this bug? Created attachment 82523 [details]
dmesg output, vramlimit=16, Xorg started
Attached the dmesg | grep 'radeon' after modprobe radeon vramlimit=16 and Xorg start (kdm).
Created attachment 82524 [details]
dmesg output, vramlimit=32, Xorg started
Attached the dmesg | grep 'radeon' after modprobe radeon vramlimit=32 and Xorg start (kdm).
Created attachment 82525 [details]
dmesg output, vramlimit=128, Xorg started
Attached the dmesg | grep 'radeon' after modprobe radeon vramlimit=128 and Xorg start (kdm).
It freezes as with no vramlimit option. Used ...; sleep 10; dmesg | grep 'radeon' >file to catch the output.
yeah :) with vramlimit=16 xeyes works perfect :). KDM (4.10.5) is working (without artifacts). kernel-3.11-rc1 from(http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-3.11) I'll do more tests later That just disables the use of vram for exa pixmaps. You can accomplish the same thing by adding: Option "EXAPixmaps" "false" to the device section of your xorg.conf (In reply to comment #21) > That just disables the use of vram for exa pixmaps. You can accomplish the > same thing by adding: > Option "EXAPixmaps" "false" > to the device section of your xorg.conf Yes, the option helps a little: I can see the KDM screen without any artifacts. However, it is dead, i.e. frozen. Feels like a GPU lockup, but no error messages in Xorg log or dmesg. How can I get more info? (In reply to comment #13) > Try comparing the following registers between fglrx and radeon: > difference: left (fglrx) right(rad) GB_ADDR_CONFIG -0x98F8 0x02010002 (33619970) +0x98F8 0x02010001 (33619969) CGTS_SYS_TCC_DISABLE -0x3F90 0x00000000 (0) +0x3F90 0xff000000 (-16777216) CGTS_TCC_DISABLE -0x9148 0x00000000 (0) +0x9148 0xff000000 (-16777216) CGTS_USER_SYS_TCC_DISABLE -0x3F94 0x00000000 (0) +0x3F94 0xff000000 (-16777216) CGTS_USER_TCC_DISABLE -0x914C 0x00000000 (0) +0x914C 0xff000000 (-16777216) PA_SC_FORCE_EOV_MAX_CNTS -0x8B24 0x00ff0fff (16715775) +0x8B24 0x00ff3fff (16728063) SMX_DC_CTL0 -0xA020 0x00158009 (1409033) +0xA020 0x00020009 (131081) SPI_CONFIG_CNTL_1 -0x913C 0x00000004 (4) +0x913C 0x00000000 (0) VGT_CACHE_INVALIDATION -0x88C4 0x000000c1 (193) +0x88C4 0x000000c2 (194)
> left (fglrx) right(rad)
> GB_ADDR_CONFIG
> -0x98F8 0x02010002 (33619970)
> +0x98F8 0x02010001 (33619969)
my mistake
- (left - radeon)
+ (right -fglrx)
Hi. I'm having a similar GPU lockup CP stall on a REDWOOD card. But it doesn't happen at login. With kernels 3.9, 3.10 I can run Xorg, use KDE Kwin effects without any issue; but when I start a game like 0a.d. or Need for Speed Most Wanted under wine, after some time a lockup happens. With kernel 3.11-rc3 lockups occurs while in desktop, little time after login -with or without radeon.dpm set to 1-. Here are two attachments (the two running a kernel 3.10) of dmesg, one with R600_HYPERZ=0 env var set, and the other not (as suggested on another bug). Is there anything else I can help? Created attachment 83488 [details]
dmesg output from ssh (hyperz enabled)
Created attachment 83489 [details]
dmesg output from ssh (hyperz disabled)
(In reply to comment #25) > Hi. I'm having a similar GPU lockup CP stall on a REDWOOD card. But it > doesn't happen at login. With kernels 3.9, 3.10 I can run Xorg, use KDE > Kwin effects without any issue; but when I start a game like 0a.d. or Need > for Speed Most Wanted under wine, after some time a lockup happens. With > kernel 3.11-rc3 lockups occurs while in desktop, little time after login > -with or without radeon.dpm set to 1-. > > Here are two attachments (the two running a kernel 3.10) of dmesg, one with > R600_HYPERZ=0 env var set, and the other not (as suggested on another bug). > > Is there anything else I can help? Please open a different bug. Your issues are not related to this one. Created attachment 86938 [details]
sumo2.patch
simple patch that's fix problem on my system :)
(In reply to comment #13) > Try comparing the following registers between fglrx and radeon: > > ... > > You can use radeonreg (as root): > ./radeonreg regmatch 0x98FC > Here is the console mode comparison (radeon installs the framebuffer console, though): $ diff -u fglrx-console.data radeon-console.data --- fglrx-console.data 2013-12-21 01:38:37.952290647 +0400 +++ radeon-console.data 2013-12-21 01:42:45.844348784 +0400 @@ -1,23 +1,23 @@ 0x98FC 0x00000000 (0) -0x98F8 0x00000000 (0) +0x98F8 0x02010002 (33619970) 0x8950 0xfffcf001 (-200703) 0x98F4 0x00fe0001 (16646145) 0x3F88 0x00fe0001 (16646145) 0x9B7C 0x00000000 (0) -0x3F90 0xffff0000 (-65536) -0x9148 0xffff0000 (-65536) +0x3F90 0x00000000 (0) +0x9148 0x00000000 (0) 0x3F94 0x00000000 (0) 0x914C 0x00000000 (0) 0x8954 0x00000000 (0) 0x2004 0x00000210 (528) 0x2008 0x00fac688 (16434824) 0x2768 0x00007000 (28672) -0x8B24 0x00000000 (0) -0xA008 0x00030000 (196608) -0xA020 0x00158011 (1409041) +0x8B24 0x00ff0fff (16715775) +0xA008 0x00010000 (65536) +0xA020 0x00158009 (1409033) 0xA02C 0x0000001b (27) 0x9100 0x00000000 (0) -0x913C 0x01000000 (16777216) -0x960c 0x76543210 (1985229328) +0x913C 0x00000004 (4) +0x960c 0x54763210 (1417032208) 0x9610 0x0000ba98 (47768) -0x88C4 0x00000000 (0) +0x88C4 0x000000c1 (193) > See if changing any of them to what fglrx programs before starting X with > tiling disabled. I've set each register values from fglrx-console.data and call xinit. Lockup again, nothing changes. (In reply to comment #30) > > I've set each register values from fglrx-console.data and call xinit. > Lockup again, nothing changes. Does the patch in comment 29 fix the issue for you? That patch is upstream now and should be in most stable kernels as well. (In reply to comment #31) > (In reply to comment #30) > > > > I've set each register values from fglrx-console.data and call xinit. > > Lockup again, nothing changes. > > Does the patch in comment 29 fix the issue for you? That patch is upstream > now and should be in most stable kernels as well. To the pity, no. I've tested Linux 3.12 with the patch already applied and it changes nothing. My chip is: $ lspci | grep 'VGA' 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Sumo [Radeon HD 6480G] Is it really a SUMO2 chip or I should open a new bug? (In reply to comment #32) > (In reply to comment #31) > > (In reply to comment #30) > > > > > > I've set each register values from fglrx-console.data and call xinit. > > > Lockup again, nothing changes. > > > > Does the patch in comment 29 fix the issue for you? That patch is upstream > > now and should be in most stable kernels as well. > > To the pity, no. I've tested Linux 3.12 with the patch already applied and > it changes nothing. My chip is: > > $ lspci | grep 'VGA' > 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] > Sumo [Radeon HD 6480G] > > Is it really a SUMO2 chip or I should open a new bug? What's the numeric pci id (lspci -nn)? You should also see a line in the dmesg output like this: [ 1.758014] [drm] initializing kernel modesetting (SUMO 0x1002:0x9640 0x1458:0xD000). It will say SUMO or SUMO2 depending on which one you have. (In reply to comment #33) > (In reply to comment #32) > > (In reply to comment #31) > > > (In reply to comment #30) > > > > > > > > I've set each register values from fglrx-console.data and call xinit. > > > > Lockup again, nothing changes. > > > > > > Does the patch in comment 29 fix the issue for you? That patch is upstream > > > now and should be in most stable kernels as well. > > > > To the pity, no. I've tested Linux 3.12 with the patch already applied and > > it changes nothing. My chip is: > > > > $ lspci | grep 'VGA' > > 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] > > Sumo [Radeon HD 6480G] > > > > Is it really a SUMO2 chip or I should open a new bug? > > What's the numeric pci id (lspci -nn)? You should also see a line in the > dmesg output like this: > [ 1.758014] [drm] initializing kernel modesetting (SUMO 0x1002:0x9640 > 0x1458:0xD000). > > It will say SUMO or SUMO2 depending on which one you have. $ lspci -nn | grep 'VGA' 00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Sumo [Radeon HD 6480G] [1002:9649] [ 188.599127] [drm] initializing kernel modesetting (SUMO 0x1002:0x9649 0x17AA:0x21EA). Thus, is's SUMO isn't it? Should I try to use "max_hw_contexts = 4" for the SUMO case or that's unreasonable? Created attachment 91154 [details] [review] possible fix (In reply to comment #34) > > $ lspci -nn | grep 'VGA' > 00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. > [AMD/ATI] Sumo [Radeon HD 6480G] [1002:9649] > > [ 188.599127] [drm] initializing kernel modesetting (SUMO 0x1002:0x9649 > 0x17AA:0x21EA). > > > Thus, is's SUMO isn't it? Should I try to use "max_hw_contexts = 4" for the > SUMO case or that's unreasonable? Your chip was misclassified. The attached patch should fix it. (In reply to comment #35) > Created attachment 91154 [details] [review] [review] > possible fix > > (In reply to comment #34) > > > > $ lspci -nn | grep 'VGA' > > 00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. > > [AMD/ATI] Sumo [Radeon HD 6480G] [1002:9649] > > > > [ 188.599127] [drm] initializing kernel modesetting (SUMO 0x1002:0x9649 > > 0x17AA:0x21EA). > > > > > > Thus, is's SUMO isn't it? Should I try to use "max_hw_contexts = 4" for the > > SUMO case or that's unreasonable? > > Your chip was misclassified. The attached patch should fix it. Great! Now, it works! :))) Thank you so much. (In reply to comment #36) > (In reply to comment #35) > > > > Your chip was misclassified. The attached patch should fix it. > > Great! Now, it works! :))) > > Thank you so much. However, the fun didn't last much long: suspend..resume gives black screen, vt switch doesn't work. There are a number of bugs on that subject over there, including https://bugs.freedesktop.org/show_bug.cgi?id=66940 https://bugs.freedesktop.org/show_bug.cgi?id=23103 https://bugs.freedesktop.org/show_bug.cgi?id=40935 https://bugs.freedesktop.org/show_bug.cgi?id=42162 https://bugs.freedesktop.org/show_bug.cgi?id=50805 https://bugs.freedesktop.org/show_bug.cgi?id=72710 Which one can you advice me to join? |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 78069 [details] lspci This happens always with KDM, LightDM/XDM (xdm-archlinux package). Login screen freezes (but I can move mouse pointer). First I thought that was my configuration error but the same issue with knoppix livecd (7.0.5, kernel 3.6.10, mesa-9.0.1) Without Display Manager (startx from console) result is the same. Screen appears but is frozen (test using xfce4 and kde). Without DM I don't see any GPU lockup but Xorg.0.log shows errors. Setting R600_HYPERZ=0 or R600_DEBUG=nohyperz doesn't help system kernel 3.8.5 or 3.9.0-rc7 mesa 9.1.1 other packages are up to date form arch repository