Bug 44800 - Radeon HD 6450 CAICOS screen corruption and kernel crashes
Summary: Radeon HD 6450 CAICOS screen corruption and kernel crashes
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-15 05:53 UTC by Marko Kohtala
Modified: 2014-06-16 04:47 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Artefacts in X (283.05 KB, image/jpeg)
2012-01-15 05:53 UTC, Marko Kohtala
no flags Details
Artefacts in text console (292.54 KB, image/jpeg)
2012-01-15 05:55 UTC, Marko Kohtala
no flags Details
A kernel panic (441.90 KB, image/jpeg)
2012-01-15 05:57 UTC, Marko Kohtala
no flags Details
lspci -vvnn (45.10 KB, text/plain)
2012-01-15 05:58 UTC, Marko Kohtala
no flags Details
dmesg output to text console (80.29 KB, text/plain)
2012-01-15 06:00 UTC, Marko Kohtala
no flags Details
X server log, before panic if I remember right (40.18 KB, text/plain)
2012-01-15 06:13 UTC, Marko Kohtala
no flags Details
dmesg with enable_mtrr_cleanup (79.67 KB, text/plain)
2012-01-16 11:26 UTC, Marko Kohtala
no flags Details
Artefacts in X, screencapture (19.73 KB, image/png)
2012-01-16 11:43 UTC, Marko Kohtala
no flags Details

Description Marko Kohtala 2012-01-15 05:53:33 UTC
Created attachment 55597 [details]
Artefacts in X

I got Radeon HD 6450 1GB DDR3 last fall and have been unable to use it.

I currently run fairly recent 32-bit Debian versions on top of 64-bit vanilla kernel 3.2.1 (currently with minor patch to makefile and a debug output patch https://bugs.freedesktop.org/attachment.cgi?id=53428). I've tried with 32-bit kernel and some older kernels without noticing any difference.

I've tried installing Windows and ran some OpenCL tests. All worked fine so I'd expect the hardware is working correctly.

I tried if kernel options iomem=off mem=2G would help. They did not help. mem=2G was needed because jmicron driver for ATA failed without iomem.

I attach some more files I collected while trying if it has gotten fixed.
Now I'm back to running a loaned nVidia card to be able to report this.
Comment 1 Marko Kohtala 2012-01-15 05:55:53 UTC
Created attachment 55598 [details]
Artefacts in text console

The artefacts seem to come during scrolling. If scrolling a lot, like in a long directory listing, they flash there and sometimes some of them stay on screen after it stops. They are horizontal lines, not blocks like in X.
Comment 2 Alex Deucher 2012-01-15 05:56:22 UTC
Please attach your xorg log and dmesg output.
Comment 3 Marko Kohtala 2012-01-15 05:57:38 UTC
Created attachment 55599 [details]
A kernel panic

To test the card, I usually booted with "text" on kernel command line to prevent gdm from starting. This time I forgot that and gdm started. I switched to text console immediately and tried to continue in text mode, but got this panic while reading dmesg output.
Comment 4 Marko Kohtala 2012-01-15 05:58:36 UTC
Created attachment 55600 [details]
lspci -vvnn
Comment 5 Marko Kohtala 2012-01-15 06:00:41 UTC
Created attachment 55601 [details]
dmesg output to text console
Comment 6 Marko Kohtala 2012-01-15 06:13:16 UTC
Created attachment 55604 [details]
X server log, before panic if I remember right
Comment 7 Michel Dänzer 2012-01-16 03:23:43 UTC
The panic backtrace doesn't look obviously related to the radeon driver — all the symptoms sound like something might be scribbling more or less randomly over memory.

I wonder if the line

 mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining

in dmesg might be relevant. Can you try resolving that, e.g. using https://linuxindetails.wordpress.com/2010/06/27/mtrr-type-mismatch-for-e000000010000000-old-write-back-new-write-combining/ and see if that helps? Might be interesting seeing the contents of /proc/mtrr before and afterwards (if it changes).
Comment 8 Marko Kohtala 2012-01-16 11:26:00 UTC
Created attachment 55650 [details]
dmesg with enable_mtrr_cleanup

Attached the dmesg with The /proc/mtrr is 

reg00: base=0x000000000 (    0MB), size= 8192MB, count=1: write-back
reg01: base=0x200000000 ( 8192MB), size=  512MB, count=1: write-back
reg02: base=0x0e0000000 ( 3584MB), size=  512MB, count=1: uncachable
reg03: base=0x21f800000 ( 8696MB), size=    8MB, count=1: uncachable

and with enable_mtrr_cleanup mtrr_spare_reg_nr=1 options

reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
reg02: base=0x0c0000000 ( 3072MB), size=  512MB, count=1: write-back
reg03: base=0x100000000 ( 4096MB), size= 4096MB, count=1: write-back
reg04: base=0x200000000 ( 8192MB), size=  512MB, count=1: write-back
reg05: base=0x21f800000 ( 8696MB), size=    8MB, count=1: uncachable
reg06: base=0x0e0000000 ( 3584MB), size=  256MB, count=1: write-combining

It makes no difference.

I installed 32-bit debian 3.1.0 kernel and compiled fglrx for it and am now reporting this with it. It does not have the artefacts, but is not quite perfect either (gnome-shell 3.2.1 animations flicker like scaling of windows does not work as expected, but no artifacts, possibly of no interest to you).
Comment 9 Marko Kohtala 2012-01-16 11:43:20 UTC
Created attachment 55651 [details]
Artefacts in X, screencapture

This capture was with the enable_mtrr_cleanup mtrr_spare_reg_nr=1 to kernel. Here I started the X server from command line and only an xterm and a window manager. 

The artefacts appear to the window while I scroll the window pressing enter.

There was some constantly running horizontal artefacts, but they were flickering and got erased immediately thus leaving no mark. Seems like it could be ramdac reading bad for some scanlines, but more likely it goes on in the framebuffer. The blue part lower in the window could be one of those caught. I wonder if the squares in the xterm window are a result of that flickering, but due to scrolling it does not get erased.

The blue and green spots came one by one slowly while doing something else in the xterm. They stayed.

This happens only with this CAICOS card (I have an older radeon X300 that has no problems, as well as a nVidia card goes without these problems). So I'd expect whatever memory corruption there is, it is somehow in the CAICOS support in kernel.
Comment 10 Alexandre Demers 2012-08-10 20:07:23 UTC
May or may not be related, but I've seen similar corruption with CAYMAN related to bug 45018 from time to time. But about comment 3, I was tempted to point at bug 42373.

Marko, if you are willing to try patches proposed in both mentionned bugs to see if it helps in any way, that could be interesting.
Comment 11 Reartes Guillermo 2012-08-14 13:16:04 UTC
@Marko Kohtala

Currently you are bootin in UEFI mode.
Can you boot in BIOS mode to test?

Your board is P8P67-M with BIOS 1701

Is your board "Rev B"?
Also check asus website, since a bios 3602 is there.
Comment 12 Marko Kohtala 2012-08-14 15:44:40 UTC
I have also tried booting in BIOS mode using Ubuntu 12.04 live USB. It does not help.

The board is "New P67 B3 Revision".

I have since upgraded to BIOS 3602. Same problem is still there.

I've followed kernels now to 3.5. I will try patches mentioned in comment 10 once I get the spare time to upgrade to 3.5.1.

Thank you for your comments.
Comment 13 Marko Kohtala 2012-08-19 18:09:19 UTC
I applied "drm/radeon: cleanup and fix crtc while programming mc" and "drm/radeon: fence virtual address and free it once idle [3.5] v2" on top of 3.5.2.

Did not change this. I saw the usual corruption in the fb console as well as in the xinit running only xterm.

This time I run 64-bit kernel under my 32-bit userspace. I had problems getting the 32-bit kernel to boot on my EFI.
Comment 14 Marko Kohtala 2014-06-15 08:13:55 UTC
At BIOS 3703, kernel 3.13.2 and Debian xorg-server 2:1.15.1-1, ati_drv.so and radeon_drv.so at module version 7.3.0. I reinstalled the HD 6450 CAICOS and it seems the problem no longer occurs.
Comment 15 Marko Kohtala 2014-06-16 04:47:05 UTC
I tried with Ubuntu 13.10 live, and the screen corruption appeared. So it was not a BIOS fix.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.