Bug 40103

Summary: Striped screen with CAICOS chip (1002:6779)
Product: xorg Reporter: Timo Aaltonen <tjaalton>
Component: Driver/RadeonAssignee: xf86-video-ati maintainers <xorg-driver-ati>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: benh, kunal.gangakhedkar, patrik.kullman
Version: 7.6 (2010.12)   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
X log
none
screenshot
none
full dmesg
none
dmesg
none
dmesg with ttm dma pool 1.7 patchset
none
Don't read from CP ring write pointer registers
none
dmesg with the patched kernel
none
workaround debugged on my caicos
none
channel remap fix none

Description Timo Aaltonen 2011-08-15 07:28:18 UTC
Created attachment 50234 [details]
X log

I've got a striped screen with a CAICOS 6779 (HD6450), running on kernel 3.0.1 or 3.1rc2, both are the same. Userland is Ubuntu Oneiric devel series, with xserver 1.10.3 and -ati snapshot from last week.

Here's a call trace from dmesg (one of several identical ones):

[   10.633032] WARNING: at /home/tjaalton/linux/drivers/gpu/drm/radeon/radeon_gart.c:177 radeon_gart_bind+0x1ac/0x1c0 [radeon]()
[   10.633034] Hardware name: P5K
[   10.633034] trying to bind memory to unitialized GART !
[   10.633036] Modules linked in: snd_hda_codec_hdmi hid_apple bnep rfcomm bluetooth snd_hda_codec_realtek binfmt_misc snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event serio_raw snd_seq snd_timer radeon asus_atk0110 snd_seq_device snd ttm drm_kms_helper drm i2c_algo_bit soundcore snd_page_alloc lp parport usbhid hid firewire_ohci ahci firewire_core pata_jmicron crc_itu_t libahci atl1
[   10.633056] Pid: 241, comm: plymouthd Tainted: G        W   3.1.0-0301rc1g72fa599-generic #201108121445
[   10.633057] Call Trace:
[   10.633060]  [<ffffffff8105e79f>] warn_slowpath_common+0x7f/0xc0
[   10.633063]  [<ffffffff8105e896>] warn_slowpath_fmt+0x46/0x50
[   10.633067]  [<ffffffffa00dddc4>] ? ttm_mem_global_alloc_page+0x54/0x60 [ttm]
[   10.633078]  [<ffffffffa0155dec>] radeon_gart_bind+0x1ac/0x1c0 [radeon]
[   10.633089]  [<ffffffffa01537d5>] radeon_ttm_backend_bind+0x35/0xb0 [radeon]
[   10.633094]  [<ffffffffa00de6b0>] ttm_tt_bind+0x50/0x80 [ttm]
[   10.633098]  [<ffffffffa00e0409>] ttm_bo_handle_move_mem+0x169/0x380 [ttm]
[   10.633103]  [<ffffffffa00e14fa>] ttm_bo_move_buffer+0x13a/0x150 [ttm]
[   10.633110]  [<ffffffffa00b209c>] ? drm_mm_kmalloc+0x3c/0xe0 [drm]
[   10.633113]  [<ffffffff8116793f>] ? alloc_file+0x9f/0xd0
[   10.633117]  [<ffffffffa00e15f7>] ttm_bo_validate+0xe7/0xf0 [ttm]
[   10.633122]  [<ffffffffa00e17b8>] ttm_bo_init+0x1b8/0x260 [ttm]
[   10.633133]  [<ffffffffa0154bb6>] radeon_bo_create+0x176/0x2a0 [radeon]
[   10.633144]  [<ffffffffa01548e0>] ? radeon_create_ttm_backend_entry+0x40/0x40 [radeon]
[   10.633147]  [<ffffffff810329b9>] ? default_spin_lock_flags+0x9/0x10
[   10.633160]  [<ffffffffa016d39a>] radeon_gem_object_create+0x5a/0x100 [radeon]
[   10.633173]  [<ffffffffa016d7f8>] radeon_gem_create_ioctl+0x58/0xd0 [radeon]
[   10.633176]  [<ffffffff8127db7a>] ? security_capable+0x2a/0x30
[   10.633182]  [<ffffffffa00a6574>] drm_ioctl+0x3e4/0x4c0 [drm]
[   10.633195]  [<ffffffffa016d7a0>] ? radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
[   10.633197]  [<ffffffff811a0902>] ? fsnotify+0x1c2/0x2a0
[   10.633199]  [<ffffffff81391abe>] ? tty_ldisc_deref+0xe/0x10
[   10.633202]  [<ffffffff81177d2a>] do_vfs_ioctl+0x8a/0x340
[   10.633204]  [<ffffffff81166200>] ? vfs_write+0x110/0x180
[   10.633207]  [<ffffffff81178071>] sys_ioctl+0x91/0xa0
[   10.633209]  [<ffffffff815f2282>] system_call_fastpath+0x16/0x1b
[   10.633211] ---[ end trace d4bd37cfb6de1754 ]---
Comment 1 Michel Dänzer 2011-08-15 07:39:14 UTC
(II) RADEON(0): GPU accel disabled or not working, using shadowfb for KMS

Check the drm/radeon initialization messages in dmesg. Usually this is due to failing to load a firmware file. I guess we should handle the failure more gracefully though...
Comment 2 Timo Aaltonen 2011-08-15 07:39:22 UTC
Created attachment 50236 [details]
screenshot

I should probably add that the mouse cursor is a "box" which moves, so the server appears working fine. A photo attached to prove it.
Comment 3 Timo Aaltonen 2011-08-15 07:53:04 UTC
Created attachment 50237 [details]
full dmesg
Comment 4 Timo Aaltonen 2011-08-23 23:59:10 UTC
the interesting bits from dmesg:

[    9.455331] [drm] Initialized drm 1.1.0 20060810
[    9.562808] [drm] radeon defaulting to kernel modesetting.
[    9.562811] [drm] radeon kernel modesetting enabled.
[    9.562865] radeon 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    9.562870] radeon 0000:01:00.0: setting latency timer to 64
[    9.563019] [drm] initializing kernel modesetting (CAICOS 0x1002:0x6779).
[    9.563062] [drm] register mmio base: 0xFE8C0000
[    9.563064] [drm] register mmio size: 131072
[    9.563377] ATOM BIOS: 6779.13.12.0.0.AS03
[    9.563569] radeon 0000:01:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
[    9.563572] radeon 0000:01:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF
[    9.564306] [drm] Detected VRAM RAM=512M, BAR=256M
[    9.564309] [drm] RAM width 32bits DDR
[    9.568873] [TTM] Zone  kernel: Available graphics memory: 3062390 kiB.
[    9.568876] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB.
[    9.568878] [TTM] Initializing pool allocator.
[    9.568898] [drm] radeon: 512M of VRAM memory ready
[    9.568899] [drm] radeon: 512M of GTT memory ready.
[    9.568915] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    9.568917] [drm] Driver supports precise vblank timestamp query.
[    9.568956] radeon 0000:01:00.0: irq 44 for MSI/MSI-X
[    9.568961] radeon 0000:01:00.0: radeon: using MSI.
[    9.568998] [drm] radeon: irq initialized.
[    9.569002] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    9.579091] [drm] Loading CAICOS Microcode
[    9.568998] [drm] radeon: irq initialized.
[    9.569002] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    9.579091] [drm] Loading CAICOS Microcode
[   10.026510] radeon 0000:01:00.0: WB enabled
[   10.204837] [drm:r600_ring_test] *ERROR* radeon: ring test failed (scratch(0x8504)=0xCAFEDEAD)
[   10.204843] radeon 0000:01:00.0: disabling GPU acceleration
[   10.206115] radeon 0000:01:00.0: ffff8801a29c4400 unpin not necessary
[   10.206447] failed to evaluate ATIF got AE_BAD_PARAMETER
<connector info>
[   10.290516] [drm] radeon: power management initialized
[   10.376185] [drm] fb mappable at 0xD0141000
[   10.376187] [drm] vram apper at 0xD0000000
[   10.376189] [drm] size 9216000
[   10.376190] [drm] fb depth is 24
[   10.376191] [drm]    pitch is 7680
[   10.376309] fbcon: radeondrmfb (fb0) is primary device
[   10.376942] Console: switching to colour frame buffer device 240x75
[   10.376993] fb0: radeondrmfb frame buffer device
[   10.376994] drm: registered panic notifier
[   10.377001] [drm] Initialized radeon 2.10.0 20080528 for 0000:01:00.0 on minor 0
<calltrace >
[   10.609455] [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 2250 pages at 0x00000000
[   10.609800] radeon 0000:01:00.0: object_init failed for (9216000, 0x00000002)
[   10.609802] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (9216000, 2, 4096, -22)
<rinse,repeat>
Comment 5 Patrik Kullman 2011-09-06 03:20:02 UTC
I get the exact same behaviour, with pretty much the same setup.

64-bit Ubuntu Oneiric Beta1, 3.0 kernel, xorg-edgers PPA for latest drivers.
Comment 6 Patrik Kullman 2011-09-06 04:59:20 UTC
Created attachment 50935 [details]
dmesg
Comment 7 Patrik Kullman 2011-09-07 05:21:25 UTC
Created attachment 50959 [details]
dmesg with ttm dma pool 1.7 patchset

After discussion with Dave Airlie he concluded that the chipset (Intel G31 in my case) doesn't have DMAR / IOMMU and that systems with 4GB or more memory enabled SWIOTLB in those cases, which was the problem.

However, booting with mem=3G or with physical memory below 4GB did not solve the problem.

Disabling iommu with iommu=off only disabled the SATA-support without solving the Radeon-issue.

Booting with less than 4GB of memory and iommu=off did not disable SATA-support but did not solve the issue either.

I finally tried compiling the Oneiric kernel with the TTM DMA Pool 1.7 patchset from Konrad for Xen, but the ring test still failed. Reading the source I can conclude that the test writes "0xDEADBEEF" but reads "0xCAFEDEAD".

Not sure how to proceed.

Attaching dmesg from boot with TTM DMA Pool 1.7 patchset.
Comment 8 Alex Deucher 2011-09-07 23:07:10 UTC
Does this patch:
http://lists.freedesktop.org/archives/dri-devel/2011-August/013591.html
adapted for evergreen/ni chips help?
Comment 9 Patrik Kullman 2011-09-08 01:09:05 UTC
I'm not sure how to adapt it for evergreen/ni.

If you'd send me such a patch I'd be happy to try it out and report back.
Comment 10 Timo Aaltonen 2011-09-08 03:06:17 UTC
I tried this:

diff --git a/drivers/gpu/drm/radeon/evergreen.c
b/drivers/gpu/drm/radeon/evergreen.c
index 15bd047..1630423 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -1404,6 +1404,7 @@ int evergreen_cp_resume(struct radeon_device *rdev)

        rdev->cp.rptr = RREG32(CP_RB_RPTR);
        rdev->cp.wptr = RREG32(CP_RB_WPTR);
+       rdev->cp.wptr &= rdev->cp.ptr_mask;

        evergreen_cp_start(rdev);
        rdev->cp.ready = true;

but it just hung the machine when X had started (before it got to write any
logs).
Comment 11 Michel Dänzer 2011-09-08 03:21:53 UTC
(In reply to comment #10)
> I tried this:
> 
> diff --git a/drivers/gpu/drm/radeon/evergreen.c
> b/drivers/gpu/drm/radeon/evergreen.c
> index 15bd047..1630423 100644
> --- a/drivers/gpu/drm/radeon/evergreen.c
> +++ b/drivers/gpu/drm/radeon/evergreen.c
> @@ -1404,6 +1404,7 @@ int evergreen_cp_resume(struct radeon_device *rdev)
> 
>         rdev->cp.rptr = RREG32(CP_RB_RPTR);
>         rdev->cp.wptr = RREG32(CP_RB_WPTR);
> +       rdev->cp.wptr &= rdev->cp.ptr_mask;
> 
>         evergreen_cp_start(rdev);
>         rdev->cp.ready = true;
> 
> but it just hung the machine when X had started (before it got to write any
> logs).

Sounds like the change did make a difference though... can you attach dmesg for that?
Comment 12 Michel Dänzer 2011-09-08 03:32:59 UTC
Created attachment 50969 [details] [review]
Don't read from CP ring write pointer registers

This patch goes further and removes any reliance on reads from those registers. Does it help?
Comment 13 Timo Aaltonen 2011-09-08 05:53:58 UTC
Created attachment 50974 [details]
dmesg with the patched kernel

unfortunately no, VT's still look corrupted, and starting X hangs the machine.
Comment 14 Dave Airlie 2011-09-30 12:14:21 UTC
Created attachment 51806 [details] [review]
workaround debugged on my caicos

please test this patch and tell me if it works for you.

If it does AMD will have to debug things from here, I just got this using old fashion if 0 bisection.
Comment 15 Patrik Kullman 2011-10-02 02:06:34 UTC
This patch, applied to latest Ubuntu Oneiric kernel, works perfectly!

Performance blows fglrx out of the water and the display management is way better than the AMD CCCLE.

Finally a usable desktop, thank you so much!
Comment 16 Patrik Kullman 2011-10-02 02:20:16 UTC
I have some strange issue with XVideo, though.

I started playing a video with Totem and the colors were off and way too dark.
Then I tried playing it with VLC and mplayer with the same result.
But then I tried to play the video with the x11 output (mplayer -vo x11) which showed the correct colors, and the next try with mplayer -vo xv also had the right colors.

This can be better compared in gstreamer-properties, where the test-stream for x11 looks like it should, but for xv, everything white or black is black and the rest of the colors are too dark.

I had similar issues with XVideo on fglrx, where XVideo could turn completely black from time to time.
Comment 17 Alex Deucher 2011-10-02 05:54:22 UTC
(In reply to comment #16)
> I have some strange issue with XVideo, though.

Make sure your Xv attributes aren't adjusted strangely.  Some apps set strange default values.
Comment 18 Alex Deucher 2011-10-04 06:57:20 UTC
Created attachment 51957 [details] [review]
channel remap fix

This patch should fix the issue.
Comment 19 Florian Mickler 2011-10-06 05:49:27 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc9:

commit 12d5180bd7e683a4ae80830b82ba67e7b7fac7b2
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Tue Oct 4 10:46:34 2011 -0400

    drm/radeon/kms: fix channel_remap setup (v2)
Comment 20 Alex Deucher 2011-10-30 12:13:22 UTC
*** Bug 42373 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.