43191 – Radeons needs 2D (MACRO) color tiling for optimal performance

Bug 43191 - Radeons needs 2D (MACRO) color tiling for optimal performance

Summary: Radeons needs 2D (MACRO) color tiling for optimal performance

Status:	RESOLVED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/Radeon (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium enhancement
Assignee:	xf86-video-ati maintainers
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-11-23 02:53 UTC by Simon Farnsworth
Modified:	2012-04-04 06:25 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
Patch to enable color tiling on Evergreen. (7.05 KB, patch) 2011-11-23 02:53 UTC, Simon Farnsworth	no flags	Details \| Splinter Review
A program to draw red/green rectangles using raw X11. Compile as gcc -o rectangle rectangle.c -lX11 (1.93 KB, text/plain) 2011-11-23 11:07 UTC, Simon Farnsworth	no flags	Details
possible fix (4.83 KB, patch) 2011-11-23 15:33 UTC, Alex Deucher	no flags	Details \| Splinter Review
possible fix (10.21 KB, patch) 2011-11-23 15:34 UTC, Alex Deucher	no flags	Details \| Splinter Review
possible fix (10.17 KB, patch) 2011-11-23 15:37 UTC, Alex Deucher	no flags	Details \| Splinter Review
avoid infinite loops in pageflip code (5.01 KB, patch) 2011-11-28 09:35 UTC, Alex Deucher	no flags	Details \| Splinter Review
CS dumping (from the kernel) for the lockup case (49.29 KB, text/plain) 2011-11-28 10:44 UTC, Simon Farnsworth	no flags	Details
Program to demonstrate misrendering issue (928 bytes, text/plain) 2011-12-01 09:50 UTC, Simon Farnsworth	no flags	Details
Kernel patch setting tile shape (5.72 KB, patch) 2011-12-13 07:18 UTC, Simon Farnsworth	no flags	Details \| Splinter Review
DDX patch sorting out alignment (4.67 KB, patch) 2011-12-13 07:22 UTC, Simon Farnsworth	no flags	Details \| Splinter Review
Mesa patch sorting out alignment (6.79 KB, patch) 2011-12-13 07:52 UTC, Simon Farnsworth	no flags	Details \| Splinter Review
Show Obsolete (1) View All

Description Simon Farnsworth 2011-11-23 02:53:07 UTC

Created attachment 53797 [details] [review]
Patch to enable color tiling on Evergreen.

I'm trying to get AMD E-350 graphics performance up to the level I experience on Intel Pineview (open source drivers in both cases), and I hit performance issues with blits.

On #radeon IRC, rak_adam confirmed via compute shaders that I should be able to achieve the blit speed I achieve on Intel (he measured around 7 GBit/s, I need around 4GBit/s).

At agd5f's suggestion, I've enabled Mesa-side tiling with the R600_TILING environment variable. I've used the attached patch (written by Alex, but completely untested by him) to get the DDX to do tiling as well, but this results in corrupted scanout.

I'm using Linux kernel 3.1.0. I'm going to dig into this myself, and see if I can find a fix.

Comment 1 Alex Deucher 2011-11-23 06:29:56 UTC

All radeons can benefit from 2D tiling.

Comment 2 Alex Deucher 2011-11-23 06:30:11 UTC

All radeons can benefit from 2D tiling.

Comment 3 Simon Farnsworth 2011-11-23 11:07:44 UTC

Created attachment 53817 [details]
A program to draw red/green rectangles using raw X11. Compile as gcc -o rectangle rectangle.c -lX11

I've started trying to analyze what's going on, using the attached program - my next step is going to be working out how to get at the scanout buffer directly.

I'm running this program against an otherwise unused (no other clients) X server, started as:

Xorg :0 -noreset -nolisten tcp vt1

and set to low resolution with:

xrandr --output DisplayPort-0 --mode 640x480

In the meantime, I have the following unexpected result: When I run the attached program as "./rectangle 16 1 1 1", which should give me one red pixel above one green pixel, 16 pixels from the left edge of the screen, I see 4 pixels light up. It *looks* like I get the 2 pixels I expect, plus a pair at 8 pixels in, 16 pixels down.

I have also determined that I see the same unexpected pair light up if I run it as "./rectangle 0 17 1 1", requesting 0 pixels offset, 17 pixels wide, and 1 pixel height of each colour. Final oddity is that "./rectangle 0 16 2 1" gives me two lines of 16 wide red pixels, no green, while "./rectangle 0 16 1 2" gives me one line 16 wide red, one line 16 wide green.

Comment 4 Alex Deucher 2011-11-23 11:47:12 UTC

You'll need to play with the tiling attributes (TILE_SPLIT, MACRO_TILE_ASPECT, BANK_WIDTH, BANK_HEIGHT, NUM_BANKS) in in CB_COLORn_ATTRIB and SQ_TEX_RESOURCE_WORD6_0, SQ_TEX_RESOURCE_WORD7_0, in the ddx and mesa and GRPH_CONTROL in the kernel.

Comment 5 Alex Deucher 2011-11-23 15:33:38 UTC

Created attachment 53820 [details] [review]
possible fix

The two attached drm patches may fix the issue, or at least lay the groundwork for a proper fix.

Comment 6 Alex Deucher 2011-11-23 15:34:05 UTC

Created attachment 53821 [details] [review]
possible fix

second patch.

Comment 7 Alex Deucher 2011-11-23 15:37:51 UTC

Created attachment 53822 [details] [review]
possible fix

Better second patch.

Comment 8 Simon Farnsworth 2011-11-24 07:32:12 UTC

With the two attached patches from Alex and the DDX patch, plain X11 rendering works fine. I have some rendering issues in OpenGL to track down, though (a rectangle at the bottom right of the screen swaps with a rectangle at the top left, whether I'm page-flipping or not.

Performance is much improved, however.

Comment 9 Simon Farnsworth 2011-11-24 08:37:24 UTC

More checking shows that I only see the wraparound flicker at 1920x1200, not at 1920x1080.

Comment 10 Simon Farnsworth 2011-11-24 09:45:44 UTC

Serial console tells me that my random crashes at 1920x1200 are CP stalls:

[  101.638081] radeon 0000:00:01.0: GPU lockup CP stall for more than 10000msec
[  101.659447] GPU lockup (waiting for 0x0000000E last fence id 0x00000006)
[  101.681108] radeon 0000:00:01.0: GPU softreset 
[  101.694826] radeon 0000:00:01.0:   GRBM_STATUS=0xF5702828
[  101.711170] radeon 0000:00:01.0:   GRBM_STATUS_SE0=0xFC000005
[  101.728566] radeon 0000:00:01.0:   GRBM_STATUS_SE1=0x00000007
[  101.745962] radeon 0000:00:01.0:   SRBM_STATUS=0x20000040
[  101.762316] radeon 0000:00:01.0:   GRBM_SOFT_RESET=0x00007F6B
[  101.779824] radeon 0000:00:01.0:   GRBM_STATUS=0x00003828
[  101.796166] radeon 0000:00:01.0:   GRBM_STATUS_SE0=0x00000007
[  101.813568] radeon 0000:00:01.0:   GRBM_STATUS_SE1=0x00000007
[  101.830964] radeon 0000:00:01.0:   SRBM_STATUS=0x20000040
[  101.848324] radeon 0000:00:01.0: GPU reset succeed
[  101.872392] radeon 0000:00:01.0: WB enabled
[  101.901741] [drm] ring test succeeded in 0 usecs
[  101.915784] [drm] ib test succeeded in 1 usecs

Comment 11 Simon Farnsworth 2011-11-24 09:54:54 UTC

Aha - they're followed by CPU hard lockups, which is likely to be the actual crash.

[  123.549279] ------------[ cut here ]------------
[  123.549279] WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x9e/0xa8()
[  123.549279] Hardware name: D3003-S2
[  123.549279] Watchdog detected hard LOCKUP on cpu 1
[  123.549279] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables sch5627 sch56xx_common saa7134_alsa tda10048 saa7134_dvb videobuf_dvb dvb_core tda18271 tda8290 tuner snd_hda_codec_realtek ir_lirc_codec lirc_dev snd_hda_intel snd_hda_codec snd_hwdep ir_mce_kbd_decoder snd_seq snd_seq_device snd_pcm ir_sony_decoder rc_hauppauge ir_jvc_decoder ir_rc6_decoder snd_timer snd soundcore ir_rc5_decoder ir_nec_decoder saa7134 rc_core videobuf_dma_sg videobuf_core v4l2_common videodev media tveeprom r8169 sp5100_tco tpm_infineon serio_raw snd_page_alloc mii k10temp microcode i2c_piix4 radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[  123.549279] Pid: 713, comm: rs:main Q:Reg Not tainted 3.1.0-91.fc15.i686.PAE #1
[  123.549279] Call Trace:
[  123.549279]  [<c0447969>] warn_slowpath_common+0x7c/0x91
[  123.549279]  [<c049532a>] ? watchdog_overflow_callback+0x9e/0xa8
[  123.549279]  [<c049532a>] ? watchdog_overflow_callback+0x9e/0xa8
[  123.549279]  [<c049528c>] ? touch_nmi_watchdog+0x57/0x57
[  123.549279]  [<c0447a09>] warn_slowpath_fmt+0x33/0x35
[  123.549279]  [<c049532a>] watchdog_overflow_callback+0x9e/0xa8
[  123.549279]  [<c04b7e0a>] __perf_event_overflow+0x14c/0x1bd
[  123.549279]  [<c0418350>] ? x86_perf_event_set_period+0x190/0x19b
[  123.549279]  [<c04b83b3>] perf_event_overflow+0x15/0x17
[  123.549279]  [<c0419055>] x86_pmu_handle_irq+0xbb/0xed
[  123.549279]  [<c0412167>] ? read_tsc+0x9/0x26
[  123.549279]  [<c05d990a>] ? div_s64_rem+0x3a/0x4b
[  123.549279]  [<c044bc04>] ? ns_to_timespec+0x27/0x3d
[  123.549279]  [<c044bc2e>] ? ns_to_timeval+0x14/0x25
[  123.549279]  [<c082844c>] perf_event_nmi_handler+0x3a/0x7c
[  123.549279]  [<c0829ab5>] notifier_call_chain+0x2b/0x4d
[  123.549279]  [<c0829b1f>] atomic_notifier_call_chain+0x22/0x24
[  123.549279]  [<c0829b4e>] notify_die+0x2d/0x2f
[  123.549279]  [<c0827a82>] do_nmi+0x6e/0x280
[  123.549279]  [<c08276a4>] nmi_stack_correct+0x2f/0x34
[  123.549279]  [<f80211d6>] ? r100_mm_rreg+0x19/0x36 [radeon]
[  123.549279]  [<f8021bd3>] evergreen_page_flip+0xbe/0x11f [radeon]
[  123.549279]  [<f7ea1afa>] ? drm_handle_vblank+0x19b/0x1a5 [drm]
[  123.549279]  [<c0435e09>] ? __wake_up+0x40/0x47
[  123.549279]  [<f7ff9056>] radeon_crtc_handle_flip+0x7c/0x19a [radeon]
[  123.549279]  [<c0435e09>] ? __wake_up+0x40/0x47
[  123.549279]  [<f8025132>] evergreen_irq_process+0x1bb/0x7f2 [radeon]
[  123.549279]  [<f7ffd85a>] radeon_driver_irq_handler_kms+0x17/0x19 [radeon]
[  123.549279]  [<c04959e5>] handle_irq_event_percpu+0x45/0x184
[  123.549279]  [<c042478a>] ? __io_apic_modify_irq+0x52/0x59
[  123.549279]  [<c0497791>] ? handle_fasteoi_irq+0x81/0x81
[  123.549279]  [<c0495b4d>] handle_irq_event+0x29/0x40
[  123.549279]  [<c0497791>] ? handle_fasteoi_irq+0x81/0x81
[  123.549279]  [<c0497818>] handle_edge_irq+0x87/0xa1
[  123.549279]  <IRQ>  [<c040eed4>] ? do_IRQ+0x3c/0x92
[  123.549279]  [<c04f85c4>] ? sys_write+0x5a/0x63
[  123.549279]  [<c082ca70>] ? common_interrupt+0x30/0x38
[  123.549279] ---[ end trace 5858d793ddb85566 ]---

Comment 12 Simon Farnsworth 2011-11-28 06:00:43 UTC

Applying some debugger knowledge to the stack traces tells me that I'm stalled in evergreen_page_flip, in the infinite loop commented as "Wait for update_pending to go high".

It looks like the chip never asserts EVERGREEN_GRPH_SURFACE_UPDATE_PENDING, which results in that loop becoming infinite. I then lose my machine, because RCU is waiting for the CPU that's looping to become idle.

Comment 13 Simon Farnsworth 2011-11-28 06:44:35 UTC

Looks like the pageflip hang is a red herring; I changed the existing DRM_DEBUG in evergreen_page_flip to DRM_ERROR, and added a new one just before the loop to tell me what crtc_base should be once the flip has completed. I get:


[   79.049935] [drm:evergreen_page_flip] *ERROR* Updating to 0x24d0000
[   79.052905] [drm:evergreen_page_flip] *ERROR* Update pending now high. Unlocking vupdate_lock.
[   99.889075] radeon 0000:00:01.0: GPU lockup CP stall for more than 10000msec
[   99.896241] GPU lockup (waiting for 0x0000000F last fence id 0x00000007)
[   99.904379] radeon 0000:00:01.0: GPU softreset
[   99.908977] radeon 0000:00:01.0:   GRBM_STATUS=0xF5702828
[   99.914447] radeon 0000:00:01.0:   GRBM_STATUS_SE0=0xFC000005
[   99.920264] radeon 0000:00:01.0:   GRBM_STATUS_SE1=0x00000007
[   99.926083] radeon 0000:00:01.0:   SRBM_STATUS=0x20000040
[   99.931558] radeon 0000:00:01.0:   GRBM_SOFT_RESET=0x00007F6B
[   99.937479] radeon 0000:00:01.0:   GRBM_STATUS=0x00003828
[   99.942953] radeon 0000:00:01.0:   GRBM_STATUS_SE0=0x00000007
[   99.948771] radeon 0000:00:01.0:   GRBM_STATUS_SE1=0x00000007
[   99.954593] radeon 0000:00:01.0:   SRBM_STATUS=0x20000040
[   99.961065] radeon 0000:00:01.0: GPU reset succeed
[   99.984272] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   99.991375] radeon 0000:00:01.0: WB enabled
[   99.995897] [drm:evergreen_page_flip] *ERROR* Updating to 0x1b6a000
[  100.012243] [drm] ring test succeeded in 0 usecs
[  100.016950] [drm] ib test succeeded in 1 usecs

It looks probable that the page flip is half-before and half-after the GPU reset, causing the lockup.

Comment 14 Simon Farnsworth 2011-11-28 09:02:14 UTC

I've spent the day digging to understand this, so I might be completely and utterly wrong, but, if I understand correctly, I've got the following:

drivers/gpu/drm/radeon/evergreen_cs.c validates command streams coming from userspace, aiming to reject ones that will obviously lock up the GPU.

Something in my userspace is sending a command stream that triggers a lockup, and the validator isn't rejecting it. I need to investigate the validator, and make sure it's OK.

Comment 15 Alex Deucher 2011-11-28 09:07:42 UTC

(In reply to comment #14)
> I've spent the day digging to understand this, so I might be completely and
> utterly wrong, but, if I understand correctly, I've got the following:
> 
> drivers/gpu/drm/radeon/evergreen_cs.c validates command streams coming from
> userspace, aiming to reject ones that will obviously lock up the GPU.
> 
> Something in my userspace is sending a command stream that triggers a lockup,
> and the validator isn't rejecting it. I need to investigate the validator, and
> make sure it's OK.

The CS checker is mostly there to patch the GPU physical addresses into the command stream.  The userspace drivers only get handles.  There is some basic sanity checking, but it's still possible to lock up the GPU.  Your best bet is to identify a test case that can easily reproduce the lock up and then examine the register state generated in that case to try and narrow down what's causing the hang.

Comment 16 Simon Farnsworth 2011-11-28 09:26:32 UTC

From advice on IRC, I'm uncommenting the CS dumping code in evergreen_cs.c, and I'm going to examine the CS dumps just before the failure.

Comment 17 Alex Deucher 2011-11-28 09:35:53 UTC

Created attachment 53906 [details] [review]
avoid infinite loops in pageflip code

This patch should avoid the infinite loops in the pageflip code.

Comment 18 Simon Farnsworth 2011-11-28 10:44:16 UTC

Created attachment 53912 [details]
CS dumping (from the kernel) for the lockup case

I'm about to go home for the day, and will come back to this tomorrow.

I've patched r600_fence_ring_emit to print fence->seq every time it emits a fence, and uncommented the CS dumper in evergreen_cs.c. I'm assuming that the CP is ordered w.r.t. fences, so that it's the 487 instructions after fence 0x00000016 that I need to interpret to determine what's caused the CP to lock up.

My test program, running against Mesa master as of commit c5012c1d56dfbf11cd631b3b37890b40d56ac884 is very simple:


#include <stdio.h>
#include <stdlib.h>
#include <GL/glut.h>

static void draw( void )
{
    static float color = 1.0f;

    glClearColor( color, color, color, 1.0 );
    glClear( GL_COLOR_BUFFER_BIT );

    color = 1.0f - color;
    glutSwapBuffers();
    glutPostRedisplay();
}

int main( int argc, char *argv[] )
{
    glutInit( &argc, argv );

    glutInitDisplayMode( GLUT_RGB | GLUT_DOUBLE );
    glutCreateWindow( "Flasher" );
    glutFullScreen();
    glutDisplayFunc( draw );

    glutMainLoop();
    return 0;
}

I'm running it with no command line arguments - some runs it works, others it locks up.

Comment 19 Jerome Glisse 2011-11-28 16:18:20 UTC

What you need to print is what is the last fence the hw signaled each time you emit a fence. Once you know that you know which cs is guilty, then it could be any instruction inside the cs. Thought there are some additional reg to get a clue on which dword inside ib is the last one cp parsed, which not necessarily means its the one causing trouble given how deep is the whole pipeline.

so if you have:
emit fence 17, last fence 13
emit fence 18, last fence 16
emit fence 19, last fence 17
emit fence 20, last fence 17
...          , last fence 17

It means that it's the cs after fence 17 is emited that is guilty.

Note that lockup is not due by a packet but by the outcome of the packet. For rendering it could be the vertex are leading to degenerate case or the memory is invalid or the shader program is wrong or there is some cache issue that lead the gpu to use wrong/invalida data/shader program ...

Comment 20 Simon Farnsworth 2011-11-29 06:27:37 UTC

I cannot reproduce the lockup now that I've added the "avoid infinite loops in pageflip code" patch; I'm guessing that the udelay(1) in there keeps the chip happy.

I have, however, noticed that I've got rendering to the back buffer partially overwriting the front buffer. If I put code in radeon_dri2.c to force buffers obtained via the DDX to be properly aligned, I see Mesa asking for buffers of size 300x300, but proper alignment being 512x304 for this hardware.

I think this implies that Mesa isn't coping with tiling properly, and am going to dive into that code base.

Comment 21 Simon Farnsworth 2011-12-01 09:50:56 UTC

Created attachment 54028 [details]
Program to demonstrate misrendering issue

Thanks to Jerome's hint, I took some time to try and spot misrendering, as I'm struggling to see other problems. The attached program shows that I'm getting bascially correct rendering, but each time I draw to the backbuffer, something else gets partly overwritten - the front buffer if I haven't attempted a modeset  after running something that uses OpenGL, something unidentified otherwise.

At 2560x1440, the rendering to the back buffer partially overwrites the front buffer - I see a block at the bottom right (approximately - judged by tape measure - 1280 pixels wide by 45 high) being overwritten.

At 1920x1200, the block at the bottom right is around 400 pixels wide, 45 high.

At 1280x800, the block is still around 45 high, but is now around 640 pixels wide.

If I modeset via xrandr after running a test, the next run renders perfectly. If I repeat without modesetting, I'm fine. When a run has rendered perfectly, 

I should also note that I don't see issues with plymouth, or when rendering via X11, implying that my remaining problem with tiled scanout is getting Mesa to work properly.

Comment 22 Simon Farnsworth 2011-12-01 10:01:28 UTC

A bit of checking by drawing rectangles in different places shows that it's the top-left of the backbuffer that overwrites the front buffer.

Comment 23 Simon Farnsworth 2011-12-05 10:32:59 UTC

I had the kernel print the size and gpu_offset of each BO assigned to CB_COLOR0_BASE:

$ git diff
diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c
index 5e00d16..0794987 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -84,6 +84,7 @@ u32 evergreen_page_flip(struct radeon_device *rdev, int crtc_id, u64 crtc_base)
        u32 tmp = RREG32(EVERGREEN_GRPH_UPDATE + radeon_crtc->crtc_offset);
        int i;
 
+        printk( KERN_INFO "Setting CRTC %d to 0x%llx\n", crtc_id, crtc_base);
        /* Lock the graphics update lock */
        tmp |= EVERGREEN_GRPH_UPDATE_LOCK;
        WREG32(EVERGREEN_GRPH_UPDATE + radeon_crtc->crtc_offset, tmp);
diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c
index b53d1c6..8cd9446 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -833,6 +833,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx)
                tmp = (reg - CB_COLOR0_BASE) / 0x3c;
                track->cb_color_bo_offset[tmp] = radeon_get_ib_value(p, idx);
                ib[idx] += (u32)((reloc->lobj.gpu_offset >> 8) & 0xffffffff);
+                printk(KERN_INFO "Setting COLOR_BASE register 0x%x to 0x%x size 0x%x\n", reg, ib[idx], radeon_bo_size(reloc->robj));
                track->cb_color_base_last[tmp] = ib[idx];
                track->cb_color_bo[tmp] = reloc->robj;
                break;

This showed me that when the back buffer drawing appears in the front buffer, my front buffer is immediately before the back buffer in GPU memory:

[   81.319704] Setting COLOR_BASE register 0x28c60 to 0x21f00 size 0xe10000
[   82.338498] Setting CRTC 0 to 21f0000
[   84.337323] Setting COLOR_BASE register 0x28c60 to 0x13e00 size 0xe10000
[   85.363962] Setting CRTC 0 to 13e0000
[   87.354612] Setting COLOR_BASE register 0x28c60 to 0x21f00 size 0xe10000
[   88.372710] Setting CRTC 0 to 21f0000
[   90.371962] Setting COLOR_BASE register 0x28c60 to 0x13e00 size 0xe10000
[   91.398175] Setting CRTC 0 to 13e0000
[   93.389461] Setting COLOR_BASE register 0x28c60 to 0x21f00 size 0xe10000
[   94.406925] Setting CRTC 0 to 21f0000
[   95.444609] Setting COLOR_BASE register 0x28c60 to 0x21f00 size 0xe10000

I therefore hacked around with the DDX, and made drmmode_get_base_align always return 0x100000 for 2D tiled surfaces. This appears to fix the misrendering and the crash bugs, with the following output from the kernel:


[  118.863087] Setting COLOR_BASE register 0x28c60 to 0x23000 size 0xe10000
[  119.886856] Setting CRTC 0 to 2300000
[  121.880499] Setting COLOR_BASE register 0x28c60 to 0x14800 size 0xe10000
[  122.895613] Setting CRTC 0 to 1480000
[  124.897752] Setting COLOR_BASE register 0x28c60 to 0x23000 size 0xe10000
[  125.921075] Setting CRTC 0 to 2300000
[  127.914984] Setting COLOR_BASE register 0x28c60 to 0x14800 size 0xe10000
[  128.946542] Setting CRTC 0 to 1480000
[  130.932377] Setting COLOR_BASE register 0x28c60 to 0x23000 size 0xe10000
[  131.955295] Setting CRTC 0 to 2300000
[  132.987772] Setting COLOR_BASE register 0x28c60 to 0x23000 size 0xe10000

I now have no idea what's going on - visually, it looks like the rendering engine writes to parts of the color buffer *before* the gpu_offset set in CB_COLOR0_BASE.

Comment 24 Michel Dänzer 2011-12-06 08:17:51 UTC

(In reply to comment #23)
> I now have no idea what's going on - visually, it looks like the rendering
> engine writes to parts of the color buffer *before* the gpu_offset set in
> CB_COLOR0_BASE.

That makes sense if the CB_COLOR0_BASE address isn't properly aligned: the unaligned least significant bits will be ignored.

Comment 25 Simon Farnsworth 2011-12-07 06:49:01 UTC

Alex Deucher has given me data on Evergreen tiling, which has allowed me to get to the bottom of this:

Evergreen doesn't have one fixed 2D tiling pattern; it has configurable per-surface 2D tiling. There are restrictions on which tiling options work together; the DDX and the kernel currently ignore those.

Doing the math, the worst case alignment required for 2D tiling is 2 megabyte alignment on my hardware. The settings used for the scanout buffer are marked in Alex's information as illegal - this will be why I need excessive alignment to make it appear to function.

It looks like fixing this is going to require changes across the stack - rather than simply attaching patches to this bug, I will send them to appropriate mailing lists. (xorg-driver-ati for DDX patches, dri-devel for the kernel, mesa-dev for Mesa).

Comment 26 Simon Farnsworth 2011-12-13 07:18:25 UTC

Created attachment 54396 [details] [review]
Kernel patch setting tile shape

This is the patch I sent to dri-devel, setting the kernel's tile shape to more reasonable values.

Comment 27 Simon Farnsworth 2011-12-13 07:22:40 UTC

Created attachment 54397 [details] [review]
DDX patch sorting out alignment

The DDX patch I sent to xorg-driver-ati, making alignment match docs from Alex.

Comment 28 Simon Farnsworth 2011-12-13 07:52:17 UTC

Created attachment 54398 [details] [review]
Mesa patch sorting out alignment

And the Mesa patch, for future reference.

Comment 29 Florian Mickler 2012-01-12 14:14:55 UTC

A patch referencing this bug report has been merged in Linux v3.2-rc5:

commit f64964796dedca340608fb1075ab6baad5625851
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Nov 28 14:49:26 2011 -0500

    drm/radeon/kms: add some loop timeouts in pageflip code

Comment 30 Michal Suchanek 2012-01-18 05:06:42 UTC

That's not really much related, that fixes an issue that you might hit when the tiling is wrong and the gpu locks up.

The current patches at http://people.freedesktop.org/~glisse/tiling/ don't lock up for me, they just cause tiled rendering of some stuff (kernel v3, libdrm v4 mesa v3, ddx v5).

Comment 31 Florian Mickler 2012-01-21 08:47:57 UTC

A patch referencing this bug report has been merged in Linux v3.2-rc5:

commit 392e37229f0d6358dcc7b43641df776e9f62a6e6
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Nov 28 14:49:27 2011 -0500

    drm/radeon/kms: fix scanout of 2D tiled buffers on EG/CM

Comment 32 Florian Mickler 2012-01-21 08:50:33 UTC

A patch referencing this bug report has been merged in Linux v3.2-rc5:

commit f3a71df05082c84d1408129084736c5f742a6165
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Nov 28 14:49:28 2011 -0500

    drm/radeon/kms: fix 2D tiling CS support on EG/CM

Comment 33 Alex Deucher 2012-04-04 06:25:23 UTC

2D tiling support is upstream now.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.