Created attachment 117462 [details]
Since my patches from a few months ago Wine can now use depth renderbuffers instead of textures for FBOs if GL_ARB_depth_texture is not supported. This makes FBOs work on r200, but it appears that there is some render corruption. The best way to describe it is that some parts of the depth buffer seem to be read or written to the wrong place inside the buffer.
The attached screenshots show the issue. The application I've run here is the StencilMirror.exe sample from the DirectX 8 SDK. in fbo.png you see incorrect draws at the edges of the window. nofbo.png uses the GLX drawable (GL_BACK) instead of rendering to an FBO, which doesn't show the corruption.
The corruption seems to be resolution dependent. I don't see it when I am running the application at 1400x1050 for example.
Created attachment 117463 [details]
Screenshot without FBOs
It seems that the corruption has something to do with FBO sizes where the width is a multiple of 16. width = 112 to 128 works OK. 129 to 143 is broken. 144 to 160 works, 161 to 175 is broken and so on.
On r200, IIRC, linear depth buffers are not supported (only tiled). I suspect the cases where it works are when the width aligns to a tile multiple.
As a quick hunch based on your observations and some quick look in the code, it seems depth buffers need to be 128 byte aligned on r200 (and 64 byte on r100 though that's just based on the tile/untile code), but radeon_alloc_renderbuffer_storage() only does it to 64 bytes: (pitch = ((cpp * width + 63) & ~63) / cpp;)
Could you try out if increasing that to 128 byte alignment there would help?
For the heck of it I can't figure out though from where the window depth/colorbuffers get their alignment, so I've no idea how that's calculated there...
Not sure right now on color buffers neither...
(In reply to Roland Scheidegger from comment #4)
> As a quick hunch based on your observations and some quick look in the code,
> it seems depth buffers need to be 128 byte aligned on r200 (and 64 byte on
> r100 though that's just based on the tile/untile code), but
> radeon_alloc_renderbuffer_storage() only does it to 64 bytes: (pitch = ((cpp
> * width + 63) & ~63) / cpp;)
> Could you try out if increasing that to 128 byte alignment there would help?
> For the heck of it I can't figure out though from where the window
> depth/colorbuffers get their alignment, so I've no idea how that's
> calculated there...
> Not sure right now on color buffers neither...
Actually if I see that right it looks like the xorg ati driver would align things to 64 pixels for non-tiled surfaces and 256 / cpp for tiled ones (so still 64 pixels for z24s8). That would be even twice of what I suggested above. May not really be required, though. The docs seem to suggest 8 pixels for color buffers both for r100 and r200 but tile sized if tiling is used (whatever the tile size is, seems to be small though, clearly this has to be a micro-tile). Depth buffer though says 8 pixels for r100 (or tile sized if tiling is enabled, though on some chips you can't even disable it) and 32 pixels for r200 (though given the tiling code I have some doubts that's enough for z16). In any case though I stick to my 128byte recommendation for r200 depth buffers...
FWIW, the (micro-)tile size is always 8x8. With macro-tiling (called 2D tiling with current GPUs) enabled, the pitch (and height, for calculating the memory allocation size) must usually be aligned to a macro-tile boundary. Not sure offhand how to calculate the macro-tile size on those old GPUs though.
(In reply to Michel Dänzer from comment #6)
> FWIW, the (micro-)tile size is always 8x8. With macro-tiling (called 2D
> tiling with current GPUs) enabled, the pitch (and height, for calculating
> the memory allocation size) must usually be aligned to a macro-tile
> boundary. Not sure offhand how to calculate the macro-tile size on those old
> GPUs though.
I'm not sure if micro/macro-tiling really apply to r200 depth tiled buffers, the pattern is quite different to color tiling.
Actually the docs say for rv200:
DEPTHOFFSET: "...128 bit aligned address. When tiling the offset must be tile-aligned (2KB)"
DEPTHPITCH:"Pitch is specified in multiples of 8 pixels. When tiling the pitch must be tile-aligned"
And for R200:
DEPTHOFFSET: "Z Buffer Offset must be aligned to 4KB"
DEPTHPITCH: "Pitch is specified in multiples of 32 pixels"
So, in r200, depthoffset wording takes into account that it is always tiled, however depth pitch doesn't mention it at all, making it sound like no specific alignment due to tiling would be required. Of course I don't know how true that really is, in particular for z16 it seems it cannot be true (because the formula in the driver for tiling only uses (pitch >> 7) both for z16 and z32).
Even if those 128bytes there would be sufficient you're probably quite right that we're also missing height alignment adjustment (if we'd use 128 bytes for width and 32 alignment for height that would give us the 4KB aligned blocks overall too).
Oh, and based on the doc wording clearly the depth pitch would be wrongly aligned on r100 too (even though based on the formula the driver uses (pitch >> 6) it looks like it could work).