Bug 28800 - [r300c, r300g] Texture corruption with World of Warcraft
[r300c, r300g] Texture corruption with World of Warcraft
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r300
git
x86 (IA32) Linux (All)
: medium normal
Assigned To: Default DRI bug account
:
: 28993 30960 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-28 12:56 UTC by Chris Rankin
Modified: 2010-12-19 13:37 UTC (History)
3 users (show)

See Also:


Attachments
Screenshot of corrupt textures (549.19 KB, image/jpeg)
2010-06-28 12:56 UTC, Chris Rankin
Details
Similarly broken textures with r300g (513.42 KB, image/jpeg)
2010-06-30 15:34 UTC, Chris Rankin
Details
Corruption from first angle (407.61 KB, image/jpeg)
2010-07-12 14:08 UTC, Chris Rankin
Details
Corruption from the second angle (429.51 KB, image/jpeg)
2010-07-12 14:10 UTC, Chris Rankin
Details
Corruption from a third angle (428.23 KB, image/jpeg)
2010-07-12 14:12 UTC, Chris Rankin
Details
possible workaround (1.09 KB, patch)
2010-08-01 18:19 UTC, Marek Olšák
Details | Splinter Review
r300g_multitexturing.diff (1.22 KB, patch)
2010-12-16 06:12 UTC, almos
Details | Splinter Review
r300g_multitexturing_v2.diff (2.29 KB, patch)
2010-12-16 15:18 UTC, almos
Details | Splinter Review
possible fix (2.35 KB, patch)
2010-12-17 00:11 UTC, Marek Olšák
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Rankin 2010-06-28 12:56:36 UTC
Created attachment 36583 [details]
Screenshot of corrupt textures

A large number of textures are rendered incorrectly in World of Warcraft. (The game appears otherwise stable and moderately playable, although the frame rate isn't particularly high.)

This is with an AGP RV350, running Linux 2.6.33.5, xorg-xf86-drv-ati from git and Fedora 13.

My theory is that this is related to bug 28033, which has been fixed for r300g. However it is impossible to confirm this due to WoW crashing with r300g before I can enter the game. (See bug 28630.)
Comment 1 Chris Rankin 2010-06-30 15:34:47 UTC
Created attachment 36647 [details]
Similarly broken textures with r300g

Now that I can run Warcraft under Gallium, I can see that some of the textures are broken in the same way as with r300c. However, the breakage doesn't seem so extreme with r300g as it does with r300c.
Comment 2 Chris Rankin 2010-07-06 12:33:59 UTC
I tried running WoW with the RADEON_DEBUG="tex" option, and noticed that many textures were in formats:

dxt1_rgba
dxt3_rgba
dxt5_rgba

And so because I am also using libtxc_dxtn070518.tar.gz, I moved the libtxc_dxtn.so object out of /usr/lib and tried running WoW again. And this time, the textures were all fine... right up until WoW crashed a few seconds later.

From this I conclude that:
a) Both r300c and r300g have difficulty with compressed textures, and
b) r300g also has a more serious problem with b8g8r8a8_unorm, b4g4r4a4_unorm, b5g6r5_unorm textures (and possibly others), because these are what WoW uses when compressed texture support isn't available.
Comment 3 Marek Olšák 2010-07-06 14:33:08 UTC
Can you obtain a backtrace of the crash?
Comment 4 Chris Rankin 2010-07-06 14:40:14 UTC
(In reply to comment #3)
> Can you obtain a backtrace of the crash?

The crash seems to happen within WoW itself, which suggests to me that WoW *needs* S3TC textures to run correctly. I get the exact same crash when I remove libtxc_dxtn.so from r300c too.

r300: texture_create: Macro:  NO, Micro:  NO, Pitch: 128, Dim: 128x8x0, LastLevel: 0, Size: 4096, Format: b8g8r8a8_unorm
r300: texture_create: Macro:  NO, Micro:  NO, Pitch: 8, Dim: 8x64x0, LastLevel: 0, Size: 2048, Format: b8g8r8a8_unorm
wine: Unhandled page fault on read access to 0x00000000 at address (nil) (thread 0009), starting debugger...
Unhandled exception: page fault on read access to 0x00000000 in 32-bit code (0x00000000).

Register dump:
 CS:0073 SS:007b DS:007b ES:007b FS:0033 GS:003b
 EIP:00000000 ESP:0196fd04 EBP:0196fd64 EFLAGS:00210246(  R- --  I  Z- -P- )
 EAX:000083f1 EBX:00008000 ECX:00000de1 EDX:0cf0ece0
 ESI:00000000 EDI:0c37c558
Stack dump:
0x0196fd04:  0069de43 00000de1 00000000 000083f1
0x0196fd14:  00000100 00000100 00000000 00008000
0x0196fd24:  0cf0ece0 00000100 0c37c558 00000100
0x0196fd34:  00000000 00000000 00000100 00000100
0x0196fd44:  0000813d 00000008 00000000 00000200
0x0196fd54:  0cf16ce0 00008000 0cf0ece0 000083f1
Backtrace:
=>0 0x00000000 (0x0196fd64)
  1 0x0069e078 in wow (+0x29e077) (0x0196fdbc)

Ouch.
Comment 5 Chris Rankin 2010-07-11 16:17:50 UTC
(In reply to comment #3)
> Can you obtain a backtrace of the crash?

With r300c, I can remove the libtxc_dtxn.so library but set a flag in my .drirc file instead:

<option name="force_s3tc_enable" value="true" />

This adds the GL_EXT_texture_compression_s3tc and GL_S3_s3tc extensions back, and is enough to stop WoW crashing with r300c. However, this <option/> does not work with r300g. Is there an equivalent option with a different name for r300g instead, please?

Note that the texture corruption is still present with r300c, even with the libtxc_dtxn.so library removed. However, it would be interesting to test r300g as well.
Comment 6 Marek Olšák 2010-07-11 17:33:51 UTC
There is nothing I can do about a crash in WoW and not in the driver, though I believe WoW without S3TC has not been tested by Blizzard QA because there are no Windows drivers that do not advertise S3TC.

I don't think there is an equivalent to force_s3tc_enable in Gallium, and I think the presence of libtxc_dxtn has nothing to do with the texture corruption issue.

Now let's get back on topic.

There are two issues regarding the corrupted textures on r3xx-r4xx.

1) Assigning texture cache regions. This was fixed in r300g some time ago.

2) Occasional texture corruption which seems to be triggered by some other state (e.g. a soldier entering the view). This is a bug in r300g, at least. It's a real mystery to me.
Comment 7 Chris Rankin 2010-07-12 01:15:18 UTC
(In reply to comment #6)
> There is nothing I can do about a crash in WoW and not in the driver, though I
> believe WoW without S3TC has not been tested by Blizzard QA because there are
> no Windows drivers that do not advertise S3TC.

The WoWWiki documents WoW as *needing* S3TC textures:
http://www.wowwiki.com/BLP_files

The crash is almost certainly to do with WoW using one of the S3TC extensions without checking whether the driver supports it first.

> I don't think there is an equivalent to force_s3tc_enable in Gallium,

Then I'll raise that as a separate bug, because WoW is going to need the S3TC extensions to be advertised, and some people might not be able to use the IP-encumbered libtxc_dxtn library.

> and I think the presence of libtxc_dxtn has nothing to do with the texture
> corruption issue.

I'm not so sure about that. The remaining texture corruption only happens in certain *locations*. It has nothing to do with NPCs appearing on screen. More interestingly, none of these locations is in "The Burning Crusade" expansion, where DXT5 compression was apparently used for the first time. (See link above.)

The final interesting observation was that when I uninstalled libtxc_dxtn and tried playing WoW with r300g, the expected texture corruption was not present for those few seconds before the game crashed. This is another reason why I am interested in a force_s3tc_enable for Gallium: to see whether the corruption reappears if I can avoid the crash. (Failing that, at least to eliminate a blob of unmaintained code as the cause of the problem.)
Comment 8 Chris Rankin 2010-07-12 14:08:46 UTC
Created attachment 36972 [details]
Corruption from first angle

Have I mentioned that this texture corruption varies with the player's "camera angle"? You'll notice that the corruption here has very linear boundaries, and square patterns within those boundaries. But those boundaries change as I rotate my character around.
Comment 9 Chris Rankin 2010-07-12 14:10:28 UTC
Created attachment 36973 [details]
Corruption from the second angle

After a quick turn to my right...
Comment 10 Chris Rankin 2010-07-12 14:12:18 UTC
Created attachment 36974 [details]
Corruption from a third angle

A further turn to the right, and "eyes down" a bit too.
Comment 11 Marek Olšák 2010-08-01 18:19:22 UTC
Created attachment 37512 [details] [review]
possible workaround

Could you try this patch?
Comment 12 Chris Rankin 2010-08-02 13:48:41 UTC
(In reply to comment #11)
> Could you try this patch?

Sorry, that made no difference.
Comment 13 Chris Rankin 2010-08-02 17:25:43 UTC
I have just upgraded this machine from 2.6.33.6 to 2.6.34.2 and have discovered that the texture corruption is now *massively* reduced. There are still artifacts here and there if I look closely, but the large blocks of mangled textures have gone!

Switching back to 2.6.33.6 makes the corruption reappear.

I'm stunned. I should probably also mention now that I am using KMS.
Comment 14 Marek Olšák 2010-08-05 16:39:34 UTC
*** Bug 28993 has been marked as a duplicate of this bug. ***
Comment 15 Marek Olšák 2010-08-05 16:44:32 UTC
I have tried everything in r300g. Flushing and invalidating all caches, flushing command stream after every draw operation and re-emitting all state, nothing helps.

Chris, considering what you say, I think this is a kernel issue, not an r300g issue.
Comment 16 Marek Olšák 2010-10-23 18:38:37 UTC
*** Bug 30960 has been marked as a duplicate of this bug. ***
Comment 17 Tomasz Czapiewski 2010-10-24 14:15:19 UTC
(In reply to comment #16)
> *** Bug 30960 has been marked as a duplicate of this bug. ***

I've made some comments in this duplicate... 
Should I comment later here or in my duplicate?

Considering that WoW is closed source and could not be easily debugged, Xonotic is open source game and it's developers might be helpful to give usefull information from game code, too.
And now I know that there, the problem appears only with lightmaps+s3tc textures and without lightmaps the s3tc textures work just fine there (Xonotic).
Comment 18 Chris Rankin 2010-11-29 15:25:54 UTC
(In reply to comment #15)
> Chris, considering what you say, I think this is a kernel issue, not an r300g
> issue.

I have a T60p with an internal M66 (~ RV530) chip, and have tested it against the exact same version of Mesa as my RV350. The RV350 still has this issue, but the M66 does not.

(Not a perfect test, mind. The M66 is using Linux 2.6.36.1 whereas the RV350 is still using 2.6.35.9, but both are running Fedora 14 userspace with xorg-drv-ati from git.)
Comment 19 almos 2010-12-03 13:27:04 UTC
I don't know if this information helps, but this corruption also appears on the water in google earth, if there is a semi-transparent icon over it (like the one for a shipwreck). If I set the view so that only one such icon is visible, and move the cursor over it, the icon becomes solid, and the glitch disappears. The viewport change indicator cursor (appearing while holding down the right or middle mouse button) also triggers this glitch, if the cursor is over an area, where water is drawn. Checking/unchecking the texture compression option doesn't change anything, maybe it's a no-op if s3tc is available...

To Chris: I believe it was already known that this glitch is only present with pre-r500 cards (maybe due to vram sizes), but now that I checked the other bugreports about it, I discovered that all reporters have rv350. Coincidence?
Comment 20 almos 2010-12-16 06:12:18 UTC
Created attachment 41170 [details] [review]
r300g_multitexturing.diff

I played around with the tmu texture cache region assignment code, read the corresponding part of the r3xx register manual, and deduced the following:
- you must assign different region for different textures
- assigned regions must not overlap
The existing code guarantees none of these, especially if there are partial updates (which are quite common). The texture corruption is the result of the tmu loading different textures into the same cache area (possibly in different formats), and only one of them won't be garbage. This only applies to r3xx and r4xx, because r5xx ignores this parameter and assigns cache automatically.

My patch doesn't guarantee anything either, is meant to be a proof-of-concept (or more like a hackload of hacky hacks), that almost completely fixes texturing in ut2004, vastly improves the situation in etqw (the corruption occurs only on strogg architecture and vehicles instead of almost everywhere), and changes nothing in googleearth. Doom3 would also be a good test, but it crashes since I increased GART from 64MB to 256MB.
Comment 21 Marek Olšák 2010-12-16 07:00:56 UTC
(In reply to comment #20)
> Created an attachment (id=41170) [details]
> r300g_multitexturing.diff
> 
> I played around with the tmu texture cache region assignment code, read the
> corresponding part of the r3xx register manual, and deduced the following:
> - you must assign different region for different textures
> - assigned regions must not overlap
> The existing code guarantees none of these, especially if there are partial
> updates (which are quite common). The texture corruption is the result of the
> tmu loading different textures into the same cache area (possibly in different
> formats), and only one of them won't be garbage.

Thanks a lot for looking into this, but let's discuss it a bit.

Could you give me a prove that the current code is wrong? At least one case where the assignment of texture cache regions is done wrong.

Currently it should assign the regions as follows:

1 texture:
R300_TX_CACHE_WHOLE

2 textures:
R300_TX_CACHE_HALF_0
R300_TX_CACHE_HALF_1

3 textures:
R300_TX_CACHE_HALF_1  
R300_TX_CACHE_FOURTH_0
R300_TX_CACHE_FOURTH_1

4 textures:
R300_TX_CACHE_FOURTH_0
R300_TX_CACHE_FOURTH_1
R300_TX_CACHE_FOURTH_2
R300_TX_CACHE_FOURTH_3

5 textures:
R300_TX_CACHE_FOURTH_1
R300_TX_CACHE_FOURTH_2
R300_TX_CACHE_FOURTH_3
R300_TX_CACHE_EIGHTH_0
R300_TX_CACHE_EIGHTH_1

6 textures:
R300_TX_CACHE_FOURTH_2
R300_TX_CACHE_FOURTH_3
R300_TX_CACHE_EIGHTH_0
R300_TX_CACHE_EIGHTH_1
R300_TX_CACHE_EIGHTH_2
R300_TX_CACHE_EIGHTH_3

7 textures:
R300_TX_CACHE_FOURTH_3
R300_TX_CACHE_EIGHTH_0
R300_TX_CACHE_EIGHTH_1
R300_TX_CACHE_EIGHTH_2
R300_TX_CACHE_EIGHTH_3
R300_TX_CACHE_EIGHTH_4
R300_TX_CACHE_EIGHTH_5

8 textures:
R300_TX_CACHE_EIGHTH_0
R300_TX_CACHE_EIGHTH_1
R300_TX_CACHE_EIGHTH_2
R300_TX_CACHE_EIGHTH_3
R300_TX_CACHE_EIGHTH_4
R300_TX_CACHE_EIGHTH_5
R300_TX_CACHE_EIGHTH_6
R300_TX_CACHE_EIGHTH_7

And so on. I can't see any overlapping regions. It always divides the whole texture cache between all the textures. The code assumes each region can be divided to two of half the size of the original one. E.g. the case with 2 textures is:

R300_TX_CACHE_HALF_0
R300_TX_CACHE_HALF_1

And when you add the 3rd texture, the first region will get divided evenly like this:

R300_TX_CACHE_FOURTH_0
R300_TX_CACHE_FOURTH_1
R300_TX_CACHE_HALF_1

Does it look wrong to you? Do you think that the hardware doesn't like so _tight_ configuration of the regions?
Comment 22 almos 2010-12-16 15:18:11 UTC
Created attachment 41192 [details] [review]
r300g_multitexturing_v2.diff

Yes, your calculations are OK. The problem is the
if (&state->sampler_views[i]->base != views[i]) {}
part, which leaves some samplers untouched, and their cache regions get reused sometimes.

My attached patch is IMHO significantly better than the previous one. It completely fixes texturing in etqw, but there are occasional glitches in ut2004 and the water in googleearth is still bad. See the TODO and XXX comments in the patch for more explanation.
Comment 23 Chris Rankin 2010-12-16 16:37:34 UTC
(In reply to comment #22)
> My attached patch is IMHO significantly better than the previous one.

The patch may be flawed, but it greatly improves WoW as well. I've just run around a couple of "texture hotspots" without noticing any problems, although the console log has been warning me of "trouble ahead" a fair bit as well.

I would say that the patch is definitely heading in the correct general direction. (Although it doesn't apply cleanly to git. Perhaps it was made against altered sources?)
Comment 24 Marek Olšák 2010-12-17 00:11:49 UTC
Created attachment 41199 [details] [review]
possible fix

(In reply to comment #22)
> Yes, your calculations are OK. The problem is the
> if (&state->sampler_views[i]->base != views[i]) {}
> part, which leaves some samplers untouched, and their cache regions get reused
> sometimes.

Well spotted! Could you guys please test this patch? I've just removed the conditional.
Comment 25 almos 2010-12-17 03:07:50 UTC
The 'possible fix' patch fixes everything, and is far superior to my attempts (somehow I thought that if is important). Please commit it.
Comment 26 Marek Olšák 2010-12-17 04:20:44 UTC
The 'if' was an optimization from the days before the texture cache partitioning was implemented.

Again, thanks a lot.

Committed to master and closing the bug.. (finally!)
Comment 27 Chris Rankin 2010-12-19 13:37:27 UTC
(In reply to comment #26)
> Committed to master and closing the bug.. (finally!)

Yes, this fixes Warcraft too. Thanks, everyone :-) !!