Bug 65968 - Massive memory corruption in Planetary Annihilation Alpha
Summary: Massive memory corruption in Planetary Annihilation Alpha
Status: RESOLVED INVALID
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r300 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-20 12:55 UTC by Andreas Ringlstetter
Modified: 2017-02-14 17:50 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Example of corruption in PA. The skybox texture has been completely overwritten, partly with textures from other programms, corruption in other textures is already starting. (359.25 KB, image/jpeg)
2013-06-20 12:55 UTC, Andreas Ringlstetter
Details
Xorg log (47.29 KB, text/plain)
2013-06-20 12:56 UTC, Andreas Ringlstetter
Details
glxinfo log (46.79 KB, text/plain)
2013-06-20 12:56 UTC, Andreas Ringlstetter
Details
Example of corruption in PA. The skybox texture has been completely overwritten, partly with textures from other programms, corruption in other textures is already starting. (359.25 KB, image/jpeg)
2013-06-20 12:58 UTC, Andreas Ringlstetter
Details

Description Andreas Ringlstetter 2013-06-20 12:55:22 UTC
Created attachment 81105 [details]
Example of corruption in PA. The skybox texture has been completely overwritten, partly with textures from other programms, corruption in other textures is already starting.

Using the R300 driver (git version from 2013-06-19) on a Mobility Radeon X1400 (128MB dedicated ???), I get massive memory corruption which can be seen in the attached screenshot when running the Planetary Annihilation Alpha.

The game makes use of virtual texturing, thats means a mega texture which won't possibly fit in the RAM in one piece.

However, it appears like textures which are NOT part of the mega texture have been mapped into the same address space. I could see other textures, and even bitmaps from other applications.

In the screenshot, there are large grey stripes for example, however there is no such texture in the game. The color does match the color of the window border though. Performing further tests, I even managed to get parts of album covers from Banshee into PA.


This issue is not only limited to Planetary Annihilation though and the corruption also works other way around, where applications overwrite the bitmaps of other applications.

The effects of the corruption are clearly visible in PA due to the large textures. They are not deterministic, but appear very reliable, most likely due to the high memory usage.

Using other applications which frequently allocate new textures (like Banshee with album covers) speeds up the corruption and makes it even visible in other applications like Firefox, Cinnamon etc., although not reliable.

Attached are:
Screenshot of corruption
Xorg-log
glxinfo output
Comment 1 Andreas Ringlstetter 2013-06-20 12:56:14 UTC
Created attachment 81106 [details]
Xorg log
Comment 2 Andreas Ringlstetter 2013-06-20 12:56:56 UTC
Created attachment 81107 [details]
glxinfo log
Comment 3 Andreas Ringlstetter 2013-06-20 12:58:06 UTC
Created attachment 81108 [details]
Example of corruption in PA. The skybox texture has been completely overwritten, partly with textures from other programms, corruption in other textures is already starting.
Comment 4 Andreas Ringlstetter 2013-06-20 13:10:50 UTC
PS:
I will not be able to test with 9.0 or 9.1 since one of the shaders causes a segfault while compiling in these version. This has only recently (last 1-2 months) been fixed in git.

This was caused by a faulty implementation of peephole_mul_omod() in compiler/radeon_optimize.c, the SIGSEGV was thrown in rc_variable_list_get_writers_one_reader due to writer_list beeing NULL.
Comment 5 Andreas Boll 2013-06-22 11:47:26 UTC
You could try setting the env var RADEON_DEBUG=noopt, maybe it helps.
Additionally you should be able to test 9.0 and 9.1 with this env var.

RADEON_DEBUG=help prints some other debug flags you could try.
E.g disable hyper-z or msaa
Comment 6 Andreas Ringlstetter 2013-06-22 12:34:28 UTC
RADEON_DEBUG=noopt is not possible, the pixel shader programs are to big to be loaded without size optimizations.
Hard limit of 512 instruction slots per pixel shader: http://developer.amd.com/wordpress/media/2012/10/Radeon_X1x00_Programming_Guide.pdf page 13
This limit is exceeded by far due to all the virtual texturing code, the optimized shader barely fits.

I did try it in 9.0 and 9.1 with noopt and I did get past the segfault in peephole_mul_omod() this way, but it did fail then because the resulting shader program was to big.

Deactivating hyper-z has no measurable impact, and it didn't prevent the corruption either.

Antialiasing hasn't even been enabled in the application by default, so turning it off makes no difference at all.
Comment 7 Timothy Arceri 2017-02-10 03:04:43 UTC
Planetary Annihilation is using compat profile. When I override the Mesa version
with MESA_GL_VERSION_OVERRIDE=3.1COMPAT the corruptions are fixed but it later crashes.
Comment 8 Timothy Arceri 2017-02-10 03:11:12 UTC
Actually no I take that back it is using core profile.
Comment 9 Timothy Arceri 2017-02-10 03:51:09 UTC
(In reply to Timothy Arceri from comment #8)
> Actually no I take that back it is using core profile.

It's requesting a core profile and using compat features.
Comment 10 Timothy Arceri 2017-02-14 05:21:48 UTC
(In reply to Timothy Arceri from comment #9)
> (In reply to Timothy Arceri from comment #8)
> > Actually no I take that back it is using core profile.
> 
> It's requesting a core profile and using compat features.

Actually I'm not sure that's true either. Anyway here is a trace (warning its 3.6GB).

https://drive.google.com/open?id=0B-f68fD4PtpBenBiekxITllIbzg
Comment 11 Timothy Arceri 2017-02-14 12:03:16 UTC
The game runs (mostly fine on) i965, and a trace from i965 seem to run without issue on radeonsi.

However running the radeonsi trace on the nvidia blob results in the same corruptions.
Comment 12 Andreas Ringlstetter 2017-02-14 17:50:21 UTC
It's a bug in PA itself, not in Mesa.

The root cause is a race condition on the shared buffer which is used to transfer the rendered HTML UI from the Coherent host process back to PA.

There is a missing mutex inside PA when the buffer gets reallocated as a result of a window resize event. Effectively, this results in a use-after-free by the render thread of the PA process.

The faster the realloc, the lower the chance of this bug occurring.
It's also subject to possibly missing protections against use after free conditions on previously shared buffers. And also to the memory allocation strategy, as a reuse of the same memory region without a clear leads to the most visible effect.

Unfortunately, various Mesa drivers so not wipe the video memory after a buffer was returned to the global pool!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.