Bug 91889

Summary: Planetary Anihilation: Titans display content of other processes buffers
Product: Mesa Reporter: Krzysztof A. Sobiecki <sobkas>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED NOTOURBUG QA Contact: mesa-dev
Severity: blocker    
Priority: highest    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Warning messages from apitrace replay before "export MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5"
A texture that breaks things
Correct framebuffer
Faulty framebuffer

Description Krzysztof A. Sobiecki 2015-09-05 22:02:18 UTC
Planetary Anihilation: Titans display content of other processes buffers.
This is a very serious bug, that can cause big problems.

 Mesa for some reason allows process to grab random data from graphic memory and display it. Including passwords, private emails and so on.
This turns Mesa and PA:T into potential leaker of personal data.

Bug #65968 shows same corruption, but it doesn't highlight potential security problems.

I will produce an apitrace of PA:T.
Comment 1 Krzysztof A. Sobiecki 2015-09-05 22:26:39 UTC
My specs:
OpenGL renderer string: Gallium 0.4 on AMD JUNIPER (DRM 2.43.0, LLVM 3.8.0)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.1.0-devel
VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 6770]
Comment 2 Krzysztof A. Sobiecki 2015-09-05 23:42:42 UTC
Apitrace: https://drive.google.com/file/d/0B3J0Mg89izcbMVgzMDdWYkxoY3M/view?usp=sharing
Comment 3 Albert Freeman 2015-09-06 00:30:49 UTC
The mesa developers are well aware of this and see it all the time in a myriad of different environments. It isn't only a problem in mesa, but also in other parts of the linux graphics stack (e.g. X). Not many people consider it a issue that needs an immediate fix. For several decades this kind of thing has been easy to archive and hasen't caused many issues. Nevertheless it is still an issue and will likely be fixed when other things have been accomplished.

But likely the issue will be present in X for eternity. I am not sure if Wayland has a similar flaw. If Wayland does, then it will almost certainly be fixed. Gimp for example depends on the flaw in X to select a color on its canvas in another window.
Comment 4 Krzysztof A. Sobiecki 2015-09-09 21:21:21 UTC
It's about one process accessing graphic memory of other process, how it's even possible. Doesn't kernel some kind of memory management facility to prevent that?
Comment 5 Albert Freeman 2015-09-10 09:50:24 UTC
The GPU has its own physical ram that is managed with a completely different system than system ram.

There are actually two issues here. One is the bug in the game causing strange behavior. The other is the security issue that just so happens to (sometimes) arise due to that bug.

This is probably not going to work, but try:
export MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5
then run the game in the same console window.

That seems to get rid of a rather critical warning when I replay the trace.

Strange thing is, when I replay this on mesa and catalyst, the same visual corruption occurs (in the areas I can remember, exactly the same). Though catalyst shows no warnings/errors but mesa does. This shouldn't happen as apitrace simply records GL commands/data before they reach the driver. When replayed they get sent to my driver for display.

Someone on irc commented that apitrace does not always capture all data needed to display the replay flawlessly. However the corruption seems to be in areas which shouldn't (in normal circumstances) be affected by missing data (as far as I know) (e.g. empty sky OR big things crosscutting many different bits of geometry/UI, like rectangles).
Comment 6 Albert Freeman 2015-09-10 10:01:16 UTC
I even get the same visual corruption with an Intel Sandybridge laptop with mesa git drivers.
Comment 7 Albert Freeman 2015-09-10 10:01:35 UTC
*with the replay
Comment 8 Albert Freeman 2015-09-10 10:36:33 UTC
Created attachment 118184 [details]
Warning messages from apitrace replay before "export MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5"
Comment 9 Albert Freeman 2015-09-10 11:06:30 UTC
When I said "bug in the game", I meant "bug with the game".

Can you upload another trace (please try to reproduce the bug as fast as possible (since parts of the trace can't be skipped since the game could be [uploading resources to the GPU]/[setting state] at anytime (likely to be used in the future)))? Can you also take screenshots of the actual issue, so I can compare your screenshots to my apitrace replay.
Comment 10 Krzysztof A. Sobiecki 2015-09-10 16:08:24 UTC
Of course I will make another trace. But it looks like corrupted textures were saved into a trace. 

Like part of memory(vram) is used as texture, it should be initialized to some state(probably transparent texture), but it silently fails and texture ends with random memory data.

Screenshot:
http://orig10.deviantart.net/7234/f/2015/253/3/a/zrzut_ekranu_z_2015_09_10_18_01_38_by_fboxnf-d992p38.png

Thanks for your help.
Comment 11 Albert Freeman 2015-09-10 16:57:37 UTC
Does the screenshot you posted look [around about]/exactly the same as actually playing the game does?
Comment 12 Krzysztof A. Sobiecki 2015-09-10 19:55:34 UTC
(In reply to Albert Freeman from comment #11)
> Does the screenshot you posted look [around about]/exactly the same as
> actually playing the game does?

Yes it does

I have made two apitraces:
1. Shorter one showing problems:
2. Shorter one using option --software-ui, that doesn't have problems(but after a match game displays only a black screen in menu)

1. https://drive.google.com/file/d/0B3J0Mg89izcbZERFZ0xmQzRaTDA/view?usp=sharing
2. https://drive.google.com/file/d/0B3J0Mg89izcbWGJtcVZ1TUFsZlE/view?usp=sharing
Comment 13 Albert Freeman 2015-09-12 09:38:42 UTC
Well the warning/error messages are the same in both traces so it can't be them.

Possible workaround:
{
wget -O - https://github.com/pamods/pamm-atom/raw/stable/install.sh | bash

It will install PAMM in $HOME/.local/pamm
Then search for:
mouse cursor fix
in PAMM

If you can't find that in the mod manager, manually install: https://forums.uberent.com/threads/no-cursor-block-cursor-ubuntu-12-04-13-04-13-10.53019/page-2#post-816410
}

That is an alternative to the --software-ui trick that (hopefully) should stop the blackness issue you have. Although I am not certain that it resolves the same issue as the one you have.

Although it does seem that a library "PA: Titans" uses, coherent UI, does cause issues with mesa...
Comment 14 Krzysztof A. Sobiecki 2015-09-12 21:43:32 UTC
So I was looking trough PA.2.trace in qapitrace and I found that in frame 3001 it somehow corrupts framebuffer
So it happens like that:
glTexImage2D(GL_TEXTURE_2D,...., binary data) < binary data is a pointer that should point to image data, it points apparently to wrong data
glGenBuffers
glBindBuffer
glBufferData
glDrawElements < here correct framebuffer gets overwritten by faulty texture

It's close to the end of the frame, so should be easy to find

So how glTexImage2D gets wrong data?
Comment 15 Krzysztof A. Sobiecki 2015-09-12 21:46:53 UTC
Created attachment 118227 [details]
A texture that breaks things

This is a texture that is written over correct framebuffer
Comment 16 Krzysztof A. Sobiecki 2015-09-12 21:49:02 UTC
Created attachment 118228 [details]
Correct framebuffer
Comment 17 Krzysztof A. Sobiecki 2015-09-12 21:49:45 UTC
Created attachment 118229 [details]
Faulty framebuffer
Comment 18 Michel Dänzer 2015-09-13 15:19:13 UTC
(In reply to Krzysztof A. Sobiecki from comment #14)
> So how glTexImage2D gets wrong data?

The data passed to glTexImage2D is controlled by the application. The question is where it's getting the bad data from.
Comment 19 Krzysztof A. Sobiecki 2015-09-13 19:41:47 UTC
(In reply to Michel Dänzer from comment #18)
> (In reply to Krzysztof A. Sobiecki from comment #14)
> > So how glTexImage2D gets wrong data?
> 
> The data passed to glTexImage2D is controlled by the application. The
> question is where it's getting the bad data from.

I have asked devs about it:
https://forums.uberent.com/threads/a-simple-question.70433/
maybe they will answer it and there will be no need for more painful debuging session
Comment 20 Krzysztof A. Sobiecki 2015-09-13 21:13:43 UTC
(In reply to Michel Dänzer from comment #18)
> (In reply to Krzysztof A. Sobiecki from comment #14)
> > So how glTexImage2D gets wrong data?
> 
> The data passed to glTexImage2D is controlled by the application. The
> question is where it's getting the bad data from.

No matter what they shouldn't be able to get that data. I'm starting to think if it's possible to grab that data with OpenGL would it be also possible with WebGL...
Comment 21 Eirik Byrkjeflot Anonsen 2015-09-14 16:06:33 UTC
(In reply to Krzysztof A. Sobiecki from comment #20)
> (In reply to Michel Dänzer from comment #18)
> > (In reply to Krzysztof A. Sobiecki from comment #14)
> > > So how glTexImage2D gets wrong data?
> > 
> > The data passed to glTexImage2D is controlled by the application. The
> > question is where it's getting the bad data from.
> 
> No matter what they shouldn't be able to get that data. I'm starting to
> think if it's possible to grab that data with OpenGL would it be also
> possible with WebGL...

It should not be possible with WebGL. One of the things that delayed WebGL's general availability was that the specification required that all data shall be initialized. This was explicitly mentioned as being a difference from other OpenGL specifications due to the different threat models.
Comment 22 almos 2015-09-14 16:42:13 UTC
(In reply to Eirik Byrkjeflot Anonsen from comment #21)
> > No matter what they shouldn't be able to get that data. I'm starting to
> > think if it's possible to grab that data with OpenGL would it be also
> > possible with WebGL...
> 
> It should not be possible with WebGL. One of the things that delayed WebGL's
> general availability was that the specification required that all data shall
> be initialized. This was explicitly mentioned as being a difference from
> other OpenGL specifications due to the different threat models.

Well, this bug might be an indication that Mesa doesn't initialize all data.
Comment 23 Eirik Byrkjeflot Anonsen 2015-09-14 17:43:42 UTC
(In reply to almos from comment #22)
> (In reply to Eirik Byrkjeflot Anonsen from comment #21)
> > > No matter what they shouldn't be able to get that data. I'm starting to
> > > think if it's possible to grab that data with OpenGL would it be also
> > > possible with WebGL...
> > 
> > It should not be possible with WebGL. One of the things that delayed WebGL's
> > general availability was that the specification required that all data shall
> > be initialized. This was explicitly mentioned as being a difference from
> > other OpenGL specifications due to the different threat models.
> 
> Well, this bug might be an indication that Mesa doesn't initialize all data.

Mesa is an OpenGL implementation, not a WebGL implementation, so there is nothing wrong with that.

That is, there is nothing in the OpenGL specifications (as far as I know) that says Mesa is doing something wrong. Arguably, this is based on an outdated threat model where local applications are trusted. A more modern understanding of local application security would probably indicate that the threat model should be changed to protect against such information leaks.
Comment 24 Albert Freeman 2015-09-15 13:06:55 UTC
Yep, you are absolutely right, I just walked through every single gl call in the application (skipping redundant frames and shader compilation). The entire UI is drawn outside the apitrace capture and just uploaded to TEXTURE0. This occurs whenever the UI is updated.

So basically the apitraces are useless as far as revealing any further information goes.

Given the issue apparently only occurs with mesa (and most likely only r600) and only with --software-ui missing. The UI probably draws with OpenGL calls outside the scope of apitrace.

Note: This rendering engine sure does some strange things.

Regarding Security:
{
It is not a matter of what specification requires what. Rather a matter of what people/organisations require a secure platform. Ultimately, if mesa can cause an issue, someone could hack up an application that directly accesses the drm/drivers in the kernel...

So the security thing is a kernel problem.

I am not sure if the average X application through X can access the entirety of graphics memory, but it certainly can access what every application has in its window at the time of access.
}
Comment 25 Albert Freeman 2015-09-15 13:08:13 UTC
Correction: I didn't walk through every call in the application, only util I realised what was happening with the UI.
Comment 26 Krzysztof A. Sobiecki 2015-09-24 09:21:55 UTC
Trace from wayland/Xwayland: https://drive.google.com/file/d/0B3J0Mg89izcbR1BHWXEwREJhY00/view?usp=sharing

It shows only dark screen in menu.
Comment 27 Albert Freeman 2015-09-25 12:25:10 UTC
(In reply to Krzysztof A. Sobiecki from comment #26)
> Trace from wayland/Xwayland:
> https://drive.google.com/file/d/0B3J0Mg89izcbR1BHWXEwREJhY00/view?usp=sharing
> 
> It shows only dark screen in menu.

The problem is: apitrace isn't capturing the data that is needed to solve this problem. I just looked through the apitrace flags, none of them would be able to fix it (assuming there are OpenGL calls on another thread for the UI).
Comment 28 Krzysztof A. Sobiecki 2015-09-30 00:26:54 UTC
So there was additional trace file hidden in a subfolder:
Main: https://drive.google.com/file/d/0B3J0Mg89izcbYnpzVjhSeVRmNEk/view?usp=sharing
Hidden one: https://drive.google.com/file/d/0B3J0Mg89izcbdk01UHgyS0s0OFk/view?usp=sharing

Didn't have time to look at it yet.
Comment 29 Rokas Kupstys 2017-07-09 08:32:41 UTC
I noticed this as well. It started happening after i upgraded my gpu to AMD rx580 and started using amdgpu driver. Could this be amdgpu issue?
Comment 30 Timothy Arceri 2018-08-20 05:28:32 UTC
Closing as not our bug as per: https://bugs.freedesktop.org/show_bug.cgi?id=65968#c12

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.