Bug 97022 - Garbage in windows while running a game windowed
Summary: Garbage in windows while running a game windowed
Status: VERIFIED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-21 13:19 UTC by Peter Mulholland
Modified: 2016-08-09 12:02 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Photograph of screen when the problem occurs (3.02 MB, image/jpeg)
2016-07-21 13:19 UTC, Peter Mulholland
Details
Part of kern.log that was written at lockup (5.89 KB, text/plain)
2016-07-21 13:58 UTC, Peter Mulholland
Details
Latest kern.log from crash described (812.32 KB, application/x-xz)
2016-07-22 15:40 UTC, Peter Mulholland
Details
Xorg log from same crash (64.16 KB, text/plain)
2016-07-22 15:41 UTC, Peter Mulholland
Details
Part of kern.log written during forcing a GPU reset (7.18 KB, text/x-log)
2016-07-22 19:43 UTC, Peter Mulholland
Details

Description Peter Mulholland 2016-07-21 13:19:31 UTC
Created attachment 125226 [details]
Photograph of screen when the problem occurs

This is an odd one, as of posting I have only managed to reproduce this when using a Virtual Programming OpenGL 3.2 based game, such as The Witcher 2. I have not been able to reproduce it with Valve's Source games.

For example, If you run The Witcher 2 in windowed mode, and allow it to get to the menu screen, doing anything with another window (such as resizing) causes the display to turn to junk. Eventually the whole X11 display will be corrupt, and only rebooting the system will clear it. Text mode TTY's are unaffected.

Occasionally, rather than corruption, the system will lock up, requiring a hard reset or magic sysrq reset. I could not find a way to consistently reproduce this though, whereas I can consistently reproduce the corruption.

Specs:
Xubuntu 16.04.1 LTS
Kernel 4.0.0-31-generic, x86_64
Mesa 12.1.0-devel (Padoka PPA)
DRM 2.43.0
GPU Radeon HD 7750 (CAPE VERDE PCI 1002:683F), 1GB VRAM
Xorg 1.18.3
Desktop Xfce4 4.12, using builtin compositor

Disabling the compositor makes no difference. Restarting Xorg when the fault occurs makes no difference.

The bug is not present in the 11.2.0 release of Mesa shipped with Ubuntu 16.04 by default.
Comment 1 Peter Mulholland 2016-07-21 13:58:04 UTC
Created attachment 125228 [details]
Part of kern.log that was written at lockup
Comment 2 Peter Mulholland 2016-07-21 13:59:35 UTC
So far I have only produced this bug with two of our games - The Witcher 2, and Overlord (due to be released today).

Witcher 2 causes the display corruption seen in the screenshot. Overlord produces corruption quickly followed by a system lockup. Alt-SysRq-B rebooted the system, and I have attached the last written messages to kern.log
Comment 3 Michel Dänzer 2016-07-22 01:56:51 UTC
Can you bisect Mesa? Note that Xorg needs to be restarted before testing each commit, as glamor might contribute to the problem.
Comment 4 Peter Mulholland 2016-07-22 09:38:29 UTC
Not easily as I am not currently compiling Mesa, but rather using the version built by the Padoka PPA in Ubuntu
Comment 5 Peter Mulholland 2016-07-22 15:39:31 UTC
Today I gave the latest Padoka (git1600721124400.4f89cf4) a try during debugging Overlord. I got a screen corruption followed by a hang.

Just as I was about to reach for the SysRq keys, Xorg restarted, and seemed to operate as normal - except I was now on llvmpipe instead. Quite an amount of kernel log was generated, and some relevant looking info in Xorg.log too. They are attached here.
Comment 6 Peter Mulholland 2016-07-22 15:40:46 UTC
Created attachment 125257 [details]
Latest kern.log from crash described
Comment 7 Peter Mulholland 2016-07-22 15:41:14 UTC
Created attachment 125258 [details]
Xorg log from same crash
Comment 8 Peter Mulholland 2016-07-22 19:43:34 UTC
Created attachment 125261 [details]
Part of kern.log written during forcing a GPU reset

I've found that force resetting the GPU clears the corruption - using cat /sys/kernel/debug/dri/0/radeon_gpu_reset

This does not take effect straight away however, I often have to kill Xorg before it'll happen. There are some errors written to the kernel log - i have attached those.

Here is lspci output for the card in case it's relevant:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] [1002:683f] (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] [1043:0427]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 26
	Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at fde80000 (64-bit, non-prefetchable) [size=256K]
	Region 4: I/O ports at de00 [size=256]
	[virtual] Expansion ROM at fde00000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: radeon
	Kernel modules: radeon
Comment 9 Chernovsky Oleg 2016-07-30 10:21:41 UTC
Confirmed.

ArchLinux: 4.6.4 x86_64
Mesa: 12.0.1
GPU: AMD Curacao PRO [Radeon R7 370 / R9 270/370 OEM]
Xorg: 1.18.4

Game: Overlord

What makes you think it's a regression in Mesa?
Comment 10 Peter Mulholland 2016-07-30 14:33:29 UTC
I am not sure exactly what this comment means, so I will answer as thoroughly as I can.

I am not 100% sure that "mesa" is the component responsible, it could be drm or radeonsi. I am not familiar enough with the Mesa components to be sure.

The problem described does not happen with the Mesa 11.2.0 distribution that Ubuntu Xenial has packaged by default. The problem only occurs when Mesa 12.1.0 is installed using a PPA such as Oibaf or Padoka.

It also does not occur on Mesa 12.0.1 that is part of an installation of Manjaro (Arch Linux) that I have on the same machine.

The problem does not occur with the nvidia or fglrx binary drivers.

The problem is not just an app crash, but display corruption/kernel panic. This should not happen due to the behaviour of a userspace app, even if our apps were doing something "wrong", being able to crash the whole machine is a bug.
Comment 11 Chernovsky Oleg 2016-07-31 20:49:00 UTC
No, I was asking Michael actually, but thanks for explanation, it guves a clue.

Regression is a situation when something was working before update and stopped afterwards. So in your case in Mesa 11 it worked and broke in Mesa 12. Most likely something was broken between releases, that's why Michael asked for bisect.
Comment 12 Marek Olšák 2016-08-09 00:09:09 UTC
I bisected and reverted the problematic commit:
https://cgit.freedesktop.org/mesa/mesa/commit/?id=1ebf3c4b6741a3a3a9d46048abe3996cb9a86334
Comment 13 Peter Mulholland 2016-08-09 12:02:38 UTC
I can confirm the problem is now fixed. Great job :D


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.