Bug 89980 - [Regression] Graphical corruption after resuming from suspend (w/ dual monitor configuration)
Summary: [Regression] Graphical corruption after resuming from suspend (w/ dual monito...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-11 02:28 UTC by Furkan
Modified: 2015-04-21 19:02 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg log (73.70 KB, text/plain)
2015-04-11 02:28 UTC, Furkan
no flags Details
dmesg (233.95 KB, text/plain)
2015-04-11 02:29 UTC, Furkan
no flags Details
Mesa bisect log (2.61 KB, text/plain)
2015-04-14 20:58 UTC, Furkan
no flags Details
Different "checkerboard" corruption (2.71 MB, application/x-tar)
2015-04-16 02:38 UTC, Furkan
no flags Details

Description Furkan 2015-04-11 02:28:15 UTC
Created attachment 115013 [details]
Xorg log

This is a regression: Bug is reproducible in Ubuntu 15.04 Beta (Xorg 1.17.1, linux 3.19) and Fedora 22 Alpha (Xorg 1.17.1, linux 4.0) with a Radeon R7 260X video card.

I first noticed the issue in Ubuntu 14.04 after I upgraded from Xorg 1.16 to 1.17.

The problem does not occur with fglrx.

Symptoms: Checkerboard tearing pattern begins to occur in approximately the top 1/8 of the display after resuming from suspend, and does not resolve itself until a reboot.

Demonstration of bug in Ubuntu w/ Unity (highlighting menu entries with the mouse): https://www.dropbox.com/s/ez2v03oetppecgx/VID_20150324_020612.mp4?dl=0

Demonstration of bug in Fedora w/ Gnome 3 (maximizing/restoring a window): https://www.dropbox.com/s/85n2iq27zm00dlo/VID_20150410_033406.mp4?dl=0

Steps to reproduce:

I can only reproduce this with when I have 2 displays connected. My primary screen is set to 2560x1440, and the secondary screen in portrait mode is set to 1200x1920 on the left-hand side. I have the landscape monitor centered with respect to the portrait one, so y = 240 in ~/.config./monitors.xml.

I cannot observe the bug when both screens are aligned at the top, i.e., with y=0 in ~/.config/monitors.xml.

I also cannot observe the bug with a single monitor connected, or with both monitors in landscape mode.

After setting up the monitor configuration, all that needs to be done to reproduce the corruption is to suspend the system, resume, and observe the top portion of the primary (landscape) display when the screen is changing, e.g., it is apparent when watching full-screen movies or minimizing/maximizing windows as demonstrated in my demo video.

Attached: Xorg log and dmesg (w/ kernel parameter drm.debug=14) saved after a suspend/resume cycle
Comment 1 Furkan 2015-04-11 02:29:05 UTC
Created attachment 115014 [details]
dmesg
Comment 2 Alex Deucher 2015-04-13 14:48:02 UTC
Since this is a regression can you narrow down the component (kernel, mesa, ddx) and bisect?
Comment 3 Furkan 2015-04-14 06:35:55 UTC
Based on my debugging so far, it seems that mesa is the likely culprit.

Step #1: Fresh install of Ubuntu 14.04 (installed on a new, clean partition):
Linux 3.13 kernel -> Manually upgraded to Linux 4.0 mainline
Xorg 1.15.1
Mesa 10.1.3
xf86-video-ati 7.3
libdrm 2.4.56

Status: The bug is not present, and confirms that the issue is not with the kernel.

----------

Step #2: Installed Ubuntu 14.04.2 hardware enablement stack:
Linux 4.0 (mainline)
Xorg 1.16.0
Mesa 10.3.2
xf86-video-ati 7.4
libdrm 2.4.56

Status: The bug is present.

----------

Step #3: I reverted to the original 14.04 packages, but compiled xf86-video-ati 7.4 from git.

Status: The bug is not present, and confirms (?) that the issue is not with ddx.

----------

Step #4: I had trouble getting Ubuntu to work with Mesa compiled from git (whenever I try to log in, I just get kicked back to the lightdm greeter), and I couldn't upgrade Mesa from the Ubuntu repo without also upgrading Xorg, so I upgraded Mesa from Oibaf PPA:

Linux 4.0 (mainline)
Xorg 1.15.1
Mesa 10.6 (oibaf-ppa)
xf86-video-ati 7.4 (git) (also tested 7.5.99 from oibaf-ppa)
libdrm 2.4.60 (oibaf-ppa)

Status: The bug is present. So it seems likely that the bug was introduced somewhere between Mesa 10.1.3 and 10.3.2.

If I can figure out how to get Ubuntu to play nice with mainline Mesa compiled from git (maybe if I figure out how to apply the Ubuntu patches), I can do a bisect, but that's where I'm stuck as of now.
Comment 4 Furkan 2015-04-14 20:57:14 UTC
I have found the bad commit. I will attach my bisect log.

I bisected between 10.1-branchpoint and 10.2-branchpoint, and here's the final result:

4a5519f1e019dbf1103e4f3abe0a695637a87518 is the first bad commit
commit 4a5519f1e019dbf1103e4f3abe0a695637a87518
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Mon Feb 10 01:25:54 2014 +0100

    r600g,radeonsi: set correct initial domain for shared resources

:040000 040000 eafa3cdc6eea908c6ba8861f3d063f6a3161217b 7938f0ed0cdf8c677af35f1b2e67739dc210bda8 M	src
Comment 5 Furkan 2015-04-14 20:58:22 UTC
Created attachment 115071 [details]
Mesa bisect log
Comment 6 Michel Dänzer 2015-04-15 03:22:41 UTC
(In reply to falaca from comment #4)
> 4a5519f1e019dbf1103e4f3abe0a695637a87518 is the first bad commit
> commit 4a5519f1e019dbf1103e4f3abe0a695637a87518
> Author: Marek Olšák <marek.olsak@amd.com>
> Date:   Mon Feb 10 01:25:54 2014 +0100
> 
>     r600g,radeonsi: set correct initial domain for shared resources

Weird. Marek, any ideas?
Comment 7 Furkan 2015-04-16 02:38:52 UTC
Created attachment 115102 [details]
Different "checkerboard" corruption

This might be totally unrelated, but since I don't know enough to make that judgement, I thought I should share just in case (otherwise I could make a separate bug report for it).

Basically, I get intermittent checkerboard patterns which appear on my screen, as seen in the 2 screenshots I'm attaching (some areas blacked out by me for privacy). I don't know how to reproduce the patterns - they appear intermittent and seem unrelated to suspend/resume. They either look like "noise", like in the first screenshot, or they are remnants from a previous window that was open, like in the second screenshot.

Both of those screenshots are from the portrait display (unlike the behaviour from the video I posted in the original bug report, which only happens on the landscape display). I can't remember if I've seen this happen on the landscape display so far. I can keep collecting screenshots to see if it's confined to specific areas of the screen.

For what it's worth, I have used Catalyst 14.12 for a couple of months with this card, and didn't observe this type of behaviour.
Comment 8 Michel Dänzer 2015-04-16 06:27:11 UTC
(In reply to falaca from comment #0)
> I can only reproduce this with when I have 2 displays connected. My primary
> screen is set to 2560x1440, and the secondary screen in portrait mode is set
> to 1200x1920 on the left-hand side. I have the landscape monitor centered
> with respect to the portrait one, so y = 240 in ~/.config./monitors.xml.
> 
> I cannot observe the bug when both screens are aligned at the top, i.e.,
> with y=0 in ~/.config/monitors.xml.

Have you tried moving the landscape monitor to y = 0 and back to y = 240 after suspend/resume, while the session is up? Does that fix the problem, or does it stay corrupted?
Comment 9 Furkan 2015-04-16 07:34:28 UTC
(In reply to Michel Dänzer from comment #8)
> (In reply to falaca from comment #0)
> > I can only reproduce this with when I have 2 displays connected. My primary
> > screen is set to 2560x1440, and the secondary screen in portrait mode is set
> > to 1200x1920 on the left-hand side. I have the landscape monitor centered
> > with respect to the portrait one, so y = 240 in ~/.config./monitors.xml.
> > 
> > I cannot observe the bug when both screens are aligned at the top, i.e.,
> > with y=0 in ~/.config/monitors.xml.
> 
> Have you tried moving the landscape monitor to y = 0 and back to y = 240
> after suspend/resume, while the session is up? Does that fix the problem, or
> does it stay corrupted?

I just tried right now, and it doesn't make a difference. But you know what, it turns out that it still *does* happen when y=0, it's just that it's a little less noticeable to me, e.g., I'm having trouble seeing it when maximizing a window, but I'm still seeing it happen in the menus. This is purely based on my eyesight, so it's hardly scientific, but I could make more videos if desired.

I tried to move my landscape screen further up (above the portrait one, but still overlapping, so I presume that would be y=0 for the landscape screen, but y= positive for the portrait screen). That resulted in X becoming unusable. My landscape screen turned white, and restarting X didn't make things much better - I just got an unusable tiled pattern: https://www.dropbox.com/s/sfrxv4owqchyq75/tiledpattern.jpg?dl=0

I rebooted and tried with linux 3.16, and also with Arch Linux + Gnome 3 + linux 3.19 (or maybe it was 4.0). Same result (white screen). So unfortunately I wasn't able to test out what would happen in that scenario.

Is there anybody else who can test this configuration (dual monitors with a portrait display)? It seems like it doesn't take much effort to break something.
Comment 10 Furkan 2015-04-18 21:44:25 UTC
I wanted to add that I built Mesa 10.1 from git and installed it on Ubuntu 15.04. Along with Xorg 1.17.1 and the latest DDX compiled from git, I can't observe the bug.

Is there anything else that I can do to help this along? I tried cloning the master branch and just reverting Marek's commit (the one that I narrowed the bug down to with my git bisect), but of course that didn't work since there is other newer code which now depends on that.

I also tried disabling hyperz (since I believe 10.2 turned hyperz on by default), and that had no effect.
Comment 11 Marek Olšák 2015-04-18 23:38:56 UTC
(In reply to Michel Dänzer from comment #6)
> (In reply to falaca from comment #4)
> > 4a5519f1e019dbf1103e4f3abe0a695637a87518 is the first bad commit
> > commit 4a5519f1e019dbf1103e4f3abe0a695637a87518
> > Author: Marek Olšák <marek.olsak@amd.com>
> > Date:   Mon Feb 10 01:25:54 2014 +0100
> > 
> >     r600g,radeonsi: set correct initial domain for shared resources
> 
> Weird. Marek, any ideas?

Sorry, no. The commit just obtains the initial domain from the kernel, so that it can use it for command submission. The idea of the commit is that the driver shouldn't move imported buffers to a domain that is different from the domain where the buffer was originally created.
Comment 12 Furkan 2015-04-19 04:11:17 UTC
Is there any sort of debugging trace that I can collect, to objectively compare the difference in behaviour before and after a suspend?
Comment 13 Furkan 2015-04-21 19:02:31 UTC
Good news! I saw this today: http://lists.x.org/archives/xorg-driver-ati/2015-April/027345.html

So I built and installed Michel's xf86-video-ati repo and enabled TearFree in xorg.conf. The corruption is now gone - so I suppose it was simply some manifestation of tearing, but only after a suspend/resume cycle.

To test it, I installed the module, enabled TearFree, then did a suspend/resume cycle, and I couldn't observe any tearing in the global menus. So then I commented out the TearFree option in xorg.conf and restarted lightdm, and immediately started seeing the really obvious tearing in the menus like in the video I posted. As a final check, I uncommented the TearFree option and restarted lightdm again, and the tearing was gone.

Thanks Michel! And I hope the TearFree feature will eventually be extended to support rotated displays as well!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.