Bug 99120 - VERDE 7770 - glxdemo, vlc/glx, weston fail with garbled screen. Elsewhere blurry text rendering when focus lost
Summary: VERDE 7770 - glxdemo, vlc/glx, weston fail with garbled screen. Elsewhere blu...
Status: CLOSED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: 13.0
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-17 08:55 UTC by Larry
Modified: 2017-01-04 15:28 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
glxinfo dump (99.63 KB, text/plain)
2016-12-17 08:55 UTC, Larry
Details
blender shot, blurred fonts (250.42 KB, image/png)
2016-12-17 08:56 UTC, Larry
Details
glxdemo shot after resize (26.73 KB, image/png)
2016-12-17 08:56 UTC, Larry
Details
xorg log (36.86 KB, text/plain)
2016-12-17 08:57 UTC, Larry
Details
terminal output from launching weston under X (5.74 KB, text/plain)
2016-12-17 08:57 UTC, Larry
Details
shot of weston window under X (87.52 KB, image/png)
2016-12-17 08:57 UTC, Larry
Details
dmesg | grep -i radeon (1.93 KB, text/plain)
2016-12-17 08:58 UTC, Larry
Details
shot of vlc playing video using openGL backend (223.32 KB, image/png)
2016-12-17 08:59 UTC, Larry
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Larry 2016-12-17 08:55:22 UTC
Created attachment 128510 [details]
glxinfo dump

OpenGL renderer string: Gallium 0.4 on AMD CAPE VERDE (DRM 2.46.0 / 4.8.13-300.fc25.x86_64, LLVM 3.8.0)

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770/8760 / R7 250X] [1002:683d]

Launching weston from vconsole freezes the display, switching to another vconsole and killing the process returns to command line.

Launching weston under X opens a window with garbled contents. (attached shot)

Launching glxdemo under X opens a window with yellow square, but background contains garbled noise. resizing the window causes the yellow square to resize and render, but with more and more garbage appearing in the background (attached shots)

Using vlc to play a video after selecting the OpenGL backend produces a fixed garbled screen while the audio plays fine. (Shot attached)

I've experienced similar issues with other X programs which rely on GL(X?), where the main screen appears but is completely corrupted.

another issue (related or not) is blurry text rendering in various programs.
I've experienced this with blender (see #38070) and with gnome-shell. 
Unlike (as I understand it) in #38070, the text is initially rendered properly but when focus is lost it is blurred and as the cursor hovers over other elements in the UI it gets worse and worse. Occasionally the whole UI s redrawn and all the text snaps back, but it's largely unusable. Using software gl instead has no such issues.  Shot attached, note properly rendered text on left pane (which has focus) vs. blurred right hand pane (which doesn't have focus).

I've experienced the exact same issues on fc23 which uses mesa 11.x and had (painfully) switched to catalyst which didn't have these issues. A recent upgrade to fc25 means I can't use catalyst anymore, and despite several years passing all these issues have not been resolved in the open source drivers.

Attaching everything I can think of, and glad to provide additional info as needed.
Comment 1 Larry 2016-12-17 08:56:12 UTC
Created attachment 128511 [details]
blender shot, blurred fonts
Comment 2 Larry 2016-12-17 08:56:49 UTC
Created attachment 128512 [details]
glxdemo shot after resize
Comment 3 Larry 2016-12-17 08:57:09 UTC
Created attachment 128513 [details]
xorg log
Comment 4 Larry 2016-12-17 08:57:39 UTC
Created attachment 128514 [details]
terminal output from launching weston under X
Comment 5 Larry 2016-12-17 08:57:59 UTC
Created attachment 128515 [details]
shot of weston window under X
Comment 6 Larry 2016-12-17 08:58:18 UTC
Created attachment 128516 [details]
dmesg | grep -i radeon
Comment 7 Larry 2016-12-17 08:59:06 UTC
Created attachment 128517 [details]
shot of vlc playing video using openGL backend
Comment 8 Larry 2016-12-17 09:06:45 UTC
fc25 currently uses mesa-dri-drivers-13.0.2-2.fc25.x86_64

And I should also point out that glxgears runs fine, both initially when launched and even after resizing.
Comment 9 Michel Dänzer 2016-12-17 09:19:09 UTC
Does not enabling Option "ShadowPrimary" in xorg.conf help for any of the issues under X?
Comment 10 Larry 2016-12-17 11:50:59 UTC
> Does not enabling Option "ShadowPrimary" in xorg.conf help for any of the issues under X?

I disabled "ShadowPrimary" and verified the change in xorg.0.log. This does not resolve either issue, not the blurring nor the display garbage. If anything, it introduces some new misbehaviors when in blender.
Comment 11 Larry 2016-12-22 21:55:26 UTC
Switching to the latest amdgpu driver (drm-next-4.10 branch on ~agd5f/linux) and using the current fc25 xorg-amdgpu server and mesa (13.0.2) still exhibits all the issues described. The commit (f8d9422ef80c512611228449) is just a few days old, with ongoing work on experimental support for Verde (gcn 1.0) in amdgpu.

X did come up without a hitch, which makes me doubt whether future work on amdgpu is likely to address these issues.

What I know so far is that:
glxgears works.
glxdemo doesn't work, but weston (which uses egl) doesn't either.
the problem manifests on both the radeonsi driver and the amdgpu driver.

are those data points sufficient to at least triangulate the problem to a specific component in the stack?
Comment 12 Larry 2016-12-31 22:30:55 UTC
According to phoronix (April 20th) the new experimental support for HD 7000 in amdgpu is basically inherited from the radeonsi driver. If true, that explains why they share same critical issues with respect to SI/Verde.

CCing Tom St Denis from AMD whose commits amdgpu seem to deal with that generation.

I understand that the amdgpu support for HD7700 is experimental (and, lower priority). Please just be aware that the existing drivers you're basing off of have critical issues (I'm posting this on https://bugs.freedesktop.org/show_bug.cgi?id=99120). 

As AMD's cutoff for amdgpu means that HD7700 should be supported eventually, I hope whatever engineering resources are presently allocated to improving the state of linux support from AMD GPU products will deal with these problems (i.e. utterly unusable GL) while the momentum is strong.

Thanks, and Happy new year!
Comment 13 Tom St Denis 2017-01-01 11:56:21 UTC
I wonder if it's a tile mode array problem?  Looking at the code for radeon/si.c and amdgpu/gfx_v6_0.c the array configs are different (entries 18-20 don't exist in the radeon side).  Briefly looking at a few other entries though they seem to line up textually (assuming the defines map to the same values).
Comment 14 Larry 2017-01-01 15:05:46 UTC
> I wonder if it's a tile mode array problem?  Looking at the code for 
> radeon/si.c and amdgpu/gfx_v6_0.c the array configs are different (entries 
> 18-20 don't exist in the radeon side).  Briefly looking at a few other entries
> though they seem to line up textually (assuming the defines map to the same
> values).

There *is* a "tiled" effect to the corrupted graphics sometimes (see for example the weston shot attached to bug), so maybe.

But as I mentioned the same problems exist in both drivers and it that makes it  seem unlikely that a comparison of the two would highlight bugs common to both. 

That said, I'm glad to test any branch or solid suggestion and report back. 
What do you think?
Comment 15 Tom St Denis 2017-01-01 16:50:03 UTC
Is there a version of mesa for which the radeon KMD works?

Also if you load amdgpu with cg_mask=0 does that help
Comment 16 Larry 2017-01-01 19:51:48 UTC
(In reply to Tom St Denis from comment #15)
> Is there a version of mesa for which the radeon KMD works?
> 
> Also if you load amdgpu with cg_mask=0 does that help

I know that mesa 12 (fc24) and mesa 13.02 (fc25) do not work. Haven't tried mesa git.

Tried booting with cg_mask, no joy.
Comment 17 Larry 2017-01-01 19:52:07 UTC
cg_mask=0, that is.
Comment 18 Tom St Denis 2017-01-01 23:28:06 UTC
So there is no known point for which this worked even with radeonsi+radeon KMD?

That surprises me.
Comment 19 Larry 2017-01-02 01:19:32 UTC
(In reply to Tom St Denis from comment #18)
> So there is no known point for which this worked even with radeonsi+radeon
> KMD?
> 
> That surprises me.

What can I tell ya?, I'd love to be wrong. :)
I has the same issues in fc23 about 18 months ago and was using catalyst until fc25 recently came out forcing fc23 into EOL, at which point I had to upgrade for security updates. All the issues are still there, only now I can't use catalyst to sidestep the problem.
Comment 20 Tom St Denis 2017-01-03 00:10:20 UTC
So Catalyst works but none of the open source KMD/UMD pairs work? Hmm...

I'll see if I can dig up a VERDE board and install an old kernel/catalyst and see what they're doing diff.
Comment 21 Larry 2017-01-03 02:16:27 UTC
(In reply to Tom St Denis from comment #20)
> So Catalyst works but none of the open source KMD/UMD pairs work? Hmm...
> 
> I'll see if I can dig up a VERDE board and install an old kernel/catalyst
> and see what they're doing diff.

That about sums it up. Thanks for looking into it, much appreciated.

VDPAU also works (mostly), while video GLX gives blocky garbage (in vlc).

In the meantime, I've compiled mesa git and tried it - no change.

Perhaps irrelevant, but the radeon KMD puts this error in dmesg:

[    4.980751] [drm] fb mappable at 0xE05D9000
[    4.980753] [drm] vram apper at 0xE0000000
[    4.980753] [drm] size 8294400
[    4.980754] [drm] fb depth is 24
[    4.980755] [drm]    pitch is 7680
[    4.980850] fbcon: radeondrmfb (fb0) is primary device
[    5.161636] [drm:si_dpm_set_power_state [radeon]] *ERROR* si_restrict_performance_levels_before_switch failed
[    5.184528] Console: switching to colour frame buffer device 240x67
[    5.188443] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device
[    5.212818] [drm] Initialized radeon 2.46.0 20080528 for 0000:01:00.0 on minor 0
 
Please Let me know if there's anything else I can do...
Comment 22 Tom St Denis 2017-01-03 18:28:32 UTC
Chatting internally one of our developers also has a VERDE with a DID of 0x683D.  He claims that with mesa + radeon kmd it works.  

So at this point the possibilities are 

1.  Different mesa/kmd combination from what he uses.
2.  User error (your install is broken)
3.  Board damaged (analogue/vbios/etc).

I doubt it's anything analogue and you probably haven't flashed the vbios.  

I'm sure our developer is using a combo of mesa git and staging kernel.  So if you haven't already could you try our staging (which is a couple of weeks out of date internally but should be fine since there aren't really any significant radeon/si changes pending) 

https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-4.7

and a recent mesa (or mesa git)?

It would help to get the full lspci info for the board too.  See if the revision/subsystem id's match up with our developer.
Comment 23 Larry 2017-01-03 22:49:24 UTC
Thank you for going to the trouble of checking internally.

(In reply to Tom St Denis from comment #22)
> Chatting internally one of our developers also has a VERDE with a DID of
> 0x683D.  He claims that with mesa + radeon kmd it works.  
> 
> So at this point the possibilities are 
> 
> 1.  Different mesa/kmd combination from what he uses.

Ok, I'll try the combination you suggested.

> 2.  User error (your install is broken)

Ok, I'll try a fedora livecd.

> 3.  Board damaged (analogue/vbios/etc).
> 
> I doubt it's anything analogue and you probably haven't flashed the vbios.  
> 

I haven't, and remember catalyst works... it must be a software thing.

> <...>
> 
> It would help to get the full lspci info for the board too.  See if the
> revision/subsystem id's match up with our developer.

Here it is again:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770/8760 / R7 250X] [1002:683d]

This and Other details/logs/shots posted in original bug report:
https://bugs.freedesktop.org/show_bug.cgi?id=99120
Comment 24 Larry 2017-01-04 02:48:25 UTC
Switching to amd-staging-4.7 with the fc25 provided mesa-13.0.2 didn't change anything. However, Booting into fc25 liveusb (and installing some stuff) shows that weston and glxdemo work just fine. Come to think of it, I was in gnome-shell and there wasn't any font blurring either.

fc25 liveusb uses kernel 4.8.3 and mesa 12.0.3, so at least there's a known-good system.

I downgraded mesa to 12.0.3 on my box and nothing changed. There's obviously something wrong with my installation as you've suggested. And that means it's not really up to you to help debug this further. Thank you very much for your help.

But, I know that both X and mesa have been updated multiple times and can't think what else on my filesystem could be implicated. Any ideas on how I could hunt this down (short of a fresh install)? Can I leverage the liveusb system for deltas somehow? I'd appreciate any suggestions, before I close this as RESOLVED.
Comment 25 Arek Ruśniak 2017-01-04 11:12:47 UTC
Hi, before you start reinstall whole system you can always try:

*) check this all things with completly new user account
*) try switch to modesetting ddx
*) clean xorg.conf.d/* (make backup before)
Comment 26 Larry 2017-01-04 13:41:30 UTC
(In reply to Arek Ruśniak from comment #25)
> Hi, before you start reinstall whole system you can always try:
> 
> 1) check this all things with completely new user account

That worked. 

Among the 100s of cruft dot files in my $HOME there's a little xml file called ".drirc", which I don't remember ever touching. It may have been there since before I even got this card (previous was an ancient nvidia). I remember trying PlayOnLinux years ago, maybe it put it there.

I finally narrowed the trouble down to the following two entries:

<driconf>
    <device screen="0" driver="radeonsi">
        <application name="Default">
            <option name="pp_jimenezmlaa" value="7" />
            <option name="pp_jimenezmlaa_color" value="8" />
        </application>
    </device>
</driconf>

"pp_jimenezmlaa" causes block-corrupted garbage, and "pp_jimenezmlaa_color" screws up the fonts with blurring. I've spent 18 months thinking the OSS driver was broken because of this. Is this a bug?


Now the hardware/driver does work, FOSS wins again and I'm \o/ about it.

Thanks again for your help.
Comment 27 Christian König 2017-01-04 15:28:29 UTC
Mhm, interesting bug. I didn't know that .drirc config options could cause such problems.

Figuring out what put the .drirc file in the home directory might be a good idea, but certainly not a job for the devs.

Anyway let's close this bug report and move on.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.