Bug 83998 - Oopses on R9270X using UVD since radeon/uvd: use PIPE_USAGE_STAGING for msg&fb buffers
Summary: Oopses on R9270X using UVD since radeon/uvd: use PIPE_USAGE_STAGING for msg&f...
Status: RESOLVED DUPLICATE of bug 91009
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-17 14:14 UTC by Andy Furniss
Modified: 2015-07-30 12:38 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Oops uvd (11.96 KB, text/plain)
2014-09-17 14:23 UTC, Andy Furniss
no flags Details
oom-killer uvd (31.37 KB, text/plain)
2014-09-17 14:23 UTC, Andy Furniss
no flags Details
kernel log with new oops at end (71.36 KB, text/plain)
2014-11-19 22:47 UTC, Andy Furniss
no flags Details
different trace with newer kernel (4.02 KB, text/plain)
2014-12-01 14:07 UTC, Andy Furniss
no flags Details
Another different trace (3.82 KB, text/plain)
2014-12-04 21:36 UTC, Andy Furniss
no flags Details

Description Andy Furniss 2014-09-17 14:14:07 UTC
R9270X Getting rareish Oopses and once oomkiller since -

commit 6327b584155d040ae089e65fd6747186bdd9666b
Author: Christian König <christian.koenig@amd.com>
Date:   Thu Sep 11 09:50:00 2014 +0200

    radeon/uvd: use PIPE_USAGE_STAGING for msg&fb buffers
    
    That better matches the actual userspace use case, the
    kernel will force it to VRAM if the hardware requires it.

To trigger this I need to repeatedly start mplayer using uvd - it's takes some time, I ended up making a script to do it for me while AFK.

I am 99% sure the above is it - I spent a day and a half trying on the commit before (radeon/video: use the hw to initial clear the buffers) with no crash.

The Oopses don't mention radeon or uvd.

They will hit as soon as mplayer launches before it renders anything - screen locked, no mouse cursor or vt switch but SysRq works, box is still up in some ways for a while (one time I had music playing for 30s-1min after).

One time oomkiller put on its blindfold and ran around killing :-)
Comment 1 Andy Furniss 2014-09-17 14:23:01 UTC
Created attachment 106433 [details]
Oops uvd
Comment 2 Andy Furniss 2014-09-17 14:23:35 UTC
Created attachment 106434 [details]
oom-killer uvd
Comment 3 Aaron B 2014-09-17 20:44:46 UTC
If you use Chrom(e/ium)/Firefox, can you make sure hardware accel is enabled and see if it crashes randomly using them? I wanna see if you have other same symptoms I have with random crashing on my system.
Comment 4 Michel Dänzer 2014-09-18 01:34:48 UTC
(In reply to comment #3)
> I wanna see if you have other same symptoms I have with random crashing on my
> system.

Aaron, I don't think this bug is related to yours.
Comment 5 Andy Furniss 2014-11-19 22:44:41 UTC
This bug is still alive for me.

Have randomly tested and reproduced since reporting and failed to reproduce with 

6327b584155d040ae089e65fd6747186bdd9666b reverted.

Just retried with the new agd5f drm-next-319-wip and it's still there - though this time I got and oops which mentions radeon/ttm so uploading.

To produce I am running a script running mplayer on various blu-rays on hard disk.

It's not aggressive - run for 10 sec sleep for three ....

I added a counter and it got up to 29 before oopsing.
Comment 6 Andy Furniss 2014-11-19 22:47:40 UTC
Created attachment 109744 [details]
kernel log with new oops at end
Comment 7 Andy Furniss 2014-12-01 14:07:39 UTC
Created attachment 110298 [details]
different trace with newer kernel

This is getting harder to trigger, started to think it was fixed.

Haven't triggered with mpv yet, but can still just with mplayer.

On third attempt, 115 starts in CPUs set to perf (may help trigger, but not sure).

Uploading because current drm-next-219-wip has some fence changes and the trace starts with -

radeon_fence_signaled
Comment 8 Andy Furniss 2014-12-04 21:36:31 UTC
Created attachment 110470 [details]
Another different trace

Newer kernel different trace - BUG this time.

Slightly different from normal both trace and behavior, in that it froze on quit and the screen was still showing the vid.

Due to it being a bit different I tried again - same result, both died within 5 minutes. I then reverted the suspect commit in case it was a different issue, went out came back and it was up to 500 and still going OK, so it does seem to be the same issue.
Comment 9 Christian König 2014-12-05 10:46:57 UTC
That seems to be some kind of random kernel memory corruption, triggered by using PIPE_USAGE_STAGING for your case.

Does anybody have any good idea how to figure out what this is?
Comment 10 Michel Dänzer 2014-12-08 06:11:55 UTC
If there's an older kernel where this doesn't happen, it might be possible to bisect.
Comment 11 Andy Furniss 2015-07-30 12:38:07 UTC
I did try older kernels, but after thinking one was OK which turned out to also have the issue I decided bisecting was not really going to work.

I don't have the 270X anymore so marking as a dup of another report.

As noted in that report, it seems easier to provoke this with current kernels.

It's possible the mesa commit just changed some timing that exposed the issue more.

Currently on Tonga I can provoke a similar issue (not oops - just ring stall) and reverting the mesa commit in this bug doesn't help, but playing around with cpufreq can change how easy it is to provoke.

*** This bug has been marked as a duplicate of bug 91009 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.