Bug 98271

Summary: [radeonsi]Playing videos with vdpau or vaapi hardware acceleration crashes my pc
Product: Mesa Reporter: snpidek <snpidek>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: CLOSED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: john.ettedgui
Version: 12.0   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: A quick script I wrote to trigger the issue.
dmesg written by the script before I restart the machine
another dmesg from another run
Xorg log

Description snpidek 2016-10-15 08:13:21 UTC
I am using ArchLinux and when I've first updated mesa to 12.0 when it first came out, my pc has been crashing while watching videos.

My radeon card is kaveri

I tested both vaapi and vdpau and both are affected with this, also i tried different kernels from 4.4 lts to 4.6 and 4.7, it's not any different. Also I tried both smplayer/mpv and vlc and both are affected.

When I've downgraded to mesa 11.2.2-1 and the problem was no more.

So I though if I waited out to see if it gets fixed, but still even though 12.0.3 came out, the problem is still present.


The problem is not really common, if I was watching videos for the whole day my pc would crash only once.

How crash happens is screen goes blank and I can't hear my fans spinning even tho lights are still on indicating that pc is still powered on and pressing any buttons doesn't work except restarting it.
Comment 1 John 2016-10-17 04:24:51 UTC
I may have the same problem, and I know how to trigger it *easily*:

When watching videos with mpv using vdpau's output, if I quickly go and back forth in the video, eventually the system will freeze, go blank, and I have to reset it manually.
If I wait maybe a second between jumps, I get no issue.
I tried it with xv, and no issue no matter how many jumps.

I'm on mesa-git and have had that issue for a while, but not sure how long anymore, so it may or may not be the same as here. The card is 280x.

Since snpidek gave an entry point, I guess we should try bisecting this.
Comment 2 Christian König 2016-10-17 12:03:14 UTC
(In reply to John from comment #1)
> I may have the same problem, and I know how to trigger it *easily*:

Thanks, that is a very valuable information. Going to try to reproduce this, cause previously that sounded like a bug we will never get a grip on.

> Since snpidek gave an entry point, I guess we should try bisecting this.

Yeah, completely agree. If you can reproduce it more or less reliable please try to bisect the issue between 11.2.2-1 and 12.0.3.
Comment 3 John 2016-10-17 14:48:44 UTC
Well I tried bisecting it today assuming 11.2.2 and got nowhere so I tried at commit 	3a9f6283f435f90ca1a2901be39ec9d629c95bb6 and it still froze.

Because of that I am not sure if that is the same problem or not...

I'll attach a few things in case.
Comment 4 John 2016-10-17 14:53:52 UTC
Created attachment 127357 [details]
A quick script I wrote to trigger the issue.

It takes a video file as an input (I used an X264 mkv movie file if it matters).

It doesn't happen as quickly as I thought originally, as I've had runs up to 25 minutes (and some in seconds..).
I added the 2nd sleep to simulate better the speed at which I would usually press keys, but maybe it just delays the whole thing, not sure.
Comment 5 John 2016-10-17 14:54:57 UTC
Created attachment 127358 [details]
dmesg written by the script before I restart the machine

Since there are quite some lines in dmesg about the issue, the computer is obviously not fully dead.
Comment 6 John 2016-10-17 14:55:33 UTC
Created attachment 127359 [details]
another dmesg from another run

not sure if it helps but in case its information is a bit different.
Comment 7 John 2016-10-17 14:56:07 UTC
Created attachment 127360 [details]
Xorg log
Comment 8 John 2016-10-17 15:00:04 UTC
I'll try today to go a bit further than 11.2, if anything in the logs give you an idea of a good starting point please do share.
Comment 9 Andy Furniss 2016-10-17 16:09:20 UTC
(In reply to John from comment #4)
> Created attachment 127357 [details]
> A quick script I wrote to trigger the issue.

For me this would use s/w dec + --vo=opengl with current mpv.

I guess you have a config or something that changes the mpv defaults? If so maybe specify what they are, though I don't think I can reproduce with TONGA using amdgpu anyway.
Comment 10 John 2016-10-17 23:41:50 UTC
Correct, I have an mpv config with:

hwdec=vdpau
hwdec-codecs=all
vo=opengl-hq

The rest shouldn't matter I believe.
Comment 11 John 2016-10-18 00:52:24 UTC
I tried going back to the commit of 11.0 (so with llvm 3.7) but I still got the issue.

I'd guess the bug is in the kernel not mesa, because I don't think I've had the issue for so long, I could be wrong though.
Comment 12 John 2016-10-18 06:52:52 UTC
Thankfully having amgdpu working with SI gave me something else to try.
So I ran the same script with amdgpu instead of radeon (alas and a 4.9 kernel instead of a 4.8...), back on the latest code from mesa's git: the script ran for 2 hours before I killed it.

Since 2 hours is not that much more significant than the maximum of half hour before crash that I've seen so far, I won't say that's it yet. I'll run the script over night again and if it still doesn't crash then it should be good enough to know.
Comment 13 Christian König 2016-10-18 08:57:43 UTC
John, please double check that you are actually correctly installing VDPAU.

E.g. add something like "while(1);" into the VDPAU driver create function or something like this. vdp_imp_device_create_x11() would be a good place for that.

That it's a kernel issue came to my mind as well, but we haven't changed anything on UVD in the radeon module in quite a while. So this is a bit unlikely.
Comment 14 John 2016-10-18 10:08:58 UTC
> John, please double check that you are actually correctly installing VDPAU.
> E.g. add something like "while(1);" into the VDPAU driver create function or something like this. vdp_imp_device_create_x11() would be a good place for that.

Alright, I've just tried that and mpv seems to be waiting, no error in output nor in dmesg, which should be as expected I guess.
Please tell me if you can think of any other thing I can test.


> That it's a kernel issue came to my mind as well, but we haven't changed anything on UVD in the radeon module in quite a while. So this is a bit unlikely.

I looked at radeon_uvd.c's history quickly and there were a few in the time period I'd think of.
Based on the date I'd guess possibly either of the kernel commits on 2016-05-05, probably nothing later, and before is so far away.

Is amdgpu using the same firmware as radeon for SI? if not maybe that's another option for the culprit.
Comment 15 Andy Furniss 2016-10-18 13:19:44 UTC
(In reply to John from comment #10)
> Correct, I have an mpv config with:
> 
> hwdec=vdpau
> hwdec-codecs=all
> vo=opengl-hq
> 
> The rest shouldn't matter I believe.

If your mpv is not too old then there is an issue with vo=opengl-hq + hwdec that means you only get half vrez.

May or may not affect this issue - I don't know.

https://bugs.freedesktop.org/show_bug.cgi?id=97988

Do you crash with vo=vdpau with radeon?
Comment 16 John 2016-10-18 13:27:02 UTC
I actually had to use -vo vdpau when I tried against mesa 11.0 (somehow mpv didn't work with ogl-hq on that version) so I know it is problematic.

But the bug you link is still interesting to me, as I was wondering why my movies looked aliased lately, so thanks for that!
Comment 17 Alex Deucher 2016-10-18 13:32:53 UTC
amdgpu does not support UVD or VCE on SI parts yet.
Comment 18 John 2016-10-18 13:35:22 UTC
> amdgpu does not support UVD or VCE on SI parts yet.
oh, I should have verified in dmesg.
Sorry about that.

What should I try next?
Comment 19 Christian König 2016-10-18 15:30:28 UTC
(In reply to John from comment #18)
> What should I try next?

Installing an older kernel, see if that works with 12.0 mesa.

If yes we have narrowed it down to the kernel, if not we need to stick a bit more into mesa.

Another possibility which came to my mind is that this might not we an issue with UVD decoding, but rather presenting it.

E.g. install both VDPAU and OpenGL from a certain Mesa version *AND* make sure that you restart X after that so that the X acceleration uses the new library versions as well.
Comment 20 John 2016-10-19 00:51:13 UTC
> Installing an older kernel, see if that works with 12.0 mesa.
> If yes we have narrowed it down to the kernel, if not we 
> need to stick a bit more into mesa.
I've tried with a 3.18 kernel and still got the issue, so the issue is not in the kernel. I had the firmware files from that date as well to eliminate that possibility.

> Another possibility which came to my mind is that this might not
> we an issue with UVD decoding, but rather presenting it.
> E.g. install both VDPAU and OpenGL from a certain Mesa version
> *AND* make sure that you restart X after that so that the
> X acceleration uses the new library versions as well.
Now this is interesting, as the reboot were only post-freeze so never to test a certain mesa version.
I've rolled back to 11 and restarted the computer and will try.

Since you mentioned presenting, could it be the DDX?


New information: I don't need to have the video on screen for the issue to happen. I can alt-tab or switch to another virtual desktop while the script runs and it still freezes.
Comment 21 John 2016-10-21 01:21:20 UTC
Sorry for the late update I wanted to tests a few more things first.

So I went back to a 11 mesa and rebooted before testing, no difference.
I tried reverting to the DDX from back then, and disabling DRI3 (which I don't think the DDX supported anyway), and still no difference.

Then I thought a bit more about what Andy wrote and updated back mesa stuff to latest git and a 4.8 kernel, but downgraded mpv to 0.10.0 (about same date as mesa 11). Now I was able to have a 4 hours run without any issue and then a 7 hours run still without issues. So maybe this is it after all.
I'll try for a last longer run, and then maybe try bisecting mpv.
Should I keep posting results here or would that be more of an mpv issue?
Comment 22 John 2016-10-21 01:21:39 UTC
Sorry for the late update I wanted to tests a few more things first.

So I went back to a 11 mesa and rebooted before testing, no difference.
I tried reverting to the DDX from back then, and disabling DRI3 (which I don't think the DDX supported anyway), and still no difference.

Then I thought a bit more about what Andy wrote and updated back mesa stuff to latest git and a 4.8 kernel, but downgraded mpv to 0.10.0 (about same date as mesa 11). Now I was able to have a 4 hours run without any issue and then a 7 hours run still without issues. So maybe this is it after all.
I'll try for a last longer run, and then maybe try bisecting mpv.
Should I keep posting results here or would that be more of an mpv issue?
Comment 23 John 2016-10-29 22:24:38 UTC
Took me a while but it seems to come from commit 6b22b216514ee2eb784711f4539410d3b312a4fd

Author: wm4 <wm4@nowhere>
Date:   Mon Nov 16 16:22:23 2015 +0100

    vo_opengl: attempt to improve GLX vs. EGL backend detection
    
    For the sake of vaapi interop, we want to use EGL, but on the other
    hand, but because driver developers are full of shit, vdpau interop will
    not work on EGL (even if the driver supports EGL). The latter happens
    with both nvidia and AMD Mesa drivers.
    
    Additionally, EGL vaapi interop support can apparently only detected at
    runtime by actually using it. While hwdec_vaegl.c already does this, it
    would require initializing libva on _every_ system, which will cause
    libav to print an unpreventable bullshit message to the terminal.
    
    Try to counter these huge loads of bullshit by adding more fucking
    bullshit.
Comment 24 John 2017-05-08 08:55:01 UTC
Well now I get the same problem in Kodi as well :/
Comment 25 GitLab Migration User 2019-09-25 17:55:10 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1238.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.