Bug 85320

Summary: [RV620][RV630][RS880] GPU hangs using UVD hardware acceleration
Product: Mesa Reporter: Eugene <ken20001>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: high CC: arthur.marsh, ckoenig.leichtzumerken, daniele.rogora, freedesktop.jim-j, ken20001, nicolamori, russianneuromancer, tanertas, zima
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Xorg.log
dmesg
Xorg.log with backtrace
dmesg
syslog
dmesg log, gpu hangs
UVD hang on RV620
journalctl dump for uvd/vdpau crash
Failed UVD playback session with RS780 mainboard
Something to test
More to test.

Description Eugene 2014-10-22 09:31:36 UTC
With new kernel 3.18RC1 trying hardware acceleraton I discovered GPU hangs.
Running 'mpv -vo vdpau --hwdec vdpau filename' starts to play movie and all it seems ok. But if we try to rewind forward/backward first it, playing, hangs. Than all system hangs. Switching to another VT in dmesg appears:

[drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
[drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).

Switching back to VT7 we can watch that image appears and disappears. Restarting ligtdm service allowed me to log in to account. But anyway all system switched to XRender and no OpenGL until I fully reboot.

mpv --version
mpv git-f5a19f6 (C) 2000-2014 mpv/MPlayer/mplayer2 projects
 built on 2014-10-18T11:58:42
ffmpeg library versions:
   libavutil       54.7.100
   libavcodec      56.1.100
   libavformat     56.4.101
   libswscale      3.0.100
   libavfilter     5.1.100
   libavresample   2.1.0

vdpauinfo 0.1-1
mesa-vdpau-drivers 10.4~git1410211930.ef280c~gd~t
libgl1-mesa-glx 10.4~git1410211930.ef280c~gd~t

Kubuntu 14.04.1
Linux: 3.18RC1 x86_64
Graphics: Radeon HD2600 XT
Comment 1 Alex Deucher 2014-10-22 17:26:07 UTC
If this is a regression can you narrow down which component (kernel, mesa, etc.) caused the problem and bisect?  Please also attach your xorg log and dmesg output.
Comment 2 Eugene 2014-10-22 17:30:43 UTC
I suspect this is the same as: https://bugs.freedesktop.org/show_bug.cgi?id=85323
Comment 3 Eugene 2014-10-22 17:31:59 UTC
(In reply to Alex Deucher from comment #1)
> If this is a regression can you narrow down which component (kernel, mesa,
> etc.) caused the problem and bisect?  Please also attach your xorg log and
> dmesg output.

And yes, I would do bisect if somebody explain how to. But I don't know.
Comment 4 Eugene 2014-10-22 17:32:58 UTC
Created attachment 108252 [details]
Xorg.log
Comment 5 Eugene 2014-10-22 17:33:28 UTC
Created attachment 108253 [details]
dmesg
Comment 6 Alex Deucher 2014-10-22 17:35:03 UTC
Google for "git bisect howto".  There are lots of good tutorials.
Comment 7 Eugene 2014-10-22 17:41:07 UTC
(In reply to Alex Deucher from comment #6)
> Google for "git bisect howto".  There are lots of good tutorials.
What exactly I shoud bisect, mesa ? Where to get it ?
Comment 8 Alex Deucher 2014-10-22 17:43:55 UTC
(In reply to Eugene from comment #7)
> (In reply to Alex Deucher from comment #6)
> > Google for "git bisect howto".  There are lots of good tutorials.
> What exactly I shoud bisect, mesa ? Where to get it ?

Can you narrow down whether it was a mesa update or a kernel update that caused the regression?  You'll need to do that first.  Once you've figured that out, you can bisect the appropriate component (mesa or kernel).  Mesa git info is here:
http://cgit.freedesktop.org/mesa/mesa/
kernel git info:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/
Comment 9 Eugene 2014-10-22 17:47:52 UTC
The thing is that Linux 3.18 is the first kernel with which hardwear acceleration for HD2600 became possible. So I don't think the regression takes place.
Comment 10 Eugene 2014-10-22 22:37:17 UTC
Recently I checked Mesa 10.1.3 stable. The same result: system hangs, display becomes dark. No possibility to switch to any VT.
So, as I understood, this is not a regression. This is just new feature (for my graphics adapter) that comes with Linux 3.18. And it is very unstable.
Comment 11 Christian König 2014-10-23 08:24:12 UTC
You need the bleeding edge Mesa code to get video accerleration working on HD2600.

But even then the hardware on the HD2600 is so buggy that it is really tricky to get this working right.
Comment 12 Eugene 2014-10-23 18:43:52 UTC
Anyway I'm ready to test anything that possible.
Comment 13 Eugene 2014-11-01 21:40:16 UTC
I was able to save logs after playing video with VLC with VDPAU and hardware decoding turned on. There are also backtrace in Xorg log file. Please, look in attachment.
Comment 14 Eugene 2014-11-01 21:41:50 UTC
Created attachment 108772 [details]
Xorg.log with backtrace
Comment 15 Eugene 2014-11-01 21:43:37 UTC
Created attachment 108773 [details]
dmesg
Comment 16 Eugene 2014-11-01 21:44:10 UTC
Created attachment 108774 [details]
syslog
Comment 17 Daniele 2014-11-21 17:09:37 UTC
I see the same happening here with a HD4200 (RS880) GPU.

I tried with:

- kernel 3.18rc3
- latest radeon ucode firmware files from the 23rd of August
- updated mesa, xserver and libs from the oibaf ubuntu ppa

I observed the same behavior: video starts well for a second, then the image freeze (but mouse cursor is still alive) and I can't do anything, not even switching to another vt. After a while the screen becomes black.

In the meantime I can hear the sound of the video correctly, and indeed if I log in with ssh from another pc everything is working well; still xserver doesn't work until a reboot is performed.

I attach a piece of the dmesg log taken from the ssh session right after the video was played where the GPU hang is reported. Xorg doesn't log any error.
Comment 18 Daniele 2014-11-21 17:10:33 UTC
Created attachment 109807 [details]
dmesg log, gpu hangs
Comment 19 Eugene 2014-11-21 18:21:34 UTC
This is quite strange cause my xorg log shows a lot of error messages.
Comment 20 D Rhodes 2014-11-22 14:15:59 UTC
There was a VDPAU fix in MPV 0.6.1 that fixed this kind of lockup on my hardware: RS880G.

My videos are all 1080i50 H.264 recordings from DVB-S2 broadcasts of various channels. Since they are just chunks of transport stream, I think that lockups when starting a video and when seeking would be similar.

Also I have ~oibaf MESA from 4th October, the first with the VAAPI state tracker, and I tested that too. Here is a summary of what I found, with MPV 0.6.1

--hwdec=vdpau --vo=vdpau does not lock up the GPU but...

--hwdec=vdpau --vo=vdpau --deinterlace locks the GPU immediately.

I assume deinterlace will not work until there is a proper workaround for the frame based output. UVD is not useful for me without deinterlace.

--hwdec=vaapi --vo=vaapi locks up the GPU quite often when starting to play a video, similar to vdpau on MPV 0.6.0

--hwdec=vaapi --vo=opengl does not lock up the GPU but there is quite a lot of coloured or speckled picture corruption on my video.
Comment 21 Eugene 2014-11-22 14:20:05 UTC
There is a VLC and using it with HW acceleration turned on also locks up GPU.
Comment 22 Daniele 2014-11-23 15:11:27 UTC
My previous tests were done with VLC and Gstreamer.

I tried with MPV 0.6.2 as well but GPU still hangs for me. 

The video I use is a 1080p, so no deintetlacing needed, I never tried to use it.
Comment 23 Eugene 2014-11-23 15:24:02 UTC
Also tried mpv 0.6.2 and also GPU hangs.
Comment 24 Daniele 2014-11-24 12:00:36 UTC
Ok I've just learnt something interesting: suspending kwin desktop effects make everything work flawlessy here.

I tried mpv, vlc and even the flash plugin and hw accel works well; I tried also seeking in the video without any problem.

Now we should find out if the problems is only there with kwin or even with other desktop environments.
Comment 25 Daniele 2014-11-24 12:14:28 UTC
Update: there are still some videos (taken with my smartphone) causing the GPU to hang as it did before, both with VLC and mpv, but I confirm that I can now play youtube with hw acceleration.
Comment 26 Eugene 2014-12-21 20:45:04 UTC
Linux 3.19RC1. Nothing's changed.
Comment 27 Elvis Fox 2014-12-22 17:40:09 UTC
Created attachment 111184 [details]
UVD hang  on RV620

Can confirm the same thing on RV620.
dmesg log attached
Comment 28 Elvis Fox 2014-12-22 18:04:53 UTC
As for me, disabling kwin effect does not change anything

Also looked briefly through the attachments, it seems that me and Daniele are reporting different issue than OP.
Comment 29 Christian König 2014-12-23 10:09:47 UTC
(In reply to Eugene from comment #26)
> Linux 3.19RC1. Nothing's changed.

You don't need to test every new kernel version. I'm going to leave a note here if I find time to work on this issue.

On the other hand if you want to get your hands dirty and try a fe things than I can give you dirrections on what could it be (but you need to get into the code yourself).
Comment 30 Eugene 2014-12-23 12:17:11 UTC
(In reply to Christian König from comment #29)
> (In reply to Eugene from comment #26)
> > Linux 3.19RC1. Nothing's changed.
> 
> You don't need to test every new kernel version. I'm going to leave a note
> here if I find time to work on this issue.
> 
> On the other hand if you want to get your hands dirty and try a fe things
> than I can give you dirrections on what could it be (but you need to get
> into the code yourself).

Thanks, it would be great if you'll decide to work on this issue. If you'll need any any additional info, any test I'm ready to help with it. It's a pity but I'm not a programmer, so I can't write a code. But any other things that would help I'll do all I can.
Comment 31 Maksim Kachur 2015-01-06 00:24:18 UTC
I think I have the same on RS880 here (HD4290).
Mesa 10.3.5, libdrm 2.4.58, kernel 3.18.1, R600_rlm.bin + RS780_uvd.bin firmware from August 2014.
Testing with mplayer + vdpau.
Comment 32 Maksim Kachur 2015-01-06 00:44:35 UTC
Created attachment 111799 [details]
journalctl dump for uvd/vdpau crash
Comment 33 russianneuromancer 2015-03-02 10:30:20 UTC
Same issue with RS880M. Sometimes hang on very beginning (before first frame appear on the screen) and sometimes after one/few/many attempts to rewind.

Tested with Linux 3.19, latest Mesa snapshot from Oibaf PPA, and mpv 0.8.
Comment 34 Erik 2015-03-03 04:54:22 UTC
Same issue with a HD4290 (RS880). My motherboard has AM3+ socket so is not obsolete hardware, it's support uvd2 and the last AMD procesors. It would be nice to fix it. If you need some help i can dirty my hands too :)
Comment 35 Taner 2015-03-25 20:46:31 UTC
Created attachment 114623 [details]
Failed UVD playback session with RS780 mainboard
Comment 36 Taner 2015-03-25 20:47:20 UTC
Comment on attachment 114623 [details]
Failed UVD playback session with RS780 mainboard

I can confirm the same happening my bleeding edge Arch linux system with my RS780/HD3200 mainboard.

Kernel 3.19.2
Mesa 10.5.1
libdrm 2.4.60
xf86-video-ati 7.5.0

I tried UVD acceleration with mpv, vdr/softhddevice and flashplugin via vdpau output enabled. Screen freezes if I seek forward/back and If I adjust playback window size (eg. going to fullscreen).

I can switch the VT consoles after X session freezes. I can create a new working X session after killing the previous freezed X session but playback video using GPU is not possible anymore. I have to reboot the system.

dmesg attached.
Comment 37 Christian König 2015-04-28 10:51:01 UTC
Created attachment 115398 [details] [review]
Something to test

Just an idea I had recently what this issue could be. Please test the attached patch and see if it works or not.
Comment 38 Erik 2015-04-30 01:33:21 UTC
I patched a 4.0.1 kernel on a Debian 8 for testing ..... i will test it with several movies :)
Comment 39 Nicola Mori 2015-04-30 12:34:39 UTC
(In reply to Christian König from comment #37)
> Created attachment 115398 [details] [review] [review]
> Something to test
> 
> Just an idea I had recently what this issue could be. Please test the
> attached patch and see if it works or not.

The patch made things on my system even worse. Before, playing a movie with VLC and VDPAU enabled resulted in random screen freezes after some time, while with the patched kernel the freeze happens immediately as I start playing, all the times.

Tested with Mobility Radeon HD3400 (RV620) on ArchLinux 64 bit with linux-ck 4.0.1, mesa 10.5.4, mesa-vdpau 10.5.4 and libvdpau 1.1.
Comment 40 Erik 2015-04-30 12:44:38 UTC
I got a black screen at the beggining, and console show this:

Radeon 0000:01:05.0 ring 5 stalled for more than .....secs
Comment 41 Christian König 2015-04-30 12:56:48 UTC
Created attachment 115475 [details]
More to test.

Interesting, attached is another patch you could test.

It just disables using UVD semaphores for now.
Comment 42 Erik 2015-04-30 19:40:52 UTC
Works like a charm :)
Comment 43 Daniele 2015-04-30 20:33:07 UTC
I confirm that your last patch makes things work here too
Comment 44 Nicola Mori 2015-05-01 14:39:36 UTC
The new patch works also for me. A couple of questions, Christian: does the patch remove some features? Do you think to mainline it or rather implement a different fix now that the problem seems to be better defined? Thanks.
Comment 45 Alex Deucher 2015-05-01 14:46:29 UTC
(In reply to Nicola Mori from comment #44)
> The new patch works also for me. A couple of questions, Christian: does the
> patch remove some features? Do you think to mainline it or rather implement
> a different fix now that the problem seems to be better defined? Thanks.

It disables hw semaphores for UVD1, but it's likely they were buggy on that early hw anyway.  It shouldn't affect UVD functionality.  The driver just uses a different method for synchronizing between rings.
Comment 46 Christian König 2015-05-01 16:23:19 UTC
(In reply to Nicola Mori from comment #44)
> The new patch works also for me. A couple of questions, Christian: does the
> patch remove some features? Do you think to mainline it or rather implement
> a different fix now that the problem seems to be better defined? Thanks.

Instead of submitting the commands to the hardware directly with semaphores to sync between the GFX and UVD engines we block until the dependent task is completed.

That's rather bad in a couple of different cases, for example doing 3D gaming and video playback at the same time.

What essentially happens is instead of keeping UVD and GFX busy all the same time (and only occasionally block one engine waiting the other one) you do it more like this:

1. Run UVD job.
2. Wait for UVD to finish.
3. Run GFX.
4. Wait for GFX to finish.
5. Run UVD
6. Wait for UVD to finish.
....
Comment 47 Nicola Mori 2015-05-01 16:44:02 UTC
Thanks for the clarification, Christian. Could it also impact other cases, e.g. a desktop environment with hardware accelerated visual effects? Playing a movie while resizing or dragging a window should result in a usage pattern of GFX and UVD that is similar to your example with movie and 3D game. I did some experiments with my KDE desktop with OpenGL 2.0 and 3.1 visual effects, but I didn't notice any lag (maybe it's a too light workload to show any issue).

Given the comment by Alex about the possibly buggy UVD1 hardware semaphores, the overall satisfactory performance of the patch and the old hardware affected by the bug I would guess that this patch likely is the final fix for this bug. If so, when will it be mainlined (approx.)? Thanks.
Comment 48 Christian König 2015-05-01 17:34:03 UTC
(In reply to Nicola Mori from comment #47)
> Thanks for the clarification, Christian. Could it also impact other cases,
> e.g. a desktop environment with hardware accelerated visual effects?

Not really, that is way to less load to cause any real trouble. 3D games on the other hand are a different story.

> Given the comment by Alex about the possibly buggy UVD1 hardware semaphores,
> the overall satisfactory performance of the patch and the old hardware
> affected by the bug I would guess that this patch likely is the final fix
> for this bug. If so, when will it be mainlined (approx.)?

Alex merged it into hist drm-fixes-4.1 branch and I put a CC stable on it. So if everything works well it will show up in 4.1 and is then backported to the stable kernel versions used by distributions.
Comment 49 Petr Zima 2015-05-08 16:00:13 UTC
Hello, I have the same problem with RV730.  The "disable semaphores ..." patch makes it better, but not completely, UVD is still not usable.  Please follow me to bug #67994 which seems more approprite for RV730.
Comment 50 Christian König 2015-05-11 07:46:57 UTC
*** Bug 88152 has been marked as a duplicate of this bug. ***
Comment 51 Christian König 2015-05-11 07:56:37 UTC
(In reply to zimous from comment #49)
> Hello, I have the same problem with RV730.  The "disable semaphores ..."
> patch makes it better, but not completely, UVD is still not usable.  Please
> follow me to bug #67994 which seems more approprite for RV730.

Please add a new bug report for this, cause as you already wrote on bug #67994 your bug has different symptoms than this one here.

Closing this bug as the problem seems to be solved now.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.