Bug 88152

Summary: 720p and 1080 H.264 videos lock-up on playback with vlc / vdpau on Radeon 3850HD
Product: Mesa Reporter: Arthur Marsh <arthur.marsh>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED DUPLICATE QA Contact:
Severity: major    
Priority: medium CC: freedesktop.jim-j
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: dmesg output
dmesg output
2015010719dmesg.txt
20150107dmesg.txt
20150112dmesg.txt
2015011216dmesg.txt - dmesg output with 3.19.0-rc4+
20150113dmesg.txt - video run under kernel 3.19.0-rc4+
vlcdebug2.log, output from running vlc with VLC_VERBOSE=2
2015011322dmesg.txt dmesg output with GPU lockup when VLC_VERBOSE=2
2015011404dmesg.txt dmesg output with mesa 10.4.2
2015011421dmesg.txt - GPU lock-up less than 2 minutes into video play-back.
20150119dmesg.txt 3.19.0-rc5 dmesg
2015011907dmesg.txt - lockup with vlc and 3.19.0-rc5
2015011922dmesg.txt - lockup with 3.19.0-rc5 and 848x480 resolution video
20150122dmesg.txt lock-up with same video after radeon updates to 3.19.0-rc5
2015012222dmesg.txt test after upgrading vlc.
20150127dmesg.txt with 3.19.0-rc6 - lock-up after a few seconds of video play
2015012718dmesg.txt lock-up with first post 3.19.0-rc6 patches applied
2015012814dmesg.txt - lockup after updating kernel to latest radeon patches
20150130dmesg.txt - lock-up with the latest git head patches
20150207dmesg.txt - lock-up about 5 and a half minutes into video playback
20150209dmesg.txt didn't lock up on usual video but did on another with 3.19 kernel
2015021218dmesg.txt - lockup of screen except for mouse, was able to restart kdm
multiple lockup errors immediately after starting video 2015031613dmesg.txt

Description Arthur Marsh 2015-01-07 11:35:08 UTC
Created attachment 111903 [details]
dmesg output

Running Debian unstable x86/64 on AMD64 (4 cores) using a Radeon 3850HD graphics card with Linus git head kernel.

H.264 videos at 720p or 1080p resolution cause lock-up with VLC (but not with mpv --vo vdpau).

Using: mesa-vdpau-drivers:amd64               10.3.2-1
libdrm-radeon1:amd64                   2.4.58-2
xserver-xorg-video-radeon              1:7.5.0-1
vlc                                    2.2.0~rc2-1

I don't have any short non-commercial videos to demonstrate the problem, but it does appear more likely with encodes from broadcast sources than blu-rays, suggesting that corrupt video gets fed through to where it causes a gpu lock-up while audio playback and other processes continue.

$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] RS780 Host Bridge
00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] RS780 PCI to PCI bridge (ext gfx port 0)
00:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] RS780 PCI to PCI bridge (PCIE port 2)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.1 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0 USB OHCI1 Controller
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.1 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0 USB OHCI1 Controller
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 3a)
00:14.1 IDE interface: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 IDE Controller
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV670 [Radeon HD 3690/3850]
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] RV670/680 HDMI Audio [Radeon HD 3690/3800 Series]
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)

# vdpauinfo
display: :0   screen: 0
API version: 1
Information string: G3DVL VDPAU Driver Shared Library version 1.0

Video surface:

name   width height types
-------------------------------------------
420     8192  8192  NV12 YV12 
422     8192  8192  UYVY YUYV 
444     8192  8192  Y8U8V8A8 V8U8Y8A8 

Decoder capabilities:

name               level macbs width height
-------------------------------------------
MPEG1                 0  9216  2048  1152
MPEG2_SIMPLE          3  9216  2048  1152
MPEG2_MAIN            3  9216  2048  1152
H264_BASELINE        41  9216  2048  1152
H264_MAIN            41  9216  2048  1152
H264_HIGH            41  9216  2048  1152
VC1_ADVANCED          4  9216  2048  1152

Output surface:

name              width height nat types
----------------------------------------------------
B8G8R8A8          8192  8192    y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 
R8G8B8A8          8192  8192    y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 
R10G10B10A2       8192  8192    y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 
B10G10R10A2       8192  8192    y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 

Bitmap surface:

name              width height
------------------------------
B8G8R8A8          8192  8192
R8G8B8A8          8192  8192
R10G10B10A2       8192  8192
B10G10R10A2       8192  8192
A8                8192  8192

Video mixer:

feature name                    sup
------------------------------------
DEINTERLACE_TEMPORAL             y
DEINTERLACE_TEMPORAL_SPATIAL     -
INVERSE_TELECINE                 -
NOISE_REDUCTION                  y
SHARPNESS                        y
LUMA_KEY                         -
HIGH QUALITY SCALING - L1        -
HIGH QUALITY SCALING - L2        -
HIGH QUALITY SCALING - L3        -
HIGH QUALITY SCALING - L4        -
HIGH QUALITY SCALING - L5        -
HIGH QUALITY SCALING - L6        -
HIGH QUALITY SCALING - L7        -
HIGH QUALITY SCALING - L8        -
HIGH QUALITY SCALING - L9        -

parameter name                  sup      min      max
-----------------------------------------------------
VIDEO_SURFACE_WIDTH              y        48     2048
VIDEO_SURFACE_HEIGHT             y        48     1152
CHROMA_TYPE                      y  
LAYERS                           y         0        4

attribute name                  sup      min      max
-----------------------------------------------------
BACKGROUND_COLOR                 y  
CSC_MATRIX                       y  
NOISE_REDUCTION_LEVEL            y      0.00     1.00
SHARPNESS_LEVEL                  y     -1.00     1.00
LUMA_KEY_MIN_LUMA                y  
LUMA_KEY_MAX_LUMA                y  

I am happy to supply additional information or try things to narrow down the source of the problem.
Comment 1 Arthur Marsh 2015-01-07 11:35:48 UTC
Created attachment 111904 [details]
dmesg output
Comment 2 Arthur Marsh 2015-01-07 11:36:39 UTC
Created attachment 111905 [details]
2015010719dmesg.txt
Comment 3 Arthur Marsh 2015-01-07 11:37:29 UTC
Created attachment 111906 [details]
20150107dmesg.txt
Comment 4 Arthur Marsh 2015-01-11 13:12:20 UTC
The commit:

https://github.com/torvalds/linux/commit/dd5a74f2f982193620cfa1ef609df1ee805781d4

appears to at least reduce the problem.

Is there any (semi-)automated way to check for any more occurences of signed variables that should be unsigned?
Comment 5 Arthur Marsh 2015-01-11 14:47:55 UTC
Created attachment 112092 [details]
20150112dmesg.txt

After updating to the Linus git head kernel with 
https://github.com/torvalds/linux/commit/dd5a74f2f982193620cfa1ef609df1ee805781d4 applied, and applying the patch at http://article.gmane.org/gmane.linux.kernel.mm/127052 I still had vlc lock-up with one video with 20150112dmesg.txt dump.

mediainfo reports the video format as:

Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom
File size                                : 2.54 GiB
Duration                                 : 48mn 0s
Overall bit rate                         : 7 579 Kbps
Encoded date                             : UTC 2010-01-10 03:49:24
Tagged date                              : UTC 2010-01-10 03:49:24

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4.0
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 3 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 48mn 0s
Source duration                          : 47mn 59s
Bit rate                                 : 7 110 Kbps
Width                                    : 1 440 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 29.970 fps
Standard                                 : NTSC
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.153
Stream size                              : 2.38 GiB (94%)
Source stream size                       : 2.49 GiB (98%)
Language                                 : English
Encoded date                             : UTC 2010-01-10 03:49:24
Tagged date                              : UTC 2010-01-10 03:49:24
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709
mdhd_Duration                            : 2880936
Comment 6 Christian König 2015-01-11 15:09:32 UTC
(In reply to Arthur Marsh from comment #4)
> The commit:
> 
> https://github.com/torvalds/linux/commit/
> dd5a74f2f982193620cfa1ef609df1ee805781d4
> 
> appears to at least reduce the problem.

The patch can't affect the issue, cause it only applies to non UMS mode which is completely deprecated and doesn't support UVD at all.
Comment 7 Arthur Marsh 2015-01-12 06:48:42 UTC
Created attachment 112109 [details]
2015011216dmesg.txt - dmesg output with 3.19.0-rc4+

problem further reduced but not eliminated.
Comment 8 Arthur Marsh 2015-01-12 15:11:37 UTC
previous 2015011216dmesg.txt was inadvertantly with 3.19.0-rc3+

When playing the same video under kernel 3.19.0-rc4+ it played for longer before locking up.
Comment 9 Arthur Marsh 2015-01-12 15:12:56 UTC
Created attachment 112129 [details]
20150113dmesg.txt - video run under kernel 3.19.0-rc4+
Comment 10 Arthur Marsh 2015-01-13 12:23:11 UTC
When I ran vlc on the file with VLC_VERBOSE=3 I had no GPU lockup. 

When I ran vlc on the file with VLC_VERBOSE=2 the GPU locked up again around the same time as the previous test with kernel 3.19.0-rc4+. Might that suggest a timing issue?
Comment 11 Arthur Marsh 2015-01-13 12:25:08 UTC
Created attachment 112166 [details]
vlcdebug2.log, output from running vlc with VLC_VERBOSE=2
Comment 12 Arthur Marsh 2015-01-13 12:26:30 UTC
Created attachment 112167 [details]
2015011322dmesg.txt dmesg output with GPU lockup when VLC_VERBOSE=2
Comment 13 Arthur Marsh 2015-01-13 18:30:26 UTC
Upgraded mesa-related packages to 10.4.2-1.

Still seeing a lockup a few minutes into video playback.
Comment 14 Arthur Marsh 2015-01-13 18:31:51 UTC
Created attachment 112175 [details]
2015011404dmesg.txt dmesg output with mesa 10.4.2
Comment 15 Arthur Marsh 2015-01-14 11:40:35 UTC
Upgraded to current Linus git head and tried again.

This time there was no GPU reset associated with starting kdm (which had happened over the last few days), but the lock-up when playing the same video came less than 2 minutes into the video, much sooner than before.
Comment 16 Arthur Marsh 2015-01-14 11:41:34 UTC
Created attachment 112211 [details]
2015011421dmesg.txt - GPU lock-up less than 2 minutes into video play-back.
Comment 17 Arthur Marsh 2015-01-18 16:42:30 UTC
With kernel 3.19.0-rc5 the same video played right through with vlc without locking up.
Comment 18 Arthur Marsh 2015-01-18 16:43:35 UTC
Created attachment 112426 [details]
20150119dmesg.txt 3.19.0-rc5 dmesg
Comment 19 Arthur Marsh 2015-01-19 07:54:43 UTC
A further run of the same video with kernel 3.19.0-rc5, doing some skipping of the the video.
Comment 20 Arthur Marsh 2015-01-19 07:56:12 UTC
Created attachment 112447 [details]
2015011907dmesg.txt - lockup with vlc and 3.19.0-rc5
Comment 21 Arthur Marsh 2015-01-19 11:50:04 UTC
Created attachment 112458 [details]
2015011922dmesg.txt - lockup with 3.19.0-rc5 and 848x480 resolution video

First lock-up with lower than 720p resolution video playback
Comment 22 Arthur Marsh 2015-01-21 16:10:30 UTC
Created attachment 112606 [details]
20150122dmesg.txt lock-up with same video after radeon updates to 3.19.0-rc5

Rebuilt the kernel after the latest Radeon updates to Linus' 3.19.0-rc5, lock-up occurred sooner.
Comment 23 Arthur Marsh 2015-01-22 11:55:37 UTC
Created attachment 112660 [details]
2015012222dmesg.txt test after upgrading vlc.

I upgraded vlc to 2.2.0~rc2-2 and re-tested against the same video running under the current Linus' git head kernel.

There was a gpu lock-up again - each different test seems to have the lock-up happen at a different stage in play-back, as if there is a non-deterministic event leading to the lock-up.
Comment 24 Arthur Marsh 2015-01-27 02:20:58 UTC
Created attachment 112868 [details]
20150127dmesg.txt with 3.19.0-rc6 - lock-up after a few seconds of video play

with kernel 3.19.0-rc6, the gpu locked up after a few seconds of playing the same video.
Comment 25 Arthur Marsh 2015-01-27 20:28:20 UTC
Created attachment 112889 [details]
2015012718dmesg.txt lock-up with first post 3.19.0-rc6 patches applied

The lock-up occurred within the first 20 seconds of playing the video, but slightly later than with plain 3.19.0-rc6.
Comment 26 Arthur Marsh 2015-01-28 06:12:36 UTC
Created attachment 112901 [details]
2015012814dmesg.txt - lockup after updating kernel to latest radeon patches

With the latest radeon patches in the 3.19.0-rc6+ kernel, I still experienced a lockup.
Comment 27 Arthur Marsh 2015-01-30 02:26:38 UTC
Created attachment 112955 [details]
20150130dmesg.txt - lock-up with the latest git head patches

vlc behaved differently - going green and stalling before finally causing a gpu lock-up.
Comment 28 Arthur Marsh 2015-02-07 06:09:43 UTC
Created attachment 113239 [details]
20150207dmesg.txt - lock-up about 5 and a half minutes into video playback

After latest radeon and mm updates to Linus git head, the same video played back fine until about 5 and a half minutes into playback, then locked-up all video.
Comment 29 Arthur Marsh 2015-02-09 07:22:15 UTC
Created attachment 113270 [details]
20150209dmesg.txt didn't lock up on usual video but did on another with 3.19 kernel

With the 3.19 kernel, I didn't get a lock-up with the usual test video but did eventually with another video.
Comment 30 Arthur Marsh 2015-02-12 08:25:25 UTC
Created attachment 113394 [details]
2015021218dmesg.txt - lockup of screen except for mouse, was able to restart kdm

For the first time when I experienced a lock-up due to running vlc with vdpau, although the desktop was locked up apart from the mouse cursor, I was able to control-alt-F1 and restart kdm successfully. 

Using current Linus' git head.
Comment 31 Arthur Marsh 2015-02-12 11:03:42 UTC
after last lock-up, although I could restart kdm, vdpau didn't work until I'd powered off and restarted the machine (vdpau failed even after a kexec restart):

Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared object file: No such file or directory
[vo/vdpau] Error when calling vdp_device_create_x11: 1
Error opening/initializing the selected video_out (-vo) device.
Video: no video

Ironically I was getting that message, even though I've only had Radeon hardware in this machine.
Comment 32 Michel Dänzer 2015-02-13 01:45:41 UTC
(In reply to Arthur Marsh from comment #31)
> Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared object
> file: No such file or directory
> [vo/vdpau] Error when calling vdp_device_create_x11: 1
> Error opening/initializing the selected video_out (-vo) device.
> Video: no video
> 
> Ironically I was getting that message, even though I've only had Radeon
> hardware in this machine.

I guess the Xorg radeon driver couldn't initialize hardware acceleration, so it didn't advertise the VDPAU driver name, and libvdpau fell back to its hardcoded default 'nvidia'.

As for the hangs, I suspect they just happen randomly regardless of kernel version. Attaching even more dmesg files just clutters up this report and makes it harder for anyone to make sense of it.
Comment 33 Arthur Marsh 2015-02-17 09:17:30 UTC
Where to from here? I'm happy to build kernels and other packages from source, test and bisect, but I'm not a C programmer.
Comment 34 Christian König 2015-02-17 13:20:55 UTC
(In reply to Arthur Marsh from comment #33)
> Where to from here? I'm happy to build kernels and other packages from
> source, test and bisect, but I'm not a C programmer.

Unfortunately we don't really have time taking care of the older hardware generations. Getting the new generations working has usually priority.

I have a couple of ideas what could cause this, but you clearly need to hack into the code to figure out what it is.

Sorry that I can't help here much,
Christian.
Comment 35 Arthur Marsh 2015-03-16 03:40:12 UTC
Created attachment 114333 [details]
multiple lockup errors immediately after starting video 2015031613dmesg.txt

After upgrading the kernel to 4.0.0-rc4, and vdpau / libdrm:
libvdpau1:amd64                        0.9-1
libdrm-radeon1:amd64                   2.4.59-1

I saw 4 GPU lockup error messages in dmesg (attached) within about half a second. The video itself locked up a very few seconds into playback. (I could supply the starting minute of the video if anyone was happy to look at  it).
Comment 36 Arthur Marsh 2015-04-09 10:41:17 UTC
With the first post- 4.0.0-rc7 drm update, I am no longer seeing the error, but have been unable to git-bisect to find the commit that fixed the problem.
Comment 37 Manuel Ullmann 2015-04-13 13:30:36 UTC
Are you sure this bug is not related to bug #85320? Also, are you certain it is fixed in 4.0.0-rc7+ (linux git). In the mentioned report users of RV620/630 and RS780/880 (3450/2600 and 3200/4200 respectively) report GPU Resets and lockups when using vdpau hardware decoding.
Do you use also mesa git and might the fix be rather introduced by a mesa git pull? That would explain, why you could not bisect it in linux git.
I for my part have a Radeon HD 3200 Mobility (RS780M) and could still reproduce it with linux git. Did you test the fix thoroughly? For example I could start a video with hardware accelerated video decoding in mpv 40 times without a GPU Reset, but seeking in the video or disabling and reenabling the video track could cause it, while normal playback usually did not trigger it.
Stable VLC however caused the GPU Reset on the first try using vaapi decoding with vdpau wrapper. It would be at least a good sign, if VLC can´t reproduce this anymore for you and maybe for the others at bug #85320 also.

So basically I´m asking, whether the described methods still cause a GPU Reset and what libraries you use in git version.
Comment 38 Arthur Marsh 2015-04-16 13:34:13 UTC
Sorry for the delay in replying, I tried a few more tests first.

The bad news is that even with a kernel build that didn't lock up on one complete playback of the video, even when trying skipping during the video, on a subsequent reboot and run I'd experience a lock-up.

The good news is that with current git head, even though I can experience lock-ups, I can restart kdm successfully following a lock-up. Prior to 4.0.0 kernel I needed to do a power down restart to undo the lock-up.

Besides Linus git head kernel built with current Debian experimental gcc-5, the other packages installed include:

libdrm - related packages at version 2.4.60-2
libvdpau1:amd64 0.9-1
mesa packages 10.4.2-2
xserver-xorg packages 1:7.5.0-1

At one stage I wanted to be sure that it wasn't a problem with the 3850HD video card so I removed it and used the onboard Radeon 3200HD and experienced the same problems.
Comment 39 Arthur Marsh 2015-05-09 16:14:20 UTC
Somehow, for the most part I'm no longer experiencing lock-ups.

Kernel is Linus' is current git head:

Linux version 4.1.0-rc2+ (root@am64) (gcc version 5.1.1 (Debian 5
.1.1-4) ) #1700 SMP PREEMPT Sat May 9 14:01:46 ACST 2015

DDX is 1:7.5.0-1+b1
libdrm is 2.4.60-3
mesa is 10.4.2-2
libc6 is 2.19-18

vlc is 2.2.1-1+b1
Comment 40 Arthur Marsh 2015-05-10 10:57:26 UTC
The "disable semaphores" patch:
http://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-4.1&id=013ead48a843442e63b9426e3bd5df18ca5d054a

appears to stop the lock-ups from happening.

It also appears to prevent one having multiple vdpau-enabled vlc sessions from working at the same time, and there are some issues with videos of different resolutions sometimes showing a black screen when their resolution is different from the previous video played. Playing a few different videos with vlc appears to reset things so that a video that previously showed a black screen plays fine again.
Comment 41 Christian König 2015-05-11 07:46:57 UTC

*** This bug has been marked as a duplicate of bug 85320 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.