Bug 81447

Summary: GPU Hang on Haswell with VAAPI accleration on XBMC (reproducible)
Product: libva Reporter: MattDevo <matt.devillier>
Component: intelAssignee: Lizhong <zhong.li>
Status: RESOLVED FIXED QA Contact: Sean V Kelley <seanvk>
Severity: major    
Priority: medium CC: fernetmenta, fritsch, gb.devel, zhixinx.liu
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: XBMC Log
kernel log (dmesg)
GPU dump log (zipped for size)
libva-intel-driver patch to fix bug 81447

Description MattDevo 2014-07-16 23:44:36 UTC
Created attachment 102949 [details]
XBMC Log

When playing back certain content on XBMC using VAAPI-accelerated video decoding, the GPU will either lock up completely, or cause a ~5s pause in video playback (audio continues as normal).

Issue is reproducible using test file: https://dl.dropboxusercontent.com/u/55728161/Joe_sample.mkv
Comment 1 MattDevo 2014-07-16 23:45:06 UTC
Created attachment 102950 [details]
kernel log (dmesg)
Comment 2 MattDevo 2014-07-16 23:47:24 UTC
Created attachment 102951 [details]
GPU dump log (zipped for size)
Comment 3 haihao 2014-07-18 14:35:22 UTC
Is it the duplication of https://bugs.freedesktop.org/show_bug.cgi?id=78960 ? Could you provide the kernel dmesg ?
Comment 4 MattDevo 2014-07-18 14:49:02 UTC
(In reply to comment #3)
> Is it the duplication of https://bugs.freedesktop.org/show_bug.cgi?id=78960
> ? Could you provide the kernel dmesg ?

the XBMC developers have told me they believe it to be two separate issues.

I did attach the kernel dmesg, is it insufficient somehow?
Comment 5 Rainer Hochecker 2014-07-18 15:02:52 UTC
@haihao we observe this issue without vaapi renderer, means we copy the video surface to system memory.
78960 happens only with vaapi rendering
Comment 6 Rainer Hochecker 2014-07-18 15:07:41 UTC
when running this sample on Windows with DXVA, the hw decoders shows errors which results in a hardly noticeable glitch. No doubt, this sample is corrupted somehow but it must not make the GPU hang.
Comment 7 Rainer Hochecker 2014-07-18 15:25:50 UTC
I got the information that the sample is most likely NOT corrupted. So I tried again on Windown DXVA but now with NVidia Graphics. All fine. Seems to be an Intel problem on all platforms.

Windows: small glitch and hw decoders shows error
Linux: GPU hang

NVidia: all fine.
Comment 8 Peter Frühberger 2014-07-26 17:29:57 UTC
I invested some further time today to find proper backtraces. It seems that for this bug vaSyncSurface never returns. This could be a libdrm bug - as the i965 driver only calls:

    if(obj_surface->bo)
        drm_intel_bo_wait_rendering(obj_surface->bo);

That really looks like a threading issue in the driver in combination with GL Output.

What "real life" use cases can trigger drm_intel_bo_wait_rendering wait for ever?
Comment 9 Lizhong 2014-07-29 07:45:26 UTC
I use mplayer-vaapi to hw-decode this video, GPU also hang when decoding frame 327.
Then I tried to use softeware way to decode this video, mplayer also show errors as follow:
mplayer -vo x11 /root/Joe_sample.mkv -fps 30
error:[h264 @ 0xf64894c0]concealing 1285 DC, 1285 AC, 1285 MV errors
A:  11.9 V:  13.7 A-V: -1.742 ct: -1.064 327/327 63%  7%  1.1% 0 0 
[h264 @ 0xf64894c0]top block unavailable for requested intra4x4 mode -1 at 50 17
[h264 @ 0xf64894c0]error while decoding MB 50 17, bytestream (21448)
[h264 @ 0xf64894c0]top block unavailable for requested intra mode at 10 34
[h264 @ 0xf64894c0]error while decoding MB 10 34, bytestream (20429)
[h264 @ 0xf64894c0]top block unavailable for requested intra mode at 37 51
[h264 @ 0xf64894c0]error while decoding MB 37 51, bytestream (9038)
[h264 @ 0xf64894c0]concealing 8159 DC, 8159 AC, 8159 MV errors

It means there are some error MBs in frame 327, which cause GPU hang when hw-decoding.
It seems it's hard to decode this frame correctly since it's an error frame.
But maybe we can aviod gpu hang or drop this frame.
I'll further check it.
Comment 10 Rainer Hochecker 2014-07-29 17:58:34 UTC
yes, you are right. sw decode fires this error.
thanks very much for looking into this!
Comment 11 Lizhong 2014-08-05 08:10:45 UTC
Created attachment 104061 [details]
libva-intel-driver patch to fix bug 81447
Comment 12 Lizhong 2014-08-05 08:14:05 UTC
Hi Rainer Hochecker:
   Could you verify my attachment patch is helpful to fix this bug?
   frame 327 miss a slice data and some MB data according my analysis.
Thanks
Comment 13 Peter Frühberger 2014-08-05 08:15:26 UTC
Rainer is currently on hollidays. I will try tonight on my hsw hardware and report back. Could very well be that someone will do before me, via: http://forum.xbmc.org/showthread.php?tid=165707&page=54

Thanks for looking into this.
Comment 14 Peter Frühberger 2014-08-05 12:04:46 UTC
For the history - the second line of your patch comment has an "/" too much which  will break compilation:

Fixed one: http://paste.ubuntu.com/7960572/
Comment 15 Peter Frühberger 2014-08-05 17:28:04 UTC
Patch is working as expected. I see a short stutter at that scene - like a frame is dropped - and afterwards it continues to play.

Thanks much. Would be nice if that patch is applied to master prior to 1.3.3 or 1.4.0 is released.
Comment 16 Lizhong 2014-08-06 01:25:25 UTC
Thanks for your test and patch typo fixing.
Yes, I dropped the error frame by checking slice parameters. As I said, "It seems it's hard to decode this frame correctly since it's an error frame. But maybe we can aviod gpu hang or drop this frame.“ Software player also show decoding error. 
We will apply this bug fixing into mater branch.
Comment 17 Bernd Kuhls 2014-08-07 18:22:50 UTC
(In reply to comment #0)

> Issue is reproducible using test file:
> https://dl.dropboxusercontent.com/u/55728161/Joe_sample.mkv

Hi,

using libva-intel-driver 1.3.2 I could reproduce the bug.

Using latest git master
http://cgit.freedesktop.org/vaapi/intel-driver/commit/?id=82d2ed8d7da3619c0ea467c06604f5626fc0b901
and this patch
https://github.com/OpenELEC/OpenELEC.tv/blob/master/packages/multimedia/libva-intel-driver/patches/libva-intel-driver-FD81447.patch

the bug is fixed.
Comment 18 Peter Frühberger 2014-08-07 18:26:13 UTC
The patch you reference does not fix the bug we see with the above sample - it still hangs, but there is no kernel hang anymore yes.

The real fix was sent to the ML yesterday, see: http://lists.freedesktop.org/archives/libva/2014-August/002565.html
Comment 19 Peter Frühberger 2014-08-09 04:36:20 UTC
@bkuhls:
Sorry, I did not read your comment correctly. Yes, we picked this patch to OpenELEC just after it was released. It is also included in 4.1.3 OE beta release. We will ship it until the new libva-driver-intel with that fix included will be released. For Ubuntu we provide a fixed driver easy to install via the wsnipex vaapi ppa.
Comment 20 Lizhong 2014-08-11 07:29:49 UTC
Updated patches have been sent to mail list. This bug will be marked as fixed.
Comment 21 Gwenole Beauchesne 2014-08-26 17:56:29 UTC
(In reply to comment #20)
> Updated patches have been sent to mail list. This bug will be marked as
> fixed.

Bugs are marked as fixed only when proper fixes reached the git repository. And, I would say, the "master" branch. Otherwise, we get in a situation where the bug is marked as fixed but the actual fix got lost in the mailing-list, which is the precise situation here.

Thus reopening the bug.
Comment 22 haihao 2015-11-23 16:55:39 UTC
Patch was merged into master branch years ago.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.