Bug 76363

Summary: [SNB] Stuck on bsd ring while playing video using vaapi
Product: libva Reporter: ValdikSS <iam>
Component: intelAssignee: ykzhao <yakui.zhao>
Status: CLOSED FIXED QA Contact: Sean V Kelley <seanvk>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, simon
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: /sys/class/drm/card0/error
/sys/class/drm/card0/error gstreamer
all disabled
With fastboot=1 and lvds_downclock=1
i915.fastboot=1 i915.lvds_downclock=1 i915.i915_enable_fbc=1 i915.i915_enable_rc6=1
semaphores=1
add the restrict check of H264 slice_param
add the restrict check of H264 slice_param
Atrifacts

Comment 1 ValdikSS 2014-03-19 15:10:55 UTC
Created attachment 96051 [details]
/sys/class/drm/card0/error

Played with cmplayer.
i915.fastboot=1 i915.lvds_downclock=1 i915.i915_enable_fbc=1 i915.i915_enable_rc6=7 i915.semaphores=1

[drm] stuck on bsd ring
[drm] GPU crash dump saved to /sys/class/drm/card0/error
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm:i915_set_reset_status] *ERROR* bsd ring hung inside bo (0xac0f000 ctx 0) at 0xac0f450
Comment 2 ValdikSS 2014-03-19 15:12:47 UTC
Created attachment 96052 [details]
/sys/class/drm/card0/error gstreamer

Played with gstreamer with default i915 module parameters.

[drm] stuck on bsd ring
[drm] GPU crash dump saved to /sys/class/drm/card0/error
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm:i915_set_reset_status] *ERROR* bsd ring hung inside bo (0x64a2000 ctx 0) at 0x64a2450
[drm] stuck on render ring
[drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x6660000 ctx 1) at 0x6660220
[drm] stuck on render ring
[drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x6473000 ctx 1) at 0x6473220
[drm:i915_context_is_banned] *ERROR* context hanging too fast, declaring banned!
Comment 3 ValdikSS 2014-03-19 15:14:06 UTC
ArchLinux x86_64
Kernel 3.13.6
xf86-video-intel 2.99.910
libva 1.2.1
libva-intel-driver 1.2.2
Comment 4 Rodrigo Vivi 2014-03-19 15:37:06 UTC
Hi ValdikSS,

Could you please try to disable all these flags and let us know if you still face the issue?
i915.fastboot=0 i915.lvds_downclock=0 i915.i915_enable_fbc=0 i915.i915_enable_rc6=0 i915.semaphores=0

Then, if you don't face the issue can you enable 1 by 1? My guess you will face the issue again when enabling the RC6 one.


Also, please boot with drm.debug=0xe. So when you face issue again, please paste dmesg output and also the /sys/kernel/debug/dri/<n>/i915_error_state

Thanks,
Rodrigo.
Comment 5 ValdikSS 2014-03-19 16:46:05 UTC
Created attachment 96058 [details]
all disabled

Test #1.
i915.fastboot=0 i915.lvds_downclock=0 i915.i915_enable_fbc=0 i915.i915_enable_rc6=0 i915.semaphores=0 drm.debug=0xe

It hang for 3 seconds on 02:15 and then continued normally.
Comment 6 ValdikSS 2014-03-19 16:54:43 UTC
Created attachment 96059 [details]
With fastboot=1 and lvds_downclock=1

Test #2.
i915.fastboot=1 i915.lvds_downclock=1 i915.i915_enable_fbc=0 i915.i915_enable_rc6=0 i915.semaphores=0 drm.debug=0xe

Everything is like in test #1. I should say that sound is stopped with the video in test #1 and test #2.
Comment 7 ValdikSS 2014-03-19 16:59:39 UTC
Created attachment 96060 [details]
i915.fastboot=1 i915.lvds_downclock=1 i915.i915_enable_fbc=1 i915.i915_enable_rc6=1

Test #3.
i915.fastboot=1 i915.lvds_downclock=1 i915.i915_enable_fbc=1 i915.i915_enable_rc6=1 i915.semaphores=0

Nothing changed.
Comment 8 Chris Wilson 2014-03-19 17:05:22 UTC
I don't see any indication that this is anything but a libva-intel bug.
Comment 9 ValdikSS 2014-03-19 17:10:38 UTC
Created attachment 96061 [details]
semaphores=1

Test #4.
i915.fastboot=0 i915.lvds_downclock=0 i915.i915_enable_fbc=0 i915.i915_enable_rc6=0 i915.semaphores=1

Graphics hang for ~30 seconds. Audio is playing. After 30 seconds, graphics unfreezes and video is 30 seconds forward.
Comment 10 ValdikSS 2014-03-19 17:17:28 UTC
Video with similar issues:
http://www.youtube.com/watch?v=x6gPODb2TFc (01:42)
Comment 11 ykzhao 2014-07-21 08:20:51 UTC
Hi, ValdiKss
    Will you please describe the env that can be used to reproduce the issue?
    >Kernel version
    >libva/libva-intel-driver version
    
It will be better that you can attach the output of "lspci -vxxx -s 0:02.0" on this machine.

Thanks.
Comment 12 ValdikSS 2014-07-21 08:25:00 UTC
Hello, ykzhao.

Sure.
I'm running ArchLinux x86_64 with:
* kernel 3.15.5
* libva 1.3.1
* libva-intel-driver 1.3.2

% lspci -vxxx -s 0:02.0
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device 21da
        Flags: bus master, fast devsel, latency 0, IRQ 40
        Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 4000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915
00: 86 80 16 01 07 04 90 00 09 00 00 03 00 00 00 00
10: 04 00 00 f0 00 00 00 00 0c 00 00 e0 00 00 00 00
20: 01 40 00 00 00 00 00 00 00 00 00 00 aa 17 da 21
30: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 00
Comment 13 ValdikSS 2014-07-21 08:26:29 UTC
% sudo lspci -vxxx -s 0:02.0
[sudo] password for valdikss: 
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device 21da
        Flags: bus master, fast devsel, latency 0, IRQ 40
        Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 4000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [a4] PCI Advanced Features
        Kernel driver in use: i915
        Kernel modules: i915
00: 86 80 16 01 07 04 90 00 09 00 00 03 00 00 00 00
10: 04 00 00 f0 00 00 00 00 0c 00 00 e0 00 00 00 00
20: 01 40 00 00 00 00 00 00 00 00 00 00 aa 17 da 21
30: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 00
40: 09 00 0c 01 9e 61 80 e2 90 00 08 14 00 00 00 00
50: 11 02 00 00 11 00 00 00 00 00 00 00 01 00 a0 db
60: 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 05 d0 01 00 0c f0 e0 fe 71 41 00 00 00 00 00 00
a0: 00 00 00 00 13 00 06 03 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 01 a4 22 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 01 00 00 00 00 80 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 06 00 18 60 ef da
Comment 14 ykzhao 2014-07-21 08:31:47 UTC
thank you for so quick response.

I will check it.

BTW: which player is used during your test? mplayer or others?

Thanks.
Comment 15 ValdikSS 2014-07-21 08:34:40 UTC
(In reply to comment #14)
> BTW: which player is used during your test? mplayer or others?

I used mpv and cmplayer for the first link (chuunibyou) and adode flash player in Firefox with libvdpau-va-gl VDPAU-VAAPI bridge (https://github.com/i-rinat/libvdpau-va-gl) for youtube link.
Comment 16 ykzhao 2014-07-22 08:58:47 UTC
Created attachment 103268 [details] [review]
add the restrict check of H264 slice_param

Will you please try the attached patch and see whether the GPU hang is gone?

Thanks.
Comment 17 ValdikSS 2014-07-22 10:17:09 UTC
ykzhao, tested this on chuunibyou, works as expected, thanks! Can't test on youtube video at the moment.
Comment 18 ykzhao 2014-07-23 00:49:57 UTC
Thanks for your verification.

It is glad that the GPU hang is gone after applying the workaround patch.
I will try to push the workaround patch.

Of course the main issue is that the mentioned bit-stream has some errors, which doesn't follow the H264 spec. In such case it causes that the mplayer doesn't parse the correct parameter and incorrect parameter is configured into the GPU.  Then the GPU hang is triggered.

Anyway, we can push the patch to workaround the GPU hang issue.

Thanks.
    Yakui
Comment 19 ykzhao 2014-07-23 04:59:57 UTC
Created attachment 103313 [details] [review]
add the restrict check of H264 slice_param

Will you please use the updated patch and see whether it is still OK to you?

Thanks.
Comment 20 ValdikSS 2014-07-23 06:57:35 UTC
Newer patch works, but gives more artifacts then previous.
Comment 21 ValdikSS 2014-07-23 07:17:45 UTC
Created attachment 103321 [details]
Atrifacts

First run is with old patch, second with the newer one.
Comment 22 ykzhao 2014-07-23 07:24:40 UTC
Thanks for the testing and response about the patch.

In fact the key problem is caused by that the bit-stream includes some errors, which doesn't follow the H264 spec. Then the upper-middleware will pass the incorrect parameter.

The updated patch is mainly to workaround the GPU hang. At the same time it will try to prompt that the error had better be fixed in the upper-middleware. 

(Of course the previous patch does one smart fix. But it doesn't have the prompt that the issue should be fixed in the upper-middleware. After the internal discussion, we think that the updated patch is more reasonable.)

How do you think?
Comment 23 ValdikSS 2014-07-23 07:42:08 UTC
I think both of the solutions are fine. If you think the latest patch is better, let it be so.
In fact, this issue is extremely rare. I know only these 2 videos with this issue, so I suppose as long as GPU won't hang, any solution is acceptable.

Thanks for your effort!
Comment 24 ykzhao 2014-07-23 07:59:39 UTC
OK. Now the second patch is pushed to the libva-intel-driver. 
So this bug will be marked as resolved.
Comment 25 haihao 2014-07-30 08:41:05 UTC
*** Bug 80720 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.