Bug 76426 - GPU hang on IvyBridge when doing specific decoding->vpp operations
Summary: GPU hang on IvyBridge when doing specific decoding->vpp operations
Status: RESOLVED FIXED
Alias: None
Product: libva
Classification: Unclassified
Component: intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: haihao
QA Contact: Sean V Kelley
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-21 03:30 UTC by Long Bu
Modified: 2014-07-21 08:11 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Long Bu 2014-03-21 03:30:29 UTC
GPU hang on IvyBridge when doing multiple decoding, put the decoded frames into one surface. Attached is the test case
Just unzip, make and run. you'll see the GPU hang without uncertain.
Comment 1 Long Bu 2014-03-21 03:40:19 UTC
The testcase is too big to be uploaded in bugzilla.
You can download it at this url:

http://pan.baidu.com/s/1kTsjBoF
Comment 2 ykzhao 2014-03-24 05:41:54 UTC
Which version of libva/libva-intel-driver is used in the testing?

Thanks.
   Yakui
Comment 3 Clear 2014-03-31 01:27:45 UTC
The version is 2013,Q4 release on 01.org. Thanks. --Clear
Comment 4 ykzhao 2014-03-31 02:12:20 UTC
Hi, Long/Clear

    Thanks for the version info. 
    I download the test case. But it fails in building.
    Will you please double check the test case so that we can run the test case?

Thanks.
   Yakui
Comment 5 Clear 2014-03-31 14:05:07 UTC
Just update here, ykzhao/Yakui can build the test program successfully. Thanks. --Clear
Comment 6 ykzhao 2014-04-01 08:17:56 UTC
I download the test case and can build it. 
Yes. The issue of GPU hang can be reproduced after running the test case.
But after some analysis I don't think that this is the driver issue. Instead it is caused by the wrong usage scenario.

From the test case I get the following info:
   1. Three videos are decoded.
   2. But only one decode context is created to decode the three video streams.

This is not correct usage scenario as the driver needs to keep some info required by the hardware during decoding. When it switches the different video stream in one context, the hardware info will be lost/mixed and trigger some unexpected behaviour.

The better solution is that one decode context is created for each video stream.

Will you please let the customer create one decode context for every decoded video stream and see whether the issue still can be reproduced?

Thanks.
     Yakui
Comment 7 Clear 2014-04-15 01:33:53 UTC
Put information in mail here.

Hi Yakui,

There is stream switch in demo application and only one stream is decoding at the same time. Stream will be switched per 50 frames. When swith, va-context will be destroied and created again,  just the va-context value is same according to log informatoin. So every stream has its own context. 

Thanks,
Clear 

"GPU hung 问题,demo中有码流的切换,每次只有1路解码,没50帧切换码流,切换码流时会切换va-context 
每次destroy 再 create之后,打印出的va-context是同一值

三段不同码流在每次切换时context都会销毁后重新创建,使用层面看不出有什么问题。但通过买文豪gdb查到的信息,每次重新创建后的context id与前次相同,这可能是导致yakui.zhao以为三段视频均使用同一id的原因。综上所述,该bug更像是libva库本身实现的问题。"
Comment 8 Sean V Kelley 2014-04-15 01:34:10 UTC
Away for two days at conference.  Expect delays in response.
Comment 9 haihao 2014-05-04 05:37:37 UTC
(In reply to comment #7)
> Put information in mail here.
> 
> Hi Yakui,
> 
> There is stream switch in demo application and only one stream is decoding
> at the same time. Stream will be switched per 50 frames. When swith,
> va-context will be destroied and created again,

This is the wrong usage of libva. The APP should only create a VA context for each stream,  and the life cycle of the VA context should be from the first frame to the last frame.
Comment 10 Long Bu 2014-05-05 02:19:01 UTC
(In reply to comment #9)
> (In reply to comment #7)
> Put information in mail here.
> 
> Hi Yakui,
> 
>
> There is stream switch in demo application and only one stream is decoding
>
> at the same time. Stream will be switched per 50 frames. When swith,
>
> va-context will be destroied and created again,

This is the wrong usage of
> libva. The APP should only create a VA context for each stream,  and the
> life cycle of the VA context should be from the first frame to the last
> frame.

By "stream", the customer means a network stream contains serveral video clips.
The usages looks good to me, they create one VA context for one clip and then destory this context and create another VA context for next video clip in the same network stream.
Comment 11 Clear 2014-05-05 02:26:10 UTC
The first frame will be always IDR. Haihao think it's reasonable usage so far. Haihao/yakui will look into more and response later. Thanks. --Clear
Comment 12 ykzhao 2014-05-05 02:40:23 UTC
Hi, Clear/Long

    Will you please double check whether the usage scenario of test case is the same as what the customer debug? Or it is not the latest test case?
    
    I capture the trace log in my test and find that the vaCreateContext is called only once. This means that the three video streams are still sharing the same context. It is different with the usage described in comment #10/11.

Thanks.
Comment 13 ykzhao 2014-07-21 07:30:24 UTC
Hi, Long/Clear
    Does the issue still exist after the customer updates the usage model?

Thanks.
    Yakui
Comment 14 Clear 2014-07-21 08:11:36 UTC
Hi Yakui,

Yes, it's solved. Thanks!

Thanks,
Clear


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.