Bug 110450 - Stability issue in i915 with transcode operation
Summary: Stability issue in i915 with transcode operation
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-16 13:47 UTC by Emelianova Svetlana
Modified: 2019-08-28 12:09 UTC (History)
4 users (show)

See Also:
i915 platform: BDW
i915 features: GPU hang


Attachments
i915_error_state (14.23 KB, application/x-gzip-compressed)
2019-04-16 14:03 UTC, Emelianova Svetlana
no flags Details
binary for reproducing (6.45 MB, application/octet-stream)
2019-04-18 15:08 UTC, Emelianova Svetlana
no flags Details
script for reproducing (331 bytes, application/x-shellscript)
2019-04-18 15:10 UTC, Emelianova Svetlana
no flags Details
log, output stream and dmesg (419.41 KB, application/x-zip-compressed)
2019-04-23 09:45 UTC, Emelianova Svetlana
no flags Details

Description Emelianova Svetlana 2019-04-16 13:47:33 UTC

    
Comment 1 Emelianova Svetlana 2019-04-16 14:03:50 UTC
Created attachment 143991 [details]
i915_error_state

KERNEL RELEASE: 4.19.5
KERNEL VERSION: #1 SMP Thu Nov 29 10:58:58 UTC 2018
PLATFORM: Ubuntu 18.04.1LTS BDW
GPU GT  : GT3 (0x1622)
CPU model name : Intel(R) Core(TM) i7-5850HQ CPU @ 2.70GHz

cat /proc/cmdline:
\boot\vmlinuz-4.19.5 root=LABEL=TARGET_OS ro vconsole.font=latarcyrheb-sun16 crashkernel=128M vconsole.keymap=us biosdevname=0 LANG=en_US.UTF-8 systemd.debug modprobe.blacklist=ast,mgag200 intel_pstate=disable i915.enable_rc6=0 intel_idle.max_cstate=1 initrd=boot\initrd.img-4.19.5

GPU hang doesn't reproduce on Kernel 4.14.20.
Comment 2 Chris Wilson 2019-04-16 16:29:19 UTC
(In reply to Emelianova Svetlana from comment #1)
> Created attachment 143991 [details]
> i915_error_state

Looks like an ordinary userspace hang.
Comment 3 Emelianova Svetlana 2019-04-17 11:43:20 UTC
I added drm.debug=0xe parameter
>> cat /proc/cmdline
\boot\vmlinuz-4.19.5 root=LABEL=TARGET_OS ro vconsole.font=latarcyrheb-sun16 crashkernel=128M vconsole.keymap=us biosdevname=0 LANG=en_US.UTF-8 systemd.debug modprobe.blacklist=ast,mgag200 intel_pstate=disable i915.enable_rc6=0 intel_idle.max_cstate=1 drm.debug=0xe initrd=boot\initrd.img-4.19.5

dmesg after GPU hang
>> dmesg -e 
[Apr17 11:24] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[  +0.001032] [drm:intel_gpu_reset [i915]] rcs0: timed out on STOP_RING
[  +0.000034] [drm:i915_gem_reset_engine [i915]] client mfx_transcoder[6254]/2: gained 1 ban score, now 1
[  +4.031095] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0
[  +0.001028] [drm:intel_gpu_reset [i915]] vcs0: timed out on STOP_RING
[  +0.000022] [drm:i915_gem_reset_engine [i915]] client mfx_transcoder[6254]/2: gained 1 ban score, now 2
Comment 4 Emelianova Svetlana 2019-04-18 15:08:28 UTC
Created attachment 144035 [details]
binary for reproducing
Comment 5 Emelianova Svetlana 2019-04-18 15:10:25 UTC
Created attachment 144036 [details]
script for reproducing
Comment 6 Emelianova Svetlana 2019-04-18 15:11:55 UTC
I built the latest drm kernel from https://anongit.freedesktop.org/git/drm/drm.git f06ddb5 commit. GPU HANG appears too. 
I attached sample_encode and bash script which runs it. Need to replace "(path/to/stream)" to real stream path and replace correct resolution (-w -h) in script. Stream should be YUV format and has not less 4k resolution. Need to build mediasdk environment from https://github.com/Intel-Media-SDK/MediaSDK. 
For reproducing it is necessary a multiple launch, I ran with command line: "./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh"
Comment 7 Lakshmi 2019-04-23 07:35:05 UTC
(In reply to Emelianova Svetlana from comment #6)
> I built the latest drm kernel from
> https://anongit.freedesktop.org/git/drm/drm.git f06ddb5 commit. GPU HANG
> appears too. 
> I attached sample_encode and bash script which runs it. Need to replace
> "(path/to/stream)" to real stream path and replace correct resolution (-w
> -h) in script. Stream should be YUV format and has not less 4k resolution.
> Need to build mediasdk environment from
> https://github.com/Intel-Media-SDK/MediaSDK. 
> For reproducing it is necessary a multiple launch, I ran with command line:
> "./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh &
> ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh &
> ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh &
> ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh & ./catch_GPU_HANG.sh &
> ./catch_GPU_HANG.sh"

Can you please attach error file and dmesg from boot from latest drmtip?
Comment 8 Emelianova Svetlana 2019-04-23 09:45:57 UTC
Created attachment 144076 [details]
log, output stream and dmesg
Comment 9 Lakshmi 2019-05-02 13:43:10 UTC
(In reply to Emelianova Svetlana from comment #8)
> Created attachment 144076 [details]
> log, output stream and dmesg

Attached logs are from kernel 4.19, Can you please verify the issue with drmtip (Kernel 5.1) (https://cgit.freedesktop.org/drm-tip) ?
Comment 10 Emelianova Svetlana 2019-05-13 08:47:03 UTC
(In reply to Lakshmi from comment #9)
> (In reply to Emelianova Svetlana from comment #8)
> > Created attachment 144076 [details]
> > log, output stream and dmesg
> 
> Attached logs are from kernel 4.19, Can you please verify the issue with
> drmtip (Kernel 5.1) (https://cgit.freedesktop.org/drm-tip) ?

The latest attachment 144076 [details] has logs and dmesg from the latest drmtip (build is based on f06ddb5 commit). "Linux version 5.1.0-rc5+" from dmessg_27128.txt
Comment 11 Lakshmi 2019-05-29 12:37:44 UTC
(In reply to Emelianova Svetlana from comment #10)
> (In reply to Lakshmi from comment #9)
> > (In reply to Emelianova Svetlana from comment #8)
> > > Created attachment 144076 [details]
> > > log, output stream and dmesg
> > 
> > Attached logs are from kernel 4.19, Can you please verify the issue with
> > drmtip (Kernel 5.1) (https://cgit.freedesktop.org/drm-tip) ?
> 
> The latest attachment 144076 [details] has logs and dmesg from the latest
> drmtip (build is based on f06ddb5 commit). "Linux version 5.1.0-rc5+" from
> dmessg_27128.txt

Can you please reporter this bug under Vaapi driver.
https://github.com/intel/intel-vaapi-driver/issues/new

Closing this as NOTOURBUG.
Comment 12 Dmitry Ermilov 2019-06-13 09:53:16 UTC
Hi Lakshmi,

Yes, we can submit a ticket against media driver (but https://github.com/intel/media-driver/ not https://github.com/intel/intel-vaapi-driver).
But can you please say which makes you think it's media driver issue? I mean the GPU hangs appeared once we moved to 4.19.5 (user stack remained the same). So technically now it looks as a kernel regression (although of course it's possible that kernel changes revealed an issue on media driver side).
Comment 13 Dmitry Ermilov 2019-06-17 08:02:53 UTC
Lakshmi,

Can you please reply?
Comment 14 Lakshmi 2019-06-25 08:15:03 UTC
Dmitry, Sorry for the late response.

There is no clue that indicates this is a kernel issue. I would recommend to debug the userspace.
Comment 15 ashutosh.dixit 2019-07-02 22:01:22 UTC
I do agree with Dmitry that it is indeed strange that a kernel update (with the user space being the same) has resulted in these hangs. However I do think that we should keep this ticket open and file another ticket against the media driver (with a link to this ticket) and have them suggest if they have any ideas what may have gone wrong.

These hangs don't appear to be driver related but the GPU HW itself has hanged. I don't know enough but one thing which probably changed with the kernel update is the HuC firmware, so that definitely seems to me to be something which should be looked into. Perhaps the media team can help with that?
Comment 16 Dmitry Ermilov 2019-08-28 11:46:04 UTC
>>There is no clue that indicates this is a kernel issue. I would recommend to debug the userspace.
Okay. Then, let's close this one. 
Svetlana,
please fill a bug against  https://github.com/intel/media-driver and put cross links here and at future gitHub ticket.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.