Bug 110297 - GPU HANG when transcoding to H.264 using VAAPI on drm-tip
Summary: GPU HANG when transcoding to H.264 using VAAPI on drm-tip
Status: RESOLVED DUPLICATE of bug 110394
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
: 102465 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-04-01 07:17 UTC by Andy Nicholas
Modified: 2019-06-03 18:14 UTC (History)
2 users (show)

See Also:
i915 platform: SKL
i915 features: GPU hang


Attachments
GPU hang after transcoding with VAAPI (193.42 KB, text/plain)
2019-04-01 07:17 UTC, Andy Nicholas
no flags Details
error file (222.34 KB, text/plain)
2019-04-03 15:07 UTC, Lakshmi
no flags Details

Description Andy Nicholas 2019-04-01 07:17:43 UTC
Created attachment 143827 [details]
GPU hang after transcoding with VAAPI

Hi, I used the drm-tip kernel to reproduce a bad problem we have when transcoding video on an Intel Compute Stick (STK2MV64CC). For our product experiments we are always transcoding, so if the GPU hangs or crashes that's exceptionally bad for us. We have sporadic reports from our testing group when the kernel crashes, so I setup a test rig to reproduce issue.

I reproduced the problem after running approximately 2000 transcodes of an 1920x1080 mp4 (big buck bunny) from H.264 back to H.264 using gstreamer on Ubuntu 18.04.2, but the kernel was DRM-TIP from Kernel 5.1-rc6 (about 2 weeks ago). I'm assuming the issue is reproducible and will continue to try to reproduce it -- in the meantime, I'm filing the bug since time is urgent for me.


[96339.653213] i915 0000:00:02.0: GPU HANG: ecode 9:0:0x00000000, hang on vcs0, vecs0
[96339.653215] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[96339.653216] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[96339.653217] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[96339.653218] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[96339.653220] [drm] GPU crash dump saved to /sys/class/drm/card0/error

Full DMESG and log from /sys/Class/drm/card0 is enclosed. The script I used to repro the bug is enclosed.



My DRM-TIP kernel is from:

commit 00cb3798a5d008c3f824fe7c89c663dba66155c3 (HEAD -> drm-tip, origin/drm-tip, origin/HEAD)
Author: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date:   Fri Mar 22 12:52:43 2019 -0700


These config switches were ADDED to DRM-TIP so I could boot from eMMC and configure for lower kernel latency and see serial output when the GPU goes bonkers:

CONFIG_USB_SERIAL=y
CONFIG_USB_SERIAL_CONSOLE=y
CONFIG_USB_SERIAL_FTDI_SIO=y
CONFIG_USB_PL2303=y
CONFIG_FRAME_POINTER=y
CONFIG_LATENCYTOP=y
CONFIG_MMC=y
CONFIG_MMC_BLOCK=y
CONFIG_MMC_BLOCK_MINORS=8
CONFIG_MMC_SDHCI=y
CONFIG_MMC_SDHCI_PCI=y
CONFIG_MMC_RICOH_MMC=y
CONFIG_MMC_SDHCI_ACPI=y
CONFIG_DEBUG_INFO=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
CONFIG_KEXEC_JUMP=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
CONFIG_DRM_I915_DEBUG=y
CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y
CONFIG_USB_RTL8152=y
CONFIG_USB_NET_DRIVERS=y


Transcoding loop is just this below:

#!/usr/bin/env bash

set -ex

tcount=0
while true; do
		echo "Transcode: iteration $tcount"
 
		# remove old output
		rm -f /tmp/transcode-output.mp4

		# transcode big-buck-bunny.mp4 using gstreamer
                time gst-launch-1.0 filesrc location=big-buck-bunny.mp4 ! qtdemux ! queue ! vaapidecodebin ! vaapih264enc ! qtmux ! filesink location=/tmp/gst-output.mp4

		tcount=$((tcount+1))	
done
Comment 1 Andy Nicholas 2019-04-01 07:43:42 UTC
Using Ubuntu Server version, without running Xorg desktop. Only text console.

If anyone has any suggestions to gather more data or better settings, let me know.
Comment 2 Chris Wilson 2019-04-01 09:08:25 UTC
*** Bug 102465 has been marked as a duplicate of this bug. ***
Comment 3 Lakshmi 2019-04-03 15:07:54 UTC
Created attachment 143853 [details]
error file
Comment 4 Francesco Balestrieri 2019-06-03 05:03:29 UTC
Is this only related to Bug 110394 or is it the same bug? Unless there is a clear difference (I couldn't tell) I'd like to resolve it as duplicate.
Comment 5 Andy Nicholas 2019-06-03 16:37:04 UTC
Yes, this is the same issue as https://bugs.freedesktop.org/show_bug.cgi?id=110394.

You can close this one as a duplicate. Thanks!
Comment 6 Francesco Balestrieri 2019-06-03 18:14:48 UTC

*** This bug has been marked as a duplicate of bug 110394 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.