Created attachment 143827 [details] GPU hang after transcoding with VAAPI Hi, I used the drm-tip kernel to reproduce a bad problem we have when transcoding video on an Intel Compute Stick (STK2MV64CC). For our product experiments we are always transcoding, so if the GPU hangs or crashes that's exceptionally bad for us. We have sporadic reports from our testing group when the kernel crashes, so I setup a test rig to reproduce issue. I reproduced the problem after running approximately 2000 transcodes of an 1920x1080 mp4 (big buck bunny) from H.264 back to H.264 using gstreamer on Ubuntu 18.04.2, but the kernel was DRM-TIP from Kernel 5.1-rc6 (about 2 weeks ago). I'm assuming the issue is reproducible and will continue to try to reproduce it -- in the meantime, I'm filing the bug since time is urgent for me. [96339.653213] i915 0000:00:02.0: GPU HANG: ecode 9:0:0x00000000, hang on vcs0, vecs0 [96339.653215] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [96339.653216] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [96339.653217] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [96339.653218] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [96339.653220] [drm] GPU crash dump saved to /sys/class/drm/card0/error Full DMESG and log from /sys/Class/drm/card0 is enclosed. The script I used to repro the bug is enclosed. My DRM-TIP kernel is from: commit 00cb3798a5d008c3f824fe7c89c663dba66155c3 (HEAD -> drm-tip, origin/drm-tip, origin/HEAD) Author: Rodrigo Vivi <rodrigo.vivi@intel.com> Date: Fri Mar 22 12:52:43 2019 -0700 These config switches were ADDED to DRM-TIP so I could boot from eMMC and configure for lower kernel latency and see serial output when the GPU goes bonkers: CONFIG_USB_SERIAL=y CONFIG_USB_SERIAL_CONSOLE=y CONFIG_USB_SERIAL_FTDI_SIO=y CONFIG_USB_PL2303=y CONFIG_FRAME_POINTER=y CONFIG_LATENCYTOP=y CONFIG_MMC=y CONFIG_MMC_BLOCK=y CONFIG_MMC_BLOCK_MINORS=8 CONFIG_MMC_SDHCI=y CONFIG_MMC_SDHCI_PCI=y CONFIG_MMC_RICOH_MMC=y CONFIG_MMC_SDHCI_ACPI=y CONFIG_DEBUG_INFO=y CONFIG_PREEMPT=y CONFIG_PREEMPT_COUNT=y CONFIG_KALLSYMS_ALL=y CONFIG_KEXEC_FILE=y CONFIG_ARCH_HAS_KEXEC_PURGATORY=y CONFIG_KEXEC_JUMP=y CONFIG_CPU_FREQ_STAT=y CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y CONFIG_DRM_I915_DEBUG=y CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y CONFIG_USB_RTL8152=y CONFIG_USB_NET_DRIVERS=y Transcoding loop is just this below: #!/usr/bin/env bash set -ex tcount=0 while true; do echo "Transcode: iteration $tcount" # remove old output rm -f /tmp/transcode-output.mp4 # transcode big-buck-bunny.mp4 using gstreamer time gst-launch-1.0 filesrc location=big-buck-bunny.mp4 ! qtdemux ! queue ! vaapidecodebin ! vaapih264enc ! qtmux ! filesink location=/tmp/gst-output.mp4 tcount=$((tcount+1)) done
Using Ubuntu Server version, without running Xorg desktop. Only text console. If anyone has any suggestions to gather more data or better settings, let me know.
*** Bug 102465 has been marked as a duplicate of this bug. ***
Created attachment 143853 [details] error file
Is this only related to Bug 110394 or is it the same bug? Unless there is a clear difference (I couldn't tell) I'd like to resolve it as duplicate.
Yes, this is the same issue as https://bugs.freedesktop.org/show_bug.cgi?id=110394. You can close this one as a duplicate. Thanks!
*** This bug has been marked as a duplicate of bug 110394 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.