84172 – [BYT] GPU Hang when using GStreamer for decoding with VAAPI

Bug 84172 - [BYT] GPU Hang when using GStreamer for decoding with VAAPI

Summary: [BYT] GPU Hang when using GStreamer for decoding with VAAPI

Status:	CLOSED INVALID

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-09-22 08:50 UTC by Stuart
Modified:	2018-10-22 01:18 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments
Trimmed gpu crash dump (216.08 KB, text/plain) 2014-09-22 09:50 UTC, Stuart	no flags	Details
Compressed GPU crash log (608.58 KB, application/x-gzip) 2014-09-24 09:41 UTC, Stuart	no flags	Details
lspci output (1.17 KB, text/plain) 2014-09-24 09:42 UTC, Stuart	no flags	Details
Second GPU crash log (137 bytes, text/plain) 2014-09-24 11:33 UTC, Stuart	no flags	Details
Show Obsolete (1) View All

Description Stuart 2014-09-22 08:50:25 UTC

Hi All,

I'm using Gstreamer and VAAPI to decode 1080p30 streams, however I found that after running over the weekend the application had failed after ~130 mins with the following:

DRM:
says stuck on bsd ring, and to file a new bug report. Last line states '*ERROR* bsd ring hung inside bo (0x7141c000 ctx0) at 0x7141c004

GENERAL FAIL:
error output is: libva-intel-driver-1.3.2/src/intel_batchbuffer.c:54: intel_batchbuffer_reset: Assertion `batch->buffer-~>virtual' failed.

Ive attached the gpu crash dump as suggested in the DRM error output

Best Regards,
Stuart

Comment 1 Stuart 2014-09-22 09:50:39 UTC

Created attachment 106666 [details]
Trimmed gpu crash dump

Log didn't seem to attach as it's bigger then what's allowed to be uploaded. Attached trimmed log file with anything cut being replaced by a line stating "--CUT OUT--". Most of what has been cut is memory addresses with the value 00000000.

Comment 2 Rodrigo Vivi 2014-09-23 00:44:00 UTC

What platform is that?

Can you also reproduce this behaviour with drm-intel-nightly?

Can you reproduce this issue with i915.enable_rc6=0?

Also please, try to compress the dump instead of trim.

Comment 3 Stuart 2014-09-23 13:01:26 UTC

Hi Rodrigo,

I'm using an atom E38XX with Yocto to create the image.

I'm a bit unsure how to reproduce what you suggested as the only package relevant to drm is libdrm, but this does not seem to have i915.enable_rc6=0 in any files.

I updated libdrm, libva, libva-intel-driver and ran a couple of more times. The issue seems to be down to a memory leak as tracking with 'top' over time shows that the application running grows very gradually.

Would this suggest that it could be down to the libva-intel-driver component since the DRM states it could be anywhere in the stack?

Either way, let me know where/how I can apply the i915.enable_rc6=0 and I'll test this out!

Many Thanks,
Stuart

Comment 4 Rodrigo Vivi 2014-09-23 23:56:45 UTC

Ok, ignore drm-intel-nightly for now.

i915.enable_rc6=0 is a linux kernel parameter that you have to put on kernel parameters during boot time on your "linux" command in whatever bootloader you use on your image.

What platform is that? What is the output of lspci -nn?

Anyway, please compress and paste your latest i915_error_state here.

Thanks,
Rodrigo

Comment 5 Stuart 2014-09-24 09:41:33 UTC

Created attachment 106775 [details]
Compressed GPU crash log

Comment 6 Stuart 2014-09-24 09:42:08 UTC

Created attachment 106776 [details]
lspci output

Comment 7 Stuart 2014-09-24 11:32:45 UTC

Attached are original gpu crash log that is compressed along with the output from lspci.

After updating the above packages and running the application over night, it got to over 400mins before failing. 

When booting with grub, I've pressed 'e' and then changed to the following:

linux /vmlinuz LABEL=install-efi root=/dev/ram0 acpi_enforce_resources=lax video=efifb:off vga=0x318 i915.enable_rc6=0

before pressing F10. Could you clarify this is the correct way and what you were looking for?

Running afterwards the application failed after 65mins. Interestingly, I could only see the error about the `batch->buffer->virtual' failing. However, when looking at /sys/class/drm/card0/error there was output which will be in the next attachment (compressed).

Best Regards,
Stuart

Comment 8 Stuart 2014-09-24 11:33:33 UTC

Created attachment 106786 [details]
Second GPU crash log

Comment 9 Rodrigo Vivi 2014-10-15 19:37:26 UTC

$ tar xf ../gpu_hang.tar.gz 
tar: This does not look like a tar archive
and second one is empty.
Could you please attach valid error_states?

Also, please try to get it with latest drm-intel-nightly branch from cgit.freedesktop.org/drm-intel.

Comment 10 Jesse Barnes 2015-03-30 20:56:27 UTC

timing out.  If it still happens I'll pull in the media folks, maybe they've made changes in the libva driver that could help.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.