|Summary:||GPU hang on Haswell while encoding and decoding video|
|Product:||DRI||Reporter:||Michael Olbrich <m.olbrich>|
|Component:||DRM/Intel||Assignee:||Intel GFX Bugs mailing list <intel-gfx-bugs>|
|Status:||CLOSED NOTOURBUG||QA Contact:||Intel GFX Bugs mailing list <intel-gfx-bugs>|
|i915 platform:||HSW||i915 features:||GPU hang|
Description Michael Olbrich 2019-02-14 11:05:05 UTC
Created attachment 143378 [details] /sys/class/drm/card0/error for drm-intel-fixes-2019-02-13 I'm getting GPU hangs while encoding & decoding H.264 video with vaapi. I'm not quite sure what triggers the issue. It might be just starting or stopping, resolution changes or corruptions in H.264 stream while decoding. I've reproduced this with the following Kernel versions: - 4.20.x - 5.0-rc6 - drm-intel-next-2019-02-07 - drm-intel-fixes-2019-02-13 It does not occur with 4.19.x and older. Git bisect gives me 79556df293b2efbb3ccebb6db02120d62e348b44 first bad commit. On Haswell this changes the default for ppgtt from 1 (aliasing) to 2 (full). If I set enable_ppgtt=1 on 4.20.x then the problem is gone. I've attached the content of /sys/class/drm/card0/error for drm-intel-fixes-2019-02-13.
Comment 1 Chris Wilson 2019-02-14 11:11:32 UTC
Looks like a userspace bug though; it doesn't match the expected typical error for invalid TLB (due to a bad mm switch). Batch was submitted by vaapi, so double check you've pulled the latest libva, and raise a bug with them (they have historically forgotten to setup their SBA correctly and such use-after-free bugs in their cmdbuffers...) One other thing to double check kernel side is disabling iommu -- although the error state doesn't indicate that to be a problem, just useful to rule out that as a source of memory latency / incoherency / missed flushed|invalidate.
Comment 2 Lakshmi 2019-02-22 11:27:47 UTC
Michael, have you tried by disabling iommu?
Comment 3 Lakshmi 2019-03-07 13:17:42 UTC
Michael, any updates here?
Comment 4 Michael Olbrich 2019-03-07 15:14:13 UTC
Sorry for the delay. Disabling the iommu was not that simple because it was needed elsewhere in the system. So it was a bit more work than just disabling it via kernel command-line. Anyways, I still get the GPU hangs with the iommu disabled.
Comment 5 Michael Olbrich 2019-03-07 15:15:14 UTC
Created attachment 143571 [details] /sys/class/drm/card0/error with iommu disabled This is the error dump for 4.20.x. Anything else I should test?