Created attachment 143378 [details]
/sys/class/drm/card0/error for drm-intel-fixes-2019-02-13
I'm getting GPU hangs while encoding & decoding H.264 video with vaapi. I'm not quite sure what triggers the issue. It might be just starting or stopping, resolution changes or corruptions in H.264 stream while decoding.
I've reproduced this with the following Kernel versions:
It does not occur with 4.19.x and older.
Git bisect gives me 79556df293b2efbb3ccebb6db02120d62e348b44 first bad commit. On Haswell this changes the default for ppgtt from 1 (aliasing) to 2 (full). If I set enable_ppgtt=1 on 4.20.x then the problem is gone.
I've attached the content of /sys/class/drm/card0/error for drm-intel-fixes-2019-02-13.
Looks like a userspace bug though; it doesn't match the expected typical error for invalid TLB (due to a bad mm switch). Batch was submitted by vaapi, so double check you've pulled the latest libva, and raise a bug with them (they have historically forgotten to setup their SBA correctly and such use-after-free bugs in their cmdbuffers...)
One other thing to double check kernel side is disabling iommu -- although the error state doesn't indicate that to be a problem, just useful to rule out that as a source of memory latency / incoherency / missed flushed|invalidate.
Michael, have you tried by disabling iommu?
Michael, any updates here?
Sorry for the delay. Disabling the iommu was not that simple because it was needed elsewhere in the system. So it was a bit more work than just disabling it via kernel command-line.
Anyways, I still get the GPU hangs with the iommu disabled.
Created attachment 143571 [details]
/sys/class/drm/card0/error with iommu disabled
This is the error dump for 4.20.x. Anything else I should test?
Please create a libva issue here
Closing this issue as NOTOURBUG.