Created attachment 145878 [details] Dump of /sys/class/drm/card0/error Hi, I've got a desktop and a laptop (both Haswell) that use VA-API to decode JPEGs as part of a larger video pipeline. This used to work fine around March-April or so, but recently (November), I've started seeing GPU hangs on both. The kernel says: [515715.657023] DMAR: DRHD: handling fault status reg 3 [515715.657030] DMAR: [DMA Write] Request device [00:02.0] fault addr f72e5000 [fault reason 05] PTE Write access is not set [515789.233234] DMAR: DRHD: handling fault status reg 3 [515789.233240] DMAR: [DMA Write] Request device [00:02.0] fault addr ed5df000 [fault reason 05] PTE Write access is not set [515809.358568] i915 0000:00:02.0: Resetting chip for hang on rcs0 [515817.358404] i915 0000:00:02.0: Resetting chip for hang on rcs0 [515825.358428] i915 0000:00:02.0: Resetting chip for hang on rcs0 I rebooted with intel_iommu=igfx_off, and ran the program again. After ~10 minutes of running, it hung again, with: [ 792.028358] i915 0000:00:02.0: GPU HANG: ecode 7:1:0xfffffffe, in futatabi [2319], hang on rcs0 [ 792.028361] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 792.028361] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 792.028362] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 792.028363] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 792.028364] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 792.028654] i915 0000:00:02.0: Resetting chip for hang on rcs0 [ 799.996625] i915 0000:00:02.0: Resetting chip for hang on rcs0 [ 807.996604] i915 0000:00:02.0: Resetting chip for hang on rcs0 If it's interesting, you can find the decoder source at https://git.sesse.net/?p=nageru;a=blob;f=futatabi/vaapi_jpeg_decoder.cpp;h=d18a8735c11a23853ea6109b340c031dfee2a19c;hb=HEAD . I'll be filing a copy of /sys/class/drm/card0/error as an attachment.
Can you please report this issue under https://github.com/intel/libva/issues. Closing this issue as NOTOURBUG. rcs0 command stream: IDLE?: no START: 0x0012d000 HEAD: 0x3b41bd60 [0x0001bca8] TAIL: 0x0001c258 [0x0001bd60, 0x0001bd78] CTL: 0x0001f001 MODE: 0x00004000 HWS: 0x7fffe000 ACTHD: 0x00000000 020299b0 IPEIR: 0x00000000 IPEHR: 0x71000014 INSTDONE: 0xffdcfffd SC_INSTDONE: 0xffffffff SAMPLER_INSTDONE[0][0]: 0xffffffff ROW_INSTDONE[0][0]: 0xfc00ffff batch: [0x00000000_0030f000, 0x00000000_0038f000] BBADDR: 0x00000000_020299b1 BB_STATE: 0x00000120 INSTPS: 0x80000209 INSTPM: 0x00006080 FADDR: 0x00000000 02029b80 RC PSMI: 0x00000010 FAULT_REG: 0x00000000 GFX_MODE: 0x00002a00 PP_DIR_BASE: 0x05410000 ring->head: 0x0001bc90 ring->tail: 0x0001c258 hangcheck timestamp: 0ms (4295088753; epoch) engine reset count: 0 Active context: futatabi[2319] hw_id 0, prio 0, guilty 0 active 0
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.