Bug 112198 - GPU hang when decoding MJPEGs via VA-API
Summary: GPU hang when decoding MJPEGs via VA-API
Status: RESOLVED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: not set normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-02 16:59 UTC by Steinar H. Gunderson
Modified: 2019-11-05 12:48 UTC (History)
1 user (show)

See Also:
i915 platform: HSW
i915 features: GPU hang


Attachments
Dump of /sys/class/drm/card0/error (28.38 KB, text/plain)
2019-11-02 16:59 UTC, Steinar H. Gunderson
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steinar H. Gunderson 2019-11-02 16:59:21 UTC
Created attachment 145878 [details]
Dump of /sys/class/drm/card0/error

Hi,

I've got a desktop and a laptop (both Haswell) that use VA-API to decode JPEGs as part of a larger video pipeline. This used to work fine around March-April or so, but recently (November), I've started seeing GPU hangs on both. The kernel says:

[515715.657023] DMAR: DRHD: handling fault status reg 3
[515715.657030] DMAR: [DMA Write] Request device [00:02.0] fault addr f72e5000 [fault reason 05] PTE Write access is not set
[515789.233234] DMAR: DRHD: handling fault status reg 3
[515789.233240] DMAR: [DMA Write] Request device [00:02.0] fault addr ed5df000 [fault reason 05] PTE Write access is not set
[515809.358568] i915 0000:00:02.0: Resetting chip for hang on rcs0
[515817.358404] i915 0000:00:02.0: Resetting chip for hang on rcs0
[515825.358428] i915 0000:00:02.0: Resetting chip for hang on rcs0

I rebooted with intel_iommu=igfx_off, and ran the program again. After ~10 minutes of running, it hung again, with:

[  792.028358] i915 0000:00:02.0: GPU HANG: ecode 7:1:0xfffffffe, in futatabi [2319], hang on rcs0
[  792.028361] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  792.028361] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  792.028362] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  792.028363] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  792.028364] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  792.028654] i915 0000:00:02.0: Resetting chip for hang on rcs0
[  799.996625] i915 0000:00:02.0: Resetting chip for hang on rcs0
[  807.996604] i915 0000:00:02.0: Resetting chip for hang on rcs0

If it's interesting, you can find the decoder source at https://git.sesse.net/?p=nageru;a=blob;f=futatabi/vaapi_jpeg_decoder.cpp;h=d18a8735c11a23853ea6109b340c031dfee2a19c;hb=HEAD .
I'll be filing a copy of /sys/class/drm/card0/error as an attachment.
Comment 1 Lakshmi 2019-11-05 12:48:53 UTC
Can you please report this issue under https://github.com/intel/libva/issues.
Closing this issue as NOTOURBUG.
rcs0 command stream:
  IDLE?: no
  START: 0x0012d000
  HEAD:  0x3b41bd60 [0x0001bca8]
  TAIL:  0x0001c258 [0x0001bd60, 0x0001bd78]
  CTL:   0x0001f001
  MODE:  0x00004000
  HWS:   0x7fffe000
  ACTHD: 0x00000000 020299b0
  IPEIR: 0x00000000
  IPEHR: 0x71000014
  INSTDONE: 0xffdcfffd
  SC_INSTDONE: 0xffffffff
  SAMPLER_INSTDONE[0][0]: 0xffffffff
  ROW_INSTDONE[0][0]: 0xfc00ffff
  batch: [0x00000000_0030f000, 0x00000000_0038f000]
  BBADDR: 0x00000000_020299b1
  BB_STATE: 0x00000120
  INSTPS: 0x80000209
  INSTPM: 0x00006080
  FADDR: 0x00000000 02029b80
  RC PSMI: 0x00000010
  FAULT_REG: 0x00000000
  GFX_MODE: 0x00002a00
  PP_DIR_BASE: 0x05410000
  ring->head: 0x0001bc90
  ring->tail: 0x0001c258
  hangcheck timestamp: 0ms (4295088753; epoch)
  engine reset count: 0
  Active context: futatabi[2319] hw_id 0, prio 0, guilty 0 active 0


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.