Created attachment 125222 [details] screenshot I'm attaching a screenshot and the relevant files.
Created attachment 125223 [details] dmesg output
Created attachment 125224 [details] /sys/class/drm/card0/error
A GPU hang will result in rendering errors, so they may well just be a victim.
Could it be linked to gem/gtt ? Seeing in kernel log several message on alignment: [143743.427883] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [143743.427932] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [143743.428074] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment and in dump (if I am correct about current render ring HEAD) we have IPEHR 0x0c000000 0x0080dc34: 0x11000005: MI_LOAD_REGISTER_IMM 0x0080dc38: 0x00012050: dword 1 0x0080dc3c: 0x00010001: dword 2 0x0080dc40: 0x00022050: dword 3 0x0080dc44: 0x00010001: dword 4 0x0080dc48: 0x0001a050: dword 5 0x0080dc4c: 0x00010001: dword 6 0x0080dc50: 0x00000000: MI_NOOP 0x0080dc54: 0x0c000000: MI_SET_CONTEXT 0x0080dc58: 0x798dd10c: gtt offset = 0x798dd000 0x0080dc5c: HEAD 0x00000000: MI_NOOP Bad length (7) in MI_LOAD_REGISTER_IMM, [3, 3] 0x0080dc60: 0x11000005: MI_LOAD_REGISTER_IMM 0x0080dc64: 0x00012050: dword 1 0x0080dc68: 0x00010000: dword 2 0x0080dc6c: 0x00022050: dword 3 0x0080dc70: 0x00010000: dword 4 0x0080dc74: 0x0001a050: dword 5 0x0080dc78: 0x00010000: dword 6 0x0080dc7c: 0x04000001: MI_ARB_ON_OFF
The "bogus alignment" errors are themselves bogus (self-inflicted by the kernel and don't affect anything). The hang is on processing the MI_SET_CONTEXT. This kernel has the workaround for the PSMI issue, so hopefully it is not a repeat of the last known context hangs, but I did find that this was a fresh context of interest. Could be some state that hasn't been cleared etc.
Created attachment 126031 [details] Another dump This keeps happening, here's another /sys/class/drm/card0/error dump. This happened when the computer was resuming after suspension. In dmesg: [215029.932383] [drm] stuck on render ring [215029.933531] [drm] GPU HANG: ecode 7:0:0x84dfbffe, in chrome [7937], reason: Ring hung, action: reset [215029.933533] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [215029.933534] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [215029.933536] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [215029.933537] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [215029.933538] [drm] GPU crash dump saved to /sys/class/drm/card0/error [215029.935896] drm/i915: Resetting chip after gpu hang
Nicolas - We seem to have neglected the bug quite a bit, apologies. Do you see this problem with the latest kernel (preferable drm-tip branch from git://anongit.freedesktop.org/drm-tip) ? Mark this as REOPENED if you can reproduce (and attach kernel log and card0/error) and RESOLVED if you cannot reproduce.
commit 5d4bac5503fcc67dd7999571e243cee49371aef7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Mar 22 20:59:30 2017 +0000 drm/i915: Restore marking context objects as dirty on pinning
Weird. error state matches the expected pattern for the fix, just the date is much much older than the regression that that patch fixes. Could be just an older version of the same bug...
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.