Created attachment 119335 [details] dmesg - [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -5 We're tracking drm-intel-nightly from freedesktop.org and running IGT (intel-gpu-tools/piglit) against each merge. Large amount of debug options is turned on for this kernel. On one machine the tests can cause GPU HANG, but the GPU reset fails, and rest of the tests give the same error code. Below is the interesting part, full dmesg attached. Hardware is Intel NUC5CPYH (Braswell Celeron N3050) Kuoppala, Mika <mika.kuoppala@intel.com> knows about this issue. [ 206.105691] kms_pipe_crc_basic: executing [ 206.368453] kms_pipe_crc_basic: starting subtest hang-read-crc-pipe-A [ 211.785045] [drm] stuck on render ring [ 211.798824] [drm] GPU HANG: ecode 8:0:0xfffffffe, in kms_pipe_crc_ba [5601], reason: Ring hung, action: reset [ 211.799199] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 211.799209] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 211.799215] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 211.799221] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 211.799227] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 211.799639] kobject: 'card0' (ffff88007906a530): kobject_uevent_env [ 211.799791] kobject: 'card0' (ffff88007906a530): fill_kobj_path: path = '/devices/pci0000:00/0000:00:02.0/drm/card0' [ 211.802857] kobject: 'card0' (ffff88007906a530): kobject_uevent_env [ 211.803104] kobject: 'card0' (ffff88007906a530): fill_kobj_path: path = '/devices/pci0000:00/0000:00:02.0/drm/card0' [ 212.509176] [drm:gen8_do_reset [i915]] *ERROR* render ring: reset request timeout [ 212.509244] [drm] Simulated gpu hang, resetting stop_rings [ 212.509248] drm/i915: Resetting chip after gpu hang [ 212.509275] [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -5 [ 212.641248] kms_pipe_crc_basic: exiting, ret=0 [ 212.656806] [drm:intel_lr_context_deferred_alloc [i915]] *ERROR* ring create req: -5 [ 212.853766] gem_ctx_param_basic: executing [ 212.857279] [drm:intel_lr_context_deferred_alloc [i915]] *ERROR* ring create req: -5 [ 212.861674] gem_ctx_param_basic: exiting, ret=99 [ 213.050754] kms_addfb_basic: executing [ 213.053785] [drm:intel_lr_context_deferred_alloc [i915]] *ERROR* ring create req: -5 [ 213.061222] kms_addfb_basic: exiting, ret=99
This has happened twice, about one week separate. Latest commit this happened: 86ba603f327626055fe1436112b3786eaaaf7fb1 2015-10-31_08-27-21 drm-intel-nightly: 2015y-10m-31d-08h-26m-39s UTC integration manifest
http://patchwork.freedesktop.org/patch/msgid/1446216229-26474-1-git-send-email-mika.kuoppala@intel.com
Created attachment 119340 [details] SKL dmesg with same problem
Created attachment 119376 [details] [review] drm/i915: Request for resets under forcewake
Comment on attachment 119376 [details] [review] drm/i915: Request for resets under forcewake Review of attachment 119376 [details] [review]: ----------------------------------------------------------------- Tested with BSW NUC hardware where the problem was easily reproduced. With this patch the test runs didn't trigger GPU reset fail. Tested-by: Tomi Sarvela <tomix.p.sarvela@intel.com>
Fixed by commit 99106bc17e667989b4c0af0a6afcbd6ddbada8fb Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Thu Nov 5 13:11:38 2015 +0200 drm/i915: Do graphics device reset under forcewake in drm-intel-next-fixes.
I've just seen [ 163.979728] drm/i915: Resetting chip after gpu hang [ 164.695335] [drm:gen8_do_reset] *ERROR* blitter ring: reset request timeout [ 164.695342] [drm:i915_reset] *ERROR* Failed to reset chip: -5 on bdw with commit 99106bc17e667989b4c0af0a6afcbd6ddbada8fb Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Thu Nov 5 13:11:38 2015 +0200 drm/i915: Do graphics device reset under forcewake applied. I do not expect it to be easily reproducible.
(In reply to Chris Wilson from comment #7) > I've just seen > > [ 163.979728] drm/i915: Resetting chip after gpu hang > [ 164.695335] [drm:gen8_do_reset] *ERROR* blitter ring: reset request > timeout > [ 164.695342] [drm:i915_reset] *ERROR* Failed to reset chip: -5 > > on bdw with > > commit 99106bc17e667989b4c0af0a6afcbd6ddbada8fb > Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Date: Thu Nov 5 13:11:38 2015 +0200 > > drm/i915: Do graphics device reset under forcewake > > applied. I do not expect it to be easily reproducible. Any update on this, especially based on all improvement done recently in kernel, I would propose to close this one and fill a new one if this is occurring again.
It is not impossible for us to kill the GPU in such a way that recovery fails, seems like it is out of our control.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.