https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3514/fi-blb-e6850/igt@gem_sync@basic-all.html (gem_sync:3296) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_sync:3296) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-all failed. and https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3514/fi-pnv-d510/igt@gem_sync@basic-all.html (gem_sync:3075) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_sync:3075) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-all failed. same test same runs 2 machines, this looks suspicious, so I create this new bug instead of piling it up on bug 102848.
It was anticipated. <7>[ 228.586379] [IGT] gem_sync: executing <4>[ 228.606909] Setting dangerous option reset - tainting kernel <7>[ 228.609394] [IGT] gem_sync: starting subtest basic-all <7>[ 230.752959] missed_breadcrumb rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5a/0x80 [i915] <7>[ 230.752988] missed_breadcrumb current seqno 2854cd, last 2854ce, hangcheck 2854cd [0 ms], inflight 1 <7>[ 230.752995] missed_breadcrumb Reset count: 0 (global 4) <7>[ 230.753004] missed_breadcrumb Requests: <7>[ 230.753013] missed_breadcrumb first 2854ce [2:2854d8] prio=-2147483648 @ 2137ms: [global] <7>[ 230.753020] missed_breadcrumb last 2854ce [2:2854d8] prio=-2147483648 @ 2137ms: [global] <7>[ 230.753028] missed_breadcrumb active 2854ce [2:2854d8] prio=-2147483648 @ 2137ms: [global] <7>[ 230.753039] missed_breadcrumb [head 9238, postfix 9250, tail 9260, batch 0x00000000_00325000] <7>[ 230.753044] missed_breadcrumb RING_START: 0x00004000 [0x00004000] <7>[ 230.753048] missed_breadcrumb RING_HEAD: 0x00009220 [0x00009210] <7>[ 230.753053] missed_breadcrumb RING_TAIL: 0x00009260 [0x00009260] <7>[ 230.753058] missed_breadcrumb RING_CTL: 0x0001f001 <7>[ 230.753063] missed_breadcrumb RING_MODE: 0x00000000 <7>[ 230.753068] missed_breadcrumb ACTHD: 0x00000000_155f7f94 <7>[ 230.753073] missed_breadcrumb BBADDR: 0x00000000_00000000 <7>[ 230.753078] missed_breadcrumb E 2854ce [2:2854d8] prio=-2147483648 @ 2137ms: [global] <7>[ 230.753083] missed_breadcrumb gem_sync [3298] waiting for 2854ce <7>[ 230.753088] missed_breadcrumb IRQ? 0x1 (breadcrumbs? yes) (execlists? no) <7>[ 230.753092] missed_breadcrumb Idle? no <7>[ 230.753097] missed_breadcrumb <6>[ 236.783524] [drm] GPU HANG: ecode 3:0:0x7dffffc1, in gem_sync [3296], reason: Hang on rcs0, action: reset Shows a TLB miss and walking off into the empty GTT
commit 7b6da818d86fddfc88ddb523d6539c1bf7fc6302 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Dec 16 00:03:34 2017 +0000 drm/i915: Restore the kernel context after a GPU reset on an idle engine As part of the system requirement for powersaving is that we always have a context loaded. Upon boot and resume, we load the kernel_context to ensure that some valid state is set before powersaving kicks in, we should do so after a full GPU reset as well. We only need to do so for an idle engine, as any active engines will restart by executing the stuck request, loading its context. For the idle engine, we create a new request to load the kernel_context instead. For whatever reason, perfoming a dummy execute on the idle engine after reset papers over a subsequent GPU hang in rare circumstances, even on machines not using contexts (e.g. Pineview). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104259 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104261 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171216000334.8197-1-chris@chris-wilson.co.uk
Fix integrated in CI_DRM_3526 results are green after that
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.