https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4685/fi-byt-n2820/igt@gem_exec_basic@readonly-blt.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4734/fi-byt-n2820/igt@gem_exec_parallel@basic.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_97/fi-byt-j1900/igt@gem_exec_reuse@baggage.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_99/fi-byt-n2820/igt@gem_exec_reuse@baggage.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_100/fi-byt-j1900/igt@gem_exec_reuse@contexts.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_101/fi-byt-j1900/igt@gem_exec_reuse@contexts.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_101/fi-byt-n2820/igt@gem_exec_reuse@contexts.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_105/fi-byt-j1900/igt@gem_exec_parallel@fds.html (gem_exec_parallel:1195) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:500: (gem_exec_parallel:1195) igt_aux-CRITICAL: Failed assertion: !"GPU hung" Subtest fds failed.
105 is after commit 06348d3086a3b34f2db6c7692b4327fb7fc0b6c7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Sep 4 07:38:02 2018 +0100 drm/i915/ringbuffer: Move double invalidate to after pd flush so a strong indication (plus the other weird byt hang) that we still aren't quite reloading the pd on a context switch in time. At least the frequency is reducing...
It seems to be a choice between either full-ppgtt or bcs!
The hilarity diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 472939f5c18f..0f4e6b29c800 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1677,9 +1677,12 @@ static int switch_context(struct i915_request *rq) GEM_BUG_ON(HAS_EXECLISTS(rq->i915)); if (ppgtt) { + int loops = (engine->id == BCS && IS_VALLEYVIEW(engine->i915)) ? 32 : 1; + do { ret = load_pd_dir(rq, ppgtt); if (ret) goto err; + } while (--loops); if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) { unwind_mm = intel_engine_flag(engine); @@ -1707,6 +1710,9 @@ static int switch_context(struct i915_request *rq) is the most effective w/a so far. Pray I don't raise loops further.
(In reply to Chris Wilson from comment #3) > The hilarity > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 472939f5c18f..0f4e6b29c800 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -1677,9 +1677,12 @@ static int switch_context(struct i915_request *rq) > GEM_BUG_ON(HAS_EXECLISTS(rq->i915)); > > if (ppgtt) { > + int loops = (engine->id == BCS && > IS_VALLEYVIEW(engine->i915)) ? 32 : 1; > + do { > ret = load_pd_dir(rq, ppgtt); > if (ret) > goto err; > + } while (--loops); > > if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) { > unwind_mm = intel_engine_flag(engine); > @@ -1707,6 +1710,9 @@ static int switch_context(struct i915_request *rq) > > is the most effective w/a so far. Pray I don't raise loops further. Oh my :D
*** Bug 107902 has been marked as a duplicate of this bug. ***
*** Bug 107903 has been marked as a duplicate of this bug. ***
gem_exec_parallel/fds survived 48 hours on my byt-j1900 with commit e2a13d1b24073fe183ebe858ebb4fc827faded56 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 10 14:08:08 2018 +0100 drm/i915/ringbuffer: Reload PDs harder on byt/bcs Baytrail takes a little more convincing that it needs to actually reload its Page Directoy (ppGTT) before the context switch, so repeat it until it gets the message. Once again the arbitrary values here are empirically derived. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107861 Testcase: igt/gem_exec_parallel/fds Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180910130808.10809-1-chris@chris-wilson.co.uk hopefully the others fare just as well.
*** Bug 107852 has been marked as a duplicate of this bug. ***
This issue was seen in drmtip_107 and drmtip_108, and not seen there after. Closing this bug as fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.