Bug 107861

Summary: [CI][BAT] [byt full-ppgtt] igt@gem_exec* - fail - Failed assertion: !"GPU hung"
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BYT i915 features: GEM/PPGTT

Comment 1 Chris Wilson 2018-09-07 17:41:55 UTC
105 is after

commit 06348d3086a3b34f2db6c7692b4327fb7fc0b6c7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 4 07:38:02 2018 +0100

    drm/i915/ringbuffer: Move double invalidate to after pd flush

so a strong indication (plus the other weird byt hang) that we still aren't quite reloading the pd on a context switch in time. At least the frequency is reducing...
Comment 2 Chris Wilson 2018-09-10 08:38:25 UTC
It seems to be a choice between either full-ppgtt or bcs!
Comment 3 Chris Wilson 2018-09-10 10:31:23 UTC
The hilarity

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 472939f5c18f..0f4e6b29c800 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1677,9 +1677,12 @@ static int switch_context(struct i915_request *rq)
        GEM_BUG_ON(HAS_EXECLISTS(rq->i915));
 
        if (ppgtt) {
+               int loops = (engine->id == BCS && IS_VALLEYVIEW(engine->i915)) ? 32 : 1;
+               do {
                ret = load_pd_dir(rq, ppgtt);
                if (ret)
                        goto err;
+               } while (--loops);
 
                if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) {
                        unwind_mm = intel_engine_flag(engine);
@@ -1707,6 +1710,9 @@ static int switch_context(struct i915_request *rq)

is the most effective w/a so far. Pray I don't raise loops further.
Comment 4 Martin Peres 2018-09-10 10:39:50 UTC
(In reply to Chris Wilson from comment #3)
> The hilarity
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 472939f5c18f..0f4e6b29c800 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1677,9 +1677,12 @@ static int switch_context(struct i915_request *rq)
>         GEM_BUG_ON(HAS_EXECLISTS(rq->i915));
>  
>         if (ppgtt) {
> +               int loops = (engine->id == BCS &&
> IS_VALLEYVIEW(engine->i915)) ? 32 : 1;
> +               do {
>                 ret = load_pd_dir(rq, ppgtt);
>                 if (ret)
>                         goto err;
> +               } while (--loops);
>  
>                 if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) {
>                         unwind_mm = intel_engine_flag(engine);
> @@ -1707,6 +1710,9 @@ static int switch_context(struct i915_request *rq)
> 
> is the most effective w/a so far. Pray I don't raise loops further.

Oh my :D
Comment 5 Chris Wilson 2018-09-11 10:30:33 UTC
*** Bug 107902 has been marked as a duplicate of this bug. ***
Comment 6 Chris Wilson 2018-09-11 10:30:50 UTC
*** Bug 107903 has been marked as a duplicate of this bug. ***
Comment 7 Chris Wilson 2018-09-12 10:06:22 UTC
gem_exec_parallel/fds survived 48 hours on my byt-j1900 with

commit e2a13d1b24073fe183ebe858ebb4fc827faded56 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Sep 10 14:08:08 2018 +0100

    drm/i915/ringbuffer: Reload PDs harder on byt/bcs
    
    Baytrail takes a little more convincing that it needs to actually reload
    its Page Directoy (ppGTT) before the context switch, so repeat it until
    it gets the message. Once again the arbitrary values here are
    empirically derived.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107861
    Testcase: igt/gem_exec_parallel/fds
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180910130808.10809-1-chris@chris-wilson.co.uk

hopefully the others fare just as well.
Comment 8 Chris Wilson 2018-09-14 20:41:36 UTC
*** Bug 107852 has been marked as a duplicate of this bug. ***
Comment 9 Lakshmi 2018-10-15 11:56:17 UTC
This issue was seen in drmtip_107 and drmtip_108, and not seen there after.
Closing this bug as fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.