Bug 111427 - [CI][BAT] igt@gem_sync@basic-store-each - incomplete - Kernel panic - not syncing: Fatal exception in interrupt
Summary: [CI][BAT] igt@gem_sync@basic-store-each - incomplete - Kernel panic - not syn...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-19 12:00 UTC by Martin Peres
Modified: 2019-08-21 16:59 UTC (History)
1 user (show)

See Also:
i915 platform: CFL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2019-08-19 12:00:32 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6733/fi-cfl-8109u/igt@gem_sync@basic-store-each.html

<6> [111.322307] Console: switching to colour dummy device 80x25
<6> [111.322367] [IGT] gem_sync: executing
<5> [111.340235] Setting dangerous option reset - tainting kernel
<6> [111.346021] [IGT] gem_sync: starting subtest basic-store-each
Comment 2 Chris Wilson 2019-08-19 12:25:42 UTC
Based on the age and addr hints, something to do with

static bool
need_timeslice(struct intel_engine_cs *engine, const struct i915_request *rq)
{
        int hint;

        if (!intel_engine_has_semaphores(engine))
                return false;

        if (list_is_last(&rq->sched.link, &engine->active.requests))
                return false;

        hint = max(rq_prio(list_next_entry(rq, sched.link)),
                   engine->execlists.queue_priority_hint);

        return hint >= effective_prio(rq);
}

Likely suspect is *list_next_entry(rq, sched.link)
Comment 3 Chris Wilson 2019-08-19 12:29:15 UTC
13:28              tsa : 256Istatic inline int list_is_last(const struct list_head *list,
13:28              tsa : 0x5a215 is in __execlists_submission_tasklet (./include/linux/list.h:259).
13:29              tsa : 259IIreturn list->next == head;
13:29 | ickle mode Zi  | 9:freenode/#intel-gfx-ci mode ns  |
Comment 4 Chris Wilson 2019-08-19 12:34:39 UTC
(In reply to CI Bug Log from comment #1)
> The CI Bug Log issue associated to this bug has been updated.
> 
> ### New filters associated
> 
> * CFL: igt@gem_sync@basic-store-each - incomplete
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14069/fi-cfl-8109u/
> igt@gem_sync@basic-store-each.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14048/fi-cfl-8109u/
> igt@gem_sync@basic-store-each.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14066/fi-cfl-8109u/
> igt@gem_sync@basic-store-each.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4794/fi-cfl-8109u/
> igt@gem_sync@basic-store-each.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4803/fi-cfl-8109u/
> igt@gem_sync@basic-store-each.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4805/fi-cfl-8109u/
> igt@gem_sync@basic-store-each.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6733/fi-cfl-8109u/
> igt@gem_sync@basic-store-each.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6734/fi-cfl-8109u/
> igt@gem_sync@basic-store-each.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4762/fi-cfl-8109u/
> igt@gem_sync@basic-store-each.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4762/fi-cfl-8700k/
> igt@gem_sync@basic-store-each.html
>   -
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4762/fi-cfl-guc/
> igt@gem_sync@basic-store-each.html

They are not all the same bug -- or at least appear not to have the same symptom.
Comment 5 Chris Wilson 2019-08-21 16:59:52 UTC
I think I understand...

commit a20ab592d1a87218229109d109b8e2feae6f598d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 21 15:23:36 2019 +0100

    drm/i915/execlists: Set priority hint prior to submission
    
    Since we now run process_csb() outside of the engine->active.lock, we
    can process a CS-event immediately upon our ELSP write. As we currently
    inspect the pending queue *after* the ELSP write, there is an
    opportunity for a CS-event to update the pending queue before we can
    read it, making ourselves chases an invalid pointer.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111427
    Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190821142336.21609-1-chris@chris-wilson.co.uk


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.