Bug 107399 - [CI][BAT] igt@drv_selftest@live_hangcheck - incomplete, hangs during others-priority
Summary: [CI][BAT] igt@drv_selftest@live_hangcheck - incomplete, hangs during others-p...
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest normal
Assignee: Paulo Zanoni
QA Contact: Intel GFX Bugs mailing list
Whiteboard: ReadyForDev
Depends on:
Reported: 2018-07-27 11:09 UTC by Martin Peres
Modified: 2018-08-21 15:25 UTC (History)
3 users (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other

dmesg (13.58 KB, text/plain)
2018-08-08 23:41 UTC, Paulo Zanoni
no flags Details

Comment 1 Martin Peres 2018-07-27 11:11:18 UTC
Setting the highest priority since it blocks the execution of the rest of the self tests.
Comment 2 Chris Wilson 2018-07-27 11:14:51 UTC
Is it truly ready for dev when the debug logs are absent?
Comment 3 Martin Peres 2018-07-27 11:33:57 UTC
Fair point... not sure what to do here...
Comment 4 James Ausmus 2018-08-08 17:49:12 UTC
Paulo was already looking at this internally, so assigning this over to him
Comment 5 Paulo Zanoni 2018-08-08 23:41:19 UTC
Created attachment 141015 [details]

The bug seems to happen during others-priority. If I comment out the tests that include TEST_PRIORITY then live_hangcheck succeeds (although it gives some messages about ring heads not parking randomly).

I added a DRM_DEBUG_KMS to print the values of the GEM_BUG_ON condition.

It may be worth investigating why the CI is not able to capture this output. I'm using the serial console and never get the unprintable bytes we can see on the CI logs.
Comment 6 Paulo Zanoni 2018-08-09 23:59:34 UTC
RFC patch submitted: https://patchwork.freedesktop.org/series/47976/
Comment 7 Chris Wilson 2018-08-10 16:19:44 UTC
commit ee435831ec83344dba5ccddd4ffcc6ca95d1cf77 (HEAD -> drm-intel-next-queued, hsw/hsw, drm-intel/drm-intel-next-queued)
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Thu Aug 9 16:58:52 2018 -0700

    drm/i915/icl: account for context save/restore removed bits
    The RS_CTX_ENABLE and CTX_SAVE_INHIBIT bits are not present on ICL
    anymore, but we still try to set them and then check them with
    GEM_BUG_ON, resulting in a BUG() call. The bug can be reproduced by
    igt/drv_selftest/live_hangcheck/others-priority and our CI was able
    to catch it.
    It is worth noticing that commit 05f0addd9b10 ("drm/i915/icl: Enhanced
    execution list support") already tried to avoid the save bits
    on ICL, but only inside populate_lr_context().
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Testcase: igt/drv_selftest/live_hangcheck/others-priority
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107399
    References: 05f0addd9b10 ("drm/i915/icl: Enhanced execution list support")
    Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180809235852.24516-1-paulo.r.zanoni@intel.com
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 8 James Ausmus 2018-08-21 15:25:36 UTC
Test is now green on CI when run (or dmesg-warn for unrelated issues), so closing

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.