Bug 103936 - [BAT] [igt@*- timeout/system hang 4.15.0-rc1
Summary: [BAT] [igt@*- timeout/system hang 4.15.0-rc1
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Petri Latvala
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-27 11:08 UTC by Marta Löfstedt
Modified: 2018-04-20 11:37 UTC (History)
2 users (show)

See Also:
i915 platform: ALL
i915 features: power/suspend-resume


Attachments

Description Marta Löfstedt 2017-11-27 11:08:50 UTC

    
Comment 1 Marta Löfstedt 2017-11-27 11:14:01 UTC
Note, this is first run on 4.15.0-rc1.
The issue was already seen on:
https://intel-gfx-ci.01.org/tree/linus/
Comment 2 Marta Löfstedt 2017-11-27 11:18:14 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-bdw-5557u/igt@gem_exec_suspend@basic-s3.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-bdw-gvtdvm/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-blb-e6850/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-bsw-n3050/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-bxt-dsi/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-bxt-j4205/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-byt-j1900/igt@gem_exec_suspend@basic-s3.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-byt-n2820/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-cfl-s2/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-cnl-y/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-elk-e7500/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-glk-1/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-hsw-4770/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-hsw-4770r/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-ilk-650/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-ilk-m540/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-ivb-3520m/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-ivb-3770/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-kbl-7500u/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-kbl-7560u/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-kbl-7567u/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-pnv-d510/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-skl-6260u/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-skl-6600u/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-skl-6700hq/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-skl-6700k/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-skl-6770hq/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-skl-gvtdvm/igt@gem_exec_suspend@basic-s3.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-snb-2520m/igt@gem_exec_suspend@basic-s3.html
Comment 3 Marta Löfstedt 2017-11-27 11:20:18 UTC
Also BWR and GDG skips igt@gem_exec_suspend@basic-s3, so they hit the issue on igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a, instead, 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-bwr-2160/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3393/fi-gdg-551/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
Comment 4 Marta Löfstedt 2017-11-27 13:23:55 UTC
results are coming in from shards CI_DRM_3393, I will just file all new incompletes on this bug, too much data to go through.
Comment 5 Marta Löfstedt 2017-11-27 13:41:00 UTC
I decided to file all new dmesg-warn and dmesg-fail on this bug as well, I don't think the results can be trusted for these runs.
Comment 6 Marta Löfstedt 2017-11-30 06:40:26 UTC
Fixed by:

commit 79d79155ff788ce3202130fe3a3085052c6bc439
Author: Martin Peres <martin.peres@linux.intel.com>
Date:   Mon Nov 27 17:43:21 2017 +0200

    Revert "x86/entry/64: Add missing irqflags tracing to native_load_gs_index()"
    
    This reverts commit ca37e57bbe0cf1455ea3e84eb89ed04a132d59e1.
    
    Reported-by: Petri Latvala <petri.latvala@intel.com>
    Tested-by: Petri Latvala <petri.latvala@intel.com>
    Signed-off-by: Martin Peres <martin.peres@linux.intel.com>


is there a kernel.org bugzilla for this?

Anyways the issue is archived from cibuglog, but should be kept open in fdo until a proper fix is awailable.
Comment 7 Marta Löfstedt 2017-11-30 06:55:21 UTC
Jani is supposed to file the kernel.org bugzilla
Comment 8 Jani Saarinen 2017-12-12 15:09:14 UTC
Petri to comment more on this I think. Might be already on rc3?
Comment 9 Jani Saarinen 2017-12-13 11:22:24 UTC
Any updates if resolved?
Comment 10 Marta Löfstedt 2017-12-13 11:25:57 UTC
(In reply to Jani Saarinen from comment #9)
> Any updates if resolved?

According to Petris try-bot and discussions on IRC, this revert should be reverted.
Comment 11 Jani Saarinen 2017-12-20 11:25:06 UTC
Revert removed
Comment 12 Harry Wentland 2018-01-03 18:44:48 UTC
Are you still seeing this issue or did you find a workaround? Starting with that same commit ("x86/entry/64: Add missing irqflags tracing to native_load_gs_index()") we (AMD DC guys) also cannot resume from S3 and many systems.

I'm surprised this is not a bigger issue on lkml or elsewhere but I couldn't really find anything online other than this bug report and the fact that Greg K-H reverted it on the stable 4.14 tree.

Unfortunately even reverting this patch on our amd-staging-drm-next tree doesn't help, although git bisect clearly points at this being the root cause of the regression.
Comment 13 Marta Löfstedt 2018-01-04 07:21:52 UTC
(In reply to Harry Wentland from comment #12)
> Are you still seeing this issue or did you find a workaround? Starting with
> that same commit ("x86/entry/64: Add missing irqflags tracing to
> native_load_gs_index()") we (AMD DC guys) also cannot resume from S3 and
> many systems.
>

We are not seeing this specific S3 related issue anymore after the revert.

> 
> I'm surprised this is not a bigger issue on lkml or elsewhere but I couldn't
> really find anything online other than this bug report and the fact that
> Greg K-H reverted it on the stable 4.14 tree.
> 
> Unfortunately even reverting this patch on our amd-staging-drm-next tree
> doesn't help, although git bisect clearly points at this being the root
> cause of the regression.

That is too bad. Are you still having S3 issues if you run drm-tip? I believe we are still carrying some 4.14.0-rc1 related fixes on our core-for-CI branch. But honestly, I am not convinced that all "suspend/resume" related issues that started with 4.14.0-rc1 are smoked out yet. For example we are seeing network adapter issues causing our system to loose contact with the machines see bug 103878 and bug 103359. Also, there are still way too many non-explained system-hangs related to suspend-resume tests.
Comment 14 Marta Löfstedt 2018-01-04 07:57:19 UTC
(In reply to Marta Löfstedt from comment #13)
> (In reply to Harry Wentland from comment #12)

I meant 4.15.0-rc1 not 4.14.0-rc1 in previous comment.
Comment 15 Jani Saarinen 2018-04-20 11:14:44 UTC
Is this still the issue?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.