Bug 100004 - [BAT][ALL] Dmesg warning related to core dump while executing igt gem_exec_suspend@basic-s3
Summary: [BAT][ALL] Dmesg warning related to core dump while executing igt gem_exec_su...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: highest blocker
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-28 12:26 UTC by Jari Tahvanainen
Modified: 2017-07-24 22:39 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Jari Tahvanainen 2017-02-28 12:26:52 UTC
On CI_DRM_2249 one gets dmesg-warn for igt@gem_exec_suspend@basic-s3 for several HWs
[  338.891616] WARNING: CPU: 3 PID: 24 at kernel/sched/sched.h:812 set_next_entity+0xb22/0xfe0
[  338.891617] rq->clock_update_flags < RQCF_ACT_SKIP
[  338.891617] Modules linked in: x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei lpc_ich i2c_designware_platform i2c_designware_core i915 e1000e ptp pps_core prime_numbers sdhci_acpi sdhci mmc_core i2c_hid
[  338.891632] CPU: 3 PID: 24 Comm: migration/3 Not tainted 4.10.0-CI-CI_DRM_2249+ #1
[  338.891633] Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0355.2016.0224.1501 02/24/2016
[  338.891633] Call Trace:
[  338.891635]  dump_stack+0x67/0x92
...

See
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-bdw-5557u/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-bsw-n3050/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-bxt-j4205/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-byt-j1900/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-hsw-4770/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-hsw-4770r/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-ilk-650/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-ivb-3520m/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-kbl-7500u/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-skl-6700hq/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-skl-6700k/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-skl-6770hq/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-snb-2520m/igt@gem_exec_suspend@basic-s3.html

Full Dmesg before and during the execution can also be fetched through the links above.
Comment 1 Chris Wilson 2017-02-28 12:36:10 UTC
topic/core-for-CI commit 7925851af123091a2590110e28ea268840ebd177
Author: Wanpeng Li <wanpeng.li@hotmail.com>
Date:   Tue Feb 21 23:52:55 2017 -0800

    sched/fair: Update rq clock before changing a task's CPU affinity
Comment 2 Chris Wilson 2017-02-28 13:39:31 UTC
That wasn't the purported fix after all. Back to scanning lkml.
Comment 3 Chris Wilson 2017-02-28 14:27:46 UTC
8cb68b3 sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume)
Comment 4 Martin Peres 2017-02-28 14:53:18 UTC
Let's keep it opened until we are sure it did fix it ;)
Comment 5 Chris Wilson 2017-02-28 15:45:15 UTC
I waited for confirmation first!
Comment 6 Chris Wilson 2017-02-28 15:45:34 UTC
s/first/that time/
Comment 7 Martin Peres 2017-03-01 09:01:07 UTC
Ah ah, good! All the machines are fixed, .... except fi-kbl-7500u is still failing...: https://intel-gfx-ci.01.org/CI/CI_DRM_2254/fi-kbl-7500u/igt@gem_exec_suspend@basic-s4-devices.html

I will check if for some reason this machine ran an old kernel or not.
Comment 8 Martin Peres 2017-03-01 11:21:38 UTC
(In reply to Martin Peres from comment #7)
> Ah ah, good! All the machines are fixed, .... except fi-kbl-7500u is still
> failing...:
> https://intel-gfx-ci.01.org/CI/CI_DRM_2254/fi-kbl-7500u/
> igt@gem_exec_suspend@basic-s4-devices.html
> 
> I will check if for some reason this machine ran an old kernel or not.

Seems like it ran the right kernel, so there is more to this bug then :s Let's reopen the bug.
Comment 9 Chris Wilson 2017-03-01 11:42:45 UTC
That's a completely different bug. The nvme driver is using a mutex inside an rcu callback.
Comment 10 Jari Tahvanainen 2017-03-07 12:05:00 UTC
(In reply to Chris Wilson from comment #9)
> That's a completely different bug. The nvme driver is using a mutex inside
> an rcu callback.

See bug 100099.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.