100004 – [BAT][ALL] Dmesg warning related to core dump while executing igt gem_exec_suspend@basic-s3

Bug 100004 - [BAT][ALL] Dmesg warning related to core dump while executing igt gem_exec_suspend@basic-s3

Summary: [BAT][ALL] Dmesg warning related to core dump while executing igt gem_exec_su...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	DRI git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	highest blocker
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-02-28 12:26 UTC by Jari Tahvanainen
Modified:	2017-07-24 22:39 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Jari Tahvanainen 2017-02-28 12:26:52 UTC

On CI_DRM_2249 one gets dmesg-warn for igt@gem_exec_suspend@basic-s3 for several HWs
[  338.891616] WARNING: CPU: 3 PID: 24 at kernel/sched/sched.h:812 set_next_entity+0xb22/0xfe0
[  338.891617] rq->clock_update_flags < RQCF_ACT_SKIP
[  338.891617] Modules linked in: x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei lpc_ich i2c_designware_platform i2c_designware_core i915 e1000e ptp pps_core prime_numbers sdhci_acpi sdhci mmc_core i2c_hid
[  338.891632] CPU: 3 PID: 24 Comm: migration/3 Not tainted 4.10.0-CI-CI_DRM_2249+ #1
[  338.891633] Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0355.2016.0224.1501 02/24/2016
[  338.891633] Call Trace:
[  338.891635]  dump_stack+0x67/0x92
...

See
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-bdw-5557u/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-bsw-n3050/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-bxt-j4205/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-byt-j1900/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-hsw-4770/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-hsw-4770r/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-ilk-650/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-ivb-3520m/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-kbl-7500u/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-skl-6700hq/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-skl-6700k/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-skl-6770hq/igt@gem_exec_suspend@basic-s3.html
http://intel-gfx-ci.01.org/CI/CI_DRM_2249/fi-snb-2520m/igt@gem_exec_suspend@basic-s3.html

Full Dmesg before and during the execution can also be fetched through the links above.

Comment 1 Chris Wilson 2017-02-28 12:36:10 UTC

topic/core-for-CI commit 7925851af123091a2590110e28ea268840ebd177
Author: Wanpeng Li <wanpeng.li@hotmail.com>
Date:   Tue Feb 21 23:52:55 2017 -0800

    sched/fair: Update rq clock before changing a task's CPU affinity

Comment 2 Chris Wilson 2017-02-28 13:39:31 UTC

That wasn't the purported fix after all. Back to scanning lkml.

Comment 3 Chris Wilson 2017-02-28 14:27:46 UTC

8cb68b3 sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume)

Comment 4 Martin Peres 2017-02-28 14:53:18 UTC

Let's keep it opened until we are sure it did fix it ;)

Comment 5 Chris Wilson 2017-02-28 15:45:15 UTC

I waited for confirmation first!

Comment 6 Chris Wilson 2017-02-28 15:45:34 UTC

s/first/that time/

Comment 7 Martin Peres 2017-03-01 09:01:07 UTC

Ah ah, good! All the machines are fixed, .... except fi-kbl-7500u is still failing...: https://intel-gfx-ci.01.org/CI/CI_DRM_2254/fi-kbl-7500u/igt@gem_exec_suspend@basic-s4-devices.html

I will check if for some reason this machine ran an old kernel or not.

Comment 8 Martin Peres 2017-03-01 11:21:38 UTC

(In reply to Martin Peres from comment #7)
> Ah ah, good! All the machines are fixed, .... except fi-kbl-7500u is still
> failing...:
> https://intel-gfx-ci.01.org/CI/CI_DRM_2254/fi-kbl-7500u/
> igt@gem_exec_suspend@basic-s4-devices.html
> 
> I will check if for some reason this machine ran an old kernel or not.

Seems like it ran the right kernel, so there is more to this bug then :s Let's reopen the bug.

Comment 9 Chris Wilson 2017-03-01 11:42:45 UTC

That's a completely different bug. The nvme driver is using a mutex inside an rcu callback.

Comment 10 Jari Tahvanainen 2017-03-07 12:05:00 UTC

(In reply to Chris Wilson from comment #9)
> That's a completely different bug. The nvme driver is using a mutex inside
> an rcu callback.

See bug 100099.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.