Bug 112001 - [CI][SHARDS]igt@gem_eio@kms - dmesg-warn - watchdog: BUG: soft lockup - CPU#2 stuck for 255s! [swapper/2:0]
Summary: [CI][SHARDS]igt@gem_eio@kms - dmesg-warn - watchdog: BUG: soft lockup - CPU#2...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium minor
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-14 12:59 UTC by Lakshmi
Modified: 2019-11-29 19:39 UTC (History)
1 user (show)

See Also:
i915 platform: SNB
i915 features: display/Other


Attachments

Description Lakshmi 2019-10-14 12:59:09 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7061/shard-snb6/igt@gem_eio@kms.html
<0> [1112.607315] watchdog: BUG: soft lockup - CPU#2 stuck for 255s! [swapper/2:0]
<4> [1112.607326] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp mei_hdcp broadcom crct10dif_pclmul bcm_phy_lib snd_hda_intel crc32_pclmul snd_intel_nhlt snd_hda_codec tg3 snd_hwdep ghash_clmulni_intel snd_hda_core ptp pps_core snd_pcm mei_me mei prime_numbers lpc_ich
<4> [1112.607370] irq event stamp: 654342
<4> [1112.607377] hardirqs last  enabled at (654341): [<ffffffff8115c59f>] tick_nohz_idle_enter+0x5f/0x90
<4> [1112.607387] hardirqs last disabled at (654342): [<ffffffff810ed2d9>] do_idle+0x79/0x250
<4> [1112.607395] softirqs last  enabled at (654318): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
<4> [1112.607404] softirqs last disabled at (654311): [<ffffffff810b7eaa>] irq_exit+0xba/0xc0
<4> [1112.607412] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G     U            5.4.0-rc2-CI-CI_DRM_7061+ #1
<4> [1112.607419] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
<4> [1112.607427] RIP: 0010:cpuidle_enter_state+0xb2/0x450
<4> [1112.607432] Code: 90 31 ff e8 d0 80 90 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 6e 03 00 00 31 ff e8 b7 4d 97 ff e8 b2 62 9b ff fb 45 85 ed <0f> 88 37 03 00 00 4c 2b 24 24 48 ba cf f7 53 e3 a5 9b c4 20 49 63
<4> [1112.607446] RSP: 0018:ffffc9000007fe80 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
<4> [1112.607454] RAX: 0000000000000000 RBX: ffffffff8229d840 RCX: 000000000000001f
<4> [1112.607460] RDX: 000001030c902bad RSI: 0000000025bb8a45 RDI: ffffffff817e50ee
<4> [1112.607466] RBP: ffffe8ffffb00990 R08: 0000000000000002 R09: 0000000000038840
<4> [1112.607472] R10: ffffc9000007fe60 R11: 00000000000003e3 R12: 000001030c902bad
<4> [1112.607478] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000000
<4> [1112.607484] FS:  0000000000000000(0000) GS:ffff888227900000(0000) knlGS:0000000000000000
<4> [1112.607491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [1112.607496] CR2: 00007f39e6b30000 CR3: 0000000002210006 CR4: 00000000000606e0
<4> [1112.607502] Call Trace:
<4> [1112.607507]  cpuidle_enter+0x24/0x40
<4> [1112.607512]  do_idle+0x1e7/0x250
<4> [1112.607518]  cpu_startup_entry+0x14/0x20
<4> [1112.607523]  start_secondary+0x15f/0x1b0
<4> [1112.607528]  secondary_startup_64+0xa4/0xb0
<4> [1112.617380] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
<4> [1112.617402] clocksource:                       'hpet' wd_now: 49ca1a08 wd_last: 60cfbd6e mask: ffffffff
<4> [1112.617414] clocksource:                       'tsc' cs_now: 131bb120a98 cs_last: 5a1d6dd5d0 mask: ffffffffffffffff
<6> [1112.617429] tsc: Marking TSC unstable due to clocksource watchdog
<4> [1112.617471] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
<6> [1112.617484] sched_clock: Marking unstable (1112582598100, 34898301)<-(1112635113359, -17642400)
Comment 1 CI Bug Log 2019-10-14 12:59:45 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SNB: igt@gem_eio@kms - dmesg-warn - watchdog: BUG: soft lockup - CPU#2 stuck for 255s! [swapper/2:0]
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7061/shard-snb6/igt@gem_eio@kms.html
Comment 2 Vanshidhar Konda 2019-11-04 19:35:54 UTC
This issue seems to be related to: https://bugs.freedesktop.org/show_bug.cgi?id=112000

In both these issues, the kernel has detected that the TSC is skewed on one of the CPUs and it switches from TSC as the clocksource to HPET. In both cases the issues occur on SNB platform and started occuring 3 weeks, 3 days ago. The issue noted in this bug occurs with less frequency that the one in the link noted above. The two issues may be a manifestation of the same error.
Comment 3 Martin Peres 2019-11-29 19:39:45 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/503.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.