https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3645/shard-apl4/igt@perf_pmu@busy-double-start-bcs0.html From pstore: <0>[ 128.657708] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [perf_pmu:1609] <4>[ 128.657748] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal coretemp crct10dif_pclmul i915 crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec lpc_ich snd_hwdep snd_hda_core r8169 snd_pcm mii mei_me mei prime_numbers pinctrl_broxton pinctrl_intel <4>[ 128.657820] irq event stamp: 289026 <4>[ 128.657830] hardirqs last enabled at (289025): [<00000000ff02e880>] restore_regs_and_return_to_kernel+0x0/0x1f <4>[ 128.657835] hardirqs last disabled at (289026): [<00000000a33c5663>] apic_timer_interrupt+0x9a/0xb0 <4>[ 128.657839] softirqs last enabled at (289024): [<00000000ecbdbd47>] __do_softirq+0x3a1/0x4aa <4>[ 128.657845] softirqs last disabled at (289017): [<00000000da21e243>] irq_exit+0xa4/0xb0 <4>[ 128.657850] CPU: 1 PID: 1609 Comm: perf_pmu Tainted: G U 4.15.0-rc8-CI-CI_DRM_3645+ #1 <4>[ 128.657853] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017 <4>[ 128.657859] RIP: 0010:smp_call_function_single+0x93/0x120 <4>[ 128.657862] RSP: 0018:ffffc90000437da0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11 <4>[ 128.657869] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000006 <4>[ 128.657872] RDX: 0000000000000001 RSI: ffffffff82108511 RDI: ffffffff820b9866 <4>[ 128.657875] RBP: ffffc90000437df0 R08: 0000000000000000 R09: 0000000000000001 <4>[ 128.657878] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81160ac0 <4>[ 128.657881] R13: ffffc90000437e18 R14: 0000000000000001 R15: 0000000000000005 <4>[ 128.657885] FS: 00007ff11e7dfa40(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000 <4>[ 128.657888] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 128.657892] CR2: 00007ff11e8aa000 CR3: 000000026adb0000 CR4: 00000000003406e0 <4>[ 128.657895] Call Trace: <4>[ 128.657903] ? perf_duration_warn+0x40/0x40 <4>[ 128.657912] ? perf_install_in_context+0x132/0x140 <4>[ 128.657917] perf_install_in_context+0x132/0x140 <4>[ 128.657922] ? __perf_event_enable+0x1c0/0x1c0 <4>[ 128.657930] SyS_perf_event_open+0x779/0x11d0 <4>[ 128.657952] entry_SYSCALL_64_fastpath+0x22/0x8f <4>[ 128.657956] RIP: 0033:0x7ff11c9e29f9 <4>[ 128.657959] RSP: 002b:00007ffd261d6778 EFLAGS: 00000246 <4>[ 128.657975] Code: 01 73 07 9c 58 f6 c4 02 74 53 45 85 f6 74 5f 48 8d 75 b0 89 df 4c 89 e9 4c 89 e2 e8 98 fe ff ff 89 c3 8b 45 c8 a8 01 74 0a f3 90 <8b> 55 c8 83 e2 01 75 f6 bf 01 00 00 00 e8 db c3 f8 ff 65 8b 05 <0>[ 128.658157] Kernel panic - not syncing: softlockup: hung tasks <4>[ 128.658185] CPU: 1 PID: 1609 Comm: perf_pmu Tainted: G U L 4.15.0-rc8-CI-CI_DRM_3645+ #1 <4>[ 128.658222] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017 <4>[ 128.658261] Call Trace: <4>[ 128.658275] <IRQ> <4>[ 128.658288] dump_stack+0x5f/0x86 <4>[ 128.658307] panic+0xcf/0x20d <4>[ 128.658333] watchdog_timer_fn+0x278/0x280 <4>[ 128.658356] ? watchdog+0x30/0x30 <4>[ 128.658374] __hrtimer_run_queues+0xdf/0x4d0 <4>[ 128.658401] hrtimer_interrupt+0xb6/0x220 <4>[ 128.658424] ? perf_duration_warn+0x40/0x40 <4>[ 128.658446] smp_apic_timer_interrupt+0x6f/0x2d0 <4>[ 128.658469] apic_timer_interrupt+0x9f/0xb0 <4>[ 128.658490] </IRQ> <4>[ 128.658503] RIP: 0010:smp_call_function_single+0x93/0x120 <4>[ 128.658526] RSP: 0018:ffffc90000437da0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11 <4>[ 128.658562] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000006 <4>[ 128.658592] RDX: 0000000000000001 RSI: ffffffff82108511 RDI: ffffffff820b9866 <4>[ 128.658622] RBP: ffffc90000437df0 R08: 0000000000000000 R09: 0000000000000001 <4>[ 128.658652] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81160ac0 <4>[ 128.658682] R13: ffffc90000437e18 R14: 0000000000000001 R15: 0000000000000005 <4>[ 128.658714] ? perf_duration_warn+0x40/0x40 <4>[ 128.658744] ? perf_duration_warn+0x40/0x40 <4>[ 128.658770] ? perf_install_in_context+0x132/0x140 <4>[ 128.658793] perf_install_in_context+0x132/0x140 <4>[ 128.658816] ? __perf_event_enable+0x1c0/0x1c0 <4>[ 128.658841] SyS_perf_event_open+0x779/0x11d0 <4>[ 128.658879] entry_SYSCALL_64_fastpath+0x22/0x8f <4>[ 128.658901] RIP: 0033:0x7ff11c9e29f9 <4>[ 128.658918] RSP: 002b:00007ffd261d6778 EFLAGS: 00000246 <0>[ 129.779249] Shutting down cpus with NMI <0>[ 129.779300] Dumping ftrace buffer: <0>[ 129.779446] --------------------------------- <0>[ 129.779557] modprobe-276 2..s1 4830458us : execlists_submission_tasklet: rcs0 in[0]: ctx=2.1, seqno=1
This looks related: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4154/shard-apl3/igt@perf_pmu@busy-double-start-bcs0.html from pstore: <0>[ 124.781813] watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [thermald:578]
commit b2f78cda260bc6a1a2d382b1d85a29e69b5b3724 Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Mon Feb 5 09:34:48 2018 +0000 drm/i915/pmu: Fix PMU enable vs execlists tasklet race Commit 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking inside for busy-stats") added a tasklet_disable call in busy stats enabling, but we failed to understand that the PMU enable callback runs as an hard IRQ (IPI). Consequence of this is that the PMU enable callback can interrupt the execlists tasklet, and will then deadlock when it calls intel_engine_stats_enable->tasklet_disable. To fix this, I realized it is possible to move the engine stats enablement and disablement to PMU event init and destroy hooks. This allows for much simpler implementation since those hooks run in normal context (can sleep). v2: Extract engine_event_destroy. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking inside for busy-stats") Testcase: igt/perf_pmu/enable-race-* Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180205093448.13877-1-tvrtko.ursulin@linux.intel.com
Fix integrated to CI_DRM_3728.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.