https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4212/shard-apl4/igt@perf_pmu@rc6.html <3>[ 101.309358] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1034 <3>[ 101.309502] in_atomic(): 1, irqs_disabled(): 1, pid: 1513, name: perf_pmu <4>[ 101.309558] 3 locks held by perf_pmu/1513: <4>[ 101.309563] #0: (&cpuctx_mutex){+.+.}, at: [<00000000c3ec2708>] perf_event_ctx_lock_nested+0xbc/0x1d0 <4>[ 101.309598] #1: (&event->child_mutex){+.+.}, at: [<000000004cd02bc9>] __perf_event_read_value+0x32/0xf0 <4>[ 101.309624] #2: (&cpuctx_lock){....}, at: [<000000002343d2dc>] __perf_event_read+0x64/0x2e0 <4>[ 101.309653] irq event stamp: 25750 <4>[ 101.309662] hardirqs last enabled at (25749): [<00000000c4700020>] entry_SYSCALL_64_fastpath+0x5/0x8f <4>[ 101.309670] hardirqs last disabled at (25750): [<00000000e34beea3>] generic_exec_single+0x90/0xe0 <4>[ 101.309678] softirqs last enabled at (25466): [<00000000bebe2149>] __do_softirq+0x3a1/0x4aa <4>[ 101.309687] softirqs last disabled at (25443): [<00000000062fd6b7>] irq_exit+0xa4/0xb0 <3>[ 101.309692] Preemption disabled at: <4>[ 101.309699] [<000000005125cef8>] perf_event_read+0x98/0x2e0 <4>[ 101.309741] CPU: 0 PID: 1513 Comm: perf_pmu Tainted: G U 4.15.0-CI-CI_DRM_3717+ #1 <4>[ 101.309747] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017 <4>[ 101.309752] Call Trace: <4>[ 101.309764] dump_stack+0x5f/0x86 <4>[ 101.309774] ___might_sleep+0x1d9/0x240 <4>[ 101.309788] __pm_runtime_resume+0x75/0x80 <4>[ 101.309870] intel_runtime_pm_get+0x1c/0xa0 [i915] <4>[ 101.309938] __i915_pmu_event_read+0x54/0x1c0 [i915] <4>[ 101.310008] i915_pmu_event_read+0x1e/0x40 [i915] <4>[ 101.310019] __perf_event_read+0x2b8/0x2e0 <4>[ 101.310031] ? __perf_event_output_stop+0xa0/0xa0 <4>[ 101.310040] generic_exec_single+0x96/0xe0 <4>[ 101.310050] ? __perf_event_output_stop+0xa0/0xa0 <4>[ 101.310058] smp_call_function_single+0x88/0x120 <4>[ 101.310076] ? perf_event_read+0xed/0x2e0 <4>[ 101.310083] ? preempt_count_add+0x44/0x90 <4>[ 101.310090] perf_event_read+0xed/0x2e0 <4>[ 101.310106] __perf_event_read_value+0x3c/0xf0 <4>[ 101.310119] perf_read+0x15e/0x290 <4>[ 101.310130] ? lock_acquire+0xaf/0x200 <4>[ 101.310143] ? free_object+0x6e/0x90 <4>[ 101.310156] __vfs_read+0x1e/0x120 <4>[ 101.310166] ? _raw_spin_unlock_irqrestore+0x39/0x60 <4>[ 101.310174] ? free_object+0x6e/0x90 <4>[ 101.310184] ? entry_SYSCALL_64_fastpath+0x5/0x8f <4>[ 101.310195] vfs_read+0x9e/0x150 <4>[ 101.310206] SyS_read+0x40/0xa0 <4>[ 101.310219] entry_SYSCALL_64_fastpath+0x22/0x8f <4>[ 101.310226] RIP: 0033:0x7f361c4276d0 <4>[ 101.310232] RSP: 002b:00007ffcce301ee8 EFLAGS: 00000246 <3>[ 101.311022] BUG: scheduling while atomic: perf_pmu/1513/0x00000004 <4>[ 101.311082] 3 locks held by perf_pmu/1513: <4>[ 101.311088] #0: (&cpuctx_mutex){+.+.}, at: [<00000000c3ec2708>] perf_event_ctx_lock_nested+0xbc/0x1d0 <4>[ 101.311124] #1: (&event->child_mutex){+.+.}, at: [<000000004cd02bc9>] __perf_event_read_value+0x32/0xf0 <4>[ 101.311152] #2: (&cpuctx_lock){+.+.}, at: [<000000002343d2dc>] __perf_event_read+0x64/0x2e0 <4>[ 101.311180] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 lpc_ich r8169 mii snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm pinctrl_broxton pinctrl_intel mei_me prime_numbers mei <3>[ 101.311314] Preemption disabled at: <4>[ 101.311321] [<000000005125cef8>] perf_event_read+0x98/0x2e0 <4>[ 101.311362] CPU: 0 PID: 1513 Comm: perf_pmu Tainted: G U W 4.15.0-CI-CI_DRM_3717+ #1 <4>[ 101.311368] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017 <4>[ 101.311374] Call Trace: <4>[ 101.311385] dump_stack+0x5f/0x86 <4>[ 101.311394] ? perf_event_read+0x98/0x2e0 <4>[ 101.311402] __schedule_bug+0x7a/0xd0 <4>[ 101.311413] __schedule+0x84c/0xaf0 <4>[ 101.311424] ? mark_held_locks+0x64/0x90 <4>[ 101.311432] ? _raw_spin_unlock_irqrestore+0x4c/0x60 <4>[ 101.311446] schedule+0x37/0x90 <4>[ 101.311456] schedule_timeout+0x1da/0x520 <4>[ 101.311471] ? collect_expired_timers+0xa0/0xa0 <4>[ 101.311485] ? msleep+0x1d/0x40 <4>[ 101.311497] ? pci_restore_standard_config+0x40/0x40 <4>[ 101.311504] msleep+0x34/0x40 <4>[ 101.311514] pci_raw_set_power_state+0x18f/0x220 <4>[ 101.311526] pci_set_power_state+0x94/0x110 <4>[ 101.311537] pci_restore_standard_config+0x22/0x40 <4>[ 101.311545] pci_pm_runtime_resume+0x3d/0x90 <4>[ 101.311556] __rpm_callback+0xb1/0x1e0 <4>[ 101.311566] ? pci_restore_standard_config+0x40/0x40 <4>[ 101.311577] rpm_callback+0x1a/0x70 <4>[ 101.311586] ? pci_restore_standard_config+0x40/0x40 <4>[ 101.311594] rpm_resume+0x4d7/0x770 <4>[ 101.311615] __pm_runtime_resume+0x42/0x80 <4>[ 101.311700] intel_runtime_pm_get+0x1c/0xa0 [i915] <4>[ 101.311769] __i915_pmu_event_read+0x54/0x1c0 [i915] <4>[ 101.311840] i915_pmu_event_read+0x1e/0x40 [i915] <4>[ 101.311852] __perf_event_read+0x2b8/0x2e0 <4>[ 101.311864] ? __perf_event_output_stop+0xa0/0xa0 <4>[ 101.311874] generic_exec_single+0x96/0xe0 <4>[ 101.311883] ? __perf_event_output_stop+0xa0/0xa0 <4>[ 101.311891] smp_call_function_single+0x88/0x120 <4>[ 101.311910] ? perf_event_read+0xed/0x2e0 <4>[ 101.311916] ? preempt_count_add+0x44/0x90 <4>[ 101.311924] perf_event_read+0xed/0x2e0 <4>[ 101.311940] __perf_event_read_value+0x3c/0xf0 <4>[ 101.311954] perf_read+0x15e/0x290 <4>[ 101.311963] ? lock_acquire+0xaf/0x200 <4>[ 101.311975] ? free_object+0x6e/0x90 <4>[ 101.311989] __vfs_read+0x1e/0x120 <4>[ 101.311999] ? _raw_spin_unlock_irqrestore+0x39/0x60 <4>[ 101.312007] ? free_object+0x6e/0x90 <4>[ 101.312018] ? entry_SYSCALL_64_fastpath+0x5/0x8f <4>[ 101.312028] vfs_read+0x9e/0x150 <4>[ 101.312040] SyS_read+0x40/0xa0 <4>[ 101.312054] entry_SYSCALL_64_fastpath+0x22/0x8f <4>[ 101.312060] RIP: 0033:0x7f361c4276d0 <4>[ 101.312066] RSP: 002b:00007ffcce301ee8 EFLAGS: 00000246 <7>[ 101.322131] [drm:intel_runtime_resume [i915]] Resuming device <7>[ 101.322196] [drm:bxt_disable_dc9 [i915]] Disabling DC9 <7>[ 101.322243] [drm:gen9_set_dc_state [i915]] Setting DC state from 08 to 00 <7>[ 101.322303] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 00 <7>[ 101.322433] [drm:intel_power_well_enable [i915]] enabling power well 1 <7>[ 101.322587] [drm:intel_dump_cdclk_state [i915]] Current CDCLK 19200 kHz, VCO 0 kHz, ref 19200 kHz, bypass 19200 kHz, voltage level 1 <7>[ 101.322636] [drm:bxt_init_cdclk [i915]] Sanitizing cdclk programmed by pre-os <7>[ 101.323942] [drm:gen9_enable_dc5 [i915]] Enabling DC5 <7>[ 101.323990] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01 <7>[ 101.325816] [drm:__bxt_hpd_detection_setup [i915]] Invert bit setting: hp_ctl:10001818 hp_port:38 <7>[ 101.326971] [drm:intel_dp_detect [i915]] [CONNECTOR:65:DP-1] <7>[ 101.327365] [drm:__bxt_hpd_detection_setup [i915]] Invert bit setting: hp_ctl:10001818 hp_port:30 <7>[ 101.327980] [drm:intel_runtime_resume [i915]] Device resumed <7>[ 101.328391] [drm:intel_power_well_enable [i915]] enabling always-on <7>[ 101.328437] [drm:intel_power_well_enable [i915]] enabling DC off <7>[ 101.328733] [drm:gen9_set_dc_state [i915]] Setting DC state from 01 to 00 <7>[ 101.328794] [drm:intel_power_well_enable [i915]] enabling power well 2 <7>[ 101.328931] [drm:intel_power_well_enable [i915]] enabling dpio-common-bc <4>[ 101.329049] ------------[ cut here ]------------ <4>[ 101.329065] DEBUG_LOCKS_WARN_ON(val > preempt_count()) <4>[ 101.329076] WARNING: CPU: 0 PID: 1513 at kernel/sched/core.c:3122 preempt_count_sub+0x55/0x90 <4>[ 101.329103] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 lpc_ich r8169 mii snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm pinctrl_broxton pinctrl_intel mei_me prime_numbers mei <4>[ 101.329189] CPU: 0 PID: 1513 Comm: perf_pmu Tainted: G U W 4.15.0-CI-CI_DRM_3717+ #1 <4>[ 101.329207] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017 <4>[ 101.329226] RIP: 0010:preempt_count_sub+0x55/0x90 <4>[ 101.329237] RSP: 0018:ffffc900003ffcc0 EFLAGS: 00010082 <4>[ 101.329250] RAX: 000000000000002a RBX: ffffe8ffffc01dd0 RCX: 0000000000000001 <4>[ 101.329264] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810eab88 <4>[ 101.329279] RBP: ffffc900003ffd00 R08: 0000000000000000 R09: 0000000000000000 <4>[ 101.329294] R10: 0000000000000000 R11: ffffffff810eab54 R12: ffffc900003ffd80 <4>[ 101.329308] R13: ffffffff8115ce60 R14: 0000000000000001 R15: ffffc900003ffe10 <4>[ 101.329323] FS: 00007f361df4fa40(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000 <4>[ 101.329340] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 101.329353] CR2: 00007f31efcea848 CR3: 000000026cc8a000 CR4: 00000000003406f0 <4>[ 101.329367] Call Trace: <4>[ 101.329379] _raw_spin_unlock+0x29/0x40 <4>[ 101.329391] generic_exec_single+0x96/0xe0 <4>[ 101.329405] ? __perf_event_output_stop+0xa0/0xa0 <4>[ 101.329417] smp_call_function_single+0x88/0x120 <4>[ 101.329432] ? perf_event_read+0xed/0x2e0 <4>[ 101.329442] ? preempt_count_add+0x44/0x90 <4>[ 101.329452] perf_event_read+0xed/0x2e0 <4>[ 101.329466] __perf_event_read_value+0x3c/0xf0 <4>[ 101.329479] perf_read+0x15e/0x290 <4>[ 101.329490] ? lock_acquire+0xaf/0x200 <4>[ 101.329503] ? free_object+0x6e/0x90 <4>[ 101.329515] __vfs_read+0x1e/0x120 <4>[ 101.329525] ? _raw_spin_unlock_irqrestore+0x39/0x60 <4>[ 101.329537] ? free_object+0x6e/0x90 <4>[ 101.329548] ? entry_SYSCALL_64_fastpath+0x5/0x8f <4>[ 101.329560] vfs_read+0x9e/0x150 <4>[ 101.329571] SyS_read+0x40/0xa0 <4>[ 101.329582] entry_SYSCALL_64_fastpath+0x22/0x8f <4>[ 101.329593] RIP: 0033:0x7f361c4276d0 <4>[ 101.329602] RSP: 002b:00007ffcce301ee8 EFLAGS: 00000246 <4>[ 101.329610] Code: 62 d5 f6 7e c3 e8 3c 59 3a 00 85 c0 74 20 8b 15 5a 57 62 02 85 d2 75 16 48 c7 c6 d0 4a 07 82 48 c7 c7 2b 5f 06 82 e8 ab 4c fd ff <0f> ff c3 f3 c3 84 d2 75 c7 e8 0d 59 3a 00 85 c0 74 f1 8b 05 2b <4>[ 101.329744] ---[ end trace ec00e489ea328d9c ]---
commit b2f78cda260bc6a1a2d382b1d85a29e69b5b3724 Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Mon Feb 5 09:34:48 2018 +0000 drm/i915/pmu: Fix PMU enable vs execlists tasklet race Commit 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking inside for busy-stats") added a tasklet_disable call in busy stats enabling, but we failed to understand that the PMU enable callback runs as an hard IRQ (IPI). Consequence of this is that the PMU enable callback can interrupt the execlists tasklet, and will then deadlock when it calls intel_engine_stats_enable->tasklet_disable. To fix this, I realized it is possible to move the engine stats enablement and disablement to PMU event init and destroy hooks. This allows for much simpler implementation since those hooks run in normal context (can sleep). v2: Extract engine_event_destroy. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking inside for busy-stats") Testcase: igt/perf_pmu/enable-race-* Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180205093448.13877-1-tvrtko.ursulin@linux.intel.com
Wrong bug, sorry.
commit 1fe699e30113ed6f6e853ff44710d256072ea627 Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Tue Feb 6 18:33:11 2018 +0000 drm/i915/pmu: Fix sleep under atomic in RC6 readout We are not allowed to call intel_runtime_pm_get from the PMU counter read callback since the former can sleep, and the latter is running under IRQ context. To workaround this, we record the last known RC6 and while runtime suspended estimate its increase by querying the runtime PM core timestamps. Downside of this approach is that we can temporarily lose a chunk of RC6 time, from the last PMU read-out to runtime suspend entry, but that will eventually catch up, once device comes back online and in the presence of PMU queries. Also, we have to be careful not to overshoot the RC6 estimate, so once resumed after a period of approximation, we only update the counter once it catches up. With the observation that RC6 is increasing while the device is suspended, this should not pose a problem and can only cause slight inaccuracies due clock base differences. v2: Simplify by estimating on top of PM core counters. (Imre) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104943 Fixes: 6060b6aec03c ("drm/i915/pmu: Add RC6 residency metrics") Testcase: igt/perf_pmu/rc6-runtime-pm Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180206183311.17924-1-tvrtko.ursulin@linux.intel.com
Fix included in CI_DRM_3735 I will monitor and close if applicable.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.