Bug 103036 - [BAT] igt@gem_exec_suspend@basic-s3 causing circular dependency
Summary: [BAT] igt@gem_exec_suspend@basic-s3 causing circular dependency
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-29 07:47 UTC by Jani Saarinen
Modified: 2017-10-04 20:01 UTC (History)
1 user (show)

See Also:
i915 platform: CNL
i915 features: GEM/execlists


Attachments

Description Jani Saarinen 2017-09-29 07:47:15 UTC
On CI_DRM_3152 on cnl system in CI igt@gem_exec_suspend@basic-s3
causing error.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3152/fi-cnl-y/igt@gem_exec_suspend@basic-s3.html

Results for igt@gem_exec_suspend@basic-s3

Result: dmesg-warn
Out	
IGT-Version: 1.19-g3df22e0d (x86_64) (Linux: 4.14.0-rc2-CI-CI_DRM_3152+ x86_64)
rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Sep 28 20:29:11 2017
Subtest basic-S3: SUCCESS (9.471s)

Environment	
PIGLIT_PLATFORM="mixed_glx_egl" PIGLIT_SOURCE_DIR="/opt/igt/piglit"
Command	
/opt/igt/libexec/intel-gpu-tools/gem_exec_suspend --run-subtest basic-S3

Dmesg	
[  277.886969] Setting dangerous option reset - tainting kernel
[  279.334108] Suspending console(s) (use no_console_suspend to debug)
[  279.890551] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  283.912201] snd_hda_intel 0000:00:1f.3: azx_get_response timeout, switching to polling mode: last cmd=0x201f0500
[  284.914685] snd_hda_intel 0000:00:1f.3: No response from codec, disabling MSI: last cmd=0x201f0500
[  285.918189] snd_hda_intel 0000:00:1f.3: azx_get_response timeout, switching to single_cmd mode: last cmd=0x201f0500

[  286.158699] ======================================================
[  286.158700] WARNING: possible circular locking dependency detected
[  286.158703] 4.14.0-rc2-CI-CI_DRM_3152+ #1 Tainted: G     U         
[  286.158704] ------------------------------------------------------
[  286.158706] rtcwake/3262 is trying to acquire lock:
[  286.158708]  ((complete)&st->done){+.+.}, at: [<ffffffff81909e0d>] wait_for_completion+0x1d/0x20
[  286.158718] 
               but task is already holding lock:
[  286.158719]  (sparse_irq_lock){+.+.}, at: [<ffffffff810f22e7>] irq_lock_sparse+0x17/0x20
[  286.158726] 
               which lock already depends on the new lock.

[  286.158727] 
               the existing dependency chain (in reverse order) is:
[  286.158728] 
               -> #1 (sparse_irq_lock){+.+.}:
[  286.158734]        __mutex_lock+0x86/0x9b0
[  286.158736]        mutex_lock_nested+0x1b/0x20
[  286.158740]        irq_lock_sparse+0x17/0x20
[  286.158743]        irq_affinity_online_cpu+0x18/0xd0
[  286.158746]        cpuhp_invoke_callback+0xa3/0x840
[  286.158746] 
               -> #0 ((complete)&st->done){+.+.}:
[  286.158752]        check_prev_add+0x430/0x840
[  286.158755]        __lock_acquire+0x1420/0x15e0
[  286.158758]        lock_acquire+0xb0/0x200
[  286.158760]        wait_for_common+0x58/0x210
[  286.158762]        wait_for_completion+0x1d/0x20
[  286.158764]        takedown_cpu+0x89/0xf0
[  286.158766]        cpuhp_invoke_callback+0xa3/0x840
[  286.158769]        cpuhp_down_callbacks+0x42/0x80
[  286.158770]        _cpu_down+0xb9/0xf0
[  286.158773]        freeze_secondary_cpus+0xa3/0x390
[  286.158775]        suspend_devices_and_enter+0x2fd/0xce0
[  286.158778]        pm_suspend+0x4f0/0x9d0
[  286.158779]        state_store+0x82/0xf0
[  286.158783]        kobj_attr_store+0xf/0x20
[  286.158787]        sysfs_kf_write+0x45/0x60
[  286.158790]        kernfs_fop_write+0x124/0x1c0
[  286.158792]        __vfs_write+0x28/0x130
[  286.158794]        vfs_write+0xcb/0x1c0
[  286.158796]        SyS_write+0x49/0xb0
[  286.158799]        entry_SYSCALL_64_fastpath+0x1c/0xb1
[  286.158800] 
               other info that might help us debug this:

[  286.158800]  Possible unsafe locking scenario:

[  286.158801]        CPU0                    CPU1
[  286.158802]        ----                    ----
[  286.158803]   lock(sparse_irq_lock);
[  286.158805]                                lock((complete)&st->done);
[  286.158807]                                lock(sparse_irq_lock);
[  286.158809]   lock((complete)&st->done);
[  286.158811] 
                *** DEADLOCK ***

[  286.158813] 8 locks held by rtcwake/3262:
[  286.158814]  #0:  (sb_writers#5){.+.+}, at: [<ffffffff81220701>] vfs_write+0x171/0x1c0
[  286.158820]  #1:  (&of->mutex){+.+.}, at: [<ffffffff812a39d2>] kernfs_fop_write+0xf2/0x1c0
[  286.158826]  #2:  (kn->count#210){.+.+}, at: [<ffffffff812a39db>] kernfs_fop_write+0xfb/0x1c0
[  286.158832]  #3:  (pm_mutex){+.+.}, at: [<ffffffff810e60a9>] pm_suspend+0xa9/0x9d0
[  286.158837]  #4:  (acpi_scan_lock){+.+.}, at: [<ffffffff8153b8b7>] acpi_scan_lock_acquire+0x17/0x20
[  286.158844]  #5:  (cpu_add_remove_lock){+.+.}, at: [<ffffffff810811ce>] freeze_secondary_cpus+0x2e/0x390
[  286.158849]  #6:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff810d676b>] percpu_down_write+0x2b/0x110
[  286.158855]  #7:  (sparse_irq_lock){+.+.}, at: [<ffffffff810f22e7>] irq_lock_sparse+0x17/0x20
[  286.158861] 
               stack backtrace:
[  286.158864] CPU: 2 PID: 3262 Comm: rtcwake Tainted: G     U          4.14.0-rc2-CI-CI_DRM_3152+ #1
[  286.158866] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X095.A01.1707101424 07/10/2017
[  286.158867] Call Trace:
[  286.158872]  dump_stack+0x68/0x9f
[  286.158876]  print_circular_bug+0x235/0x3c0
[  286.158879]  ? lockdep_init_map_crosslock+0x20/0x20
[  286.158882]  check_prev_add+0x430/0x840
[  286.158888]  __lock_acquire+0x1420/0x15e0
[  286.158890]  ? __lock_acquire+0x1420/0x15e0
[  286.158893]  ? lockdep_init_map_crosslock+0x20/0x20
[  286.158897]  lock_acquire+0xb0/0x200
[  286.158900]  ? wait_for_completion+0x1d/0x20
[  286.158903]  wait_for_common+0x58/0x210
[  286.158905]  ? wait_for_completion+0x1d/0x20
[  286.158907]  ? cpuhp_invoke_callback+0x840/0x840
[  286.158912]  ? stop_machine_cpuslocked+0xc1/0xd0
[  286.158914]  ? cpuhp_invoke_callback+0x840/0x840
[  286.158916]  wait_for_completion+0x1d/0x20
[  286.158919]  takedown_cpu+0x89/0xf0
[  286.158921]  ? cpuhp_complete_idle_dead+0x20/0x20
[  286.158924]  cpuhp_invoke_callback+0xa3/0x840
[  286.158927]  cpuhp_down_callbacks+0x42/0x80
[  286.158930]  _cpu_down+0xb9/0xf0
[  286.158933]  freeze_secondary_cpus+0xa3/0x390
[  286.158936]  suspend_devices_and_enter+0x2fd/0xce0
[  286.158940]  pm_suspend+0x4f0/0x9d0
[  286.158943]  state_store+0x82/0xf0
[  286.158947]  kobj_attr_store+0xf/0x20
[  286.158950]  sysfs_kf_write+0x45/0x60
[  286.158953]  kernfs_fop_write+0x124/0x1c0
[  286.158956]  __vfs_write+0x28/0x130
[  286.158960]  ? rcu_read_lock_sched_held+0x7a/0x90
[  286.158962]  ? rcu_sync_lockdep_assert+0x2f/0x60
[  286.158965]  ? __sb_start_write+0x108/0x200
[  286.158968]  vfs_write+0xcb/0x1c0
[  286.158970]  SyS_write+0x49/0xb0
[  286.158974]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  286.158977] RIP: 0033:0x7f2b7e8378f0
[  286.158979] RSP: 002b:00007fffb950fe08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  286.158981] RAX: ffffffffffffffda RBX: ffffffff81493023 RCX: 00007f2b7e8378f0
[  286.158983] RDX: 0000000000000004 RSI: 0000558d2b42e060 RDI: 0000000000000007
[  286.158984] RBP: ffffc90000293f88 R08: 0000558d2b42bdc0 R09: 00007f2b7ed0b700
[  286.158986] R10: 00007f2b7eb00b58 R11: 0000000000000246 R12: 0000558d2b42bce0
[  286.158987] R13: 0000000000000001 R14: 0000000000000004 R15: 0000000000000004
[  286.158993]  ? __this_cpu_preempt_check+0x13/0x20
[  286.169772] IRQ 32: no longer affine to CPU2
[  286.169781] IRQ 120: no longer affine to CPU2
[  286.180638] IRQ 1: no longer affine to CPU3
[  286.180646] IRQ 8: no longer affine to CPU3
[  286.180651] IRQ 9: no longer affine to CPU3
[  286.180672] IRQ 12: no longer affine to CPU3
[  286.180699] IRQ 16: no longer affine to CPU3
[  286.180726] IRQ 17: no longer affine to CPU3
[  286.180761] IRQ 122: no longer affine to CPU3
[  286.195291]  cache: parent cpu1 should not be sleeping
[  286.198990]  cache: parent cpu2 should not be sleeping
[  286.202167]  cache: parent cpu3 should not be sleeping
[  286.231235] ACPI: button: The lid device is not compliant to SW_LID.
[  286.406396] HDA: we are doing full chip reset now
[  286.423171] snd_hda_codec_hdmi hdaudioC0D2: Unable to sync register 0x2f0d00. -5
[  286.440300] CSR SSP Base Not fine
[  286.440315] ------------[ cut here ]------------
[  286.440334] WARNING: CPU: 2 PID: 5 at drivers/gpu/drm/i915/intel_runtime_pm.c:577 assert_csr_loaded+0x8e/0xc0 [i915]
[  286.440335] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec asix snd_hwdep usbnet snd_hda_core mii e1000e snd_pcm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ptp ghash_clmulni_intel pps_core prime_numbers i2c_hid
[  286.440354] CPU: 2 PID: 5 Comm: kworker/u8:0 Tainted: G     U          4.14.0-rc2-CI-CI_DRM_3152+ #1
[  286.440355] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X095.A01.1707101424 07/10/2017
[  286.440358] Workqueue: events_unbound async_run_entry_fn
[  286.440360] task: ffff8802677d0040 task.stack: ffffc90000068000
[  286.440377] RIP: 0010:assert_csr_loaded+0x8e/0xc0 [i915]
[  286.440378] RSP: 0000:ffffc9000006bac0 EFLAGS: 00010282
[  286.440380] RAX: 0000000000000015 RBX: ffff880255510000 RCX: 0000000000000000
[  286.440381] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810eef26
[  286.440382] RBP: ffffc9000006bac8 R08: 0000000000000001 R09: 0000000000000000
[  286.440382] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880255510000
[  286.440383] R13: ffffffffa0258d38 R14: ffff880255510000 R15: 0000000400000000
[  286.440385] FS:  0000000000000000(0000) GS:ffff880271100000(0000) knlGS:0000000000000000
[  286.440386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  286.440387] CR2: 0000000000000000 CR3: 0000000003e10001 CR4: 00000000006606e0
[  286.440387] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  286.440388] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  286.440389] PKRU: 00000000
[  286.440390] Call Trace:
[  286.440408]  skl_enable_dc6+0x40/0xb0 [i915]
[  286.440425]  gen9_dc_off_power_well_disable+0x31/0x40 [i915]
[  286.440442]  intel_power_well_disable+0x3e/0x50 [i915]
[  286.440459]  intel_display_power_put+0xb5/0x110 [i915]
[  286.440490]  intel_display_set_init_power+0x2a/0x40 [i915]
[  286.440516]  intel_modeset_setup_hw_state+0xc5b/0xda0 [i915]
[  286.440519]  ? ww_mutex_lock+0x42/0xb0
[  286.440544]  __intel_display_resume+0x1f/0xc0 [i915]
[  286.440568]  intel_display_resume+0xc7/0x100 [i915]
[  286.440585]  i915_pm_restore+0xf4/0x190 [i915]
[  286.440600]  i915_pm_resume+0xe/0x10 [i915]
[  286.440603]  pci_pm_resume+0x74/0xb0
[  286.440606]  dpm_run_callback+0x6f/0x310
[  286.440608]  ? pci_pm_suspend+0x140/0x140
[  286.440611]  device_resume+0xb4/0x1e0
[  286.440614]  ? dpm_watchdog_set+0x70/0x70
[  286.440617]  async_resume+0x1d/0x50
[  286.440619]  async_run_entry_fn+0x38/0x160
[  286.440622]  process_one_work+0x233/0x660
[  286.440625]  worker_thread+0x4e/0x3b0
[  286.440628]  kthread+0x152/0x190
[  286.440629]  ? process_one_work+0x660/0x660
[  286.440631]  ? kthread_create_on_node+0x40/0x40
[  286.440633]  ret_from_fork+0x27/0x40
[  286.440636] Code: 05 77 7d 15 00 01 e8 f3 83 fb e0 0f ff eb af 80 3d 66 7d 15 00 00 75 bd 48 c7 c7 f7 82 24 a0 c6 05 56 7d 15 00 01 e8 d3 83 fb e0 <0f> ff eb a6 80 3d 45 7d 15 00 00 75 b4 48 c7 c7 0e 83 24 a0 c6 
[  286.440674] ---[ end trace 623896c0d288792c ]---
[  286.440676] CSR HTP Not fine
[  286.440686] ------------[ cut here ]------------
[  286.440704] WARNING: CPU: 2 PID: 5 at drivers/gpu/drm/i915/intel_runtime_pm.c:578 assert_csr_loaded+0xae/0xc0 [i915]
[  286.440705] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec asix snd_hwdep usbnet snd_hda_core mii e1000e snd_pcm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ptp ghash_clmulni_intel pps_core prime_numbers i2c_hid
[  286.440720] CPU: 2 PID: 5 Comm: kworker/u8:0 Tainted: G     U  W       4.14.0-rc2-CI-CI_DRM_3152+ #1
[  286.440721] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X095.A01.1707101424 07/10/2017
[  286.440724] Workqueue: events_unbound async_run_entry_fn
[  286.440725] task: ffff8802677d0040 task.stack: ffffc90000068000
[  286.440742] RIP: 0010:assert_csr_loaded+0xae/0xc0 [i915]
[  286.440743] RSP: 0000:ffffc9000006bac0 EFLAGS: 00010282
[  286.440744] RAX: 0000000000000010 RBX: ffff880255510000 RCX: 0000000000000000
[  286.440745] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810eef26
[  286.440746] RBP: ffffc9000006bac8 R08: 0000000000000001 R09: 0000000000000000
[  286.440747] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880255510000
[  286.440748] R13: ffffffffa0258d38 R14: ffff880255510000 R15: 0000000400000000
[  286.440749] FS:  0000000000000000(0000) GS:ffff880271100000(0000) knlGS:0000000000000000
[  286.440750] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  286.440751] CR2: 0000000000000000 CR3: 0000000003e10001 CR4: 00000000006606e0
[  286.440752] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  286.440753] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  286.440753] PKRU: 00000000
[  286.440754] Call Trace:
[  286.440771]  skl_enable_dc6+0x40/0xb0 [i915]
[  286.440787]  gen9_dc_off_power_well_disable+0x31/0x40 [i915]
[  286.440803]  intel_power_well_disable+0x3e/0x50 [i915]
[  286.440820]  intel_display_power_put+0xb5/0x110 [i915]
[  286.440836]  intel_display_set_init_power+0x2a/0x40 [i915]
[  286.440860]  intel_modeset_setup_hw_state+0xc5b/0xda0 [i915]
[  286.440861]  ? ww_mutex_lock+0x42/0xb0
[  286.440885]  __intel_display_resume+0x1f/0xc0 [i915]
[  286.440907]  intel_display_resume+0xc7/0x100 [i915]
[  286.440924]  i915_pm_restore+0xf4/0x190 [i915]
[  286.440939]  i915_pm_resume+0xe/0x10 [i915]
[  286.440941]  pci_pm_resume+0x74/0xb0
[  286.440944]  dpm_run_callback+0x6f/0x310
[  286.440946]  ? pci_pm_suspend+0x140/0x140
[  286.440949]  device_resume+0xb4/0x1e0
[  286.440951]  ? dpm_watchdog_set+0x70/0x70
[  286.440955]  async_resume+0x1d/0x50
[  286.440957]  async_run_entry_fn+0x38/0x160
[  286.440959]  process_one_work+0x233/0x660
[  286.440962]  worker_thread+0x4e/0x3b0
[  286.440964]  kthread+0x152/0x190
[  286.440965]  ? process_one_work+0x660/0x660
[  286.440967]  ? kthread_create_on_node+0x40/0x40
[  286.440969]  ret_from_fork+0x27/0x40
[  286.440972] Code: 05 56 7d 15 00 01 e8 d3 83 fb e0 0f ff eb a6 80 3d 45 7d 15 00 00 75 b4 48 c7 c7 0e 83 24 a0 c6 05 35 7d 15 00 01 e8 b3 83 fb e0 <0f> ff eb 9d 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 
[  286.441010] ---[ end trace 623896c0d288792d ]---
[  286.655622] atkbd serio0: Failed to deactivate keyboard on isa0060/serio0
[  286.686680] snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2b8000. -5
[  286.686799] snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2b8000. -5
[  287.039766] atkbd serio0: Failed to enable keyboard on isa0060/serio0
[  287.368209] Setting dangerous option reset - tainting kernel
Comment 1 Jani Saarinen 2017-09-29 07:48:22 UTC
Some relation to https://bugs.freedesktop.org/show_bug.cgi?id=102505?
Comment 2 Jani Saarinen 2017-09-29 08:15:05 UTC
Olso as reference: https://patchwork.kernel.org/patch/9937251/
Comment 3 krisman 2017-10-02 05:01:44 UTC
(In reply to Jani Saarinen from comment #0)
> On CI_DRM_3152 on cnl system in CI igt@gem_exec_suspend@basic-s3
> causing error.

Same as 103026.  Ok to dup?
Comment 4 Marta Löfstedt 2017-10-02 06:37:23 UTC
We have a bunch of these lockdep bugs open. Let's try-bot the Peter Zijlstra patch and see if it helps.
Comment 5 Marta Löfstedt 2017-10-02 06:51:55 UTC
https://patchwork.freedesktop.org/series/31251/
Comment 6 Marta Löfstedt 2017-10-02 10:39:57 UTC
The fixes from Peter Zijlstra are included in 4.14.0-rc3.
From CI_DRM_3959 drm-tip is 4.14.0-rc3. 
And we are no longer seeing the lockdep.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.