Bug 108676 - [CI][BAT] igt@*reload/suspend/hibernate* - dmesg-warn - WARNING: possible circular locking dependency detected
Summary: [CI][BAT] igt@*reload/suspend/hibernate* - dmesg-warn - WARNING: possible cir...
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-06 08:40 UTC by Martin Peres
Modified: 2018-12-28 08:57 UTC (History)
1 user (show)

See Also:
i915 platform: BDW, BSW/CHT, BXT, BYT, KBL
i915 features: power/runtime PM


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-11-06 08:40:02 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5089/fi-kbl-guc/igt@pm_rpm@module-reload.html

<7> [197.958145] [drm:intel_power_well_enable [i915]] enabling always-on
<7> [197.958778] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK for addr: 0050 w(1)
<7> [197.958810] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK on first message, retry
<7> [197.959385] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK for addr: 0050 w(1)
<7> [197.959404] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter i915 gmbus dpb
<7> [197.959434] [drm:intel_hdmi_set_edid [i915]] HDMI GMBUS EDID read failed, retry using GPIO bit-banging
<4> [197.959701] 
<4> [197.959702] ======================================================
<4> [197.959702] WARNING: possible circular locking dependency detected
<4> [197.959703] 4.20.0-rc1-CI-CI_DRM_5089+ #1 Tainted: G     U  W        
<4> [197.959704] ------------------------------------------------------
<4> [197.959704] kworker/1:2/304 is trying to acquire lock:
<4> [197.959704] 00000000a039cdda (&p->pi_lock){-.-.}, at: try_to_wake_up+0x38/0x610
<4> [197.959706] 
<4> [197.959707] but task is already holding lock:
<4> [197.959707] 00000000ccf5062d ((console_sem).lock){-...}, at: up+0xd/0x50
<4> [197.959709] 
<4> [197.959709] which lock already depends on the new lock.
<4> [197.959709] 
<4> [197.959710] 
<4> [197.959710] the existing dependency chain (in reverse order) is:
<4> [197.959710] 
<4> [197.959711] -> #3 ((console_sem).lock){-...}:
<4> [197.959712]        down_trylock+0xa/0x30
<4> [197.959713]        __down_trylock_console_sem+0x20/0x80
<4> [197.959713]        console_trylock+0xe/0x60
<4> [197.959714]        vprintk_emit+0xf1/0x320
<4> [197.959714]        printk+0x4d/0x69
<4> [197.959714]        __debug_object_init+0x281/0x560
<4> [197.959715]        virt_efi_get_next_variable+0x9d/0x190
<4> [197.959715]        efivar_init+0x11f/0x3a0
<4> [197.959716]        efisubsys_init+0x155/0x2fc
<4> [197.959716]        do_one_initcall+0x58/0x2e0
<4> [197.959717]        kernel_init_freeable+0x2c0/0x352
<4> [197.959717]        kernel_init+0x5/0x100
<4> [197.959717]        ret_from_fork+0x3a/0x50
<4> [197.959718] 
<4> [197.959718] -> #2 (&obj_hash[i].lock){-.-.}:
<4> [197.959719]        __debug_object_init+0x74/0x560
<4> [197.959720]        hrtimer_init+0x1b/0x100
<4> [197.959721]        init_dl_task_timer+0x17/0x30
<4> [197.959721]        __sched_fork.isra.16+0xa4/0xf0
<4> [197.959721]        init_idle+0x53/0x270
<4> [197.959722]        sched_init+0x421/0x471
<4> [197.959722]        start_kernel+0x272/0x4c9
<4> [197.959722]        secondary_startup_64+0xa4/0xb0
<4> [197.959723] 
<4> [197.959723] -> #1 (&rq->lock){-.-.}:
<4> [197.959724]        task_fork_fair+0x36/0x160
<4> [197.959725]        sched_fork+0x118/0x220
<4> [197.959725]        copy_process.part.6+0x7d0/0x2250
<4> [197.959726]        _do_fork+0xe2/0x6b0
<4> [197.959726]        kernel_thread+0x20/0x30
<4> [197.959726]        rest_init+0x1d/0x250
<4> [197.959727]        start_kernel+0x4a9/0x4c9
<4> [197.959727]        secondary_startup_64+0xa4/0xb0
<4> [197.959727] 
<4> [197.959728] -> #0 (&p->pi_lock){-.-.}:
<4> [197.959729]        _raw_spin_lock_irqsave+0x33/0x50
<4> [197.959730]        try_to_wake_up+0x38/0x610
<4> [197.959730]        up+0x3b/0x50
<4> [197.959731]        __up_console_sem+0x2e/0x50
<4> [197.959731]        console_unlock+0x30b/0x5f0
<4> [197.959731]        vprintk_emit+0xfe/0x320
<4> [197.959732]        printk+0x4d/0x69
<4> [197.959732]        drm_dbg+0x7f/0x90
<4> [197.959732]        intel_hdmi_set_edid+0xd1/0x280 [i915]
<4> [197.959733]        intel_hdmi_detect+0x9f/0xc0 [i915]
<4> [197.959733]        drm_helper_probe_single_connector_modes+0xd4/0x6e0
<4> [197.959734]        drm_setup_crtcs+0x156/0xc90
<4> [197.959734]        __drm_fb_helper_initial_config_and_unlock+0x34/0x4a0
<4> [197.959734]        drm_kms_helper_hotplug_event+0x21/0x30
<4> [197.959735]        output_poll_execute+0x9d/0x1a0
<4> [197.959735]        process_one_work+0x262/0x630
<4> [197.959736]        worker_thread+0x37/0x380
<4> [197.959736]        kthread+0x119/0x130
<4> [197.959737]        ret_from_fork+0x3a/0x50
<4> [197.959737] 
<4> [197.959737] other info that might help us debug this:
<4> [197.959738] 
<4> [197.959738] Chain exists of:
<4> [197.959738]   &p->pi_lock --> &obj_hash[i].lock --> (console_sem).lock
<4> [197.959740] 
<4> [197.959741]  Possible unsafe locking scenario:
<4> [197.959741] 
<4> [197.959741]        CPU0                    CPU1
<4> [197.959742]        ----                    ----
<4> [197.959742]   lock((console_sem).lock);
<4> [197.959743]                                lock(&obj_hash[i].lock);
<4> [197.959744]                                lock((console_sem).lock);
<4> [197.959745]   lock(&p->pi_lock);
<4> [197.959746] 
<4> [197.959746]  *** DEADLOCK ***
<4> [197.959747] 
<4> [197.959747] 7 locks held by kworker/1:2/304:
<4> [197.959747]  #0: 000000002fc62b4e ((wq_completion)"events"){+.+.}, at: process_one_work+0x1d6/0x630
<4> [197.959749]  #1: 00000000e1969fe4 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1d6/0x630
<4> [197.959751]  #2: 00000000a3406f56 (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.24+0x16/0xb0
<4> [197.959753]  #3: 00000000e80d8032 (&dev->mode_config.mutex){+.+.}, at: drm_setup_crtcs+0x104/0xc90
<4> [197.959755]  #4: 000000000a5dcc0f (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_single_connector_modes+0x68/0x6e0
<4> [197.959757]  #5: 0000000013fa4faa (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0xb5/0x110
<4> [197.959759]  #6: 00000000ccf5062d ((console_sem).lock){-...}, at: up+0xd/0x50
<4> [197.959761] 
<4> [197.959761] stack backtrace:
<4> [197.959762] CPU: 1 PID: 304 Comm: kworker/1:2 Tainted: G     U  W         4.20.0-rc1-CI-CI_DRM_5089+ #1
<4> [197.959762] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018
<4> [197.959762] Workqueue: events output_poll_execute
<4> [197.959763] Call Trace:
<4> [197.959764]  dump_stack+0x67/0x9b
<4> [197.959764]  print_circular_bug.isra.16+0x1c8/0x2b0
<4> [197.959765]  __lock_acquire+0x183a/0x1b00
<4> [197.959765]  ? lock_acquire+0xa6/0x1c0
<4> [197.959765]  lock_acquire+0xa6/0x1c0
<4> [197.959766]  ? try_to_wake_up+0x38/0x610
<4> [197.959766]  _raw_spin_lock_irqsave+0x33/0x50
<4> [197.959767]  ? try_to_wake_up+0x38/0x610
<4> [197.959767]  try_to_wake_up+0x38/0x610
<4> [197.959768]  up+0x3b/0x50
<4> [197.959768]  __up_console_sem+0x2e/0x50
<4> [197.959768]  console_unlock+0x30b/0x5f0
<4> [197.959769]  vprintk_emit+0xfe/0x320
<4> [197.959769]  printk+0x4d/0x69
<4> [197.959769]  ? intel_hdmi_set_edid+0xd1/0x280 [i915]
<4> [197.959770]  drm_dbg+0x7f/0x90
<4> [197.959770]  intel_hdmi_set_edid+0xd1/0x280 [i915]
<4> [197.959770]  ? intel_hdmi_detect+0x87/0xc0 [i915]
<4> [197.959771]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [197.959772]  intel_hdmi_detect+0x9f/0xc0 [i915]
<4> [197.959772]  drm_helper_probe_single_connector_modes+0xd4/0x6e0
<4> [197.959772]  ? __mutex_lock+0x89/0x970
<4> [197.959773]  ? __mutex_lock+0x3f2/0x970
<4> [197.959773]  ? drm_setup_crtcs+0xce/0xc90
<4> [197.959773]  drm_setup_crtcs+0x156/0xc90
<4> [197.959774]  ? drm_fb_helper_hotplug_event.part.24+0x16/0xb0
<4> [197.959775]  ? lockdep_hardirqs_on+0xe0/0x1b0
<4> [197.959775]  __drm_fb_helper_initial_config_and_unlock+0x34/0x4a0
<4> [197.959775]  drm_kms_helper_hotplug_event+0x21/0x30
<4> [197.959776]  output_poll_execute+0x9d/0x1a0
<4> [197.959776]  process_one_work+0x262/0x630
<4> [197.959777]  worker_thread+0x37/0x380
<4> [197.959777]  ? process_one_work+0x630/0x630
<4> [197.959777]  kthread+0x119/0x130
<4> [197.959778]  ? kthread_park+0x80/0x80
<4> [197.959778]  ret_from_fork+0x3a/0x50
<7> [197.959804] [drm:intel_gmbus_force_bit [i915]] enabling bit-banging on i915 gmbus dpb. force bit now 1
Comment 1 Chris Wilson 2018-11-06 09:08:31 UTC
That's an artifact of hitting the debugobject spam early on during boot:

<6>[    1.353329] Registered efivars operations
<4>[    1.353329] ODEBUG: object 0000000015ac3fa7 is NOT on stack 00000000cd66b6c5, but annotated.
<4>[    1.353329] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:369 __debug_object_init+0x281/0x560
<4>[    1.353329] Modules linked in:
<4>[    1.353329] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc1-CI-CI_DRM_5089+ #1
<4>[    1.353329] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018
<4>[    1.353329] RIP: 0010:__debug_object_init+0x281/0x560
<4>[    1.353329] Code: 84 55 fe ff ff 83 c0 01 48 89 de 48 c7 c7 50 6f 0d 82 89 05 1d 77 22 02 65 48 8b 04 25 00 4f 01 00 48 8b 50 18 e8 28 d4 c5 ff <0f> 0b 41 8b 44 24 10 83 f8 03 0f 85 2e fe ff ff 4c 89 e7 48 c7 c6
<4>[    1.353329] RSP: 0000:ffffc9000002bd10 EFLAGS: 00010082
<4>[    1.353329] RAX: 0000000000000050 RBX: ffffffff83ff98d0 RCX: 0000000000000000
<4>[    1.353329] RDX: 0000000000000000 RSI: ffffffff82128772 RDI: 00000000ffffffff
<4>[    1.353329] RBP: ffffffff8223c8a0 R08: 0000000000000000 R09: 0000000000000001
<4>[    1.353329] R10: ffffc9000002bc90 R11: 0000000000000000 R12: ffff88022c2982f8
<4>[    1.353329] R13: ffffffff83747dc8 R14: 0000000000000206 R15: 0000000000081900
<4>[    1.353329] FS:  0000000000000000(0000) GS:ffff88022ea00000(0000) knlGS:0000000000000000
<4>[    1.353329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[    1.353329] CR2: 0000000000000000 CR3: 0000000005210001 CR4: 00000000003606f0
<4>[    1.353329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[    1.353329] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[    1.353333] Call Trace:
<4>[    1.353343]  virt_efi_get_next_variable+0x9d/0x190
<4>[    1.353352]  efivar_init+0x11f/0x3a0
<4>[    1.353361]  ? efivar_ssdt_setup+0x3a/0x3a
<4>[    1.353370]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4>[    1.353380]  ? lockdep_hardirqs_on+0xe0/0x1b0
<4>[    1.353389]  ? match_config_table.part.0+0x129/0x129
<4>[    1.353398]  ? efisubsys_init+0x155/0x2fc
<4>[    1.353405]  efisubsys_init+0x155/0x2fc
<4>[    1.353413]  ? match_config_table.part.0+0x129/0x129
<4>[    1.353422]  ? set_debug_rodata+0xc/0xc
<4>[    1.353430]  do_one_initcall+0x58/0x2e0
<4>[    1.353438]  ? set_debug_rodata+0xc/0xc
<4>[    1.353447]  kernel_init_freeable+0x2c0/0x352
<4>[    1.353455]  ? rest_init+0x250/0x250
<4>[    1.353463]  kernel_init+0x5/0x100
<4>[    1.353470]  ret_from_fork+0x3a/0x50
<4>[    1.353480] irq event stamp: 2233218
<4>[    1.353487] hardirqs last  enabled at (2233217): [<ffffffff81953d2c>] _raw_spin_unlock_irqrestore+0x4c/0x60
<4>[    1.353503] hardirqs last disabled at (2233218): [<ffffffff81953b9d>] _raw_spin_lock_irqsave+0xd/0x50
<4>[    1.353517] softirqs last  enabled at (2232116): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4>[    1.353532] softirqs last disabled at (2232107): [<ffffffff8108df29>] irq_exit+0xa9/0xc0
<4>[    1.353545] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:369 __debug_object_init+0x281/0x560
<4>[    1.353564] ---[ end trace 982afa883b783f4b ]---
Comment 2 Chris Wilson 2018-11-06 09:10:50 UTC
Caused by

commit 9dbbedaa6171247c4c7c40b83f05b200a117c2e0
Author: Sai Praneeth <sai.praneeth.prakhya@intel.com>
Date:   Tue Sep 11 12:15:21 2018 -0700

    efi: Make efi_rts_work accessible to efi page fault handler
    
    After the kernel has booted, if any accesses by firmware causes a page
    fault, the efi page fault handler would freeze efi_rts_wq and schedules
    a new process. To do this, the efi page fault handler needs
    efi_rts_work. Hence, make it accessible.
    
    There will be no race conditions in accessing this structure, because
    all the calls to efi runtime services are already serialized.
    
    Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
    Suggested-by: Matt Fleming <matt@codeblueprint.co.uk>
    Based-on-code-from: Ricardo Neri <ricardo.neri@intel.com>
    Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
    Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Comment 3 Chris Wilson 2018-11-06 09:11:35 UTC
Fix en route: 1540304307-2806-1-git-send-email-longman@redhat.com
Comment 4 Daniel Vetter 2018-11-19 15:41:21 UTC
Full link for the lazy (like me):

https://lore.kernel.org/lkml/1540304307-2806-1-git-send-email-longman@redhat.com/


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.