Bug 109831

Summary: [CI][BAT] igt@i915_module_load@reload-with-fault-injection - incomplete - WARN_ON(!intel_irqs_enabled(dev_priv))
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: KBL i915 features: display/DP MST

Description Lakshmi 2019-03-05 11:02:56 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5703/fi-kbl-7560u/igt@i915_module_load@reload-with-fault-injection.html

<4> [342.218287] CPU: 3 PID: 3536 Comm: i915_module_loa Tainted: G     U            5.0.0-CI-CI_DRM_5703+ #1
<4> [342.218368]  drm_fb_helper_fini.part.18+0xb3/0x100
<4> [342.218407]  intel_fbdev_fini+0x20/0x80 [i915]
<4> [342.218441]  intel_modeset_cleanup+0x9a/0x140 [i915]
<4> [342.218461]  i915_driver_unload+0xa7/0x110 [i915]
<4> [342.218481]  i915_pci_remove+0x19/0x30 [i915]
<4> [342.218539]  i915_exit+0x16/0x1c [i915]
<4> [342.594335] WARN_ON(!intel_irqs_enabled(dev_priv))
<4> [342.594362] WARNING: CPU: 1 PID: 627 at drivers/gpu/drm/i915/i915_irq.c:625 bdw_update_pipe_irq+0x108/0x120 [i915]
<4> [342.594455] Workqueue: events_long drm_dp_mst_link_probe_work
<4> [342.594596]  drm_vblank_enable+0x7c/0x120
<4> [342.594600]  drm_vblank_get+0x86/0xa0
<4> [342.594647]  intel_pipe_update_start+0xee/0x3f0 [i915]
<4> [342.594699]  intel_update_crtc+0xc7/0x3d0 [i915]
<4> [342.594777]  intel_atomic_commit_tail+0x207/0x12a0 [i915]
<4> [342.594834]  intel_atomic_commit+0x244/0x330 [i915]
<4> [342.594847]  drm_fb_helper_restore_fbdev_mode_unlocked+0x42/0x90
<4> [342.594852]  drm_fb_helper_set_par+0x24/0x50
<4> [342.594857]  drm_fb_helper_hotplug_event.part.24+0x9d/0xb0
<4> [342.594862]  drm_kms_helper_hotplug_event+0x21/0x30
<4> [342.594867]  drm_dp_send_link_address+0x16c/0x1d0
<4> [342.594872]  drm_dp_check_and_send_link_address+0x82/0xc0
<4> [342.594877]  drm_dp_mst_link_probe_work+0x54/0x70
<4> [342.595009] WARNING: CPU: 1 PID: 627 at drivers/gpu/drm/i915/i915_irq.c:625 bdw_update_pipe_irq+0x108/0x120 [i915]
<3> [353.098218] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:47:pipe A] flip_done timed out
<3> [363.338218] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:47:pipe A] flip_done timed out
<3> [373.578217] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:30:plane 1A] flip_done timed out
<3> [373.630266] [drm:intel_pipe_update_start [i915]] *ERROR* PSR idle timed out 0x40010006, atomic update may fail
<3> [373.633251] [drm:intel_pipe_update_start [i915]] *ERROR* Potential atomic update failure on pipe A
<3> [373.633302] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=83 end=84) time 8 us, min 1772, max 1799, scanline start 1799, end 1799
<3> [383.818217] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:47:pipe A] flip_done timed out
<4> [383.818346] WARNING: CPU: 3 PID: 3536 at drivers/gpu/drm/i915/intel_fbc.c:1155 intel_fbc_global_disable+0x5f/0x70 [i915]
<4> [383.818392] CPU: 3 PID: 3536 Comm: i915_module_loa Tainted: G     U  W         5.0.0-CI-CI_DRM_5703+ #1
<4> [383.818434] RIP: 0010:intel_fbc_global_disable+0x5f/0x70 [i915]
<4> [383.818531]  intel_modeset_cleanup+0xa7/0x140 [i915]
<4> [383.818554]  i915_driver_unload+0xa7/0x110 [i915]
<4> [383.818577]  i915_pci_remove+0x19/0x30 [i915]
<4> [383.818638]  i915_exit+0x16/0x1c [i915]
<4> [383.818776] WARNING: CPU: 3 PID: 3536 at drivers/gpu/drm/i915/intel_fbc.c:1155 intel_fbc_global_disable+0x5f/0x70 [i915]
<4> [383.830816] WARNING: CPU: 0 PID: 3536 at drivers/gpu/drm/drm_mode_config.c:456 drm_mode_config_cleanup+0x283/0x2d0
<4> [383.830859] CPU: 0 PID: 3536 Comm: i915_module_loa Tainted: G     U  W         5.0.0-CI-CI_DRM_5703+ #1
<4> [383.830872] RIP: 0010:drm_mode_config_cleanup+0x283/0x2d0
<4> [383.830973]  intel_modeset_cleanup+0xc3/0x140 [i915]
<4> [383.830994]  i915_driver_unload+0xa7/0x110 [i915]
<4> [383.831015]  i915_pci_remove+0x19/0x30 [i915]
<4> [383.831073]  i915_exit+0x16/0x1c [i915]
<4> [383.831183] WARNING: CPU: 0 PID: 3536 at drivers/gpu/drm/drm_mode_config.c:456 drm_mode_config_cleanup+0x283/0x2d0
<3> [383.831222] [drm:drm_mode_config_cleanup] *ERROR* connector eDP-1 leaked!
<4> [383.861445] WARNING: CPU: 2 PID: 3536 at drivers/gpu/drm/i915/intel_runtime_pm.c:4509 intel_runtime_pm_cleanup+0x2a/0x40 [i915]
<4> [383.861496] CPU: 2 PID: 3536 Comm: i915_module_loa Tainted: G     U  W         5.0.0-CI-CI_DRM_5703+ #1
<4> [383.861535] RIP: 0010:intel_runtime_pm_cleanup+0x2a/0x40 [i915]
<4> [383.861628]  i915_pci_remove+0x19/0x30 [i915]
<4> [383.861691]  i915_exit+0x16/0x1c [i915]
<4> [383.861830] WARNING: CPU: 2 PID: 3536 at drivers/gpu/drm/i915/intel_runtime_pm.c:4509 intel_runtime_pm_cleanup+0x2a/0x40 [i915]
<4> [383.871419] WARNING: CPU: 3 PID: 3536 at drivers/gpu/drm/drm_vblank.c:413 drm_vblank_cleanup+0x3b/0x70
<4> [383.871478] CPU: 3 PID: 3536 Comm: i915_module_loa Tainted: G     U  W         5.0.0-CI-CI_DRM_5703+ #1
<4> [383.871496] RIP: 0010:drm_vblank_cleanup+0x3b/0x70
<4> [383.871621]  drm_dev_fini+0x9/0xd0
<4> [383.871661]  i915_driver_release+0x19/0x30 [i915]
<4> [383.871695]  i915_pci_remove+0x21/0x30 [i915]
<4> [383.871780]  i915_exit+0x16/0x1c [i915]
<4> [383.871948] WARNING: CPU: 3 PID: 3536 at drivers/gpu/drm/drm_vblank.c:413 drm_vblank_cleanup+0x3b/0x70
<4> [384.929283] RIP: 0010:drm_setup_crtcs+0x13c/0xc90
<4> [384.929366]  ? drm_fb_helper_is_bound.isra.12+0x71/0xa0
<4> [384.929371]  drm_fb_helper_hotplug_event.part.24+0x84/0xb0
<4> [384.929377]  drm_kms_helper_hotplug_event+0x21/0x30
<0> [384.929487] CPU:1 [LOST 640719 EVENTS]
Comment 1 CI Bug Log 2019-03-05 11:05:08 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* KBL: igt@i915_module_load@reload-with-fault-injection - incomplete - WARN_ON(!intel_irqs_enabled(dev_priv)) 
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5703/fi-kbl-7560u/igt@i915_module_load@reload-with-fault-injection.html
Comment 2 Chris Wilson 2019-03-05 11:08:29 UTC
It's the unflushed worker:

<4>[  342.594335] WARN_ON(!intel_irqs_enabled(dev_priv))
<4>[  342.594362] WARNING: CPU: 1 PID: 627 at drivers/gpu/drm/i915/i915_irq.c:625 bdw_update_pipe_irq+0x108/0x120 [i915]
<4>[  342.594372] Modules linked in: i915(-) vgem cdc_ether r8152 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core mcs7830 usbnet mii snd_pcm thunderbolt mei_me mei prime_numbers i2c_hid btusb btrtl btbcm btintel bluetooth ecdh_generic pinctrl_sunrisepoint pinctrl_intel [last unloaded: i915]
<4>[  342.594425] CPU: 1 PID: 627 Comm: kworker/1:3 Tainted: G     U  W         5.0.0-CI-CI_DRM_5703+ #1
<4>[  342.594447] Hardware name: Dell Inc. XPS 13 9360/0823VW, BIOS 2.3.1 10/03/2017
<4>[  342.594455] Workqueue: events_long drm_dp_mst_link_probe_work
<4>[  342.594475] RIP: 0010:bdw_update_pipe_irq+0x108/0x120 [i915]
<4>[  342.594480] Code: a0 e8 7c 64 b4 e0 0f 0b 41 80 bc 24 e5 af 00 00 00 0f 85 41 ff ff ff 48 c7 c6 60 4a 70 a0 48 c7 c7 7a 1f 6f a0 e8 58 64 b4 e0 <0f> 0b 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00
<4>[  342.594492] RSP: 0018:ffffc900005239d0 EFLAGS: 00010082
<4>[  342.594496] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
<4>[  342.594501] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffff81123e20
<4>[  342.594524] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
<4>[  342.594529] R10: ffffc90000523970 R11: ffffffff82249a38 R12: ffff8881d5100000
<4>[  342.594535] R13: 0000000000000001 R14: ffff8881d51002b8 R15: ffff888271e3efd0
<4>[  342.594540] FS:  0000000000000000(0000) GS:ffff888276280000(0000) knlGS:0000000000000000
<4>[  342.594546] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  342.594551] CR2: 00005570b1d0a850 CR3: 0000000257b36004 CR4: 00000000003606e0
<4>[  342.594571] Call Trace:
<4>[  342.594590]  gen8_enable_vblank+0x33/0x60 [i915]
<4>[  342.594596]  drm_vblank_enable+0x7c/0x120
<4>[  342.594600]  drm_vblank_get+0x86/0xa0
<4>[  342.594647]  intel_pipe_update_start+0xee/0x3f0 [i915]
<4>[  342.594654]  ? wait_woken+0xa0/0xa0
<4>[  342.594699]  intel_update_crtc+0xc7/0x3d0 [i915]
<4>[  342.594729]  skl_update_crtcs+0x266/0x2b0 [i915]
<4>[  342.594777]  intel_atomic_commit_tail+0x207/0x12a0 [i915]
<4>[  342.594784]  ? flush_workqueue+0x236/0x530
<4>[  342.594788]  ? lockdep_hardirqs_on+0xe0/0x1b0
<4>[  342.594834]  intel_atomic_commit+0x244/0x330 [i915]
<4>[  342.594840]  restore_fbdev_mode_atomic+0x1da/0x1f0
<4>[  342.594847]  drm_fb_helper_restore_fbdev_mode_unlocked+0x42/0x90
<4>[  342.594852]  drm_fb_helper_set_par+0x24/0x50
<4>[  342.594857]  drm_fb_helper_hotplug_event.part.24+0x9d/0xb0
<4>[  342.594862]  drm_kms_helper_hotplug_event+0x21/0x30
<4>[  342.594867]  drm_dp_send_link_address+0x16c/0x1d0
<4>[  342.594872]  drm_dp_check_and_send_link_address+0x82/0xc0
<4>[  342.594877]  drm_dp_mst_link_probe_work+0x54/0x70
<4>[  342.594899]  process_one_work+0x245/0x610
<4>[  342.594905]  worker_thread+0x37/0x380
<4>[  342.594909]  ? process_one_work+0x610/0x610
<4>[  342.594913]  kthread+0x119/0x130
<4>[  342.594917]  ? kthread_park+0x80/0x80
<4>[  342.594922]  ret_from_fork+0x3a/0x50
<4>[  342.594927] irq event stamp: 1410618
<4>[  342.594932] hardirqs last  enabled at (1410617): [<ffffffff8197b80c>] _raw_spin_unlock_irqrestore+0x4c/0x60
<4>[  342.594955] hardirqs last disabled at (1410618): [<ffffffff8197445a>] __schedule+0xaa/0xb40
<4>[  342.594979] softirqs last  enabled at (1410284): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4>[  342.594987] softirqs last disabled at (1410277): [<ffffffff810b50f1>] irq_exit+0xd1/0xe0
<4>[  342.595009] WARNING: CPU: 1 PID: 627 at drivers/gpu/drm/i915/i915_irq.c:625 bdw_update_pipe_irq+0x108/0x120 [i915]

that appears to be the culprit.
Comment 3 Manasi 2019-06-06 19:52:25 UTC
This failure is seen on KBL machines and last seen 1 day ago.
Failure is in the igt@i915_module_load@reload-with-fault-injection test.
This test injects a failure at various points during the driver load to test whether the unload-load(reload) can happen correctly.

Failure analysis:
Looking through the dmesg logs, the failure occurs at the failure injection checkpoint 9. This injects failure at a point when the worker threads are already created.
Possible Cause:
When we unload the driver due to the failure, the driver is not cleaning up/killing the worker threads correctly. Due to this drm_dp_mst_link_probe_work() continues to do the modeset and expects the vblank irqs in crtc_enable() while the IRqs are already disabled due to the unload causing the warnings about IRqs not enabled.

Possible Fix:
This could be fixed by properly handling the worker thread cleanup/kill in intel_modeset_cleanup() or earlier.

Manasi
Comment 4 Chris Wilson 2019-07-03 20:24:39 UTC
commit 3b295cb1a411d9c82bbfaa66bc17a8508716ed07
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jun 3 14:58:57 2019 +0100

    drm: Flush output polling on shutdown
    
    We need to mark the output polling as disabled to prevent concurrent
    irqs from queuing new work as shutdown the probe -- causing that work to
    execute after we have freed the structs:
Comment 5 swathi.dhanavanthri 2019-11-22 20:26:13 UTC
No failures seen on drm-tip, closing and archiving this
Comment 6 CI Bug Log 2019-11-22 20:26:44 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.