Bug 99847

Summary: [BAT][IVB] Suspend tests fail due to e1000 driver failing to suspend
Product: DRI Reporter: Tvrtko Ursulin <tvrtko.ursulin>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED NOTOURBUG QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, jani.saarinen
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: IVB i915 features: power/suspend-resume

Description Tvrtko Ursulin 2017-02-17 11:38:45 UTC
https://intel-gfx-ci.01.org/CI/Patchwork_3870/fi-ivb-3520m/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html

[  432.585002] ------------[ cut here ]------------
[  432.585013] WARNING: CPU: 3 PID: 8372 at kernel/irq/manage.c:1478 __free_irq+0x9f/0x280
[  432.585015] Trying to free already-free IRQ 20
[  432.585016] Modules linked in: cdc_ncm usbnet x86_pkg_temp_thermal intel_powerclamp coretemp mii crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep lpc_ich snd_hda_core snd_pcm mei_me mei sdhci_pci sdhci i915 mmc_core e1000e ptp pps_core prime_numbers
[  432.585042] CPU: 3 PID: 8372 Comm: kworker/u16:40 Tainted: G     U          4.10.0-rc8-CI-Patchwork_3870+ #1
[  432.585044] Hardware name: LENOVO 2356GCG/2356GCG, BIOS G7ET31WW (1.13 ) 07/02/2012
[  432.585050] Workqueue: events_unbound async_run_entry_fn
[  432.585051] Call Trace:
[  432.585058]  dump_stack+0x67/0x92
[  432.585062]  __warn+0xc6/0xe0
[  432.585065]  warn_slowpath_fmt+0x4a/0x50
[  432.585070]  ? _raw_spin_lock_irqsave+0x49/0x60
[  432.585072]  __free_irq+0x9f/0x280
[  432.585075]  free_irq+0x34/0x80
[  432.585089]  e1000_free_irq+0x65/0x70 [e1000e]
[  432.585098]  e1000e_pm_freeze+0x7a/0xb0 [e1000e]
[  432.585106]  e1000e_pm_suspend+0x21/0x30 [e1000e]
[  432.585113]  pci_pm_suspend+0x71/0x140
[  432.585118]  dpm_run_callback+0x6f/0x330
[  432.585122]  ? pci_pm_freeze+0xe0/0xe0
[  432.585125]  __device_suspend+0xea/0x330
[  432.585128]  async_suspend+0x1a/0x90
[  432.585132]  async_run_entry_fn+0x34/0x160
[  432.585137]  process_one_work+0x1f4/0x6d0
[  432.585140]  ? process_one_work+0x16e/0x6d0
[  432.585143]  worker_thread+0x49/0x4a0
[  432.585145]  kthread+0x107/0x140
[  432.585148]  ? process_one_work+0x6d0/0x6d0
[  432.585150]  ? kthread_create_on_node+0x40/0x40
[  432.585154]  ret_from_fork+0x2e/0x40
[  432.585156] ---[ end trace 6712df7f8c4b9124 ]---
[  433.531342] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x30 [e1000e] returns -2
[  433.531345] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2
[  433.531349] PM: Device 0000:00:19.0 failed to suspend async: error -2
[  433.531439] PM: Some devices failed to suspend, or early wake event detected
[  433.542069] sd 0:0:0:0: [sda] Starting disk
Comment 1 Chris Wilson 2017-02-17 12:23:12 UTC
Note it was an error during an earlier e1000 suspend that triggered the later failure:

[  429.994338] ACPI : EC: event blocked
[  429.994633] e1000e: EEE TX LPI TIMER: 00000011
[  430.955451] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x30 [e1000e] returns -2
[  430.955454] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2
[  430.955458] PM: Device 0000:00:19.0 failed to suspend async: error -2
[  430.955581] PM: Some devices failed to suspend, or early wake event detected
[  430.957709] ACPI : EC: event unblocked
Comment 2 Chris Wilson 2017-02-17 12:30:03 UTC
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index eccf1da9356b..429a5210230d 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6615,12 +6615,19 @@ static int e1000e_pm_thaw(struct device *dev)
 static int e1000e_pm_suspend(struct device *dev)
 {
        struct pci_dev *pdev = to_pci_dev(dev);
+       int rc;
 
        e1000e_flush_lpic(pdev);
 
        e1000e_pm_freeze(dev);
 
-       return __e1000_shutdown(pdev, false);
+       rc = __e1000_shutdown(pdev, false);
+       if (rc) {
+               e1000e_pm_thaw(dev);
+               return rc;
+       }
+
+       return 0;
 }
 
 static int e1000e_pm_resume(struct device *dev)
Comment 3 Jani Nikula 2017-02-20 08:06:20 UTC
Has this been reported to the e1000 maintainers? They won't hang out at fdo bugzilla looking at DRM/Intel bugs...
Comment 4 Chris Wilson 2017-02-20 17:22:18 UTC
topic/core-for-CI commit ce3000be4f666479e49a4e844bda2a469b0bbb4d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Feb 17 12:30:51 2017 +0000

    e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails
Comment 5 Chris Wilson 2017-03-02 15:24:52 UTC
Got a response from e1000e maintainer - believe upstream fix in progress.
Comment 6 Jani Saarinen 2017-03-02 15:41:23 UTC
Should we wait real fix or whitelist now on CI?
Comment 7 Jani Saarinen 2017-03-02 15:42:30 UTC
Based on IRC discussion I will close now and whitelist on CI.
Comment 8 Jani Saarinen 2017-03-08 15:22:06 UTC
*** Bug 100114 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.