Bug 85920

Summary: [BYT-T] suspend/resume regression from 3.17 kernel version and beyond
Product: DRI Reporter: Glenn Williamson <glenn.p.williamson>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: intel-gfx-bugs
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Glenn Williamson 2014-11-05 15:34:33 UTC
I verified this issue again as follows: (First identified in 3.17-rc6)

OS: Ubuntu

Kernel: mainline Linux kernel 3.17 + patch(mmc: core: sdio: Fix unconditional wake_up_process() on sdio thread commit dea67c4ec8218b301d7cac7ee6e63dac0bc566cb)
 This patch is not in mainline kernel until 3.18-rc1, so need applying it or will occur oops.

Kernel config file: config_3.17(attached)

Machine: ASUS T100(baytrail-T platform)

Operations:
1.echo devices > /sys/power/pm_test; sleep 1; echo freeze > /sys/power/state
2.echo 0 > /sys/power/pm_async; sleep 1; echo devices > /sys/power/pm_test; sleep 1; echo freeze > /sys/power/state

With the cmdline: "linux /boot/vmlinuz-3.17.0-19-g71bc931 root=UUID=e69b20a7-3dfb-48f7-839d-8ae476da0ea5 no_console_suspend ro --", the above two operations all result in hang forever(black screen).

With the cmdline: "linux /boot/vmlinuz-3.17.0-19-g71bc931 root=UUID=e69b20a7-3dfb-48f7-839d-8ae476da0ea5 no_console_suspend ro text nomodeset --", the above two operations both work well, and the system can freeze and wake up normally.

With different cmdline, different drivers are used at run-time as follows:

have "text nomodeset": device[0000:00:02.0] driver[pci]
 device[LNXVIDEO:00] driver[acpi]

have no "text nomodeset": device[0000:00:02.0] driver[i915]
 device[LNXVIDEO:00] driver[video]

After graphics device was in suspended status, no any information will be printed on console even though have no_console_suspend cmdline argument. So, I can't determine which driver and which process(freeze or wake up) result in this hang. Nonetheless, the root cause of this issue should be in "i915" or "video" driver.


1.echo freeze > /sys/power/state
 [ 63.204061] PM: Syncing filesystems ... done.
 [ 63.267909] Freezing user space processes ... (elapsed 0.001 seconds) done.
 [ 63.277521] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
 [ 133.267741] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} 
(detected by 3, t=15010 jiffies, g=1940, c=1939, q=1)
 [ 133.279673] Task dump for CPU 0:
 [ 133.283290] kworker/u8:0 R running task 0 6 2 0x00000008
 [ 133.291213] Workqueue: events_unbound async_run_entry_fn
 [ 133.297167] ffffffff81076284 0000000000000246 ffffffffa030c049 0000000000000246
 [ 133.305479] ffffffffa030d5f2 ffff88003c240000 000000003f367101 00000000ffff199e
 [ 133.313790] ffff88003713bc94 0000000000000246 0000000000000046 ffffffffa030c049
 [ 133.322101] Call Trace:
 [ 133.324855] [<ffffffff81076284>] ? lock_timer_base.isra.38+0x21/0x44
 [ 133.332189] [<ffffffffa030c049>] ? vlv_read32+0x101/0x110 [i915]
 [ 133.339121] [<ffffffffa030d5f2>] ? gen6_write32+0x3a/0x83 [i915]
 [ 133.346052] [<ffffffffa030c049>] ? vlv_read32+0x101/0x110 [i915]
 [ 133.352953] [<ffffffffa02de599>] ? vlv_display_power_well_disable+0x51/0x63 [i915]
 [ 133.361591] [<ffffffffa02de599>] ? vlv_display_power_well_disable+0x51/0x63 [i915]
 [ 133.370243] [<ffffffffa02e258b>] ? intel_display_power_put+0xd8/0x105 [i915]
 [ 133.378347] [<ffffffffa0319fcb>] ? intel_display_set_init_power+0x26/0x30 [i915]
 [ 133.386795] [<ffffffffa02d8973>] ? i915_drm_freeze+0x1b8/0x1c3 [i915]
 [ 133.394120] [<ffffffff811e80d9>] ? pci_pm_suspend+0x77/0xf4
 [ 133.400465] [<ffffffff811e8062>] ? pci_pm_freeze+0xa2/0xa2
 [ 133.406707] [<ffffffff812755f8>] ? dpm_run_callback+0x3a/0x72
 [ 133.413247] [<ffffffff8127613b>] ? __device_suspend+0x1d1/0x25e
 [ 133.419983] [<ffffffff812761dd>] ? async_suspend+0x15/0x4e
 [ 133.426232] [<ffffffff8104de91>] ? async_run_entry_fn+0x55/0x107
 [ 133.433064] [<ffffffff810486cb>] ? process_one_work+0x168/0x27b
 [ 133.439798] [<ffffffff81048c1e>] ? worker_thread+0x1de/0x2b3
 [ 133.446232] [<ffffffff81048a40>] ? cancel_delayed_work_sync+0xa/0xa
 [ 133.453357] [<ffffffff8104c144>] ? kthread+0x9e/0xa6
 [ 133.459023] [<ffffffff8104c0a6>] ? __kthread_parkme+0x55/0x55
 [ 133.465564] [<ffffffff813404ac>] ? ret_from_fork+0x7c/0xb0
 [ 133.471813] [<ffffffff8104c0a6>] ? __kthread_parkme+0x55/0x55

Also reported, 
For T100(Baytrail-T), this issue is still exist with mainline kernel 3.18-rc2. For Flex 10(Baytrail-M), there is no this issue with mainline kernel 3.17.0 and 3.18-rc2.

My Flex 10(Baytrail-M): VGA compatible controller: Intel Corporation ValleyView Gen7 (rev 0a)
 MY T100(Baytrail-T): VGA compatible controller: Intel Corporation ValleyView Gen7 (rev 09)

Display interface on T100: MIPI_DSI
 Display interface on Flex 10: EDP
Comment 1 Gordon Jin 2014-11-06 05:11:32 UTC
changing the title, since the description says BYT not BSW.
Comment 2 Imre Deak 2014-11-11 15:14:36 UTC
I can reproduce this with 18.0-rc2, but I haven't found any way to get the logs out of the machine after the crash. But with drm-intel-nightly from git://anongit.freedesktop.org/drm-intel I'm not able to reproduce this issue. Could you give it a try? There are a few i915 suspend/resume fixes that could explain this.
Comment 3 Shobhit 2014-11-13 08:32:10 UTC
I also confirmed that drm-intel-nightly does not have the issue
Comment 4 Daniel Vetter 2014-11-18 09:12:09 UTC
Also a bisect (either of the commit introducing the regression or the bugfix in -nightly using a revers bisect) is needed here I think.
Comment 5 Imre Deak 2014-11-18 16:38:26 UTC
Could you try the commit below on 3.18-rc5, that got rid of the problem for me. Resetting simply to that commit in -nightly doesn't work, there seems to be multiple issues fixed since then:

commit 950eabaf5a87257040e0c207be09487954113f54
Author: Imre Deak <imre.deak@intel.com>
Date:   Mon Sep 8 15:21:09 2014 +0300

    drm/i915: vlv: fix display IRQ enable/disable
    
    We want to enable/disable display IRQs only if global i915 IRQs are
    enabled. To check the latter it's not enough to consult the DRM
    dev->irq_enabled flag, since runtime PM can disable/enable IRQs
    and it won't adjust this flag only the i915 specific
    dev_priv->pm._irqs_disabled flag. Fix this by using the proper
    intel_irqs_enabled() helper instead.
    
    Fortunately this didn't cause an actual problem since even if we enabled
    display IRQs too early (before enabling global i915 IRQs) the
    VLV_MASTER_IER would still be clear masking all IRQs.
    
    This issue was caught by
    
    commit 920dd15a2b2fc60d054646a8a1ffd6aeb6090e05
    Author: Daniel Vetter <daniel.vetter@ffwll.ch>
    Date:   Wed Aug 27 10:43:37 2014 +0200
    
        drm/i915: WARN if interrupts aren't on in en/disable_pipestat
    
    Signed-off-by: Imre Deak <imre.deak@intel.com>
    Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4847ed5..d22f870 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3723,7 +3723,7 @@ void valleyview_enable_display_irqs(struct drm_i915_private *dev_priv)
 
 	dev_priv->display_irqs_enabled = true;
 
-	if (dev_priv->dev->irq_enabled)
+	if (intel_irqs_enabled(dev_priv))
 		valleyview_display_irqs_install(dev_priv);
 }
 
@@ -3736,7 +3736,7 @@ void valleyview_disable_display_irqs(struct drm_i915_private *dev_priv)
 
 	dev_priv->display_irqs_enabled = false;
 
-	if (dev_priv->dev->irq_enabled)
+	if (intel_irqs_enabled(dev_priv))
 		valleyview_display_irqs_uninstall(dev_priv);
 }
Comment 6 Jani Nikula 2014-12-11 14:09:46 UTC
commit c352d1ba1e1e2c8a96af660944a58e86b12ac4af
Author: Imre Deak <imre.deak@intel.com>
Date:   Thu Nov 20 16:05:55 2014 +0200

    drm/i915: vlv: fix IRQ masking when uninstalling interrupts

in drm-intel-next-fixes, cc: stable.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.