Bug 95166 - [KBL] Suspend to disk does work after several iterations
Summary: [KBL] Suspend to disk does work after several iterations
Status: CLOSED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-27 10:31 UTC by cprigent
Modified: 2016-10-28 11:17 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features: power/suspend-resume


Attachments
kern.log (2.46 MB, text/plain)
2016-04-27 10:31 UTC, cprigent
no flags Details
logs.jpg (2.65 MB, image/jpeg)
2016-04-27 10:55 UTC, cprigent
no flags Details
kblu_4.8-rc2_reboot-when-resuming from-s4_kern.log (3.26 MB, text/x-log)
2016-08-23 15:07 UTC, cprigent
no flags Details

Description cprigent 2016-04-27 10:31:59 UTC
Created attachment 123299 [details]
kern.log

Hardware
Platform: KABY LAKE-U
CPU : Intel(R) Core(TM) @ 2.60GHz
MCP : KBL-U G0 2+2 (ou ULT-G0)
QDF : QYQ8
Chipset PCH: SPT-LP C1
CRB : KABY LAKE U DDR3L RVP7 CRB FAB1

Software
BIOS : KBLSE2R1.R00.X015.B01.1511271314
ME FW : 11.5.0.1008
Ksc (EC FW): 1.20
Linux distribution: Ubuntu 15.10 64 bits
Kernel: drm-intel-nightly 4.6.0-rc4 from http://cgit.freedesktop.org/drm-intel/
  with applied: https://lists.freedesktop.org/archives/intel-gfx/2016-April/094135.html
libdrm 2.4.67-25 cc9a53f from git://git.freedesktop.org/git/mesa/drm
mesa 11.1.2 7bcd827 from git://git.freedesktop.org/git/mesa/mesa
cairo 1.15.2 db8a7f1 from git://git.freedesktop.org/git/cairo
xorg/xserver 1.18.0-274 8437955 from git://git.freedesktop.org/git/xorg/xserver
xf86-video-intel 2.99.917-634 81029be from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
vaapi/libva 1.7.0-1 2339d10 from git://git.freedesktop.org/git/vaapi/libva
vaapi/intel-driver 1.7.0-8 2c1bec0 from git://git.freedesktop.org/git/vaapi/intel-driver
DMC 1.01 from https://01.org/linuxgraphics/downloads/kabylake-dmc-1.01

Pre-conditions
--------------
Disconnect external screens
Unplug unnecessary PCI cards like ethernet card ...
Boot in text mode (execute command: systemctl set-default multi-user.target and reboot)
Disconnect USB mouse

Steps:
------
1. Suspend to disk and resume with:
sudo -s
echo disk > /sys/power/state
Wait 60 seconds
Resume with keyboard
2. Wait 30 seconds
3. Repeat steps 2 and 3 100 times

Actual result:
--------------
3. DUT does not respond after between 8 and 12 iterations (black screen, keyboard not responding, ssh not responding, need to unplug charger)

Expected result:
----------------
3. System can suspend to DISK and resume 100 times
Comment 1 cprigent 2016-04-27 10:55:29 UTC
Created attachment 123303 [details]
logs.jpg

I launched 50 loops of each test modes of hibernation:

Freezer mode:
It returned 3 times "write error: Device or resource busy"
Apr 27 10:10:42 KBLU1 kernel: [  693.773538] Freezing user space processes ...
Apr 27 10:10:42 KBLU1 kernel: [  713.776165] Freezing of tasks failed after 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
Apr 27 10:10:42 KBLU1 kernel: [  713.776323] fstrim          D ffff8801ded4faa8     0  3757   3756 0x00080004
Apr 27 10:10:42 KBLU1 kernel: [  713.776334]  ffff8801ded4faa8 00000000fffffffb ffff8801e6373c80 ffff8801dfda1e40
Apr 27 10:10:42 KBLU1 kernel: [  713.776341]  ffff8801ded50000 ffff8801edcd6d00 7fffffffffffffff ffff8801dfda1e40
Apr 27 10:10:42 KBLU1 kernel: [  713.776347]  0000000016200000 ffff8801ded4fac0 ffffffff81795b25 0000000000000000
Apr 27 10:10:42 KBLU1 kernel: [  713.776353] Call Trace:
Apr 27 10:10:42 KBLU1 kernel: [  713.776367]  [<ffffffff81795b25>] schedule+0x35/0x80
Apr 27 10:10:42 KBLU1 kernel: [  713.776375]  [<ffffffff8179895f>] schedule_timeout+0x1af/0x260
Apr 27 10:10:42 KBLU1 kernel: [  713.776385]  [<ffffffff810ebffc>] ? ktime_get+0x3c/0xb0
Apr 27 10:10:42 KBLU1 kernel: [  713.776390]  [<ffffffff817950a4>] io_schedule_timeout+0xa4/0x110
Apr 27 10:10:42 KBLU1 kernel: [  713.776397]  [<ffffffff81796eb4>] wait_for_completion_io+0xa4/0x110
Apr 27 10:10:42 KBLU1 kernel: [  713.776403]  [<ffffffff810a27e0>] ? wake_up_q+0x70/0x70
Apr 27 10:10:42 KBLU1 kernel: [  713.776412]  [<ffffffff81389d10>] blkdev_issue_discard+0x1e0/0x230
Apr 27 10:10:42 KBLU1 kernel: [  713.776419]  [<ffffffff812bd169>] ext4_trim_fs+0x489/0x9e0
Apr 27 10:10:42 KBLU1 kernel: [  713.776427]  [<ffffffff8128a329>] ext4_ioctl+0xc59/0x1300
Apr 27 10:10:42 KBLU1 kernel: [  713.776436]  [<ffffffff8120ea92>] do_vfs_ioctl+0x92/0x580
Apr 27 10:10:42 KBLU1 kernel: [  713.776444]  [<ffffffff812009f5>] ? SYSC_newfstat+0x25/0x30
Apr 27 10:10:42 KBLU1 kernel: [  713.776451]  [<ffffffff8120eff9>] SyS_ioctl+0x79/0x90
Apr 27 10:10:42 KBLU1 kernel: [  713.776458]  [<ffffffff81003cc9>] do_syscall_64+0x69/0x110
Apr 27 10:10:42 KBLU1 kernel: [  713.776466]  [<ffffffff81799ba5>] entry_SYSCALL64_slow_path+0x25/0x25
Apr 27 10:10:42 KBLU1 kernel: [  713.776472]

Devices, Platform, Processors, Core modes: 50 with success and without error.

Then the 1rst normal suspend to disk caused a crashed.
I attach "test-mode-of-hibernation_kern.log.tar.gz" and logs.jpg (we see more logs in the screen than in kern.log)
Comment 2 Imre Deak 2016-04-27 11:31:13 UTC
(In reply to cprigent from comment #1)
> Created attachment 123303 [details]
> logs.jpg
> 
> I launched 50 loops of each test modes of hibernation:
> 
> Freezer mode:
> It returned 3 times "write error: Device or resource busy"
> Apr 27 10:10:42 KBLU1 kernel: [  693.773538] Freezing user space processes
> ...
> Apr 27 10:10:42 KBLU1 kernel: [  713.776165] Freezing of tasks failed after
> 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
> Apr 27 10:10:42 KBLU1 kernel: [  713.776323] fstrim          D
> ffff8801ded4faa8     0  3757   3756 0x00080004
> Apr 27 10:10:42 KBLU1 kernel: [  713.776334]  ffff8801ded4faa8
> 00000000fffffffb ffff8801e6373c80 ffff8801dfda1e40
> Apr 27 10:10:42 KBLU1 kernel: [  713.776341]  ffff8801ded50000
> ffff8801edcd6d00 7fffffffffffffff ffff8801dfda1e40
> Apr 27 10:10:42 KBLU1 kernel: [  713.776347]  0000000016200000
> ffff8801ded4fac0 ffffffff81795b25 0000000000000000
> Apr 27 10:10:42 KBLU1 kernel: [  713.776353] Call Trace:
> Apr 27 10:10:42 KBLU1 kernel: [  713.776367]  [<ffffffff81795b25>]
> schedule+0x35/0x80
> Apr 27 10:10:42 KBLU1 kernel: [  713.776375]  [<ffffffff8179895f>]
> schedule_timeout+0x1af/0x260
> Apr 27 10:10:42 KBLU1 kernel: [  713.776385]  [<ffffffff810ebffc>] ?
> ktime_get+0x3c/0xb0
> Apr 27 10:10:42 KBLU1 kernel: [  713.776390]  [<ffffffff817950a4>]
> io_schedule_timeout+0xa4/0x110
> Apr 27 10:10:42 KBLU1 kernel: [  713.776397]  [<ffffffff81796eb4>]
> wait_for_completion_io+0xa4/0x110
> Apr 27 10:10:42 KBLU1 kernel: [  713.776403]  [<ffffffff810a27e0>] ?
> wake_up_q+0x70/0x70
> Apr 27 10:10:42 KBLU1 kernel: [  713.776412]  [<ffffffff81389d10>]
> blkdev_issue_discard+0x1e0/0x230
> Apr 27 10:10:42 KBLU1 kernel: [  713.776419]  [<ffffffff812bd169>]
> ext4_trim_fs+0x489/0x9e0
> Apr 27 10:10:42 KBLU1 kernel: [  713.776427]  [<ffffffff8128a329>]
> ext4_ioctl+0xc59/0x1300
> Apr 27 10:10:42 KBLU1 kernel: [  713.776436]  [<ffffffff8120ea92>]
> do_vfs_ioctl+0x92/0x580
> Apr 27 10:10:42 KBLU1 kernel: [  713.776444]  [<ffffffff812009f5>] ?
> SYSC_newfstat+0x25/0x30
> Apr 27 10:10:42 KBLU1 kernel: [  713.776451]  [<ffffffff8120eff9>]
> SyS_ioctl+0x79/0x90
> Apr 27 10:10:42 KBLU1 kernel: [  713.776458]  [<ffffffff81003cc9>]
> do_syscall_64+0x69/0x110
> Apr 27 10:10:42 KBLU1 kernel: [  713.776466]  [<ffffffff81799ba5>]
> entry_SYSCALL64_slow_path+0x25/0x25
> Apr 27 10:10:42 KBLU1 kernel: [  713.776472]
> 
> Devices, Platform, Processors, Core modes: 50 with success and without error.
> 
> Then the 1rst normal suspend to disk caused a crashed.
> I attach "test-mode-of-hibernation_kern.log.tar.gz" and logs.jpg (we see
> more logs in the screen than in kern.log)

The above looks like a filesystem or block device, fstrim problem. Please make sure your filesystems, block devices are ok by running fsck and fstrim on them manually as necessary.

On the screen capture log OTOH, there is a crash from the network device, so you may need to avoid using network, or switch to another device/driver for the test.
Comment 3 yann 2016-04-29 12:11:34 UTC
Milestone criteria blocker so increasing priority
Comment 4 yann 2016-05-10 17:16:09 UTC
Christophe, please do this re-test w/o network card and check file system integrity, it appears that this is not linked to i915
Comment 5 yann 2016-05-17 09:41:19 UTC
Reducing priority due to current milestone impact
Comment 6 cprigent 2016-08-23 13:50:48 UTC
Launched again with fresh setup, DUT rebooted after 30 iterations (it successfully suspended but rebooted instead of resuming, no image restored).

Main errors from kernel log:
[ 2235.450106] Bluetooth: hci0: Failed to load Intel firmware file (-11)
[ 2235.454763] Bluetooth: hci0: Reading Intel version information failed (-4)
[ 2236.196956] [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe B FIFO underrun
[ 2300.989173] Bluetooth: hci0: Failed to load Intel firmware file (-11)
[ 2300.993571] Bluetooth: hci0: Reading Intel version information failed (-4)
[ 2301.760925] [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe B FIFO underrun
[ 2364.480257] Bluetooth: hci0: Failed to load Intel firmware file (-11)
[ 2364.485000] Bluetooth: hci0: Reading Intel version information failed (-4)
[ 2365.217483] [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe B FIFO underrun

I launched the test again. I will attach full log if reproduced.
Comment 7 cprigent 2016-08-23 15:07:07 UTC
Created attachment 125980 [details]
kblu_4.8-rc2_reboot-when-resuming from-s4_kern.log

I reproduced the reboot instead of resuming from S4 after 23 iterations.

Last successful S4:
[ 1568.563040] PM: restore of devices complete after 1737.328 msecs
Main errors:
[ 1434.641124] Bluetooth: hci0: Failed to load Intel firmware file (-11)
[ 1434.641388] Bluetooth: hci0: Reading Intel version information failed (-4)
[ 1435.348562] [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe B FIFO underrun
[ 1500.179487] Bluetooth: hci0: Failed to load Intel firmware file (-11)
[ 1500.180195] Bluetooth: hci0: Reading Intel version information failed (-4)
[ 1500.884606] [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe B FIFO underrun
[ 1565.717729] Bluetooth: hci0: Failed to load Intel firmware file (-11)
[ 1565.722352] Bluetooth: hci0: Reading Intel version information failed (-4)
[ 1566.426511] [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe B FIFO underrun

Platform: KABY LAKE-U
Processor : Genuine Intel(R) CPU 0000 @ 1.80GHz (cpu family: 6, model: 142, stepping: 9)
MCP : KBL-U J0 2+3e
QDF : QL9J
PCH: PCH-LP C1
CRB : KABY LAKE U DDR3L RVP7
Rework: O-16

Software
BIOS: 45.1 3KBLSE2R1.R00.X045.P01.1606291634 from https://ubit-artifactory-ba.intel.com/artifactory/owr-repos/Submissions/ifwi/KBL_ORANGE_IFWI_2016_WW27_3_03_SR'17/
ME FW: 11.6.0.1065
EC FW: 1.24
KSC: 1.24
Linux distribution: Ubuntu 16.04 64 bits
Kernel: 4.8.0-rc2 f53a8d1 from http://cgit.freedesktop.org/drm-intel/
    commit f53a8d1853e8a97ad4a6308ffa8a2011fbd80467
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Fri Aug 19 17:24:52 2016 +0100
    drm-intel-nightly: 2016y-08m-19d-16h-24m-21s UTC integration manifest
libdrm-2.4.70-2 b214b05 from git://anongit.freedesktop.org/mesa/drm
mesa: mesa-11.2.2 3a9f628from git://anongit.freedesktop.org/mesa/mesa
cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo
xorg-server-1.18.0- 532 6e5bec2 from git://git.freedesktop.org/git/xorg/xserver
xf86-video-intel 2.99.697 12c14de from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
libva-1.7.0-45 b27feb9 from git://git.freedesktop.org/git/vaapi/libva 
vaapi-intel-driver: 1.7.0-89 b53fad9 from git://git.freedesktop.org/git/vaapi/intel-driver
Intel-Gpu-Tools 1.15 a147ef2 from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git
Comment 8 Imre Deak 2016-09-27 11:59:50 UTC
(In reply to cprigent from comment #7)
> Created attachment 125980 [details]
> kblu_4.8-rc2_reboot-when-resuming from-s4_kern.log
> 
> I reproduced the reboot instead of resuming from S4 after 23 iterations.

Please try the same test by booting with 'modprobe.blacklist=i915,snd_hda_intel'.
Comment 9 cprigent 2016-10-28 11:17:29 UTC
It will be tested on a production device and a new bug will be reported if needed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.