Bug 84919 - [HSW+nvidia] Kernel 3.17: Resume from suspend to ram issues, 2 GPUs
Summary: [HSW+nvidia] Kernel 3.17: Resume from suspend to ram issues, 2 GPUs
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-11 19:40 UTC by tigrangab
Modified: 2017-07-24 22:51 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg output of first boot (108.66 KB, text/plain)
2014-10-13 20:54 UTC, tigrangab
no flags Details
dmesg output of first resume from standby (44.46 KB, text/plain)
2014-10-13 20:54 UTC, tigrangab
no flags Details
dmesg output of second resume from standby (28.62 KB, text/plain)
2014-10-13 20:55 UTC, tigrangab
no flags Details
dmesg output of third resume from standby (71.40 KB, text/plain)
2014-10-13 20:56 UTC, tigrangab
no flags Details

Description tigrangab 2014-10-11 19:40:29 UTC
I have a Lenovo Z50 laptop with Intel 4400 and Nvidia 820M GPU. If I set gpu mode to UMA only in the BIOS, then the discrete card isn't actually powered off. The kernel cannot see the discrete card but it is still powered on (idle power usage is 13WH+ according to powertop). If I set BIOS to Discrete card and use bbswitch to power off the discrete card, then my idle usage is around 8 to 9WH. Regardless of whether I have bbswitch enabled, bios set to uma only, or set to discrete, suspend always fails after 4-5 times of suspend/resume cycle.

The following is some relevant information I could find in dmesg:

[ 5085.085835] ata1.00: exception Emask 0x10 SAct 0x7000000 SErr 0x50000 action 0xe frozen
[ 5085.085838] ata1.00: irq_stat 0x00400000, PHY RDY changed
[ 5085.085840] ata1: SError: { PHYRdyChg CommWake }
[ 5085.085842] ata1.00: failed command: WRITE FPDMA QUEUED
[ 5085.085846] ata1.00: cmd 61/00:c0:00:98:b4/04:00:00:00:00/40 tag 24 ncq 524288 out
         res 50/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
[ 5085.085847] ata1.00: status: { DRDY }
[ 5085.085849] ata1.00: failed command: WRITE FPDMA QUEUED
[ 5085.085851] ata1.00: cmd 61/00:c8:00:9c:b4/04:00:00:00:00/40 tag 25 ncq 524288 out
         res 50/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
[ 5085.085853] ata1.00: status: { DRDY }
[ 5085.085854] ata1.00: failed command: WRITE FPDMA QUEUED
[ 5085.085857] ata1.00: cmd 61/30:d0:00:a0:b4/02:00:00:00:00/40 tag 26 ncq 286720 out
         res 50/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
[ 5085.085859] ata1.00: status: { DRDY }
[ 5085.085861] ata1: hard resetting link
[ 5085.806309] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 5085.810507] ata1.00: configured for UDMA/133
[ 5085.822996] ata1: EH complete
[ 5086.349991] ------------[ cut here ]------------
[ 5086.350024] WARNING: CPU: 0 PID: 4 at drivers/gpu/drm/i915/intel_pm.c:6317 intel_display_power_put+0x14c/0x160 [i915]()
[ 5086.350027] Modules linked in: bbswitch(O) msr cpufreq_stats ctr ccm fuse rtsx_usb_ms memstick uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev snd_hda_codec_hdmi ecb media btusb bluetooth joydev mousedev thinkpad_acpi nvram arc4 coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp iwlmvm mac80211 kvm_intel iwlwifi iTCO_wdt iTCO_vendor_support ppdev evdev cfg80211 r8169 kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd mac_hid psmouse pcspkr serio_raw microcode mii i2c_hid hid mei_me snd_hda_codec_conexant snd_hda_codec_generic mei i915 snd_hda_intel snd_hda_controller snd_hda_codec i2c_designware_platform parport_pc video snd_hwdep i2c_designware_core spi_pxa2xx_platform
[ 5086.350080]  parport ideapad_laptop sparse_keymap rfkill battery snd_pcm snd_timer 8250_dw gpio_lynxpoint drm_kms_helper dw_dmac dw_dmac_core drm snd soundcore intel_gtt i2c_algo_bit i2c_i801 i2c_core processor lpc_ich button shpchp ac wmi ext4 crc16 mbcache jbd2 sd_mod sr_mod crc_t10dif cdrom crct10dif_common rtsx_usb_sdmmc rtsx_usb atkbd libps2 ahci xhci_hcd libahci libata scsi_mod ehci_pci ehci_hcd usbcore usb_common i8042 serio sdhci_acpi sdhci led_class mmc_core [last unloaded: bbswitch]
[ 5086.350128] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G        W  O   3.17.0-1-mainline #1
[ 5086.350131] Hardware name: LENOVO 20354/Lancer 5A5, BIOS 9BCN25WW 04/10/2014
[ 5086.350147] Workqueue: events edp_panel_vdd_work [i915]
[ 5086.350149]  0000000000000000 0000000050d553e6 ffff880243993d38 ffffffff815346e0
[ 5086.350154]  0000000000000000 ffff880243993d70 ffffffff8106e54d ffff880242f0002c
[ 5086.350158]  000000000000000b ffff880242f085d8 ffff880241c6f000 ffff880242f00000
[ 5086.350162] Call Trace:
[ 5086.350172]  [<ffffffff815346e0>] dump_stack+0x4d/0x6f
[ 5086.350178]  [<ffffffff8106e54d>] warn_slowpath_common+0x7d/0xa0
[ 5086.350183]  [<ffffffff8106e67a>] warn_slowpath_null+0x1a/0x20
[ 5086.350196]  [<ffffffffa03b22bc>] intel_display_power_put+0x14c/0x160 [i915]
[ 5086.350212]  [<ffffffffa041fc04>] edp_panel_vdd_off_sync+0xf4/0x1e0 [i915]
[ 5086.350227]  [<ffffffffa041fd54>] edp_panel_vdd_work+0x34/0x50 [i915]
[ 5086.350232]  [<ffffffff81086b85>] process_one_work+0x145/0x400
[ 5086.350236]  [<ffffffff8108714b>] worker_thread+0x6b/0x4a0
[ 5086.350241]  [<ffffffff810870e0>] ? init_pwq.part.22+0x10/0x10
[ 5086.350246]  [<ffffffff8108c06a>] kthread+0xea/0x100
[ 5086.350252]  [<ffffffff8108bf80>] ? kthread_create_on_node+0x1b0/0x1b0
[ 5086.350257]  [<ffffffff8153a5fc>] ret_from_fork+0x7c/0xb0
[ 5086.350262]  [<ffffffff8108bf80>] ? kthread_create_on_node+0x1b0/0x1b0
[ 5086.350264] ---[ end trace 4816bdb8abb63299 ]---

On other occasions, I get the following after resuming with the stacktrace as above:

[  107.985876] Restarting tasks ... done.
[  107.989697] pci 0000:03:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[  107.990579] pci_bus 0000:01: Allocating resources
[  107.990613] pci_bus 0000:02: Allocating resources
[  107.990641] pci_bus 0000:03: Allocating resources
[  107.990666] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  107.990775] pci 0000:03:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[  107.991723] pci_bus 0000:01: Allocating resources
[  107.991751] pci_bus 0000:02: Allocating resources
[  107.991779] pci_bus 0000:03: Allocating resources
[  107.991801] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  107.991914] pci 0000:03:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug

There is also always usb errors which I'm not sure are relevant to resume failing:

[  106.661972] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 1.
[  106.661974] usb 2-4: hub failed to enable device, error -22
[  106.821928] usb 2-4: reset high-speed USB device number 2 using xhci_hcd
[  106.821952] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 1.
[  106.821953] usb 2-4: hub failed to enable device, error -22
[  106.981981] usb 2-4: reset high-speed USB device number 2 using xhci_hcd
[  106.995743] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880242640c00
[  106.995746] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880242640c48
[  106.995747] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880242640c90
[  107.155586] usb 2-6: reset high-speed USB device number 3 using xhci_hcd
[  107.155722] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 2.
[  107.155726] usb 2-6: hub failed to enable device, error -22
[  107.315515] usb 2-6: reset high-speed USB device number 3 using xhci_hcd
[  107.315576] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 2.
[  107.315580] usb 2-6: hub failed to enable device, error -22
[  107.475747] usb 2-6: reset high-speed USB device number 3 using xhci_hcd
[  107.549317] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8802436ddd20
[  107.649285] usb 2-7: reset full-speed USB device number 4 using xhci_hcd
[  107.649388] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 3.
[  107.649394] usb 2-7: hub failed to enable device, error -22
[  107.809205] usb 2-7: reset full-speed USB device number 4 using xhci_hcd
[  107.809371] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 3.
[  107.809377] usb 2-7: hub failed to enable device, error -22
[  107.969443] usb 2-7: reset full-speed USB device number 4 using xhci_hcd


Is the discrete card not powering off when setting UMA Only in the BIOS a bios bug?
Comment 1 Paulo Zanoni 2014-10-13 18:13:17 UTC
Please boot with "drm.debug=0xe" as a Kernel parameter - you can use Grub to pass this parameter -, then reproduce the problem, run "dmesg > dmesg.txt" and attach the file here.
Comment 2 tigrangab 2014-10-13 20:54:02 UTC
Created attachment 107790 [details]
dmesg output of first boot
Comment 3 tigrangab 2014-10-13 20:54:28 UTC
Created attachment 107791 [details]
dmesg output of first resume from standby
Comment 4 tigrangab 2014-10-13 20:55:51 UTC
Created attachment 107792 [details]
dmesg output of second resume from standby
Comment 5 tigrangab 2014-10-13 20:56:10 UTC
Created attachment 107793 [details]
dmesg output of third resume from standby
Comment 6 tigrangab 2014-10-13 20:57:44 UTC
I suspended and resumed three times, after each resume I cleared dmesg and they are each in their own files that are attached.

On the fourth resume, black screen and I had to do a hard reset.
Comment 7 tigrangab 2014-10-16 21:31:37 UTC
output of lspci if you need it:

00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 0b)
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)
00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 0b)
00:14.0 USB controller: Intel Corporation 8 Series USB xHCI HC (rev 04)
00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04)
00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 3 (rev e4)
00:1c.3 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 4 (rev e4)
00:1c.4 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 5 (rev e4)
00:1d.0 USB controller: Intel Corporation 8 Series USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation 8 Series LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 8 Series SMBus Controller (rev 04)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 10)
02:00.0 Network controller: Intel Corporation Wireless 3160 (rev 93)
03:00.0 3D controller: NVIDIA Corporation GF117M [GeForce 610M/710M/820M / GT 620M/625M/630M/720M] (rev ff)
Comment 8 Jesse Barnes 2014-12-10 20:51:07 UTC
Can you try the drm-intel-nightly branch from git://anongit.freedesktop.org/drm-intel?  That has some power well fixes that might affect the backtrace and instability, though I doubt it will power off the discrete card automatically in either configuration.
Comment 9 tigrangab 2014-12-11 03:27:50 UTC
I'm running Arch and trying to build it by modifying the linux-git aur package to point to the git repo you linked me to.

It gets to this point and then fails:

  CC [M]  net/netfilter/ipset/ip_set_list_set.o
  LD [M]  net/wireless/cfg80211.o
  LD [M]  net/netfilter/ipset/ip_set.o
  LD      net/built-in.o
==> ERROR: A failure occurred in build().
    Aborting...

I tried to build it manually with following steps but run into an issue as well:

1. zcat /proc/config.gz > .config (all I edited in here is CONFIG_LOCALVERSION="-DRM-INTEL-NIGHTLY")
2. make
3. sudo  make modules_install
4. make bzImage
5. sudo cp -v arch/x86/boot/bzImage /boot/vmlinuz-drm-intel-nightly
6. sudo mkinitcpio -k 3.18.0-DRM-INTEL-NIGHTLY-00849-ga243fbb -c /etc/mkinitcpio.conf -g /boot/initramfs-drm-intel-nightly.img

==> Starting build: 3.18.0-DRM-INTEL-NIGHTLY-00849-ga243fbb
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [autodetect]
  -> Running build hook: [modconf]
  -> Running build hook: [block]
  -> Running build hook: [filesystems]
  -> Running build hook: [keyboard]
==> ERROR: module not found: `usbhid'
  -> Running build hook: [fsck]
==> WARNING: No modules were added to the image. This is probably not what you want.
==> Creating gzip-compressed initcpio image: /boot/initramfs-drm-intel-nightly.img
==> WARNING: errors were encountered during the build. The image may not be complete.
Comment 10 tigrangab 2015-01-28 20:29:32 UTC
I was missing a step, but I resolved it quickly after. I've been using the latest git since and it's been a little over a month now. I no longer have issues after resuming from standby.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.