Summary: | [HSW Bisected]WARNING: SPLL already enabled | ||
---|---|---|---|
Product: | DRI | Reporter: | liulei <lei.a.liu> |
Component: | DRM/Intel | Assignee: | Daniel Vetter <daniel> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | highest | CC: | intel-gfx-bugs, yi.sun |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
liulei
2014-05-08 07:08:49 UTC
I will append result of bisect! And assign this bug to the author. ==Bisect results== ---------------------------- branch : drm-intel-next-queued Bisect shows: 0882dae983707455e97479e5e904e37673517ebc is the first bad commit commit 0882dae983707455e97479e5e904e37673517ebc Author: Paulo Zanoni <paulo.r.zanoni@intel.com> AuthorDate: Wed Jan 8 11:12:27 2014 -0200 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Wed Jan 8 15:54:09 2014 +0100 drm/i915: fix DDI PLLs HW state readout code Properly zero the refcounts and crtc->ddi_pll_set so the previous HW state doesn't affect the result of reading the current HW state. This fixes WARNs about WRPLL refcount if we have an HDMI monitor on HSW and then suspend/resume. Cc: stable@vger.kernel.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64379 Tested-by: Qingshuai Tian <qingshuai.tian@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Probably fixed with my runtime pm dpms series which completely reworks the hsw ddi pll code. Now if someone would actually review that pile of crap ... I am having the exact same problem (WARNING reported at intel_ddi_pll_mode_set) with Ubuntu 14.04 LTS + frequent crashes/freezes/spontaneous reboots upon resume from hibernation). Both with 3.13 stock kernel (3.13.0-27-generic) and with the current "3.15.0-031500rc8-generic #201406012235" mainline kernel i915 version (?) from dmesg: [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 Detailed description, syslog and dmesg output, HW info: see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326092 This bug causes several of my development machines to crash upon every second resume from S4 (hibernate). I can always suspend/resume once (then I get the warning logged above), if I try again, the machine reboots after resuming. What is the progress on this issue? Is there anything I can do to help fix this? Thanks! (In reply to comment #5) > This bug causes several of my development machines to crash upon every > second resume from S4 (hibernate). I can always suspend/resume once (then I > get the warning logged above), if I try again, the machine reboots after > resuming. Lei, can you reproduce the crash after 2nd resume? (In reply to comment #6) > (In reply to comment #5) > > This bug causes several of my development machines to crash upon every > > second resume from S4 (hibernate). I can always suspend/resume once (then I > > get the warning logged above), if I try again, the machine reboots after > > resuming. > > Lei, can you reproduce the crash after 2nd resume? I can't reproduce the crash after 2nd resume. I continuously make 4 times s4(hibernate),only get the warning logged above, no crash . In fact , we have opened a bug to track s4(hibernate) sporadically cause system hang. https://bugs.freedesktop.org/show_bug.cgi?id=65496 Can you reproduce the crash using Ubuntu 14.04 stock kernel? If not, can I help reproducing it by using some other kernel, or any other bootable image? I'd be happy to help. Also, there's additional info at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326092, including a log creted with "dri.debug=0xe". Thank you! I have checked out the current drm-intel-next-queued branch as of 2 hours ago, built it and tested hibernate with this kernel. I get a whole bunch of new WARNINGs, like WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:154 ironlake_disable_display_irq+0x75/0x80 [i915]() WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]() WARNING: CPU: 3 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]() WARNING: CPU: 1 PID: 171 at drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915]() WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:154 ironlake_disable_display_irq+0x75/0x80 [i915]() WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]() WARNING: CPU: 3 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]() WARNING: CPU: 1 PID: 171 at drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915]() WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:154 ironlake_disable_display_irq+0x75/0x80 [i915]() WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]() WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]() WARNING: CPU: 1 PID: 7178 at drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915]() but I do not get the spontaneous reboots any more. Instead, after a couple of suspend/resume cycles (3 in my case) I get tons of these: [ 665.971162] BUG: Bad page map in process lxpanel pte:dd000000dc0000 pmd:35d6b067 [ 665.971163] addr:00007f7fe682e000 vm_flags:08000070 anon_vma: (null) mapping:ffff880211963220 index:1de [ 665.971164] vma->vm_ops->fault: filemap_fault+0x0/0x430 [ 665.971165] vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x60 [ 665.971166] CPU: 3 PID: 3270 Comm: lxpanel Tainted: G B W OE 3.16.0-rc2+ #4 [ 665.971166] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014 [ 665.971167] ffff8800352b1000 ffff88003631bad8 ffffffff81739d4e 00007f7fe682e000 [ 665.971168] ffff88003631bb20 ffffffff8118704a 00dd000000dc0000 00000000000001de [ 665.971169] ffff880035d6b170 00dd000000dc0000 00007f7fe6955000 00007f7fe682e000 [ 665.971171] Call Trace: [ 665.971173] [<ffffffff81739d4e>] dump_stack+0x45/0x56 [ 665.971174] [<ffffffff8118704a>] print_bad_pte+0x1aa/0x250 [ 665.971175] [<ffffffff811883de>] unmap_single_vma+0x5de/0x8d0 [ 665.971176] [<ffffffff81189699>] unmap_vmas+0x49/0x90 [ 665.971177] [<ffffffff811920fc>] exit_mmap+0x9c/0x170 [ 665.971179] [<ffffffff8111b893>] ? __delayacct_add_tsk+0x153/0x170 [ 665.971180] [<ffffffff8106992c>] mmput+0x5c/0x120 [ 665.971182] [<ffffffff8106ecdc>] do_exit+0x26c/0xa60 [ 665.971183] [<ffffffff8173d62e>] ? schedule_timeout_killable+0x1e/0x20 [ 665.971185] [<ffffffff81161a8c>] ? out_of_memory+0x49c/0x4d0 [ 665.971186] [<ffffffff8106f54f>] do_group_exit+0x3f/0xa0 [ 665.971187] [<ffffffff8107ee40>] get_signal_to_deliver+0x1d0/0x6f0 [ 665.971189] [<ffffffff81012548>] do_signal+0x48/0x9d0 [ 665.971190] [<ffffffff8101c1d5>] ? native_sched_clock+0x35/0x90 [ 665.971192] [<ffffffff8101c239>] ? sched_clock+0x9/0x10 [ 665.971194] [<ffffffff8111ce9c>] ? acct_account_cputime+0x1c/0x20 [ 665.971195] [<ffffffff810a335b>] ? account_user_time+0x8b/0xa0 [ 665.971197] [<ffffffff810a3924>] ? vtime_account_user+0x54/0x60 [ 665.971198] [<ffffffff81012f39>] do_notify_resume+0x69/0xb0 [ 665.971199] [<ffffffff817432d8>] retint_signal+0x48/0x90 [ 665.971200] swap_free: Bad swap offset entry 37c000003780 (In reply to comment #8) > Can you reproduce the crash using Ubuntu 14.04 stock kernel? > > If not, can I help reproducing it by using some other kernel, or any other > bootable image? I'd be happy to help. > I will be impressed ,if you offer me a trouble image > Also, there's additional info at > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326092, including a > log creted with "dri.debug=0xe". > > Thank you! Here you are: https://rb-hosting.de/owncloud/public.php?service=files&t=7d4ea1d33dfae6f5d0868425601cfe44 This directory contains - two .deb files (kernel-headers, kernel-image) built on Ubuntu 14.04) - appropriate syslog file showing the first resume with the WARNINGs and dri.debug=0xe (available until 2014-07-31) With the new kernel I have to suspend/resume 2..4 times to get a crash. After each crash and subsequent, there's a Ubuntu app asking "System problem detected, do you want to report it"? It's called 'whoopsie' and it will create a crash report at launchpad.net, AFAIK. Its logfile is also stored at the above URL. Unfortunately I was not (yet) able to find the actual Oops message it complains about, in the past it was logged to syslog but now I can't find it. I will post it when I get it to display. (In reply to comment #11) > Here you are: > > https://rb-hosting.de/owncloud/public. > php?service=files&t=7d4ea1d33dfae6f5d0868425601cfe44 > > This directory contains > - two .deb files (kernel-headers, kernel-image) built on Ubuntu 14.04) > - appropriate syslog file showing the first resume with the WARNINGs and > dri.debug=0xe > > (available until 2014-07-31) > > With the new kernel I have to suspend/resume 2..4 times to get a crash. > After each crash and subsequent, there's a Ubuntu app asking "System problem > detected, do you want to report it"? It's called 'whoopsie' and it will > create a crash report at launchpad.net, AFAIK. Its logfile is also stored at > the above URL. > > Unfortunately I was not (yet) able to find the actual Oops message it > complains about, in the past it was logged to syslog but now I can't find > it. I will post it when I get it to display. With the image you offered I didn't get a crash after suspend/resume 5 times. What hardware are you using? I have a MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014 see also http://www.msi.com/product/mb/B85ME45.html. Maybe this is MSI (or bios setting) specific. Can you get access to a comparable chipset to test this? Do the errors logged to syslog make any sense to you at all? Thank you! Created attachment 102153 [details]
Oops on first resume with 3.16.0rc2+ on MSI B85M-E45 mainboard
This is an oops log with drm.debug=0xe after the first resume from hibernation. Note that resume from sleep is never an issue, just hibernation.
Also I have this problem on two machines, B81M chipset and B85 chipset.
Here is another dmesg after first resume. There are some more warnings compared to the stock Ubuntu kernel, but the main one (PLL related) is the same: [ 83.583002] [drm:intel_ddi_pll_select] Using SPLL on pipe A [ 83.583003] ------------[ cut here ]------------ [ 83.583023] WARNING: CPU: 1 PID: 173 at drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915] () [ 83.583023] SPLL already enabled (...) Updated Git kernel image is building right now. I noticed there is a "intel-fixes-2014-07-03" branch in the Git repo, is this something worth trying out for me? Anything else I can help with? Provide access to appropriate hardware perhaps? No change with 3.16.0rc2+ image (taken from Git) and compiled today. Still the same WARNINGs after hibernate and a kernel Oops after resuming three times. Date: Wed Jul 2 20:10:08 2014 Failure: oops OopsText: general protection fault: 0000 [#3] SMP Modules linked in: btrfs(E) xor(E) raid6_pq(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) pci_stub(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) bnep(E) rfcomm(E) bluetooth(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E) snd_pcm(E) intel_rapl(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_seq(E) coretemp(E) snd_seq_device(E) snd_timer(E) kvm_intel(E) kvm(E) snd(E) mei_me(E) mei(E) soundcore(E) lpc_ich(E) serio_raw(E) shpchp(E) mac_hid(E) tpm_infineon(E) intel_smartconnect(E) parport_pc(E) ppdev(E) lp(E) parport(E) dm_crypt(E) hid_generic(E) usbhid(E) hid(E) mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) drm_kms_helper(E) libahci(E) r8169(E) mii(E) drm(E) wmi(E) video(E) CPU: 1 PID: 9815 Comm: Xorg Tainted: G D W OE 3.16.0-rc2+ #4 Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014 task: ffff8801eba80000 ti: ffff880210474000 task.ti: ffff880210474000 RIP: 0010:[<ffffffff811da5dd>] [<ffffffff811da5dd>] __inode_permission+0x5d/0xc0 RSP: 0018:ffff880210477cc8 EFLAGS: 00010246 RAX: 006f0000006e0000 RBX: ffff880036252f98 RCX: 0000000000000018 RDX: ffff8802130cf2e0 RSI: 0000000000000081 RDI: ffff880036252f98 RBP: ffff880210477ce0 R08: 647261632f697264 R09: ffff880210477cc4 R10: ffff8800d4a43025 R11: 0000000000000003 R12: 0000000000000081 R13: 0000000000000000 R14: 0000000000000000 R15: ffff880210477e50 FS: 00007f6d65a8e9c0(0000) GS:ffff88021ea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff16d3ee38 CR3: 00000000d4bc4000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff8800d4a43029 ffff8801eba80000 0000000000000000 ffff880210477cf0 ffffffff811da658 ffff880210477d98 ffffffff811dab01 ffff880210477d48 ffffffff811b3986 ffff8801eba80000 ffffffff8131eb73 ffff8801eba80000 Call Trace: [<ffffffff811da658>] inode_permission+0x18/0x50 [<ffffffff811dab01>] link_path_walk+0x71/0x870 [<ffffffff811b3986>] ? kmem_cache_alloc_trace+0x1c6/0x1f0 [<ffffffff8131eb73>] ? apparmor_file_alloc_security+0x23/0x40 [<ffffffff812e41d6>] ? security_file_alloc+0x16/0x20 [<ffffffff811defac>] path_openat+0x9c/0x670 [<ffffffff8120c121>] ? send_to_group+0xd1/0x1b0 [<ffffffff811dfd8a>] do_filp_open+0x3a/0x90 [<ffffffff811ec8f7>] ? __alloc_fd+0xa7/0x130 [<ffffffff811ce898>] do_sys_open+0x128/0x220 [<ffffffff81021ac5>] ? syscall_trace_enter+0x145/0x250 [<ffffffff811ce9ae>] SyS_open+0x1e/0x20 [<ffffffff817426ff>] tracesys+0xe1/0xe6 Code: 41 5d 5d c3 66 2e 0f 1f 84 00 00 00 00 00 8b 43 4c 85 c0 75 36 44 89 e6 48 89 df e8 9e 98 10 00 5b 41 5c 41 5d 5d c3 48 8b 43 20 <48> 8b 40 10 48 85 c0 74 35 44 89 e6 48 89 df ff d0 eb bb f6 47 RIP [<ffffffff811da5dd>] __inode_permission+0x5d/0xc0 RSP <ffff880210477cc8> ---[ end trace 45c4f49310fca543 ]--- (In reply to comment #15) > Here is another dmesg after first resume. There are some more warnings > compared to the stock Ubuntu kernel, but the main one (PLL related) is the > same: > > [ 83.583002] [drm:intel_ddi_pll_select] Using SPLL on pipe A > [ 83.583003] ------------[ cut here ]------------ > [ 83.583023] WARNING: CPU: 1 PID: 173 at > drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915] > () > [ 83.583023] SPLL already enabled > (...) > This issue we have opened a bug to track. > Updated Git kernel image is building right now. I noticed there is a > "intel-fixes-2014-07-03" branch in the Git repo, is this something worth > trying out for me? > I don't think it will help you out of trouble. > Anything else I can help with? Provide access to appropriate hardware > perhaps? (In reply to comment #9) > I have checked out the current drm-intel-next-queued branch as of 2 hours > ago, built it and tested hibernate with this kernel. I get a whole bunch of > new WARNINGs, like > > WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:154 > ironlake_disable_display_irq+0x75/0x80 [i915]() > WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 > ibx_display_interrupt_update+0x90/0xa0 [i915]() > WARNING: CPU: 3 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 > ibx_display_interrupt_update+0x90/0xa0 [i915]() > WARNING: CPU: 1 PID: 171 at drivers/gpu/drm/i915/intel_ddi.c:911 > intel_ddi_pll_enable+0x248/0x250 [i915]() > WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:154 > ironlake_disable_display_irq+0x75/0x80 [i915]() > WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 > ibx_display_interrupt_update+0x90/0xa0 [i915]() > WARNING: CPU: 3 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 > ibx_display_interrupt_update+0x90/0xa0 [i915]() > WARNING: CPU: 1 PID: 171 at drivers/gpu/drm/i915/intel_ddi.c:911 > intel_ddi_pll_enable+0x248/0x250 [i915]() > WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:154 > ironlake_disable_display_irq+0x75/0x80 [i915]() > WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:423 > ibx_display_interrupt_update+0x90/0xa0 [i915]() > WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:423 > ibx_display_interrupt_update+0x90/0xa0 [i915]() > WARNING: CPU: 1 PID: 7178 at drivers/gpu/drm/i915/intel_ddi.c:911 > intel_ddi_pll_enable+0x248/0x250 [i915]() > Above issues we have opened bugs, too. So it's reasonable you got them. > but I do not get the spontaneous reboots any more. Instead, after a couple > of suspend/resume cycles (3 in my case) I get tons of these: > > [ 665.971162] BUG: Bad page map in process lxpanel pte:dd000000dc0000 > pmd:35d6b067 > [ 665.971163] addr:00007f7fe682e000 vm_flags:08000070 anon_vma: > (null) mapping:ffff880211963220 index:1de > [ 665.971164] vma->vm_ops->fault: filemap_fault+0x0/0x430 > [ 665.971165] vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x60 > [ 665.971166] CPU: 3 PID: 3270 Comm: lxpanel Tainted: G B W OE > 3.16.0-rc2+ #4 > [ 665.971166] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 > 05/30/2014 > [ 665.971167] ffff8800352b1000 ffff88003631bad8 ffffffff81739d4e > 00007f7fe682e000 > [ 665.971168] ffff88003631bb20 ffffffff8118704a 00dd000000dc0000 > 00000000000001de > [ 665.971169] ffff880035d6b170 00dd000000dc0000 00007f7fe6955000 > 00007f7fe682e000 > [ 665.971171] Call Trace: > [ 665.971173] [<ffffffff81739d4e>] dump_stack+0x45/0x56 > [ 665.971174] [<ffffffff8118704a>] print_bad_pte+0x1aa/0x250 > [ 665.971175] [<ffffffff811883de>] unmap_single_vma+0x5de/0x8d0 > [ 665.971176] [<ffffffff81189699>] unmap_vmas+0x49/0x90 > [ 665.971177] [<ffffffff811920fc>] exit_mmap+0x9c/0x170 > [ 665.971179] [<ffffffff8111b893>] ? __delayacct_add_tsk+0x153/0x170 > [ 665.971180] [<ffffffff8106992c>] mmput+0x5c/0x120 > [ 665.971182] [<ffffffff8106ecdc>] do_exit+0x26c/0xa60 > [ 665.971183] [<ffffffff8173d62e>] ? schedule_timeout_killable+0x1e/0x20 > [ 665.971185] [<ffffffff81161a8c>] ? out_of_memory+0x49c/0x4d0 > [ 665.971186] [<ffffffff8106f54f>] do_group_exit+0x3f/0xa0 > [ 665.971187] [<ffffffff8107ee40>] get_signal_to_deliver+0x1d0/0x6f0 > [ 665.971189] [<ffffffff81012548>] do_signal+0x48/0x9d0 > [ 665.971190] [<ffffffff8101c1d5>] ? native_sched_clock+0x35/0x90 > [ 665.971192] [<ffffffff8101c239>] ? sched_clock+0x9/0x10 > [ 665.971194] [<ffffffff8111ce9c>] ? acct_account_cputime+0x1c/0x20 > [ 665.971195] [<ffffffff810a335b>] ? account_user_time+0x8b/0xa0 > [ 665.971197] [<ffffffff810a3924>] ? vtime_account_user+0x54/0x60 > [ 665.971198] [<ffffffff81012f39>] do_notify_resume+0x69/0xb0 > [ 665.971199] [<ffffffff817432d8>] retint_signal+0x48/0x90 > [ 665.971200] swap_free: Bad swap offset entry 37c000003780 Did machine get crash with these Call Trace? (In reply to comment #13) > What hardware are you using? I have a > > MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014 > > see also http://www.msi.com/product/mb/B85ME45.html. > > Maybe this is MSI (or bios setting) specific. > > Can you get access to a comparable chipset to test this? > > Do the errors logged to syslog make any sense to you at all? > > Thank you! Our board is ASUSTeK Z87-EXPERT BIOS Revision: 4.6 05/17/2013 Hello, thank you for replying! >> [ 83.583023] WARNING: CPU: 1 PID: 173 at >> drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915] > This issue we have opened a bug to track. Which bug is it (I thought it was this one)? btw this is the only WARNING I get with the Ubuntu 14.04 stock kernel. All the other WARNING messages appear only with newer kernels (mainline or compiled from Git). Very roughly, how far away are you from a solution? Days, months? (I need to decide whether to exchange the hardware, I badly need the suspend functionality.) >> [ 665.971200] swap_free: Bad swap offset entry 37c000003780 > Did machine get crash with these Call Trace? Yes, this happened after a resume. It was alive, but unusable - I could access the local console but no network and no new processes were starting. I had to reset it. > Our board is ASUSTeK Z87-EXPERT BIOS Revision: 4.6 05/17/2013 Can you get access to a MSI B85 or B81 board? If not, will it help if I donated one? > >> [ 83.583023] WARNING: CPU: 1 PID: 173 at > >> drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915] > > This issue we have opened a bug to track. > > Which bug is it (I thought it was this one)? > Ohh, My mistake. It was this one that we opened to track issue. > > Very roughly, how far away are you from a solution? Days, months? (I need to > decide whether to exchange the hardware, I badly need the suspend > functionality.) > I can't say it. I don't think it's our i915 module that cause your machine crash. I remove i915 module on laptop whose board is MSI MS-16GC , BIOS version E16GCIMS.509. That machine even can't finish S4 . > > Can you get access to a MSI B85 or B81 board? > If not, will it help if I donated one? > I don't think it's our i915 module that cause your machine crash. I remove i915 module on laptop whose board is MSI MS-16GC
Well, I don't know. When I move the i915.ko module somewhere else, call 'update-initrramfs -u' (to update the ramdisk), reboot (so it doesn't get loaded) and then try 'sudo pm-suspend' or 'sudo pm-hibernate', the system freezes at once and has to be hard reset.
With loading the i915.ko module, the system survives 'pm-suspend' and resumes perfectly, but has the problems stated above with 'sudo pm-hibernate'.
This is with 3.16.0rc2+ as of my last comment.
Created attachment 102574 [details]
pm-suspend and pm-hibernate (including resume) works on 3.16.0rc2+ without i915 module loaded
I tried again, using the following procedure, to confirm i915 as the cause of the S4 resume problems:
* booted with parameter "i915.modeset=0 text" (to avoid lightdm startup)
* echo 0 > /sys/class/vtconsole/vtcon1/bind (which froze the local console, I had to continue using SSH)
* pkill alsactl
* rmmod snd_hda_intel
* rmmod i915
* sudo pm-suspend (worked)
* resume (worked)
* sudo pm-hibernate (worked)
* resume (worked, even without using the same kernel parameters)
So it seems the i915 module is actually the cause of the resume problem. Or is it? Log (dmesg) attached.
Anything else I can do to fix this issue? I upgraded to 3.16.0 (drm-intel-next Git repo, as of 2014-08-08, 2c0827cffca8ac0c654b888c58a1989a5172f007) and I still get a frozen machine after a hibernate/resume process (and a KernelOops apport report after a subsequent reboot) exactly when the i915 module is loaded when suspending. Interestingly, when suspending, my screen goes black and then a frozen image of my desktop reappears while (I suppose) the RAM image is written to the disk. Once - only once - this did not happen (the screen went black and stayed black), and this was the one single successful resume process I had so far. I don't know if this helps. Where can I find the oops backtrace to dig out the source of the Oops? Still does not work with 3.16.0-final. How can I further help fix this issue? Booted 3.16.1, tried again: * hibernate once (worked), * resume (worked, with the WARNING: SPLL already enabled" message just like before), * second hibernate (froze before hibernation was complete). To double check, tried without i915 loaded, worked perfectly three times in a row. Still the same MSI-B85M (MSI-7817) chipset. (In reply to comment #27) > Booted 3.16.1, tried again: > > * hibernate once (worked), > * resume (worked, with the WARNING: SPLL already enabled" message just like > before), > * second hibernate (froze before hibernation was complete). > > To double check, tried without i915 loaded, worked perfectly three times in > a row. > > Still the same MSI-B85M (MSI-7817) chipset. I tried latest -nightly , and did S4. I didn't run into Call Trace issue.But still sporadically can't get system back. Maybe you can try our latest -nightly kernel to see if the call trace issue still exists. I pulled the current "drm-intel-nightly" code and tried again. Setup: * Ubuntu 14.04 LTS, Kernel 3.16.0+ (3.17rc1 as of now) * MSI-7817 chipset with i5-i4570 * Boot Lubuntu desktop, start "make -j4" in git checkout, start Firefox with Youtube video, then hibernate and resume in a loop Results: * No more WARNING: messages upon resume * Multiple resumes work fine * About one in every fifth resume the machine grinds to a halt with dozens of OOM killer messages So: A big improvement (I can hibernate and resume multiple times in a row, even with a loaded machine!). But we're not quite there yet - where do the OOM errors come from? When I hibernate, only ~1,5G out of 8G RAM are actually used. Thank you! Since latest -nightly kernel works without this Call Trace, I close this bug. Unfortunately, this isn't the end of it. I posted some hibernation resume failures (dmesg output) here: https://bugzilla.kernel.org/show_bug.cgi?id=59321#c42 In short: every 3..5 resumes the OOM killer runs amok and kills half my system Before that, I always see log messages (sometimes hundreds) like Purging GPU memory, X bytes freed, Y bytes still pinned. After that, the system is - most of the time - unusable and has to be hard reset. What causes these messages and why is the OOM killer invoked? I have 8G of memory of which -usually- 7G are not even filled with buffer cache when I hibernate the system, i.e. completely empty. Do you want me to open another bug report because the symptoms changed? (In reply to comment #31) > Unfortunately, this isn't the end of it. I posted some hibernation resume > failures (dmesg output) here: > https://bugzilla.kernel.org/show_bug.cgi?id=59321#c42 > Do you want me to open another bug report because the symptoms changed? Yes please. Attach the dmesgs from the above bug as plain text. Done. See #82864. Thank you! Closing verified+fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.