Bug 78424

Summary:

[HSW Bisected]WARNING: SPLL already enabled

Product:

DRI

Reporter:

liulei <lei.a.liu>

Component:

DRM/Intel

Assignee:

Daniel Vetter <daniel>

Status:

CLOSED FIXED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

highest

CC:

intel-gfx-bugs, yi.sun

Version:

unspecified

Hardware:

Other

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
I attach a dmesg.	none
Oops on first resume with 3.16.0rc2+ on MSI B85M-E45 mainboard	none
pm-suspend and pm-hibernate (including resume) works on 3.16.0rc2+ without i915 module loaded	none

Description liulei 2014-05-08 07:08:49 UTC

Created attachment 98664 [details]
I attach a dmesg.

*System Environment:
--------------------------
Regression: Yes. Good commit: 484b41dd70a9fbea894632d8926bbb93f05021c7(-drm-intel-next-queued). 
commit 484b41dd70a9fbea894632d8926bbb93f05021c7
Author:     Jesse Barnes <jbarnes@virtuousgeek.org>
AuthorDate: Fri Mar 7 08:57:55 2014 -0800
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Sat Mar 8 11:31:58 2014 +0100

    drm/i915: remove early fb allocation dependency on CONFIG_FB v2

    By stuffing the fb allocation into the crtc, we get mode set lifetime
    refcounting for free, but have to handle the initial pin & fence
    slightly differently.  It also means we can move the shared fb handling
    into the core rather than leaving it out in the fbdev code.

    v2: null out crtc->fb on error (Daniel)
        take fbdev fb ref and remove unused error path (Daniel)

Non-working platforms: Haswell
 *kernel: 
--------------------------
-nightly: dd28119c31cf06fc4c3bb548699018a91e45a676 (s4 cause Call Trace)
-queued: 10efa9321e (s4 cause Call Trace)
   commit 10efa9321efe5f62637b189587539e4086726a2b
   Author:     Ville Syrjälä <ville.syrjala@linux.intel.com>
   AuthorDate: Mon Apr 28 15:53:25 2014 +0300
   Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
   CommitDate: Tue May 6 10:18:04 2014 +0200

    drm/i915: Remove useless checks from primary enable/disable

    We won't be calling intel_enable_primary_plane() or
    intel_disable_primary_plane() with the primary plane in the
    wrong state. So remove the useless DISPLAY_PLANE_ENABLE checks.

    v2: Convert the checks to WARNs instead (Daniel,Paulo)

    Detail of the commit
-fixes: e4c610fe0 (s4 cause Call Trace)
   commit e4c610fe051579ba0a1fadf339905b0231c6ef94
   Author:     Egbert Eich <eich@suse.de>
   AuthorDate: Fri Apr 11 19:07:44 2014 +0200
   Commit:     Jani Nikula <jani.nikula@intel.com>
   CommitDate: Wed May 7 15:01:50 2014 +0300

    drm/i915/SDVO: For sysfs link put directory and target in correct order

    When linking the i2c sysfs file into the connector's directory
    pass directory and link target in the right order.
    This code was introduced with:

      commit 931c1c26983b4f84e33b78579fc8d57e4a14c6b4
      Author: Imre Deak <imre.deak@intel.com>
      Date:   Tue Feb 11 17:12:51 2014 +0200

        drm/i915: sdvo: add i2c sysfs symlink to the connector's directory

        This is the same what we do for DP connectors, so make things more
        consistent.

*Bug detailed description:
-----------------------------
do S4 ,and check dmesg .

[  101.206995] ------------[ cut here ]------------
[  101.207009] WARNING: CPU: 4 PID: 1123 at drivers/gpu/drm/i915/intel_ddi.c:960 intel_ddi_pll_enable+0x191/0x1ef [i915]()
[  101.207009] SPLL already enabled
[  101.207016] Modules linked in: dm_mod snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support serio_raw pcspkr i2c_i801 lpc_ich mfd_core snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore battery wmi tpm_infineon tpm_tis tpm acpi_cpufreq i915 video button drm_kms_helper drm
[  101.207018] CPU: 4 PID: 1123 Comm: kworker/u16:6 Not tainted 3.15.0-rc3_drm-intel-next-queued_1cf0ba_20140508+ #2418
[  101.207018] Hardware name: ASUS All Series/Z87-EXPERT, BIOS 1008 05/17/2013
[  101.207021] Workqueue: events_unbound async_run_entry_fn
[  101.207023]  0000000000000000 0000000000000009 ffffffff817233c0 ffff8802514b1ad8
[  101.207023]  ffffffff8103517a ffff8802514b1ad0 ffffffffa00b04f1 0000000000000001
[  101.207024]  ffff8802512e0000 0000000094000000 ffffffffa00e65d0 0000000000000000
[  101.207025] Call Trace:
[  101.207028]  [<ffffffff817233c0>] ? dump_stack+0x41/0x51
[  101.207030]  [<ffffffff8103517a>] ? warn_slowpath_common+0x73/0x8b
[  101.207035]  [<ffffffffa00b04f1>] ? intel_ddi_pll_enable+0x191/0x1ef [i915]
[  101.207037]  [<ffffffff8103522a>] ? warn_slowpath_fmt+0x45/0x4a
[  101.207043]  [<ffffffffa008e2fb>] ? gen6_read32+0x71/0x7c [i915]
[  101.207048]  [<ffffffffa00b04f1>] ? intel_ddi_pll_enable+0x191/0x1ef [i915]
[  101.207052]  [<ffffffffa0005546>] ? drm_vblank_get+0x1e0/0x1f2 [drm]
[  101.207058]  [<ffffffffa009a1be>] ? haswell_crtc_mode_set+0x44/0x433 [i915]
[  101.207064]  [<ffffffffa009f363>] ? __intel_set_mode+0xfd7/0x11b9 [i915]
[  101.207071]  [<ffffffffa00a2b93>] ? intel_modeset_setup_hw_state+0x8ff/0x9d8 [i915]
[  101.207077]  [<ffffffffa008e98d>] ? hsw_write64+0x9b/0x9b [i915]
[  101.207084]  [<ffffffffa008e98d>] ? hsw_write64+0x9b/0x9b [i915]
[  101.207088]  [<ffffffffa0060411>] ? __i915_drm_thaw+0xe8/0x1d9 [i915]
[  101.207090]  [<ffffffff812fe516>] ? pci_pm_default_resume+0x29/0x29
[  101.207095]  [<ffffffffa0060a4c>] ? i915_resume+0x1f/0x39 [i915]
[  101.207099]  [<ffffffffa0060a6d>] ? i915_pm_resume+0x7/0x11 [i915]
[  101.207102]  [<ffffffff813896c1>] ? dpm_run_callback.isra.8+0x24/0x52
[  101.207104]  [<ffffffff81389cb0>] ? device_resume+0x10c/0x14e
[  101.207105]  [<ffffffff81389d06>] ? async_resume+0x14/0x38
[  101.207106]  [<ffffffff8104f683>] ? async_run_entry_fn+0x55/0x10b
[  101.207108]  [<ffffffff81046844>] ? process_one_work+0x1bc/0x2ed
[  101.207110]  [<ffffffff81046db2>] ? worker_thread+0x1c7/0x2bc
[  101.207111]  [<ffffffff81046beb>] ? rescuer_thread+0x251/0x251
[  101.207112]  [<ffffffff8104b722>] ? kthread+0xc5/0xcd
[  101.207113]  [<ffffffff8104b65d>] ? kthread_freezable_should_stop+0x40/0x40
[  101.207114]  [<ffffffff8172dafc>] ? ret_from_fork+0x7c/0xb0
[  101.207115]  [<ffffffff8104b65d>] ? kthread_freezable_should_stop+0x40/0x40
[  101.207116] ---[ end trace db4d3e1dc7c3c334 ]---


*Reproduce steps:
---------------------------- 
1. echo 0 > /sys/class/rtc/rtc0/wakealarm ; 
   echo +10 > /sys/class/rtc/rtc0/wakealarm; 
   echo disk > /sys/power/state
   (do S4)
2. dmesg

Comment 1 liulei 2014-05-08 07:11:59 UTC

I will append result of bisect! And assign this bug to the author.

Comment 2 liulei 2014-05-11 06:41:55 UTC

==Bisect results==
----------------------------
branch : drm-intel-next-queued
Bisect shows: 0882dae983707455e97479e5e904e37673517ebc is the first bad commit

commit 0882dae983707455e97479e5e904e37673517ebc
Author:     Paulo Zanoni <paulo.r.zanoni@intel.com>
AuthorDate: Wed Jan 8 11:12:27 2014 -0200
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Wed Jan 8 15:54:09 2014 +0100

    drm/i915: fix DDI PLLs HW state readout code

    Properly zero the refcounts and crtc->ddi_pll_set so the previous HW
    state doesn't affect the result of reading the current HW state.

    This fixes WARNs about WRPLL refcount if we have an HDMI monitor on
    HSW and then suspend/resume.

    Cc: stable@vger.kernel.org
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64379
    Tested-by: Qingshuai Tian <qingshuai.tian@intel.com>
    Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Comment 3 Daniel Vetter 2014-05-15 16:04:24 UTC

Probably fixed with my runtime pm dpms series which completely reworks the hsw ddi pll code.

Now if someone would actually review that pile of crap ...

Comment 4 Jens 2014-06-03 20:36:32 UTC

I am having the exact same problem (WARNING reported at intel_ddi_pll_mode_set) with Ubuntu 14.04 LTS + frequent crashes/freezes/spontaneous reboots upon resume from hibernation).

Both with 3.13 stock kernel (3.13.0-27-generic)
and with the current "3.15.0-031500rc8-generic #201406012235" mainline kernel

i915 version (?) from dmesg:
[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0

Detailed description, syslog and dmesg output, HW info: see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326092

Comment 5 Jens 2014-06-17 20:50:55 UTC

This bug causes several of my development machines to crash upon every second resume from S4 (hibernate). I can always suspend/resume once (then I get the warning logged above), if I try again, the machine reboots after resuming.

What is the progress on this issue?
Is there anything I can do to help fix this?

Thanks!

Comment 6 Jani Nikula 2014-06-28 14:00:16 UTC

(In reply to comment #5)
> This bug causes several of my development machines to crash upon every
> second resume from S4 (hibernate). I can always suspend/resume once (then I
> get the warning logged above), if I try again, the machine reboots after
> resuming.

Lei, can you reproduce the crash after 2nd resume?

Comment 7 liulei 2014-06-30 01:28:06 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > This bug causes several of my development machines to crash upon every
> > second resume from S4 (hibernate). I can always suspend/resume once (then I
> > get the warning logged above), if I try again, the machine reboots after
> > resuming. 
> 
> Lei, can you reproduce the crash after 2nd resume?

I can't reproduce the crash after 2nd resume. I continuously make 4 times s4(hibernate),only get the warning logged above, no crash . In fact , we have opened a bug to track s4(hibernate) sporadically cause system hang. 

https://bugs.freedesktop.org/show_bug.cgi?id=65496

Comment 8 Jens 2014-06-30 18:08:44 UTC

Can you reproduce the crash using Ubuntu 14.04 stock kernel?

If not, can I help reproducing it by using some other kernel, or any other bootable image? I'd be happy to help.

Also, there's additional info at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326092, including a log creted with "dri.debug=0xe".

Thank you!

Comment 9 Jens 2014-06-30 20:38:03 UTC

I have checked out the current drm-intel-next-queued branch as of 2 hours ago, built it and tested hibernate with this kernel. I get a whole bunch of new WARNINGs, like

WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:154 ironlake_disable_display_irq+0x75/0x80 [i915]()
WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]()
WARNING: CPU: 3 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]()
WARNING: CPU: 1 PID: 171 at drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915]()
WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:154 ironlake_disable_display_irq+0x75/0x80 [i915]()
WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]()
WARNING: CPU: 3 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]()
WARNING: CPU: 1 PID: 171 at drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915]()
WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:154 ironlake_disable_display_irq+0x75/0x80 [i915]()
WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]()
WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:423 ibx_display_interrupt_update+0x90/0xa0 [i915]()
WARNING: CPU: 1 PID: 7178 at drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915]()

but I do not get the spontaneous reboots any more. Instead, after a couple of suspend/resume cycles (3 in my case) I get tons of these:

[  665.971162] BUG: Bad page map in process lxpanel  pte:dd000000dc0000 pmd:35d6b067
[  665.971163] addr:00007f7fe682e000 vm_flags:08000070 anon_vma:          (null) mapping:ffff880211963220 index:1de
[  665.971164] vma->vm_ops->fault: filemap_fault+0x0/0x430
[  665.971165] vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x60
[  665.971166] CPU: 3 PID: 3270 Comm: lxpanel Tainted: G    B   W  OE 3.16.0-rc2+ #4
[  665.971166] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014
[  665.971167]  ffff8800352b1000 ffff88003631bad8 ffffffff81739d4e 00007f7fe682e000
[  665.971168]  ffff88003631bb20 ffffffff8118704a 00dd000000dc0000 00000000000001de
[  665.971169]  ffff880035d6b170 00dd000000dc0000 00007f7fe6955000 00007f7fe682e000
[  665.971171] Call Trace:
[  665.971173]  [<ffffffff81739d4e>] dump_stack+0x45/0x56
[  665.971174]  [<ffffffff8118704a>] print_bad_pte+0x1aa/0x250
[  665.971175]  [<ffffffff811883de>] unmap_single_vma+0x5de/0x8d0
[  665.971176]  [<ffffffff81189699>] unmap_vmas+0x49/0x90
[  665.971177]  [<ffffffff811920fc>] exit_mmap+0x9c/0x170
[  665.971179]  [<ffffffff8111b893>] ? __delayacct_add_tsk+0x153/0x170
[  665.971180]  [<ffffffff8106992c>] mmput+0x5c/0x120
[  665.971182]  [<ffffffff8106ecdc>] do_exit+0x26c/0xa60
[  665.971183]  [<ffffffff8173d62e>] ? schedule_timeout_killable+0x1e/0x20
[  665.971185]  [<ffffffff81161a8c>] ? out_of_memory+0x49c/0x4d0
[  665.971186]  [<ffffffff8106f54f>] do_group_exit+0x3f/0xa0
[  665.971187]  [<ffffffff8107ee40>] get_signal_to_deliver+0x1d0/0x6f0
[  665.971189]  [<ffffffff81012548>] do_signal+0x48/0x9d0
[  665.971190]  [<ffffffff8101c1d5>] ? native_sched_clock+0x35/0x90
[  665.971192]  [<ffffffff8101c239>] ? sched_clock+0x9/0x10
[  665.971194]  [<ffffffff8111ce9c>] ? acct_account_cputime+0x1c/0x20
[  665.971195]  [<ffffffff810a335b>] ? account_user_time+0x8b/0xa0
[  665.971197]  [<ffffffff810a3924>] ? vtime_account_user+0x54/0x60
[  665.971198]  [<ffffffff81012f39>] do_notify_resume+0x69/0xb0
[  665.971199]  [<ffffffff817432d8>] retint_signal+0x48/0x90
[  665.971200] swap_free: Bad swap offset entry 37c000003780

Comment 10 liulei 2014-07-01 07:18:04 UTC

(In reply to comment #8)
> Can you reproduce the crash using Ubuntu 14.04 stock kernel?
> 
> If not, can I help reproducing it by using some other kernel, or any other
> bootable image? I'd be happy to help.
> 
I will be impressed ,if you offer me a trouble image
> Also, there's additional info at
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326092, including a
> log creted with "dri.debug=0xe".
> 
> Thank you!

Comment 11 Jens 2014-07-01 20:10:50 UTC

Here you are:

https://rb-hosting.de/owncloud/public.php?service=files&t=7d4ea1d33dfae6f5d0868425601cfe44

This directory contains
- two .deb files (kernel-headers, kernel-image) built on Ubuntu 14.04)
- appropriate syslog file showing the first resume with the WARNINGs and dri.debug=0xe

(available until 2014-07-31)

With the new kernel I have to suspend/resume 2..4 times to get a crash. After each crash and subsequent, there's a Ubuntu app asking "System problem detected, do you want to report it"? It's called 'whoopsie' and it will create a crash report at launchpad.net, AFAIK. Its logfile is also stored at the above URL.

Unfortunately I was not (yet) able to find the actual Oops message it complains about, in the past it was logged to syslog but now I can't find it. I will post it when I get it to display.

Comment 12 liulei 2014-07-02 07:56:19 UTC

(In reply to comment #11)
> Here you are:
> 
> https://rb-hosting.de/owncloud/public.
> php?service=files&t=7d4ea1d33dfae6f5d0868425601cfe44
> 
> This directory contains
> - two .deb files (kernel-headers, kernel-image) built on Ubuntu 14.04)
> - appropriate syslog file showing the first resume with the WARNINGs and
> dri.debug=0xe
> 
> (available until 2014-07-31)
> 
> With the new kernel I have to suspend/resume 2..4 times to get a crash.
> After each crash and subsequent, there's a Ubuntu app asking "System problem
> detected, do you want to report it"? It's called 'whoopsie' and it will
> create a crash report at launchpad.net, AFAIK. Its logfile is also stored at
> the above URL.
> 
> Unfortunately I was not (yet) able to find the actual Oops message it
> complains about, in the past it was logged to syslog but now I can't find
> it. I will post it when I get it to display.

With the image you offered I didn't get a crash after suspend/resume 5 times.

Comment 13 Jens 2014-07-02 17:39:33 UTC

What hardware are you using? I have a

   MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014

see also http://www.msi.com/product/mb/B85ME45.html.

Maybe this is MSI (or bios setting) specific.

Can you get access to a comparable chipset to test this?

Do the errors logged to syslog make any sense to you at all?

Thank you!

Comment 14 Jens 2014-07-02 18:17:11 UTC

Created attachment 102153 [details]
Oops on first resume with 3.16.0rc2+ on MSI B85M-E45 mainboard

This is an oops log with drm.debug=0xe after the first resume from hibernation. Note that resume from sleep is never an issue, just hibernation.

Also I have this problem on two machines, B81M chipset and B85 chipset.

Comment 15 Jens 2014-07-06 08:50:00 UTC

Here is another dmesg after first resume. There are some more warnings compared to the stock Ubuntu kernel, but the main one (PLL related) is the same:

[   83.583002] [drm:intel_ddi_pll_select] Using SPLL on pipe A
[   83.583003] ------------[ cut here ]------------
[   83.583023] WARNING: CPU: 1 PID: 173 at drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915]
()
[   83.583023] SPLL already enabled
(...)

Updated Git kernel image is building right now. I noticed there is a "intel-fixes-2014-07-03" branch in the Git repo, is this something worth trying out for me?

Anything else I can help with? Provide access to appropriate hardware perhaps?

Comment 16 Jens 2014-07-06 11:58:23 UTC

No change with 3.16.0rc2+ image (taken from Git) and compiled today. Still the same WARNINGs after hibernate and a kernel Oops after resuming three times.

Date: Wed Jul  2 20:10:08 2014
Failure: oops
OopsText:
 general protection fault: 0000 [#3] SMP 
 Modules linked in: btrfs(E) xor(E) raid6_pq(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) pci_stub(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) bnep(E) rfcomm(E) bluetooth(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E) snd_pcm(E) intel_rapl(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_seq(E) coretemp(E) snd_seq_device(E) snd_timer(E) kvm_intel(E) kvm(E) snd(E) mei_me(E) mei(E) soundcore(E) lpc_ich(E) serio_raw(E) shpchp(E) mac_hid(E) tpm_infineon(E) intel_smartconnect(E) parport_pc(E) ppdev(E) lp(E) parport(E) dm_crypt(E) hid_generic(E) usbhid(E) hid(E) mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) drm_kms_helper(E) libahci(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
 CPU: 1 PID: 9815 Comm: Xorg Tainted: G      D W  OE 3.16.0-rc2+ #4
 Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014
 task: ffff8801eba80000 ti: ffff880210474000 task.ti: ffff880210474000
 RIP: 0010:[<ffffffff811da5dd>]  [<ffffffff811da5dd>] __inode_permission+0x5d/0xc0
 RSP: 0018:ffff880210477cc8  EFLAGS: 00010246
 RAX: 006f0000006e0000 RBX: ffff880036252f98 RCX: 0000000000000018
 RDX: ffff8802130cf2e0 RSI: 0000000000000081 RDI: ffff880036252f98
 RBP: ffff880210477ce0 R08: 647261632f697264 R09: ffff880210477cc4
 R10: ffff8800d4a43025 R11: 0000000000000003 R12: 0000000000000081
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff880210477e50
 FS:  00007f6d65a8e9c0(0000) GS:ffff88021ea80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fff16d3ee38 CR3: 00000000d4bc4000 CR4: 00000000001407e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Stack:
  ffff8800d4a43029 ffff8801eba80000 0000000000000000 ffff880210477cf0
  ffffffff811da658 ffff880210477d98 ffffffff811dab01 ffff880210477d48
  ffffffff811b3986 ffff8801eba80000 ffffffff8131eb73 ffff8801eba80000
 Call Trace:
  [<ffffffff811da658>] inode_permission+0x18/0x50
  [<ffffffff811dab01>] link_path_walk+0x71/0x870
  [<ffffffff811b3986>] ? kmem_cache_alloc_trace+0x1c6/0x1f0
  [<ffffffff8131eb73>] ? apparmor_file_alloc_security+0x23/0x40
  [<ffffffff812e41d6>] ? security_file_alloc+0x16/0x20
  [<ffffffff811defac>] path_openat+0x9c/0x670
  [<ffffffff8120c121>] ? send_to_group+0xd1/0x1b0
  [<ffffffff811dfd8a>] do_filp_open+0x3a/0x90
  [<ffffffff811ec8f7>] ? __alloc_fd+0xa7/0x130
  [<ffffffff811ce898>] do_sys_open+0x128/0x220
  [<ffffffff81021ac5>] ? syscall_trace_enter+0x145/0x250
  [<ffffffff811ce9ae>] SyS_open+0x1e/0x20
  [<ffffffff817426ff>] tracesys+0xe1/0xe6
 Code: 41 5d 5d c3 66 2e 0f 1f 84 00 00 00 00 00 8b 43 4c 85 c0 75 36 44 89 e6 48 89 df e8 9e 98 10 00 5b 41 5c 41 5d 5d c3 48 8b 43 20 <48> 8b 40 10 48 85 c0 74 35 44 89 e6 48 89 df ff d0 eb bb f6 47 
 RIP  [<ffffffff811da5dd>] __inode_permission+0x5d/0xc0
  RSP <ffff880210477cc8>
 ---[ end trace 45c4f49310fca543 ]---

Comment 17 liulei 2014-07-07 00:58:14 UTC

(In reply to comment #15)
> Here is another dmesg after first resume. There are some more warnings
> compared to the stock Ubuntu kernel, but the main one (PLL related) is the
> same:
> 
> [   83.583002] [drm:intel_ddi_pll_select] Using SPLL on pipe A
> [   83.583003] ------------[ cut here ]------------
> [   83.583023] WARNING: CPU: 1 PID: 173 at
> drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915]
> ()
> [   83.583023] SPLL already enabled
> (...)
> 

This issue we have opened a bug to track.

> Updated Git kernel image is building right now. I noticed there is a
> "intel-fixes-2014-07-03" branch in the Git repo, is this something worth
> trying out for me?
> 

I don't think it will help you out of trouble.

> Anything else I can help with? Provide access to appropriate hardware
> perhaps?

Comment 18 liulei 2014-07-07 01:06:26 UTC

(In reply to comment #9)
> I have checked out the current drm-intel-next-queued branch as of 2 hours
> ago, built it and tested hibernate with this kernel. I get a whole bunch of
> new WARNINGs, like
> 
> WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:154
> ironlake_disable_display_irq+0x75/0x80 [i915]()
> WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423
> ibx_display_interrupt_update+0x90/0xa0 [i915]()
> WARNING: CPU: 3 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423
> ibx_display_interrupt_update+0x90/0xa0 [i915]()
> WARNING: CPU: 1 PID: 171 at drivers/gpu/drm/i915/intel_ddi.c:911
> intel_ddi_pll_enable+0x248/0x250 [i915]()
> WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:154
> ironlake_disable_display_irq+0x75/0x80 [i915]()
> WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423
> ibx_display_interrupt_update+0x90/0xa0 [i915]()
> WARNING: CPU: 3 PID: 6 at drivers/gpu/drm/i915/i915_irq.c:423
> ibx_display_interrupt_update+0x90/0xa0 [i915]()
> WARNING: CPU: 1 PID: 171 at drivers/gpu/drm/i915/intel_ddi.c:911
> intel_ddi_pll_enable+0x248/0x250 [i915]()
> WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:154
> ironlake_disable_display_irq+0x75/0x80 [i915]()
> WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:423
> ibx_display_interrupt_update+0x90/0xa0 [i915]()
> WARNING: CPU: 0 PID: 7177 at drivers/gpu/drm/i915/i915_irq.c:423
> ibx_display_interrupt_update+0x90/0xa0 [i915]()
> WARNING: CPU: 1 PID: 7178 at drivers/gpu/drm/i915/intel_ddi.c:911
> intel_ddi_pll_enable+0x248/0x250 [i915]()
> 
Above issues we have opened bugs, too. So it's reasonable you got them.
> but I do not get the spontaneous reboots any more. Instead, after a couple
> of suspend/resume cycles (3 in my case) I get tons of these:
> 
> [  665.971162] BUG: Bad page map in process lxpanel  pte:dd000000dc0000
> pmd:35d6b067
> [  665.971163] addr:00007f7fe682e000 vm_flags:08000070 anon_vma:         
> (null) mapping:ffff880211963220 index:1de
> [  665.971164] vma->vm_ops->fault: filemap_fault+0x0/0x430
> [  665.971165] vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x60
> [  665.971166] CPU: 3 PID: 3270 Comm: lxpanel Tainted: G    B   W  OE
> 3.16.0-rc2+ #4
> [  665.971166] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5
> 05/30/2014
> [  665.971167]  ffff8800352b1000 ffff88003631bad8 ffffffff81739d4e
> 00007f7fe682e000
> [  665.971168]  ffff88003631bb20 ffffffff8118704a 00dd000000dc0000
> 00000000000001de
> [  665.971169]  ffff880035d6b170 00dd000000dc0000 00007f7fe6955000
> 00007f7fe682e000
> [  665.971171] Call Trace:
> [  665.971173]  [<ffffffff81739d4e>] dump_stack+0x45/0x56
> [  665.971174]  [<ffffffff8118704a>] print_bad_pte+0x1aa/0x250
> [  665.971175]  [<ffffffff811883de>] unmap_single_vma+0x5de/0x8d0
> [  665.971176]  [<ffffffff81189699>] unmap_vmas+0x49/0x90
> [  665.971177]  [<ffffffff811920fc>] exit_mmap+0x9c/0x170
> [  665.971179]  [<ffffffff8111b893>] ? __delayacct_add_tsk+0x153/0x170
> [  665.971180]  [<ffffffff8106992c>] mmput+0x5c/0x120
> [  665.971182]  [<ffffffff8106ecdc>] do_exit+0x26c/0xa60
> [  665.971183]  [<ffffffff8173d62e>] ? schedule_timeout_killable+0x1e/0x20
> [  665.971185]  [<ffffffff81161a8c>] ? out_of_memory+0x49c/0x4d0
> [  665.971186]  [<ffffffff8106f54f>] do_group_exit+0x3f/0xa0
> [  665.971187]  [<ffffffff8107ee40>] get_signal_to_deliver+0x1d0/0x6f0
> [  665.971189]  [<ffffffff81012548>] do_signal+0x48/0x9d0
> [  665.971190]  [<ffffffff8101c1d5>] ? native_sched_clock+0x35/0x90
> [  665.971192]  [<ffffffff8101c239>] ? sched_clock+0x9/0x10
> [  665.971194]  [<ffffffff8111ce9c>] ? acct_account_cputime+0x1c/0x20
> [  665.971195]  [<ffffffff810a335b>] ? account_user_time+0x8b/0xa0
> [  665.971197]  [<ffffffff810a3924>] ? vtime_account_user+0x54/0x60
> [  665.971198]  [<ffffffff81012f39>] do_notify_resume+0x69/0xb0
> [  665.971199]  [<ffffffff817432d8>] retint_signal+0x48/0x90
> [  665.971200] swap_free: Bad swap offset entry 37c000003780
Did machine get crash with these Call Trace?

Comment 19 liulei 2014-07-07 01:27:33 UTC

(In reply to comment #13)
> What hardware are you using? I have a
> 
>    MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014
> 
> see also http://www.msi.com/product/mb/B85ME45.html.
> 
> Maybe this is MSI (or bios setting) specific.
> 
> Can you get access to a comparable chipset to test this?
> 
> Do the errors logged to syslog make any sense to you at all?
> 
> Thank you!
Our board is ASUSTeK Z87-EXPERT    BIOS Revision: 4.6 05/17/2013

Comment 20 Jens 2014-07-07 20:00:59 UTC

Hello,

thank you for replying!

>> [   83.583023] WARNING: CPU: 1 PID: 173 at
>> drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915]
> This issue we have opened a bug to track.

Which bug is it (I thought it was this one)?

btw this is the only WARNING I get with the Ubuntu 14.04 stock kernel. All the other WARNING messages appear only with newer kernels (mainline or compiled from Git).

Very roughly, how far away are you from a solution? Days, months? (I need to decide whether to exchange the hardware, I badly need the suspend functionality.)


>> [  665.971200] swap_free: Bad swap offset entry 37c000003780
> Did machine get crash with these Call Trace?

Yes, this happened after a resume. It was alive, but unusable - I could access the local console but no network and no new processes were starting. I had to reset it.

> Our board is ASUSTeK Z87-EXPERT    BIOS Revision: 4.6 05/17/2013

Can you get access to a MSI B85 or B81 board? 
If not, will it help if I donated one?

Comment 21 liulei 2014-07-08 08:30:40 UTC

> >> [   83.583023] WARNING: CPU: 1 PID: 173 at
> >> drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x248/0x250 [i915]
> > This issue we have opened a bug to track.
> 
> Which bug is it (I thought it was this one)?
> 
Ohh, My mistake. It was this one that we opened to track issue. 
> 
> Very roughly, how far away are you from a solution? Days, months? (I need to
> decide whether to exchange the hardware, I badly need the suspend
> functionality.)
>
I can't say it. I don't think it's our i915 module that cause your machine crash. I remove i915 module on laptop whose board is MSI MS-16GC , BIOS version E16GCIMS.509. That machine even can't finish S4 .
> 
> Can you get access to a MSI B85 or B81 board? 
> If not, will it help if I donated one?

Comment 22 Jens 2014-07-09 19:59:13 UTC

> I don't think it's our i915 module that cause your machine crash. I remove i915 module on laptop whose board is MSI MS-16GC

Well, I don't know. When I move the i915.ko module somewhere else, call 'update-initrramfs -u' (to update the ramdisk), reboot (so it doesn't get loaded) and then try 'sudo pm-suspend' or 'sudo pm-hibernate', the system freezes at once and has to be hard reset.

With loading the i915.ko module, the system survives 'pm-suspend' and resumes perfectly, but has the problems stated above with 'sudo pm-hibernate'.

This is with 3.16.0rc2+ as of my last comment.

Comment 23 Jens 2014-07-10 19:23:03 UTC

Created attachment 102574 [details]
pm-suspend and pm-hibernate (including resume) works on 3.16.0rc2+ without i915 module loaded

I tried again, using the following procedure, to confirm i915 as the cause of the S4 resume problems:

* booted with parameter "i915.modeset=0 text" (to avoid lightdm startup)
* echo 0 > /sys/class/vtconsole/vtcon1/bind   (which froze the local console, I had to continue using SSH)
* pkill alsactl
* rmmod snd_hda_intel
* rmmod i915
* sudo pm-suspend    (worked)
* resume             (worked)
* sudo pm-hibernate  (worked)
* resume             (worked, even without using the same kernel parameters)

So it seems the i915 module is actually the cause of the resume problem. Or is it? Log (dmesg) attached.

Comment 24 Jens 2014-07-23 19:10:10 UTC

Anything else I can do to fix this issue?

Comment 25 Jens 2014-08-09 18:16:57 UTC

I upgraded to 3.16.0 (drm-intel-next Git repo, as of 2014-08-08, 2c0827cffca8ac0c654b888c58a1989a5172f007) and I still get a frozen machine after a hibernate/resume process (and a KernelOops apport report after a subsequent reboot) exactly when the i915 module is loaded when suspending.

Interestingly, when suspending, my screen goes black and then a frozen image of my desktop reappears while (I suppose) the RAM image is written to the disk. Once - only once - this did not happen (the screen went black and stayed black), and this was the one single successful resume process I had so far. I don't know if this helps.

Where can I find the oops backtrace to dig out the source of the Oops?

Comment 26 Jens 2014-08-15 18:26:02 UTC

Still does not work with 3.16.0-final. How can I further help fix this issue?

Comment 27 Jens 2014-08-16 20:56:56 UTC

Booted 3.16.1, tried again:

* hibernate once (worked),
* resume (worked, with the WARNING: SPLL already enabled" message just like before),
* second hibernate (froze before hibernation was complete).

To double check, tried without i915 loaded, worked perfectly three times in a row.

Still the same MSI-B85M (MSI-7817) chipset.

Comment 28 liulei 2014-08-18 01:29:49 UTC

(In reply to comment #27)
> Booted 3.16.1, tried again:
> 
> * hibernate once (worked),
> * resume (worked, with the WARNING: SPLL already enabled" message just like
> before),
> * second hibernate (froze before hibernation was complete).
> 
> To double check, tried without i915 loaded, worked perfectly three times in
> a row.
> 
> Still the same MSI-B85M (MSI-7817) chipset.
I tried latest -nightly , and did S4. I didn't  run into Call Trace issue.But still sporadically can't get system back. Maybe you can try our latest -nightly kernel to see if the call trace issue still exists.

Comment 29 Jens 2014-08-18 13:32:11 UTC

I pulled the current "drm-intel-nightly" code and tried again.

Setup:
* Ubuntu 14.04 LTS, Kernel 3.16.0+ (3.17rc1 as of now)
* MSI-7817 chipset with i5-i4570
* Boot Lubuntu desktop, start "make -j4" in git checkout, start Firefox with Youtube video, then hibernate and resume in a loop

Results:
* No more WARNING: messages upon resume
* Multiple resumes work fine
* About one in every fifth resume the machine grinds to a halt with dozens of OOM killer messages

So: A big improvement (I can hibernate and resume multiple times in a row, even with a loaded machine!). But we're not quite there yet - where do the OOM errors come from? When I hibernate, only ~1,5G out of 8G RAM are actually used.

Thank you!

Comment 30 liulei 2014-08-19 05:50:47 UTC

Since latest -nightly kernel works without this Call Trace, I close this bug.

Comment 31 Jens 2014-08-19 07:35:25 UTC

Unfortunately, this isn't the end of it. I posted some hibernation resume failures (dmesg output) here: https://bugzilla.kernel.org/show_bug.cgi?id=59321#c42

In short: every 3..5 resumes the OOM killer runs amok and kills half my system Before that, I always see log messages (sometimes hundreds) like

  Purging GPU memory, X bytes freed, Y bytes still pinned.

After that, the system is - most of the time - unusable and has to be hard reset.

What causes these messages and why is the OOM killer invoked? I have 8G of memory of which -usually- 7G are not even filled with buffer cache when I hibernate the system, i.e. completely empty.

Do you want me to open another bug report because the symptoms changed?

Comment 32 Jani Nikula 2014-08-19 07:55:55 UTC

(In reply to comment #31)
> Unfortunately, this isn't the end of it. I posted some hibernation resume
> failures (dmesg output) here:
> https://bugzilla.kernel.org/show_bug.cgi?id=59321#c42

> Do you want me to open another bug report because the symptoms changed?

Yes please. Attach the dmesgs from the above bug as plain text.

Comment 33 Jens 2014-08-20 11:50:05 UTC

Done. See #82864. Thank you!

Comment 34 Jari Tahvanainen 2016-10-19 12:41:54 UTC

Closing verified+fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.